Tools>Summarize Highlights in 3.0 Menu

ngan · April 29, 2019, 3:22pm

May I know what is this function about? This item is not mentioned/explained in the help file under “The Tools menu”.

Thanks

vinschger · April 29, 2019, 4:14pm

it should summarize all highlighted text in a new document.
I do not know how it can be used in practice, if I have a pdf with highlighted text, and I select “summarize highlights” nothing happens in DT3beta1… I did not find any explanation in the help doc.

ngan · April 29, 2019, 4:31pm

I see… I think this item is the same additional script provided by 2.n that will only work for .rtf document. It works in 3.0 beta if the document is a .rtf file (just checked).

vinschger · April 29, 2019, 4:38pm

oh, that’s interesting…no error message, no information in DT’s protocol, if I try to summarize hightlights of a pdf…
@cgrunenberg this should be improved (help file, and also error message if this feature is used in a non-supported file)

as written before in the forum: it would be simply great, if such a feature can be used selecting multiple pdf files, creating ONE summary file that inlcudes all highlighted sentences/paragraps with a reference link to each individual pdf file where the highlight has been made… (feature suggestion)

lutefish · April 29, 2019, 6:17pm

It worked (somewhat) for an indexed PDF that contained highlighted text, inasmuch as it extracted some words from each highlighted instance across all pages of the PDF. The summaries were nonsensical, but it was accurate.

BLUEFROG · April 29, 2019, 10:20pm

It is missing in the documentation, but the function works when selecting highlighted PDFs or RTFs. It does not work with highlighted web content.

Select a highlighted PDF or two and choose the command.

If the files are selected in a local smart group, a Summary document will be created in the root of the database.
If the selected files are in the same group, the document will be created in that group.

@lutefish If you are experiencing unexpected behavior, please be more specific in the reporting.

vinschger · April 30, 2019, 3:56am

It works as described with highlighted rtf files, but not with highlighted pdfs (no summary file is created, no error message, no message in DT’s protocol)

ngan · April 30, 2019, 4:40am

Thanks to DT for the great effort to finally put in place this excellent utility.

For me, it works on both single and multiple selection of pdf files (kind=“PDF+Text”) in my indexed groups. The function worked smoothly almost every time except for some specific files.

I am guessing that the script is working as best as it can, but not all indexed pdfs are the same. If somehow the function seems not working, it may have to do with the OCR problems in some user-specific pdf+text files. For those who encounter issues may want to try using the function on other pdf+text files.

In the future update of 3.0, I wish we are allowed to modify the script, or by means of choosing an option, to (1) decide whether we want to paste the full name or the aliases of the file/s in the summary, and (2) able to choose to extract underlined text or highlighted text.

Updated results: the function also extracts underlined text, but displayed as a “blackened” block in the summary.rtf file. However, when I selected the block I can see the underlined text.

cgrunenberg · April 30, 2019, 6:46am

Just a bug, the next beta will fix this.

darwin · April 30, 2019, 6:49am

Just tried it a moment ago: It worked insofar as ist creates an .rtf with an entry, but this consists only of some words, actually a line of a three line highlight, and missed two other highlights completely (five lines long, two lines long). See attached the part of the pdf and the RTF.

ngan · April 30, 2019, 6:53am

You may want to use another/different pdfs for testing to check whether it’s a consistent pattern or is file specific? I tested the function on 50 pdf+text files and only 2 of them don’t work. Most of my files are generated by “print to pdf”, extracted from journal database, and OCRed journals (some older journal articles are not text indexed).

Also, some journal articles are intentionally scrabbled (as far as I am aware), even if you just copy and paste some selected text in a .rtf file, you won’t get what you want.

cgrunenberg · April 30, 2019, 6:53am

Could you send the PDF document to cgrunenberg - at - devon-technologies.com? Thanks

darwin · April 30, 2019, 6:57am

I tried it now with a second pdf, and it worked like charme.
By the way (feature request): It would be nice, if the generated RTF uses the PDF’s title as title/name automatically and not a simple “Zusammenfassung” (I’m German) , so the RTFs are better distinguishable.

ngan · April 30, 2019, 6:59am

This is not obvious, because the function works for both single and multiple selection?

darwin · April 30, 2019, 7:04am

Sorry, don`t geht what you mean?

ngan · April 30, 2019, 7:05am

If you select multiple files, the function will summarise all highlighted text of all selected files in one .rtf, with individual file name and extracted text and page index all listed in the .rtf. The script can’t aggregate all file names as title.

darwin · April 30, 2019, 7:33am

Okay, get it. Yeah, the feature request is only meant to one file, which is the most common case, I think.

cgrunenberg · April 30, 2019, 7:34am

Makes sense, thanks for the suggestion.

BLUEFROG · April 30, 2019, 12:59pm

Yeah, the feature request is only meant to one file, which is the most common case, I think.

Actually, it isn’t. The requests were actually skewed toward summarizing more than one document. Just FYI.

darwin · April 30, 2019, 1:28pm

in my case, I should have add