PDF And Duplicate Recognition

Problems with the function which finds duplicates automatically:

  1. I had a duplicate pdf and accidentially moved one page (in the thumbnail pane on the right). I undid (Cmd-Z) the action but the pdf was written to disk anyway (like it happens with undo and text files). Apart from doubling its size, DT does no longer recognizes the two pdf files as duplicates, even though they are exactly the same.

  2. I have annotated a pdf outside of DT (which exists already in DT). After dropping it into the sorter, DT tells me that both are duplicates, which they aren’t (because they differ in the annotations and file size).


Could you please send the two files to cgrunenberg - at - devon-technologies.com? Thanks in advance!

Annotations and filesize don’t have any impact on the recognition of duplicates as only the text and number of pages are compared.

Done. This happens with almost every pdf file.

That’s good for the filesize, but I wish DT would recognize when two pdf files have totally different annotations.

For comparison of documents, DEVONthink looks only at the content of the documents. Annotations of PDFs are not part of the document content and are not currently recognized by DEVONthink.

We often describe PDFs as having an “image layer” and a “text layer” – the latter being the “content” of the searchable PDF. Adobe designed the PDF format such that text annotations are not part of the “text layer” of the PDF, but something else entirely.

Some years ago I was working with an Acrobat Pro database that contained more than 600 PDFs, using Adobe’s own indexed database module that allowed searching across the PDFs. But Adobe didn’t index PDF text annotations, so I started working with annotations as separate linked PDFs, which did work with Acrobat’s PDF database design as those annotations were searchable. I also liked the fact that this approach to annotations was much richer and thus much more powerful than the simple plain text annotations allowed within PDFs. For example, these annotations could contain as many hyperlinks as I wished.

Adobe’s approach to creating a PDF database was really clumsy, though. Each time a document was modified the entire database had to be reindexed, and that was time-consuming and fairly complicated.

Later on, I found that I could simply drop the collection of PDFs into DEVONthink Pro and all the clumsiness disappeared. Better yet, I was no longer limited to using the PDF file format for my annotations, but could instantly create them using rich text notes. Not only that; now I could set up such rich text notes to annotate and relate by hyperlinks documents of any file type, not just PDFs.

Which may explain why I haven’t used the plain text annotation feature of PDFs for years. I don’t have any problems relating my rich text notes to my referenced documents. Suppose I want to find all my notes that refer to a certain document. All I have to do is a “Lookup” search (“Command-/”) on the name of that document, by selecting its Name. The referenced document and my rich text notes about it are immediately listed. And of course this works for all file types, not just PDFs.

Thanks, but my preferred method of annotating is like drawing a box around an important fact on page 13, highlight text on page 44 and write a note on the margin of page 66. Just like I would work with a pen and a book. And the annotations in DT are too separated from the document to do this efficiently. In DTTG I would have to switch between 2 documents -pdf and annotation- for that.

However these annotations may come handy for other document types. I couldn’t find the “lookup search” and cmd-/ does nothing - maybe because I run the German version? Does someone know the translation of the menu item or the shortcut?

“Lookup” is a Service installed by DEVONthink. As with all Services, the application should be installed in Applications, and if DEVONthink has been installed for the first time, the Service requires logout/login or a restart in order to become activated.

The Lookup Service is contextual and isn’t applicable unless a text string has been selected. If you have selected a text string and the keyboard shortcut doesn’t result in opening a Search window with the selected text string already entered, it’s possible that keyboard shortcut has been “hijacked” by another command in a different application. You may check to see if the Lookup Service was installed by clicking on the application name in the menubar, then examining the list of applicable Services under the “Services” submenu. If it’s present, click on “Lookup” to invoke it.

If a text string comprising multiple words was selected, by default the search will be conducted as though the AND operator were inserted between each term. To instruct DEVONthink to search as an exact string (phrase), enclose the query string in quotation marks.

Found it. For some reason I had to activate the service in the service settings first. And there I ran across other interesting services… omg, I’m dealing intensively with DT for nearly 4 weeks now, but there is still so much more functionality I haven’t even seen yet.