I’ve been trying to import a PDF document into DEVONthink 3 and am having a problem. The document is my income tax return that has been produced by a tax agent and sent to me as a PDF for signature.

I have used PDF Expert to edit the original document, and add signatures and dates. I have then saved the document to a separate file. When I view the new document in PDF Expert, the changes are there. When I view the new document in Apple’s Preview application, the changes are there, too. However…

When I import the new document into DEVONthink 3 (via a watched folder) I have in the file system, the document that appears in my Inbox is missing the changes.

I don’t understand what is going on here. Is the edited PDF in a format that DEVONthink 3 doesn’t import and index?

Here are two documents to try it for yourself.

  1. This is the original PDF - without the signature. DT3-pdf-experiment.pdf (20.6 KB)

  2. This is the edited PDF copy with signature and date added. This document appears to lose the signature and date upon import into DEVONthink 3. DT3-pdf-experiment-signed.pdf (39.7 KB)

  3. Just for comparison: a document edited with Apple Preview to add the signature and date (and the word “(preview)”. Importing this document into DEVONthink 3 preserves the signature but loses the added text (date). DT3-pdf-experiment-preview-signed.pdf (58.5 KB)

Do you get the same outcome?

This stopping me from using DEVONthink 3 as my document store of choice. :confused:

Cheers, S t u a r t .

macOS v10.15.4
DEVONthink 3 v3.0.4
PDF Expert v2.5.4 (675)

I’m not seeing any issue with the last two documents. Here is the second PDF

I’ve experienced that in the past; it’s never been consistent enough for me to follow up or be able to pin point when it happens. At least some of the time, however, the document has only looked as if it is missing the changes - ie they weren’t visible in DT, but if I opened the document from within DT using “open with”, the external app did show the changes (can you try that? Are your changes really lost, or just not showing up in DT?)

Interestingly, I use PDF Expert too. But as I said, I can’t reliably reproduce the issue.

Thanks for taking a look; very much appreciated.

It’s weird that you get a different result. I wonder why. I wish I was getting the same result as you.

I’ll try a few more experiments and see if I can find more indicators of the variability.

Cheers, S t u a r t .

Okay, here’s another data point.

If I import the document via the menu option File > Import > File and Folders…, then I get the document imported into DT3 correctly. However, the imported document is missing searchability (is that even a word?).

If I then OCR the manually imported document to add searchability, by right clicking the document in DT3 and selecting OCR > to searchable PDF, the resulting document will lose the signature and added text.

When I use the watched folder route, by dropping the document into my target Finder folder and letting DT3 suck it into the application (and OCR is automatically applied), then I lose the signature and other added text.

So, it seems that it’s the OCR process that knocks out the additions.


Cheers, S t u a r t .

Hold the Option key and choose Help > Report bug to start a support ticket.


Thanks for persisting with this weirdness.

I assume your recent result is because I created the original PDF document from a text editing application, which will have included a text layer in the PDF document output. This text layer makes that original PDF searchable.

Trying a different approach, I’ve now scanned the document without OCR, and imported it into DT3. This gives me a document that is not searchable, as you’d expect.

If I now annotate this original scanned document with the signature and date (using PDF Expert, outside DT3) and then manually import it into DT3, it’s still not searchable; which is, again, as you’d expect.

However, if I then manually OCR the imported and annotated scanned document within DT3, then everything is visually preserved. Even more intriguing, though, searching the document now in DT3 will find and highlight text from the original scanned document in the document pane, but will only find annotated text in the search panel but not highlight the added text in the document pane.

I suspect that there are layers of content within the PDF that are being treated differently during the OCR process. From my limited reading of the PDF format, it is complicated (and messy).

Cheers, S t u a r t .

No problem.

From my limited reading of the PDF format, it is complicated (and messy).

Incredibly so. The main PDF Evangelist at Adobe has often been sited as telling people, even though there is some humand-readable text in the code of PDFs, they shouldn’t mess about in the internals of a PDF without using proper tools.

Why not do the OCR in DEVONthink before annotating in PDF Expert?

I just tried your idea of preOCR in DT3, and then edit with PDF Expert.

When I go this way, I get the result where the original OCR text is searchable and highlighted in the document, while the annotated text added by PDF Expert is searchable but not highlighted in the document (as before, with my very last example).

I can probably live with this outcome, if it’s consistent :wink:

I’ll see how it goes over the next few documents … and thanks for the idea.

Cheers, S t u a r t .

No problem :slight_smile:

the annotated text added by PDF Expert

Can you post a screencap of this?

Here are two screen captures, as requested. See the red arrows for context.

  1. The first one shows the search for text (the word “existing”) from the original document text. You can see that the searched text is displayed in the occurrences section on the right. And the text is highlighted in yellow within the document.

  2. I then changed the search to look for the word “May” which only occurs in the annotated text that was added with PDF Expert. You can see in the occurrences section that it has found the text, but it is not highlighted within the document. In fact, the previous search remains highlighted.

Another interesting aspect of the second search is that if the annotation text is not within the view port of the document when searching, then the document does scroll on search to show the text (but still doesn’t highlight) implying that it knows where the result is located. :thinking:

