Cannot move to topic in PDF from table of contents

Hello. When I open up a PDF in Preview, if it has a table of contents, I can usually find the area/topic there, and then click it and I’m taken to the exact area in the PDF. When I OCR import the same PDF into DTPO and go to the table of contents, if I click the topic I want it will not take me to the spot within the PDF.

Is there a way to do this? I am using DTPO 2.8.6

In advance, many thanks for the support! :smiley:

Before importing that PDF into DEVONthink, is the text searchable and selectable?

If so, there should be no need to OCR the PDF upon importing to DT. By applying OCR to the already-OCR’d PDF you’re wiping out the internal links because you’ve replaced the text metadata.

Many thanks scottlougheed. Yes indeed, that was the problem. I have under preferences >OCR for incoming scans to be converted to searchable PDF’s. Most PDF’s are not set up this way that I import (e.g., journal articles, etc.). However, there are a few that are (like when I import a downloaded ebook that is a PDF).

Is there any way to tell DTPO to notice if the incoming PDF is already searchable and selectable so that it ignores the preference setting for this particular PDF?

As far as I understand it, that setting you are referring to refers specifically to scans, and not to importing already-existing PDFs.
That is, with this setting enabled, dragging and dropping a PDF into DTPO should not initiate OCR. It is only when a PDF is being imported from recognized scanning software that this setting is relevant.

I import academic journal articles on a daily basis, (all of which are OCR’d already anyway), and DTPO does not OCR those files (even though I have the setting enabled). If I were to scan a consent form via the built in scanning interface in DTPO, THAT would be OCRd because it is entering DTPO via a known scanning software.

So it is unclear to me why already-existing PDF files are having their internal links wiped out. I think it is for some other reason than I previously speculated.

Thanks for sharing your thoughts scottlougheed! I did a bit of experimenting and here’s what I found:

If I drag the PDF to the Sorter, it is imported and all the table of content links work just fine. On the other hand, if I launch the PDF into Preview, then select from the print menu Save PDF to DEVONthink Pro, the PDF is imported but the links do not work. It does not matter whether in DT I have the box re incoming scans be converted to searchable PDF or not (you were correct on this).

Additionally, if I select the PDF in the finder, left-click > services >Add to DEVONthink Pro Office, this works fine too (which is what I now realize I had done previously, when I reported above that the problem was resolved).

Is this a bug, or do the two different import methods inherently create different types of imported PDF’s by design?

Okay this points more directly to what is likely the root cause of the problem. It is not related to OCR (since that isn’t taking place at any point in this scenario) and it is not directly related to DTPO. The issue arises, I think, from limitations on Apple’s Quartz PDF engine. I have found that there are some instances where certain types of metadata encoding (where things like internal document links, attachments, and OCR constitute the metadata) can be poorly handled by Apple’s engine. This also happens to be the same engine used in DTPO.

What is likely happening is that importing it through a drag and drop or the sorter is literally just duplicating the file. The actual contents of the file are not being re-interpreted or re-encoded or anything. This is a filesystem operation just duplicating the bits.

When you “Save to DEVONthink”, you are actually creating a new PDF file that is being re-encoded by Preview (or more specifically, Apple’s Quartz PDF engine) during the saving process. I strongly suspect that this re-encoding is wiping the internal links out.

Thanks for your explanation. This makes sense. Too bad.

Is it fair to assume that the command to Print > Save PDF to DTPO is an applescript? If so, I wonder if there might be a work-around in a future update so that the 'script is similar to the one used when left-clicking > Services (I realize using the Print dialogue bypasses Services. I’m inquiring if it would be possible to use the same/similar script in the Print dialogue in the future, to bypass the problem of Preview creating a new PDF and thereby removing the links, etc.)?

The best way to bypass the re-encoding of a PDF is to use the file system to relocate the PDF, such as by copying it and moving the copy to DEVONthink, dragging the file to DEVONthink, or dragging the file to the sorter. You can also right-click a file in Finder and “Add to DEVONthink”, which is a script, but it is using the filesystem rather than the PDF engine.

Basically anything you do, manually or by AppleScript, that uses the filesystem rather than the PDF engine, will not re-encode your file and preserve its contents.

The limitations of Apple’s PDF engine have been problematic from at times, as I’ve observed it wipe the OCR from some files, which has been immensely frustrating, but I also have to place some of the blame on Adobe for doing some proprietary wonkiness and facilitating the proliferation of various, sometimes sketchy encoding schemes that Apple and other developers might not support for various reasons. For a standard, there sure are some majorly frustrating quirks.

Agreed! Thanks so much for all of your time in helping me troubleshoot this, and for your thoughts on why there is this problem to begin with :exclamation:

Have a great rest of your work week. :slight_smile:

Cheers, happy to help when and where I can! :slight_smile: