Buggy handling of PDF text due to PDFKit engine

After converting epub books to pdf with text correctly via Calibre, DEVONthink imports them as pdfs without text.

DEVONthink and Apple’s Preview don’t recognize the text in these pdfs; PDF Expert and Adobe Acrobat do. Therefore, I assume this serious problem for my use of DEVONthink is caused by the app using of the buggy PDFKit engine.

Is DEVONtechnologies planning to move to the standard Adobe system? If not, can you suggest a way to address this problem so that these standards-compliant pdfs can be read and manipulated in DEVONthink?

On MAC, DT uses the integrated Apple SDK (or whatever is called) to handle PDF, and Apple today is not the Apple of yesterday. Open the PDF with Apple Preview and try a search. No text. Problem is not from DT but macOS itself.

I bet you’ve converted them with Calibre. :smiley:

Calibre generates UNICODE character PDF, that is not recognized as text by Apple, and is the current standard de facto.

We have a workaround for that: open your PDF with Preview, duplicate it (menu File) and then save it applying “optimize size” or “generate standard blah PDF”. New file will be readable both by Preview and DT.

However, this has a problem: embedded fonts stop working, and if your PDF has embedded fonts, it will show blank pages inside DT.

Said, that I would like DT threw away Apple SDK and use the same SDK they are using in iOS, that does not have those problems.

Thanks, I have tried the workaround you suggest. I don’t know why it works with some but not with all pdfs. For these I don’t get blank pages but pdfs with viewable and selectable text. However, these files are imported by DT as pdfs with no text, and when their text is selected and copied, Apples’s clipboard is empty. All these files are perfectly ok in PDF Expert.

I understand the problem is that DT is using Apple’s pdf engine, but it is in DT that I have the problem. Any other conversion you can suggest to get DT to properly read pdfs with text from epub?

(As for Apple, I find it difficult to accept that for a company of its size and pricing, it doesn’t get right things like this and other long standing issues that are increasingly making macOS a bother to use. As soon as most of the handful of apps by third developers I frequently use -DEVONthink being one of them- are made available for Windows, I’ll most probably switch OS).

No, sorry, and I have the same problem you have. My solution is a bit of a commitment.

In macOS I have PDF Expert as default PDF viewer. Double click in a PDF inside of DT opens PDF Expert, and if it opens the integrated viewer (I really wonder why with some PDF opens one or the other), I right click to “open with default”.

In iOS, I use PDF Viewer from PSPDFKit via Apple Files. I open DT, download the PDF if it is not yet, then go to PDF Viewer and use the app to navigate into the DT Location and open/annotate the file. It has a lot of options (dark mode, for example), and when I finish I don’t forget to go back to DT to let synchronize the changes.

This is my last document flow, but it is clumsy and I think I’m going to stop using DT for that and simply open the files directly from PDF Viewer/PDF Expert and leave DT only for in-depth searching and relations between documents. Currently, all my files are in OneDrive (moved from iCloud Drive, as it gave me a lot of sync problems).

BTW, the clipoard problem is the same: copied text is in a variation of UNICODE that Apple does not understand… Another bug in macOS. Those hidden bugs that makes your life not as easier as it used to be.

Thanks for the detailed description of your workflow. It saves my time looking for an unexistent solution.

I stopped using iOS. In macOS my workflow is the same as yours, with the exception that I have the “Open Externally” icon in DT’s toolbar.

But my workflow gets clumsier than yours because I annotate pdfs. I tried MarginNote but, although a good app by itself, I gave it up because of its non-standard pdf system and the additional steps I have to add to my DT-based workflow. I prefer to use the straightforward Annotation Pane.

It is DT’s handling of pdfs that spoils what would be an acceptable workflow. I have to keep searching/going to pages in DT to extract annotations with Annotation Pane of the paragraphs I highlight in PDF Expert -this app has faultless pdf handling, ToC and annotations summary. To make things worse, some pdfs cannot be read by DT.

Apple’s PDFKit has some of the longstanding bugs in macOS (another important one is non-persistent Edit>Substitutions), that I have to circumvent dozens of times a day. It gets annoying enough to make me want to change OS as soon as I can find in Windows a handful of third-party apps my workflow relies on. Hopefully, Microsoft is big enough to solve bugs in its OS! As for Apple’s spectacular hardware, I never valued it as much as my previous IBM Thinkpads -that I used first with Windows and later with Linux.