I don’t know if this a Tahoe 26.2 problem, but in DT (4) I am getting very variable results when trying to highlight text in PDFs.
Some highlight fine, the text is highlighted and the full text appears in the annotation inspector.
Some the text doesn’t highlight and a single line entry appears in the annotation inspector. If you click on it it takes you to the unhighlighted text.
Some won’t highlight at all and nothing appears
All PDFs show as PDF+Text. Doing an internet search show that there are problems with highlighting text in Preview (it shows the same behaviour as DT for all these PDFs).
Is there any setting I have wrong (MacOS or DT) that is causing this?
One PDF is not all PDFs and they come from a variety of sources / PDF creation tools / etc. And yes, Tahoe is buggy so that could also come into play, but I would start with the PDFs.
I am getting very variable results when trying to highlight text in PDFs.
You need to examine a series of problematic PDFs and determine any commonalities, e.g., they’re all from a particular website, the Properties info inspector in DEVONthink shows the same Creator or Producer.
Also, see if you see the same behavior in Preview.
The PDFs behave in the same way in both DT and Preview (Preview has trouble highlighting too).
Most of the problematic PDFs are Docx files that have been saved from within MS Word as PDFs (using save as). This is a problem for me as most files get sent to me as Docx and I convert to PDF so I can mark them up. I have a script within DT (posted on this forum) that opens the Docx in word and does the save as and moves the original docx to the database trash.
It does appear to be a Tahoe issue (internet search has lots of complaints about it). I wondered if DT had any work arounds?
Use DEVONthink’s “Convert to PDF” function on the DOCX files you import into DEVONthink … I don’t know if Word is used, or something internal to DEVONthink, but give it a go
In Word, instead of “Save as..” try “Print to PDF”. I don’t know if a different algorithm is used in Word for these two methods, but maybe. Give it a go.
Try Apple Pages and/or the “free” LibreOffice to create PDFs. Surely other PDF algorithms used.
Many of the Docx are minutes or agendas that are in a table format (which I never understand why people do it as you spend more time fighting with tables in word than actual typing content ). The DT convert to PDF does not render these very well (they look nothing like the original Docx).
I will try you print from word to PDF idea and see what happens. Thanks.
Tables are complicated … and there seems to be a correlation between those who write minutes and tables and no use of Word styles in those minute documents. Been there done that.
Menu: File → Save As … then pick PDF as file format in the dialog box
Menu: File → Print … then pick “save as PDF …” on the button at bottom. Can also save to “Preview” and then save to PDF from there. Also on that button are other PDF-making options if installed on the machine, e.g. I have PDF Pen available, so I’d try that also.
I have no idea if different PDF generation algorithms are involved. But I’d try all if having these sorts of issues, I guess. There is no Menu: File → Export that I can see. I’d also give Pages and LibreOffice a go as they might use a different PDF generation algorithm.
Ah, my bad. Thanks for clearing that up!
Agreed, tables are complicated. (But even outside of tables, PDF text selection can sometimes behave unexpectedly.)
OK. I tried printing as PDF from MS word but get the same issues. Here is an example. No highlights are showing in the PDF itself. Annotations inspector shows two highlights but none of the highlighted text. If you click on one of the lines in the inspector it does go to the text in the document but with a light blue shading (not the colour of the highlight).
Sorry, can’t really help. I’ve not experienced debugging PDF annotation problems … If this was me with apparent issues with highlight PDFs, I’d drop back to highlighting the Word DOCX file. Word does a good job of highlighting. Then I’d get on with highlighting the backlog.
No worries. Oddly, if I open the PDF and annotate in Adobe Reader, it all shows correctly in DT. I can use that for now, but would prefer to use DT itself. It seems to be a Tahoe and PDFkit issue (I think) as highlighting in Preview doesn’t work either.