PDF Highlighting issues

Cassady · September 20, 2015, 1:43pm

Hello all,

Support Ticket(?) – or maybe someone has some suggestions.

Sitting with the same issue spread out over several PDFs now. They were originally photocopied, then scanned, and imported into DTPO. I converted them to searchable inside DTPO.

The quality is not great – being old – but good enough for it to still be possible to select individual text lines.

Problem:>>

In a nutshell, my highlights inside DTPO aren’t sticking. In 3-pane mode, I will sometimes annotate inside the bottom view window, highlighting the text. I hit Cmd-S, and move onto the next pdf. When I return to the initial pdf, it’s blank again (i.e. – not highlights).

At first I though it might be due to my annotating in 3-pane – but when I double-click to open the PDF into a separate window, it behaves even more strangely. After annotating, when I click on SAVE, all appears fine.
But as soon as I try and close the PDF, the dialogue “Do you Want to Save the changes made to this document” pops up. If I again click on Save, the popup disappears, but as soon as I try and close the pdf, the process repeats.

So the only way I can close the PDF, is buy clicking “Don’t Save” – which obviously drops the annotations again.

Here’s the strange bit – If I open the same PDF in Preview, and annotate/highlight there – when I hit save, close and return to DTPO – the changes have stuck… So it appears to only be the internal highlighting that is the cause of the problem…

Any suggestions about what I might do to try and narrow down the issue?

korm · September 20, 2015, 3:32pm

Do your highlighting in Preview?

The problem with OCR not-so-good scans of text is that the OCR text layer can get garbled up with lots of character artifacts. You’re not highlighting the scanned image, you’re highlighting the not-visible text layer. I have no idea why PDF Kit in DEVONthink does things differently than Preview, but sometimes it’s just necessary to find the PDF tool that works for the job and use it – since everyone’s tool has tweaks and quirks that are nearly impossible to reconcile with one another.

Try selecting one of your problem-child PDFs and conver to text – to force DEVONthink to make a copy of the text layer. See how good or bad the quality really is by doing that.

Cassady · September 20, 2015, 3:57pm

Thanks for the suggestion – will give the conversion a go, and see what happens.

[EDIT]: Comes up blank…

So be it – will use Preview.

korm · September 20, 2015, 4:24pm

Well … didn’t expect that

Are you sure this is a PDF+Text document?

The quality must really be bad. If DEVONthink’s conversion is coming up empty you’re probably just adding bloat to the file by processing them. Save a little space maybe and skip the conversion step?

Cassady · September 20, 2015, 7:35pm

That’s what’s throwing me. They’re not that bad – I have worse, that don’t have problems. And yes, definitely PDF+Text.

Here be a snapshot, with the individual lines selected:

OCR example.png

No going back now – batch processed a whole bunch of them – never had issues before, so OCRed them inside DTPO without thinking much of it. It’s probably effected less than 10 so far – but I’m now getting to the point where I check first, before doing a complete annotation. Quite annoying doing 30 pages of annotations, only to have them disappear!