I’ve gotten up and running with DEVONthink Pro on the Mac in recent weeks and overall things are working great. However, I’ve run into two issues that’s prompting an adjustment that now appears to be causing a new issue. Here is what’s happening:
I get PDFs to review and annotate (almost entirely I’m doing highlighting) from lots of different sources for my work. I started noticing two issues with a few of the PDFs that I’ve imported into DEVONthink on the Mac:
On a few PDFs, the annotations would come through (i.e. highlights show up in all the right places) but the actual text that shows up in the annotation window (or if I attempt to copy/paste the text from the PDF) is gibberish characters – despite appearing correct on screen inside DEVONthink and the PDF file itself.
In some of these cases, I’m also unable to edit the author and title fields under “properties” for the PDF document in DEVONthink.
I saw a reference to at least one of the issues above on the forum here and the advice was to re-do OCR (even though they were OCR’d previously by another source) inside of DEVONthink. So, I did this.
Great news – in all cases, it resolves both problems above.
Bad news – in all cases, it appears to create a new problem.
The newly OCR’d PDF text now appears slightly fuzzy (i.e. the quality is noticeably reduced on the text). Wouldn’t be a huge issue if I was just using for archive reasons, but between the amount of reading I do on iPad and 40+ year old eyes, it’s a noticeable problem. =)
I have 300 dpi selected in my settings for OCR and have attempted OCR with both compression enabled and not enabled. Both appear equally fuzzy after the DEVONthink OCR (see attached examples attached). File size of the original PDF is 2.3 MB and the newly OCR’d with compression is 162.4 MB and the one without compression is 108 MB (also odd that the uncompressed version is smaller).
Guessing I’m missing something obvious since I don’t see any references here to anybody else having quality issues after OCRing inside DEVONthink. Suggestions on what to try?
Related issue – assuming we get the fuzzy issue addressed, is there a way to do this without massively increasing file sizes?
The tough variable here is that I get PDFs from all different sources that I don’t have control over. Almost always these are computer generated (i.e. not scanned) but the software that did the original OCR/creation could be anything since I work with lots of different stakeholders.
Thank you in advance for any insight!