Best Process and Apps to go from Scanned Notes -> Cleaned OCR Notes --> Devonthink -> PDF annotator on iPad

I have a massive cache of a professors scanned notes. I would like to be able to open on my Devonthink to Go 3, send them to goodnotes, highlight and extract highlights. Can the OCR technology in Devonthink create a readable document that I can open in goodnotes (or another app) for annotating? The original scans are “picture” PDFs. I may need to use another software to get it into a usable format. If so, what do you suggest? The text is mostly English with some Hebrew. I would love to get them in a clean enough format to be able to use an app like VoiceDream to listen to them…Any thoughts on a process to go from Scanned Notes → Cleaned OCR Notes → Devonthink storage → opened in PDF annotator on iPad?

Any thoughts or suggestions would be greatly appreciated!

Can the OCR technology in Devonthink create a readable document

Yes.

that I can open in goodnotes (or another app) for annotating?

Why in another app when DEVONthink To Go has PDF annotation tools built in?

The original scans are “picture” PDFs. I may need to use another software to get it into a usable format.

Raster PDFs are fine to process.

The text is mostly English with some Hebrew.

Hebrew is currently not supported.

I would love to get them in a clean enough format to be able to use an app like VoiceDream to listen to them…

OCR is never a 100% process but it may be close enough for government work :wink:

Any thoughts on a process to go from Scanned Notes → Cleaned OCR Notes → Devonthink storage → opened in PDF annotator on iPad?
I have a massive cache of a professors scanned notes.

Where is this cache?

The pdfs are already in Devonthink. Why not devonthink to go? I would like the highlights to be extracted like highlights on a kindle. After reading a book (in Kindle) I extract the highlights into a document. Perhaps Devonthink can do this. I’m a newbie so I’ve not discovered this, if it can.

You mentioned Raster PDFs are fine to process. How would I know if the scanned pdfs are Raster or not?

A “picture PDF” is often what people call them, i.e., a PDF of images not text. Raster PDF is the computer nerd / graphic arts term :wink:

Also, you can create a smart group with criteria of:
Kind is PDF/PS
Word Count is 0

This will identify PDFs with no words in them. While not guaranteed to be 100%, it should effectively be close to it.

You could then do a Data > OCR > To Searchable PDF on files in the results. However, I would not suggest queuing up thousands of files at once.

Also, check the Preferences > OCR for settings, assuming you’re doing to processing in DEVONthink.