Where is the OCR'd text?

tinksquared · March 20, 2024, 10:34am

This is probably a ridiculous question. When I import a .pdf into DT and it runs OCR on that, I see it listed under “Kind” as PDF+Text.

I get that the text is now searchable. I use this feature a lot. But is there a text copy now, of the .pdf somewhere?

cgrunenberg · March 20, 2024, 10:46am

The result of the OCR operation is added as an invisible text layer to the PDF document. E.g. the Concordance inspector lists the indexed words of the document and a conversion to plain text uses this layer too.

tinksquared · March 20, 2024, 10:58am

Ah, okay thank you. I figured it was a silly question

BLUEFROG · March 20, 2024, 11:42am

Not silly at all In fact, you can even OCR a document to some other formats. See the Data > OCR submenu. However, creating a searchable PDF is the most common use.

tinksquared · March 20, 2024, 11:48am

Much appreciated. DevonThink has changed my life for genealogy research! I brag on it in my genealogy groups a lot!

amalis · March 21, 2024, 3:05pm

You can also convert an OCR’ed PDF to a text file, see Data->Convert->To Plain Text. This will create a .txt file in addition to the PDF file.

slipkid · March 26, 2024, 2:48am

Me, too!