Where is the OCR'd text?

This is probably a ridiculous question. When I import a .pdf into DT and it runs OCR on that, I see it listed under “Kind” as PDF+Text.

I get that the text is now searchable. I use this feature a lot. But is there a text copy now, of the .pdf somewhere?

The result of the OCR operation is added as an invisible text layer to the PDF document. E.g. the Concordance inspector lists the indexed words of the document and a conversion to plain text uses this layer too.

Ah, okay thank you. I figured it was a silly question :slight_smile:

Not silly at all :slight_smile: In fact, you can even OCR a document to some other formats. See the Data > OCR submenu. However, creating a searchable PDF is the most common use.

2 Likes

Much appreciated. DevonThink has changed my life for genealogy research! I brag on it in my genealogy groups a lot!

7 Likes

You can also convert an OCR’ed PDF to a text file, see Data->Convert->To Plain Text. This will create a .txt file in addition to the PDF file.

2 Likes

Me, too!

1 Like