OCR to replace original

benoit.pointet · November 27, 2024, 9:54am

Hello!

I’d love to have the option that “OCR to searchable PDF” replaces the original item.

Use case: sometimes I already have taken reading notes / excerpts from a PDF and they point to it through wikilinks / URL field. And then I realized that the PDF textflow is crappy and I want to re-OCR.

Hope it’s clear enough.

chrillek · November 27, 2024, 11:23am

If you use Wikilinks, it’s a question of naming the PDF, I think. But if you use an x-devonthink-item link … not possible. Those use unique IDs, which are not recycled.

meowky · November 27, 2024, 11:59am

As a workaround, you can swap the data of the PDF files using a script.

-- This is a demonstrator. You need to modify this script for it to work.
tell application id "DNtp"
	set newData to (data of OCRedPDF)
	set (data of oldPDF) to newData
end tell

cgrunenberg · November 27, 2024, 12:23pm

You could create a smart rule with the condition Kind is PDF and the action OCR > Apply. Then either drag & drop your PDFs onto the smart rule or use Tools > Apply Rules > …

benoit.pointet · November 28, 2024, 7:13am

@cgrunenberg, to be sure: “OCR > Apply” in a smart rule does replace original and “Data > OCR to searchable PDF” does not?

cgrunenberg · November 28, 2024, 7:17am

Data > OCR > to searchable PDF is identical to the smart rule action OCR > to searchable PDF, both create a new document.

benoit.pointet · November 28, 2024, 8:49am

Ok, so there’s no way to “OCR in place”, except for going the applescript way and replacing data.

cgrunenberg · November 28, 2024, 9:01am

See above, just use the OCR > Apply smart rule action.