I’m scanning a lot of cuttings from magazines but a page can include parts of other irrelevant articles. What’s the best way to manage this? Can I cut our the irrelevant parts on the pdf image? Or can I edit the OCR text? (Or both).
You can open a PDF in Preview (or Skim, Acrobat, etc.) and crop it. This way only the desired text area will appear. Cropping doesn’t actually delete sections of PDFs. It’s more like hiding what you don’t want to see.
Is there a way to edit the ocr text layer?
Sure. OCR then crop, or crop then OCR.
When you do a scan and immediate OCR, Devonthink stores the file as “pdf+text”. However, I couldn’t find how to look at the text file it has stored, just the pdf.
Choose Data > Convert > to (plain or rich) text. This will create a new text document that contains the text resulting from OCR.