What is the best way to scan and store articles?

lunedi_hax · January 4, 2013, 2:36pm

I’m scanning a lot of cuttings from magazines but a page can include parts of other irrelevant articles. What’s the best way to manage this? Can I cut our the irrelevant parts on the pdf image? Or can I edit the OCR text? (Or both).

korm · January 4, 2013, 2:44pm

You can open a PDF in Preview (or Skim, Acrobat, etc.) and crop it. This way only the desired text area will appear. Cropping doesn’t actually delete sections of PDFs. It’s more like hiding what you don’t want to see.

lunedi_hax · January 4, 2013, 2:55pm

Is there a way to edit the ocr text layer?

korm · January 4, 2013, 3:03pm

Sure. OCR then crop, or crop then OCR.

lunedi_hax · January 4, 2013, 5:31pm

When you do a scan and immediate OCR, Devonthink stores the file as “pdf+text”. However, I couldn’t find how to look at the text file it has stored, just the pdf.

Bill_DeVille · January 4, 2013, 5:33pm

Choose Data > Convert > to (plain or rich) text. This will create a new text document that contains the text resulting from OCR.

lunedi_hax · January 4, 2013, 5:37pm

OK! Thanks