Best way to scan to .rtf?

I’m scanning a lot of research (documents and articles) into a new DTPO database, and would like them to be .rtf documents, not .pdfs.

I’m using a ScanSnap scanner - how can I have it send documents to DTPO as .rtf files?

I have limited experience with going beyond OCR’ed pdfs. Here is what I can say: I recently got the full version of ABBYY Finereader to accomplish some more involved OCR. In particular, I was interested in replacing the (image/text layer) method for certain documents with “text only”, i.e. the scanned image characters get replaced in the pdf by similar looking actual font characters (for the obvious reason that the file is vastly smaller). This works quite well for many documents. Then I saw that Finereader also offered saving as docx (and probably rtf as well). I figured that once the text (characters and words) and structure (paragraphs, sections, headlines) are properly recognized, it should be easy to dump that into a Word file. The ability to further re-format this document (e.g. changing globally the font size, to create a version that reads well on an iPhone and another on an iPadPro) makes this attractive. The results in terms of the formatting was very poor and I gave up on it. Writing to EPUB also was disappointing. Writing to “font only” pdf, on the other is often amazingly good.

This might work much better for you, who knows. But: the beauty of (image/text layer) is that even if there are numerous mistakes in the OCR, you always know that you have the original image to go by. At most, your text search can be faulty here and there, which is a lot more benign than whole sections being unreadable.

So be careful with going into production mode with any method that does not preserve the image layer, unless you anyway check and fix each document in detail at the time it gets scanned.

Edit: I see that you refer to “documents and articles”. Hence, there will be figures, there will be two-column content. The Mac version of rtf cannot handle this. For figures it would be rtfd. The latter is rather simplistic, and no one outside the Mac domain can look at them. I’d stay away from that.

In any case: If these articles are original literature, why do you want to make them editable? Which, I imagine, is the only purpose of leaving the rather robust pdf format? Putting annotations in?

You didn’t mention which ScanSnap, so I’ll assume you’re using the more recent ix500. ScanSnap always creates a PDF, it does not scan to rich text. As @gg378 explained, however, the ABBYY Fine Reader software included with your ScanSnap can cover the PDF to Word. Word files can be opened and saved as RTF – though that involves additional steps in Word.

I’d suggest an alternate way is to send those PDFs to DEVONthink and use Data > Convert > To Rich Text – a command that can be applied to multiple PDFs at the same time. The DEVONthink conversion does not replace or delete the existing PDF, which is a good safety. You never want to delete scanned PDFs unless you have access to the source paper doc and can rescan.

The resulting RTFs from a DEVONthink conversion can have widely varying quality depending on what’s in the PDF. For example, tables rarely come out right. Scanned prints of Excel documents look nothing like the original. If you have lots of tables or spreadsheets to convert, then look into ABBYY’'s conversion to Excel – or consider getting Acrobat.

To add to the great advice here, note that RTF - while common on the desktop - has far less support on iOS. Just something to consider.

I second Jim’s point that because the image layer of PDFs faithfully reflects the original paper document, it’s protection against OCR errors – whereas other document formats would inevitably present erroneous information to the viewer, as they would not retain the original appearance of the paper document. OCR errors can and do happen.

Do you want OCRed documents captured as RTF to contain “bloopers” resulting from text recognition errors? That could be serious for legal documents, receipts used at tax time, etc.

korm’s point that RTF conversion of scanned tables is not good should also be remembered.

I’m a big fan of rich text for my notes and Web captures. But for whatever reason, Apple’s support of rich text in iOS is poor. If you want to see your scanned documents in that environment, you may have problems.