I have been scanning some magazines to import into Devonthink as PDF’s. JPEG’s are smaller in file size, but i’ve had problems that some of my pdf’s are not viewable by the Mac’s Quartz engine. So I’m wondering if I should switch to the larger file size of TIFF’s. Anyone have a suggestion?
Out of curiosity, why don’t you just import them into DT as jpegs?
I sometimes scan journal articles or book pages with the intention of putting them into DEVONthink. In such cases, I want to be able to transfer the text content into DT.
I’ve got a Canon flatbed scanner with a plugin in Adobe Photoshop Elements 2 that lets me save scans as PDF (or other format if desired).
ReadIRIS 9 for OS X lets me open a series of page PDFs together and OCR them. I’ve had good luck generally with setting ReadIRIS output as PDF + Text, then importing the resulting PDF document into DT using Index import. (ReadIRIS also does a good job of making the text of most encrypted PDFs available to DEVONthink.)
Scanning followed by OCR takes time, but sometimes it’s worth it to me.
i also am converting to pdf so that the i can run ocr on the page. i’m using adobe acrobat to do it. i didn’t realize that i needed to import to index. better look at the manual again. can you elaborate? also, does anyone recommend TIFF over JPEG?
Does your Canon scanner (which model?) work with Hi-Speed USB? The LiDE 50 (which I have) didn’t, according to a comment on amazon.com, which was irrelevant with my G3 iBook. There’s been at least one driver update since then but I haven’t checked yet if the scanner uses HI-Speed USB on my new eMac.
I’m curious why you chose Readiris over OmniPage, if you did that comparison. My wife has interest in using OCR on some German documents.
 It’s been my preference to keep my large PDF and Word file collections outside of DT, and to import text or use index import rather than storing these files inside the DT database. That way, I can use DT’s search and context recognition features, yet not worry about the security of my reference files. This past week Safari crashed and locked up my computer while a test copy of DT Pro was optimizing and backing up my database. As a result, the database and its backup were damaged. Was that a disaster? No. I was able to recover everything – and the latest alpha of DT Pro now has features designed to prevent the worst case event that I experienced, corruption of both the database and its backup file.
Kudos to DEVONtechnologies. One of these days I may change old habits and start storing my PDFs entirely within DT. External backups are still recommended, of course. Note: the most recent alpha of DT Pro lets one backup a database to an external optical medium such as CD or DVD, yet be able to open the database from that medium! (Read only, of course.) That may be useful for distribution of information.
 TIFF files are generally better than JPEG files for OCR. I used to use Acrobat to do OCR, but switched over to OCR applications because they are faster and more accurate than Acrobat. These days I scan to PDF images, then run the PDF images through ReadIRIS 9 for OS X, with final output as PDF + text. If the original document being scanned has clean text, I don’t even bother to do spell checking during the OCR process; I’m getting very high OCR accuracy.
Your final statement convinced me to buy Readiris 9 ($60 for the “upgrade” from whatever I wasn’t using before ). I’m quite satisfied with the results so far, even with the funky UI. And scanning is faster but I’ve only tested text-heavy documents so I’m not sure how much faster.
After installing the scanner plugin I acquired documents directly into Readiris and saved them as PDF, which is useful when you don’t want to save an original scan of a document. The resulting PDFs can be imported to DEVONthink with searchable text. Or save/import text and RTF instead of PDF. Even save from Readiris to a folder with DT’s Action Import folder action script enabled. Very nice.
Thanks for the inspiration, Bill!
Some scanners (for example the HP OfficeJet over here) even provide the possibility to define “actions”, e.g. scan a page, start ReadIris, create a PDF with an invisible text layer, save the PDF and import the PDF afterwards in DEVONthink. And these actions can be used from the scanner panel without touching the computer - it’s pretty cool but unfortunately not useful for me (I don’t scan at all - I’ve only tried this once )
Buttons on the Canon can be assigned to launch applications so maybe with a bit of scripting they could achieve the kind of action behavior you described. In the past I haven’t done the type or amount of scanning where I’ve wanted it to be that automatic (my brother, on the other hand, has a document feeder on his scanner). But now that my system/scanner have sufficient speed I’m motivated to finally get started digitally archiving a bunch of random paperwork and finally discard the hard copy. I expect some streamlining of the process will likely happen over the duration of that project.
Thanks to Bill for suggesting ReadIrisPro9. I gave up on OCR a long time ago, as I think many did, and it’s probably about time I give it another shot.
I wrote IRIS, asking if they had an academic version or discount, and was informed they have a 15% discount for academic users, which brings the product to about $100usd.
Good thing I didn’t order, though, as today I found a $75 rebate for ReadIrisPro 9 at Amazon, which brings the total cost of the software down to $23 or so. I feel reticent to include the link here, as I don’t know if DEVON wants commercial links in their support forums, but the deal should be easy enough to find at Amazon’s site.
I notice that ReadIrisPro 9 will accept files from digital cameras. I’ve been using a number of cameras as makeshift copystands, which allow me to bundle images as a .pdf for reading. If I can then make those .pdfs text to import into DT, well, that opens up a whole universe of articles and book sections that I’d love to be able to search and reference.
Argh, the one time I don’t check Amazon of course there’s a good deal. :’(
Why include the link? Anyone who searches and doesn’t quickly find the product/coupon on Amazon probably isn’t savvy enough to use an OCR program.
We offer a limited 30-day money back guarantee on all products purchased through I.R.I.S. online shop available at the shop.irislink.com.
It’s worth writing them to find out how “limited” that is if it’ll save me $27 over what I paid…
[edit: It was a surprisingly pleasant experience getting a refund for my original Readiris purchase from Dan at I.R.I.S. (nice guy, professional service) and ordered it from Amazon. The offer is listed on MacInTouch which should give it good exposure.]