Starting a major scanning / paperless project; can DEVONthink reorganize docs?

jenmei · July 23, 2019, 8:38am

I have a lot of paper I want to digitize – several banker boxes worth. Some of this probably isn’t important, but I’m moving in a couple of days and really don’t want to put all these boxes back in storage. Scanning docs is reasonably quick (I have a Fujuitsu ix500), but I want to make sure I’m not making a bigger problem for myself down the road. My plan is to create a few large documents using continuous scanning (because it’s faster), and organize them later. Neither Preview nor ScanSnap Home seem to be really great at that, so I started looking for a better document management solution for Macs. I saw more positive posts about DEVONthink than anything else, which is what brought me here. I downloaded the app but it’s a lot … I don’t want to spend a lot of time learning it right now. I mostly want to know if I go ahead and scan a bunch of stuff in a kind of chaotic way whether it will be easy to make sense of it later using DEVONthink? E.g. scan a arbitrarily into large docs of hundreds of pages, then use DEVONthink to split up docs, reorder pages, add metadata, OCR, etc.? I couldn’t easily figure out how split docs, and when I did a search I saw someone say that it couldn’t be done (from this forum but years ago; someone said they use Acrobat to do this before bringing doc into DEVONthink).

Assuming DEVONthink works out, any recommendations on things to streamline the process? Or any advice in general for someone wanting to go paperless?

Also, when I was first looking for document managers, I saw people recommend DEVONthink Office Pro. Is there still an Office edition or is that merged into Pro?

Thanks!

Jen-Mei

cgrunenberg · July 24, 2019, 12:32pm

DEVONthink can perform all these steps but not automatically.

You can split PDF documents via the thumbnails inspector, see Tools > Inspectors > Content > Thumbnails, by right- or control-clicking on a page and using the contextual menu.

DEVONthink Pro Office 2.x isn’t sold anymore and the editions of version 3 are slightly different, see DEVONtechnologies | Upgrader's Guide

BLUEFROG · July 24, 2019, 3:46pm

My suggestion from a production standpoint, if you’re in a rush, I wouldn’t even bother with OCR in the process. Just get your scanning done with an external scanning app (like the app that comes with the scanner) at 200-300 dpi, making sure your local backups are ongoing, then process them with OCR, etc. after you’ve moved.

If you scan individual documents, even multi-page ones for docs with more than one page, it will likely be easier than scanning unrelated documents into several giant files.

Just a thought.

jenmei · July 25, 2019, 5:49am

Thanks @cgrunenberg for the tips on editing PDFs!

And thanks @BLUEFROG for suggestions on process. You’re right, the scanner software makes it pretty easy to scan separate docs. I just needed to spend some time figuring out how it works.

Got through a lot of docs today, but still have more to go. Will try to finish that before moving stuff into DEVONthink … Looking forward to exploring DEVONthink’s document management features.

BLUEFROG · July 25, 2019, 2:09pm

You’re welcome. Have fun and an uneventful move!

mog · July 25, 2019, 2:54pm

Having undertaken a similar task I can suggest from experience.
Scan the lot to pdf. Never mind OCR, that can be done later. Depending upon the scanner features, there are at least 2 things to watch out for. (1) If scanning duplex then ideally the scanner should be set to ignore blank pages. (2) the scanner memory: if the total file size of each large document exceeds the memory then chances are the scanning process won’t warn you in advance and the first you discover that is when the pdf won’t open/cannot be viewed. I found out the hard way: nothing more frustrating - probably there is but I don’t get frustrated very often so wouldn’t know - than having to scan the lot again, the second time fewer pages at a time.

I am not knowledgeable enough to be able to suggest how to check how much the process can cope with; I guess it depends upon the nature of the content: images, diagrams, etc occupy more space than text. All I can tell you is that every so often I have about 800 pages printed both sizes to scan. I have a high-speed duplex scanner whose memory is 2GB and I find that 800 pages duplex needs to be split into 4 lots of 200 or so.

Once you’ve a pdf that can be viewed, the next step is to look at the thumbnails of each page and manually delete all blank pages. Depending upon the viewer it may not be necessary to delete one page at a time: I use Acrobat Xi - I may be the someone that you mention - and can select as many blank pages as I can scroll to view and delete all in one go.

It’s not only blank pages that should be deleted, you may also find, as you skim through each thumbnail, that the content on other pages is not wanted in which case delete.

Having pruned the pdf, then OCR. DT3’s OCR engine speed is a vast improvement, in my opinion, on DTPO’s. Earlier this week, I OCR’d 140 pages (text and diagrams) in about 9 minutes. DT3 uses Abbyy Fine Reader for OCR. You could buy AFR for Mac direct from Abbyy and I think the scan is quicker but the price of AFR is not far short of the price of DT3 which means I suggest AFR is disproportionately too much for a OCR engine/reader but without the benefit of DT3.

Hens · July 25, 2019, 11:54pm

Maybe in the future we would be able to have a 10 second auto scan option

For mass scanning.

jenmei · July 30, 2019, 5:45am

Thanks for the tips, @mog! I haven’t had a document as long as 200 pages yet, but I’ll keep your advice in mind when I get to those (coming up soon). I haven’t gotten to the editing stage yet but might check out Acrobat if Preview doesn’t cut it. What does Acrobat do that Preview doesn’t that you find particularly handy?

mog · July 30, 2019, 7:17am

I don’t know the answer to you question because I rarely use Preview so I have little idea of its capabilities. My only use for P is for cropping and zooming images dragged from websites. For photo image manipulation otherwise I use Lightroom. Acrobat XI Pro i do not think is available any more, having been superceded by Acrobat DC. I use it for customised actions, watermarking and Bates numbers. Also for ocr - but I read that Abby Fine Reader for Mac is more accurate, although for the most part Acrobat doesn’t let me down. Acrobat also has measuring tools but having found them fiddly, I use PDF Studio for its measurement tools.

Regarding 200 page documents, I was referring to the number of individual items to scan at a time, as in a batch of items.