Organizing docs when you scan a bunch of papers

Perhaps my approach is not the best, and I’m certainly open to suggestion.

I am buried under mountains of paper and want to scan-and-shred a bunch of stuff.

In at least two batches, I’ve done just that. I’d like to better split things out after the fact.

My input method is ScanSnap :arrow_right: DEVONthink 3 (D3). I also have PDFPenPro, with which to edit PDFs.

I know that stuff gets OCR’d when scanning right into D3. I know that PDFPenPro also has OCR. If I take the resulting PDF file from D3 to PDFPenPro, and split it out into multiple separate PDFs, can I still get OCR when I bring things back if it doesn’t go through the scanner?

I realize I’ve said a lot here. If anyone has suggestions/advice/experience/criticism, please share.

The reason I’ve done it this way, so far, is to get to see my desk again, and reclaim office space.

Don’t know if it helps, but here’s what I do:

  1. scan the daily mail in one chunk
  2. opening the file in DTPro after OCR and have the thumbnails in sight
  3. select the pages in the thumbnails view, select the pages I want to have in a separate file and form the popup menu select “Cut” and press command-N afterwards
  4. the selected pages get cut out of the file and a separate file with just the pages gets created, retaining its OCR layer.
  5. repeat until finished.

So you can do it all from within DT? No need to split it out into a 3rd-party app, and bring it back in? :astonished:

Absolutely. Otherwise I would have gone mad a long time ago. Well, more mad :slight_smile:


I don’t have an answer to your specific question, because I don’t use PDFPen Pro. In my case: ScanSnap -> OCR with Adobe X Pro -> name the file -> Hazel sorts everything into the appropriate folders for me (I index my files), usually with rules based on the names of the files.

I probably scan about 1,000 pages a week. Some of it is lumped together as a single large “item” (last week I scanned a research journal with 200+ pages, for example). Other things are individual scans (a handwritten note for the day, for example). There’s a variety of stuff. I never mix “items” (scanning the journal together with my handwritten note). That would be bedlam.

The only time-consuming parts are feeding the scanner (it’s together with my computer, so I feed it while I work on other stuff) and giving everything a name: date + title + keywords (for Hazel). Most of the categories are already in place, so there is no thinking involved. It’s mainly the minor tedium of typing out the name. If I had a little less variety in what I scan, I could imagine just leaving the date stamp and having Hazel sort it all out according to the content. And, I could become a billionaire and pay folks to feed documents into the scanner. But, I don’t live in that ideal world yet.

At this moment, I don’t have a backlog, so what I am doing seems to be working OK for me (your mileage may vary). Hazel is a relatively new part of my workflow (I incorporated it a few years back), but everything else has been in place since about 2009, so I can vouch for its reliability over time. I should mention that a fair amount of stuff is random and doesn’t have a category, so it sits in an “unsorted” folder. I’m OK with this, but ultra-organized folks might find this to be unconscionable. Searching by Spotlight (I prefer HoudahSpot as a front-end) finds anything that I might need in there. In general, I only end up needing stuff from this folder a few times a year. I scan everything, so I know that whatever I am looking for is definitely somewhere on my HD.

Adobe Pro gets the job done (I think I bought it 9 years ago in the pre-document cloud days / non-subscription days). I use Japanese and Chinese a lot, though, so up until recently DT couldn’t help me out there (as far as I know). It looks like the version of ABBY they have now has support for Japanese (and maybe even Japanese+English at the same time). If so, I may switch to that. I’ll do a few test runs for accuracy and so forth before I jump into anything. I don’t like to try and “fix” a workflow that isn’t broken. There are some improvements to DT3 that may also make Hazel unnecessary for me, but I’ll have to test things out a lot more before I give up that magical app.

Just a little heads-up, in case people didn’t notice: In the Content Inspector: Thumbnails, if you widen the inspector you can get two columns of thumbnails, just in case it’s helpful to anyone.

1 Like

Actually up to three.

It must be a limitation of screen space. I can only widen to accommodate two on a 13" MacBook Pro. That includes hiding the sidebar for more space.

27" inch iMac helps :slight_smile:

Must be nice. Show-off :stuck_out_tongue:

This thread is both helpful and entertaining. Thank you, folks. :slight_smile: