Organizing docs when you scan a bunch of papers

akulbe · April 29, 2019, 3:22pm

Perhaps my approach is not the best, and I’m certainly open to suggestion.

I am buried under mountains of paper and want to scan-and-shred a bunch of stuff.

In at least two batches, I’ve done just that. I’d like to better split things out after the fact.

My input method is ScanSnap DEVONthink 3 (D3). I also have PDFPenPro, with which to edit PDFs.

I know that stuff gets OCR’d when scanning right into D3. I know that PDFPenPro also has OCR. If I take the resulting PDF file from D3 to PDFPenPro, and split it out into multiple separate PDFs, can I still get OCR when I bring things back if it doesn’t go through the scanner?

I realize I’ve said a lot here. If anyone has suggestions/advice/experience/criticism, please share.

The reason I’ve done it this way, so far, is to get to see my desk again, and reclaim office space.

Zikade · April 29, 2019, 3:51pm

Don’t know if it helps, but here’s what I do:

scan the daily mail in one chunk
opening the file in DTPro after OCR and have the thumbnails in sight
select the pages in the thumbnails view, select the pages I want to have in a separate file and form the popup menu select “Cut” and press command-N afterwards
the selected pages get cut out of the file and a separate file with just the pages gets created, retaining its OCR layer.
repeat until finished.

akulbe · April 29, 2019, 4:22pm

So you can do it all from within DT? No need to split it out into a 3rd-party app, and bring it back in?

Zikade · April 29, 2019, 4:52pm

Absolutely. Otherwise I would have gone mad a long time ago. Well, more mad

FROBGOBLIN · April 30, 2019, 11:02am

I don’t have an answer to your specific question, because I don’t use PDFPen Pro. In my case: ScanSnap -> OCR with Adobe X Pro -> name the file -> Hazel sorts everything into the appropriate folders for me (I index my files), usually with rules based on the names of the files.

I probably scan about 1,000 pages a week. Some of it is lumped together as a single large “item” (last week I scanned a research journal with 200+ pages, for example). Other things are individual scans (a handwritten note for the day, for example). There’s a variety of stuff. I never mix “items” (scanning the journal together with my handwritten note). That would be bedlam.

The only time-consuming parts are feeding the scanner (it’s together with my computer, so I feed it while I work on other stuff) and giving everything a name: date + title + keywords (for Hazel). Most of the categories are already in place, so there is no thinking involved. It’s mainly the minor tedium of typing out the name. If I had a little less variety in what I scan, I could imagine just leaving the date stamp and having Hazel sort it all out according to the content. And, I could become a billionaire and pay folks to feed documents into the scanner. But, I don’t live in that ideal world yet.

At this moment, I don’t have a backlog, so what I am doing seems to be working OK for me (your mileage may vary). Hazel is a relatively new part of my workflow (I incorporated it a few years back), but everything else has been in place since about 2009, so I can vouch for its reliability over time. I should mention that a fair amount of stuff is random and doesn’t have a category, so it sits in an “unsorted” folder. I’m OK with this, but ultra-organized folks might find this to be unconscionable. Searching by Spotlight (I prefer HoudahSpot as a front-end) finds anything that I might need in there. In general, I only end up needing stuff from this folder a few times a year. I scan everything, so I know that whatever I am looking for is definitely somewhere on my HD.

Adobe Pro gets the job done (I think I bought it 9 years ago in the pre-document cloud days / non-subscription days). I use Japanese and Chinese a lot, though, so up until recently DT couldn’t help me out there (as far as I know). It looks like the version of ABBY they have now has support for Japanese (and maybe even Japanese+English at the same time). If so, I may switch to that. I’ll do a few test runs for accuracy and so forth before I jump into anything. I don’t like to try and “fix” a workflow that isn’t broken. There are some improvements to DT3 that may also make Hazel unnecessary for me, but I’ll have to test things out a lot more before I give up that magical app.

BLUEFROG · April 30, 2019, 1:57pm

Just a little heads-up, in case people didn’t notice: In the Content Inspector: Thumbnails, if you widen the inspector you can get two columns of thumbnails, just in case it’s helpful to anyone.

cgrunenberg · April 30, 2019, 2:17pm

Actually up to three.

BLUEFROG · April 30, 2019, 2:20pm

It must be a limitation of screen space. I can only widen to accommodate two on a 13" MacBook Pro. That includes hiding the sidebar for more space.

cgrunenberg · April 30, 2019, 2:22pm

27" inch iMac helps

BLUEFROG · April 30, 2019, 2:23pm

Must be nice. Show-off

akulbe · April 30, 2019, 6:44pm

This thread is both helpful and entertaining. Thank you, folks.