Preparing to Scan Hundreds of Books, Magazines-Necessary HD

Hello, I’ve owned DTPO for a while now, but have only been tinkering with it and Devonthink To Go a little. This summer I finally have some time off, and I’m preparing to change my life around next year, move to a new country, and would really like to be more mobile, and able to fit in a cheap, tiny 1-room apartment again. I’m a newbie, so I apologize if this is posted in the wrong forum (I would have posted in the ‘Newbie’ section if there was one).

Basically, I’m preparing to scan about 500 books, 500 magazines, and thousands of pieces of paper. I’m not even worried yet about organizing the database for the perfect research project, I’ll only be using simple tags to cover a wide range of subjects.

I own DTPO, an S1500M Scanner, a 2009 Macbook that’s been getting really slow lately, was going to invest in an OtherWorldComputing 480 gb SSD Drive, but now am thinking about instead getting one of the brand new Macbook Air (256 gb of flash space) or Mac Minis (at least 500 gb). I will wait to get that soon before really beginning my Devonthink project.

To help me decide, I’m trying to figure out how much space I will need for all the scanned PDF files (aside from the obvious portability, processor speed issues). Let’s say: 500 books * 300 pages= 150,000 OCR’ed pdfs. And then magazines where I’d like to have the full contents, pictures kept as they are: 500 magazines * 100 pages=50,000 OCR’ed 8.5 by 11/A4-sized pages with pictures, etc. Can anyone give me an idea on what sort of space I will be needing before beginning such a task? I’ll do the calculations myself, but if anyone has a rough reference point for the stuff I’m asking about, I’d really appreciate it, I just have no idea of how much space I should set aside. I’ve seen comparisons between Adobe OCRing and ABBYY that give measurements like 1.5 MB or so for a page, but I don’t know if I should be expecting to use a lot less for my books than for my magazines, which is why I differentiated above.

I do own a couple of external HDs with 2 TB each or so, but IF the total contents of the above (along with other notes, papers, stuff, pictures saved from Aperture/iPhoto—maybe that’s a whole other story) become more than the 100 or 500 gb available to me in my new computer hard drive, for example, can I store parts of my Devonthink database on my external hard drive, all files over 50 mb for instance, or is this not generally done by Devonthink looking to access their data relatively quickly? Perhaps I’m misunderstanding some of the main principles of the Devonthink database method in even asking this though.

If it matters, I intend to read the scanned files later on with my current Kindle, or maybe a future iPad 3 or larger e-reader…I’ve been using calibre to make epub/mobi files, which is usually fine, but there are sometimes errors in the spelling that is picked up from the pdfs there.

I have not yet bought my cutter yet for the books, but will be doing that soon as well.

Anyway, thanks for any possible help.

I have been scanning magazines for some time. I just grabbed a bunch of the PDFs and they average about 334KB / page, for full-color letter-sized magazines. Part of the scanning process DTPO does resamples the image down to save space (controlled by the OCR->Resolution preference).

I haven’t scanned books yet, but you should definitely see smaller sizes if they are black and white.

If it gets too big, you can definitely index the files and store them on separate disks.

I’ve scanned a few books. Here are my numbers:

Book 1: 328 pages, 75.7MB = 236KB/page
Book 2: 211 pages, 48.4MB = 235KB/page
Book 3: 217 pages, 58.6MB = 277KB/page
Book 4: 130 pages, 31.5MB = 248KB/page

Assuming my math is right =)

Thanks for the reference points/benchmarks, guys. Looks like scanning a room full of books won’t be as bad on memory space as I thought, but I’ll have to clean up my hard drive if I want to fit it all on my computer and not just on external HDs—planning to get the new Macbook Air but I’m not sure it’ll be so easy (or at least not cheap) to get anything bigger than the 256 flash gb drive it comes with.

Will also be making .mobi or .epub files (I forget which) for my ereader, at least for the picture less books, so maybe one day I can delete pdf’s to save space if I’m counting my megabytes, but I certainly won’t do that until I can be sure that my conversion with Calibre or whatever is accurately capturing the contents.

Thanks again, but any other tips or stuff to watch out for (if necessary) would be appreciated.

Any recommendations on book cutters? Just recalled again that I need to buy that thing still!

Have a backup strategy and stick to it.

Korm, thanks for the tips, and more importantly, the principles, to stick by when dealing with this project. I’ll be backing things up, both in drives I already own in a couple different locations and also online.

About the OCR’ing, I have the Adobe Acrobat Pro 8 for Mac that came with my ScanSnap. If I take your advice to rely on this for OCR (which I probably will), would it be worthwhile to upgrade to version 10?