I have some old books now falling to bits: what is the best way to digitalize them into DEVONthink 3

Most of these books I only need a few pages from in fact, sometimes only two or three from an article. So I will very rarely or never need to digitalize a whole book. I have looked at Adobe app, but I am sure there are as good or better and slightly cheaper things? I don’t think the Adobe is overpriced to make clear, but…
This is the most obvious place to ask about this.

Mostly they are collections of papers from journals. If it is any help, most are from the “Oxford Readings in Philosophy” series. Some are 50 years old and I still need the papers in them!
In passing, there was a point when paperbacks seemed to stop falling apart, I think they changed the formula for the glue, a lot of mine seem prior to that and are also yellow and brittle already (already after 50 years I mean :wink: )

vFlat has been praised here

2 Likes

Hi. It sounds like you need to scan to get to the PDF stage, but if (for one reason or another) you have something scanned already into a PDF, you can make it searchable (do OCR on it) in DT (it is packaged with the product).

Every day I am cutting off spines with a guillotine-style paper cutter and scanning old books and journal articles into PDF format. Some of them would be useless even for reading copies, because they are in such a terrible state, but as PDFs, they get a new lease on life. In fact, I prefer to buy old copies of books in terrible condition (extremely inexpensive) just to digitize them.

I recommend Fujitsu’s ScanSnap series. For regular scanning (in my case, hundreds of pages a week), the ix1600 is well worth the money. There are other high-quality scanners from other companies (Epson, for example), but Fujitsu’s software + hardware + price has been the best combination for me. Files are automatically made searchable with OCR by Adobe, which comes bundled with the product. For hopelessly crumbling materials that won’t survive a pass through the ix1600 scanner (100-year-old journals are fine, but some paperbacks from just a few decades ago might as well be crumbling stale bread—poor quality paper and binding), items you don’t want to deconstruct by cutting off the spines, or larger items that won’t fit into the scanner, Fujitsu’s SV600 is wonderful, because it works by capturing an image from overhead.

If it really is just a handful of pages, one inexpensive option is an iPhone using the ScannerPro app, which also performs OCR and exports files as PDFs or images. The iPhone cameras are pretty good, especially in the right lighting with a steady hand or device (less than ten dollars these days) to hold it for you. I also use this app on a daily basis for my handwritten notes and newspapers. The developers are Ukrainian and nice folks, so I like to think / hope that some of the money is helping someone over there. I’ve been working with their products for many users (sometimes as a beta tester) and I can say without reservation that they have been amazingly high quality.

To sum up—no need to subscribe to Adobe. Take that money (actually, a lot less than you’d pay for 2 or 3 years of Adobe) and get high quality hardware + software that you can use for decades.

As a final note, for older items that are not subject to copyright law restrictions, many things can be found online already in PDF form. There are digital treasure troves out there from legitimate sources. Even with copyright, authors and publishers sometimes make new items available immediately online for free. The Internet Archive ought to be your first stop. It has some items from the Oxford series, though Singer’s edited volume (the one I have on hand at home) doesn’t appear to be there, for example, so maybe it isn’t complete online.

3 Likes

Thanks for such a detailed reply. Second recommendation already for ScannerPro which seems just right. The troves online… we could talk for hours! :grinning:
As I said to another response, once you start with this kind of thing though, well off one goes. I am at the limit of my hard book space in all honesty. I don’t want to get into a digitalizing project but I might start to nibble at the book piles.

If you’re ok with destroying the book, you’ll likely get the best quality image if you guillotine the binder and run the pages through a scanner. The smaller Fujitsu ScanSnap scanners are great, as they scan both sides of the page simultaneously and are pretty quick.

If you can’t destroy the book, overhead scanning is a bit of a pain. Skewing is a real problem. You could try the Adobe Scan app on iPhone on a raised mount, angled desk lamp (so the lighting is uniform) and a thick perspex sheet (to keep the page flat). You may still get skewed text if the phone camera isn’t positioned correctly.

There’s some videos online by people who have more time and DIY skill who have created som impressive setups.

1 Like

I am a big fan of Scanner Pro, as others have also mentioned.

For lengthier scans, I use the lower-end book scanner from https://www.czur.com

2 Likes

I had no idea how much one could do with DEVONthink 3 regarding converting and merging files. So I think that free app, which works fine will meet my needs. I am only likely to need a few pages at a time and it is handy to have them on one document and it seems I can do that easily using the scans taken on vFlat and merging and OCR ing them on DEVONthink 3 .

1 Like

No-one’s mentioned that iOS can take scans directly with no need for a third party app. It’s serviceable for life generally, though I don’t know about the requirements of scanning historical documents so if you want to preserve things other than words it might not be so good.

Anyway to use your iPhone’s scanner mode, open Apple Notes and start a new note. Click the plus button (this seems to have appeared recently on the latest iOS?) and then click the image button to insert an image. A little menu will open, and you want the third option, scan document. It’s fairly self-explanatory. It will process some skewing, and it will try and adapt to low light levels. However it’s far better to take scans in a well-lit room if you can. (You can also use the iphone’s flash, although it does tend to bleach out pages sometimes.) I also try to do it on a surface that contrasts the papers if I can, so that Apple can find the paper edges easily (it’s not really necessary, but if you try to scan one page that’s on a pile of messy papers, it will struggle!).

You can add lots of pages to a single scan.

When the scan is finished, you have a note that has a pdf file in it. Apple’s already cleaned the image as best it can and performed OCR. All that’s left is moving it to your preferred location. I do that on my Mac, I just open the note and “right click” on the PDF, then save to DT. You can do it from iOS. That’s just a personal preference I have.

I do all my scanning this way nowadays.

4 Likes

and performed OCR

And how have you verified this, especially across devices?

I use the scanner feature with my iPad via the Files app
(How to Scan Documents in the iOS Files App - MacRumors)
Stores in pdf format, directly to my DTTG inbox

OCR is added by Devonthink

1 Like

I tested it in Preview ages ago when I wondered how it works before I switched to scanning like this. Nowadays I make sure the text is searchable when I import to DT (I also have a smart group that finds any PDFs with <200 words just in case, although that’s more of a safeguard against poor online PDFs).

Huh, there was me revealing my secret trick, and I didn’t even know Apple had a second “secret” way of doing it! Thank you for sharing. This seems quicker than doing it via Notes (although Notes is handy sometimes if you need to add a couple of notes to the file for action at the same time).

For somebody interested in the use of this feature(s). It is useful to me beyond measure because I can now take the contents page from a volume of collected papers and DEVONthink 3 store it and it will be searchable. Big deal for me with a lot of old volumes.
The problem with paper journals and Books is knowing what is in them, especially collections. I have spend some time looking for papers that I actually had on my shelves, in a volume that wasn’t intutive for it: that is a common problem for me. I forget even if I know a paper well from a journal in the old days, that I have it in a collected volume.
So I just take the contents page, one scan and if I find it in a search I know which book it is always by a kind of Gestalt. Hope that is useful for somebody, thanks for the time and trouble you all spend in my regard here.

I only really investigated the OCR recently, there is a lot on DEVONthink 3 that I am doing without fully understanding it, and that is fine with me: nice sometimes to get under the hood a bit though. This did for me. I have to say it has added another dimension for me.
I am fine with the free vFlat, and I assume the OCR takes place when I put it into DEVONthink 3 ? Have I got that right? Or do scan apps routinely OCR?

I assume the OCR takes place when I put it into DEVONthink 3 ?

No but there are mechanisms to make that happen.

However, the vFlat app you mentioned appears to have OCR built-in.

1 Like

I too use ScanSnap, and did just that, scanning book pages (book was disassembled). Was very easy, and the scanner (ix500) took lots of pages at a time without any glitches.

I can only agree. I run my paper mail through my Scansnap, send it to DT3.
If I would start a second process for books which I could disassemble it would just add complexity. So I would strongly recommend keep it simple. One process is enough.

BTW, ScanSnap uses ABBYY OCR software built in, not Adobe.

I use the ix500 with ScanSnap and it is very good at batch processing papers and books. It does mean that you have to cut up a book to get separate pages - less traumatic if the book is an old paperback with yellow/brown pages that are starting to self destruct (non acid-fast paper).
MacSparky did report that he used a power saw to remove the spines of books!

For smaller numbers of pages or to scan a book that you wish not to destroy, then Readdle Scanner Pro is excellent and well worth the small cost. It will deskew and OCR. Best used in a cold northern daylight to reduce colour casts, reflections and shadows. Best practice for thinner papers is to place a black sheet under the page being scanned to prevent the writing on the other side of the page bleeding through.
‘Professional’ scans by Google etc do not seem to do this, as you can see bleed through of the writing on the other side of the page being scanned. NB this is not a problem if you are scanning to Word and just saving the text (on the way to creating an ePub). Scanning to pdf with ocr just means saving an image of the page and adding the ocr’d text as a separate layer.

Love this thread. I have taken books to FedEx to have the spines cut—That costs money, of course. I would like to scan many of the books in my office library so where does one find a guillotine cutter mentioned here. I cannot live without out my ScanSnap. I have the ix1600. I am also interested in the book scanner for books that I don’t ‘behead’. I looked at the czur.com site. Which specific one do you use, GordonMyer?
For scanning apps, I have had VERY good luck with Genius Scan although I own several others.

1 Like

Might be of interest

Personally I use a Scansnap 600, which has a similar formfactor, but i’s decurving algorithm is not very fast.

1 Like

Hi. The paper cutters I use looks something like this item from amazon.

Heavy Duty Paper Cutter,17 inch Guillotine Paper Cutter (one made to cut 500+ sheets at a time)

I don’t think it matters much who the company / seller is, because I have ordered a couple over the years and they have both been high-quality items made in China. Mine were less expensive, I think, but that was pre-pandemic days without supply-chain issues and runaway inflation. Even at this price, they’d be well worth it. I have digitized many thousands of books and journals over the years, and the guillotine + Scansnap combination has made the process considerably less onerous. I can usually go through about 20 books a day / 6000+ pages without much trouble. I work on the computer while pages feed through the scanner, add pages when it gets low, and name the file when it is finished. OCR happens automatically. The scanner is located under my computer (on a stand—I work standing up), so I don’t even have to step away from the computer. I often feed papers in before and after I get back from classes or other things I have to do outside the office, so it scans even when I am away.

3 Likes