Three Cheers for the free vFlat app for de-warping iPhone photos of book pages

One of my longer term projects, has been to digitize a bunch of out-of-print history and genealogy books on my shelves. I’ve been putting this off after doing a lot of research on HOW to do this, because I don’t own a scanner, and didn’t want to mutilate the books themselves. I also had fingers crossed that a decades-old Open Source desktop app for the PC, called Scan Tailor, would become accessible for Mac users (that did happen, very recently).

I just completed my first couple of books, using a table clamp mount for an iPhone (I bought this one).

My issue, as expected, was in de-warping pages at the beginning and end of the book. I’m a bit of a perfectionist and tried, first, to manually de-warp with Photoshop. This works, but is painstakingly time-consuming.

I then tried a free app, much raved in a few online forums devoted to book scanning. HOLY SMOKES! For a free app, this one takes the cake. The app not only cropped each page, but deskewed and, above all, de-warped the necessary pages with mind-blowing results. Better than I was able to do manually in Photoshop.

My workflow is to:

  1. Take pictures of each page with the three-second timer mode, on iPhone. This gives me time to use my hands to flatten each page.
  2. Launch vFlat for the initial cropping and de-skewing. I set the output to full color.
  3. In Photos app, on my laptop, I then export all this stuff without modification (export as original).
  4. In Lightroom, I do a batch conversion to B&W, change contrast, shadow etc.
  5. Export from Lightroom as 300 DPI TIFF images, to a folder on the Desktop.
  6. Import these images into Scan Tailor Advanced, which runs on Mac and has many uses for everyday OCRing, too.
  7. Begin Scan Tailor at the content discovery phase, then do the remaining phases (set margins, and finally, the output iteration).

Scan Tailor outputs the lot, magically, as tiny files for each page. It converts the background to pure white, makes my fingers disappear, blackens and smoothens text for optimal OCR, etc.

Finally, I bring all those tiny TIFF files into DT to (a) convert to PDF, (b) merge to single PDF, then, finally, OCR at 300 DPI resolution, which I set in DT preferences.

I’m very happy with the result. Worst part of this process, was standing at the side of my table. Consider instead setting this contraption up on a coffee table, so you can sit while doing all the picture taking. Otherwise, be ready for neck pain!

2 Likes

Very nice!
Thanks for sharing, especially as I have a support ticket asking for something similar :smiley:

1 Like

To clarify: Scan Tailor does not provide an OCR function itself.

As a side note, I do more or less the same with old books, but as I’m lazy and not too much of a perfectionist, I do it in another way.

With scanner:

  • Scan the pages: with an OptiBook 3600 with a hacked driver to avoid delays under a Windows XP VM
  • Batch rename/reorder pages, as sometimes I only do odd or even for speed.
  • Drop the images into Abbyy Finereader 15 under Windows and let it “do the magic” of improve images, rotation, page division.
  • Correct some not-so-magic changes Abbyy has done.
  • Generate a PDF with MRC compression and desired image quality.
    (Yes, all of this is Windows-only, and I’ve not found an equivalent way to do it with a Mac).

If a book is too old, or it is too damaged, you can end up with a destroyed book after scanning. In that case I use iPhone:

  • Use ScannerPro (Readdle). This application has been superior in page cropping, removing curved pages and so on. For two years or so, it has been a piece of crap but sometimes still useful. Crashes and crashes and other issues.
  • Export a full image quality PDF.
  • Continue in windows with Abbyy Finereader 15 PDF editor.

However, I’m awaiting [this preciosity] (ET24 Pro - Incomparable Professional Book Scanner | Indiegogo). Currently it is stopped at Customs. And I think ti will change my entire scanning flow and force me to re-scan yet scanned books. Or be a 400 euro piece of garbage. Time will say.

That looks pretty amazing! Enjoy, and perhaps drop some feedback when you have it up and running :slight_smile:

2 Likes

I’m using the ET24 Pro scanner a couple of days, and I think it will be my definitive scanner (until they release a new one with more DPI). It can do 320 DPI now, but has a 350 option and a custom one until 7000 I’ve not used.

My current flow with this scanner is:

  • Run the CZUR software (in macOS, yes, macOS version works faster and more reliable than Windows one in same iMac). Be careful because it seems does not work in Silicon.
  • Scan both covers with auto-crop or manual crop in scan interface.
  • Select the dual curved page option, take the finger cots (yellow thingies), and start scanning with the pedal like a pro. It is fast: next page, handle pages with cots, pedal, next page, handle pages with cots, pedal… No delays waiting for images to be processed or sent from scanner to program, etc…
  • Close scanner interface, go back to the main program interface.
  • Batch process images, cropping the excess of left and right sides generated by the book thickness.
  • Batch color image optimizing.
  • Generate the PDF with OCR and MRC.

A 400 pages book takes about 1/2 hour all process if the pages are well glued, and it opens more or less well. The curve of the pages is automagically removed even if the central curve is very pronounced.

And crop processing is like magic. There are two options with sub-options, but one option allows to remove the thickness excess at left and right sides caused by the different book thickness as you go from start to end of the book, and the other to simply crop or white fill.

The idea inside the program cropping is enormously powerful. You have a crop baseline that is general for all pages and a special crop-line for each page. In preview mode, you can manually adjust both lines and the program is intelligent enough to know if a page size is not enough. You can crop from side to side, or crop from image center… and preview all before doing it, all on the same screen. I’m in love with it.

Generated PDFs are very thin if you select mid image quality and “process images” option, which is a synonym for MRC compression and rotation compensation. You can generate a well facsimile PDF without excessive weight selecting no image processing and 95% of image quality, which allows enable the JPEG compression routines without eye-quality losing. You don’t need professional quality PDF processing if you don’t need sub-pixel MRC optimization, mess with detected OCR areas inside pages or dictionary correction.

I have in mind to write some blog posts (in Spanish) and even do some Twitchs or YT videos showing the scan process. Next month or even further.

4 Likes