Three Cheers for the free vFlat app for de-warping iPhone photos of book pages

Ryan_N · August 29, 2022, 7:19pm

One of my longer term projects, has been to digitize a bunch of out-of-print history and genealogy books on my shelves. I’ve been putting this off after doing a lot of research on HOW to do this, because I don’t own a scanner, and didn’t want to mutilate the books themselves. I also had fingers crossed that a decades-old Open Source desktop app for the PC, called Scan Tailor, would become accessible for Mac users (that did happen, very recently).

I just completed my first couple of books, using a table clamp mount for an iPhone (I bought this one).

My issue, as expected, was in de-warping pages at the beginning and end of the book. I’m a bit of a perfectionist and tried, first, to manually de-warp with Photoshop. This works, but is painstakingly time-consuming.

I then tried a free app, much raved in a few online forums devoted to book scanning. HOLY SMOKES! For a free app, this one takes the cake. The app not only cropped each page, but deskewed and, above all, de-warped the necessary pages with mind-blowing results. Better than I was able to do manually in Photoshop.

My workflow is to:

Take pictures of each page with the three-second timer mode, on iPhone. This gives me time to use my hands to flatten each page.
Launch vFlat for the initial cropping and de-skewing. I set the output to full color.
In Photos app, on my laptop, I then export all this stuff without modification (export as original).
In Lightroom, I do a batch conversion to B&W, change contrast, shadow etc.
Export from Lightroom as 300 DPI TIFF images, to a folder on the Desktop.
Import these images into Scan Tailor Advanced, which runs on Mac and has many uses for everyday OCRing, too.
Begin Scan Tailor at the content discovery phase, then do the remaining phases (set margins, and finally, the output iteration).

Scan Tailor outputs the lot, magically, as tiny files for each page. It converts the background to pure white, makes my fingers disappear, blackens and smoothens text for optimal OCR, etc.

Finally, I bring all those tiny TIFF files into DT to (a) convert to PDF, (b) merge to single PDF, then, finally, OCR at 300 DPI resolution, which I set in DT preferences.

I’m very happy with the result. Worst part of this process, was standing at the side of my table. Consider instead setting this contraption up on a coffee table, so you can sit while doing all the picture taking. Otherwise, be ready for neck pain!

BLUEFROG · August 29, 2022, 8:14pm

Very nice!
Thanks for sharing, especially as I have a support ticket asking for something similar

chrillek · August 30, 2022, 6:54am

To clarify: Scan Tailor does not provide an OCR function itself.

rfog · August 30, 2022, 9:59am

As a side note, I do more or less the same with old books, but as I’m lazy and not too much of a perfectionist, I do it in another way.

With scanner:

Scan the pages: with an OptiBook 3600 with a hacked driver to avoid delays under a Windows XP VM
Batch rename/reorder pages, as sometimes I only do odd or even for speed.
Drop the images into Abbyy Finereader 15 under Windows and let it “do the magic” of improve images, rotation, page division.
Correct some not-so-magic changes Abbyy has done.
Generate a PDF with MRC compression and desired image quality.
(Yes, all of this is Windows-only, and I’ve not found an equivalent way to do it with a Mac).

If a book is too old, or it is too damaged, you can end up with a destroyed book after scanning. In that case I use iPhone:

Use ScannerPro (Readdle). This application has been superior in page cropping, removing curved pages and so on. For two years or so, it has been a piece of crap but sometimes still useful. Crashes and crashes and other issues.
Export a full image quality PDF.
Continue in windows with Abbyy Finereader 15 PDF editor.

However, I’m awaiting [this preciosity] (ET24 Pro - Incomparable Professional Book Scanner | Indiegogo). Currently it is stopped at Customs. And I think ti will change my entire scanning flow and force me to re-scan yet scanned books. Or be a 400 euro piece of garbage. Time will say.

Blanc · September 2, 2022, 7:36am

That looks pretty amazing! Enjoy, and perhaps drop some feedback when you have it up and running

rfog · September 14, 2022, 10:30am

I’m using the ET24 Pro scanner a couple of days, and I think it will be my definitive scanner (until they release a new one with more DPI). It can do 320 DPI now, but has a 350 option and a custom one until 7000 I’ve not used.

My current flow with this scanner is:

Run the CZUR software (in macOS, yes, macOS version works faster and more reliable than Windows one in same iMac). Be careful because it seems does not work in Silicon.
Scan both covers with auto-crop or manual crop in scan interface.
Select the dual curved page option, take the finger cots (yellow thingies), and start scanning with the pedal like a pro. It is fast: next page, handle pages with cots, pedal, next page, handle pages with cots, pedal… No delays waiting for images to be processed or sent from scanner to program, etc…
Close scanner interface, go back to the main program interface.
Batch process images, cropping the excess of left and right sides generated by the book thickness.
Batch color image optimizing.
Generate the PDF with OCR and MRC.

A 400 pages book takes about 1/2 hour all process if the pages are well glued, and it opens more or less well. The curve of the pages is automagically removed even if the central curve is very pronounced.

And crop processing is like magic. There are two options with sub-options, but one option allows to remove the thickness excess at left and right sides caused by the different book thickness as you go from start to end of the book, and the other to simply crop or white fill.

The idea inside the program cropping is enormously powerful. You have a crop baseline that is general for all pages and a special crop-line for each page. In preview mode, you can manually adjust both lines and the program is intelligent enough to know if a page size is not enough. You can crop from side to side, or crop from image center… and preview all before doing it, all on the same screen. I’m in love with it.

Generated PDFs are very thin if you select mid image quality and “process images” option, which is a synonym for MRC compression and rotation compensation. You can generate a well facsimile PDF without excessive weight selecting no image processing and 95% of image quality, which allows enable the JPEG compression routines without eye-quality losing. You don’t need professional quality PDF processing if you don’t need sub-pixel MRC optimization, mess with detected OCR areas inside pages or dictionary correction.

I have in mind to write some blog posts (in Spanish) and even do some Twitchs or YT videos showing the scan process. Next month or even further.

Ryan_N · September 27, 2022, 2:56pm

Thank you for posting this!! I’ve been eyeing this Chinese pile of plastic for quite a while now. How well does it flatten/dewarp? The marketing images for this scanner seem to suggest it somehow measures length from the overhead lens to the page surface, with a laser. If true–and it then somehow algorithmically makes distorted text near a tight binding look as though the page was actually cut from the book and scanned on a flatbed scanner, I think I’m sold.

Lots of reviewers also complained of it crashing with a Mac. Any similar experience?

As a side note–and I really should make a separate post about this–the most significant game-changer for me, with book scanning and/or making PDFs from books already scanned and online–is the free app called Scan Tailor, mentioned in OP. Trust me on this; you’ll want to suffer through the process of learning how to use it. I have downloaded in the past 600+ page books, page by page, as .jpg images because that was the only downloadable format available by the hosting repository. I used to do nothing to these images except bring them into DT, convert to PDF, then OCR as a merged file. This results not only in absolutely massive file sizes, but also, an OCR job that can take upwards of six hours for a 650-ish page book.

Now I run Scan Tailor first, which takes around 15 minutes to overhaul each page image. Result: I can now OCR a book of that size in about a half hour, or less. ABBYY’s API in DT absolutely flies like a jet, and the finished PDF is a tiny file size, and the OCR is more accurate, and the Scan Tailor app is free.

I must admit however to not knowing how to use all the various settings in the Scan Tailor Advanced “output” settings. Still looking for a decent guide on what the various smoothing filters do, and so forth.

Below is an example of a portion of a pre-processed downloaded page from a book. This paragraph comes from a page that was originally 204 KB (but some pages, were 800KB + !!):

After running Scan Tailor, the page was an 88KB TIFF file. I couldn’t crop the TIFF and post it here (uploading that filetype, was not possible), but that doesn’t matter anyway. Here’s a screenshot of same:

I downloaded Scan Tailor Advanced from GitHub based on the developer’s post as a comment at the bottom of this blog post. Note: the blog post is incorrect with instal procedure. All you need to do, is download the .dmg file from GitHub, and it runs. No need for homebrew, terminal commands etc. Edit: Here’s the GitHub link.

chrillek · September 27, 2022, 3:44pm

Do I understand the description in the repository correctly that this is an interactive application, or can you run it from the command line, too?

Ryan_N · September 27, 2022, 5:23pm

I think it’s only GUI based.

rfog · September 27, 2022, 7:17pm

I think you aren’t sold, but sold-out.

I haven’t experienced any crash in my 2019 iMac 27" with custom SSD disks (yes, two), and program is what advertisement says and more. Of course, you must learn to use it, as it needs some tricks with the fingers positions, light configuration, book right position, etc., but once you are comfortable with the program… oh, man, it is like magic.

It flattens curved pages. Really flattens them via 3 laser beams (or one running 3 times very fast), you can see the red lines going across the page curves. There is a thickness limit, of course, but I’ve scanned 1600 pages bible paper books without any issue (look into the images). Apart of that, it has very reliable and productive batch-image processing.

Photo quality is not like a flatbed scanner, but it is enough, and it is fast. Blazing fast.

I think it is worth of it.

tudoreynon · January 13, 2023, 4:28pm

Wish I had found this before I posted my own inquiry today. Hope I am not wasting people’s time here. Do you have any new tips. I am in pretty much in the position you describe in fact. Some paperbacks now well over 40 years old are starting to break up and yellow and get brittle, but I don’t want to destroy them.
I appreciate you are a perfectionist, but would this work well if one was not too fussy as it were?
My need is really to be able to carry round only a couple of pages at a time which I need to mull over. More like a math thing than a ‘reading’ project if you see what I mean. I don’t even need to keep them for long, though the book will stay in its traditional place for sure.
In most cases I even exactly where the passages I need to reflect on are, very rarely over four pages in length.