PDFs from stapled brochures

If I scan in a stapled brochure (removed the staples before :slight_smile:) I’ll end up with the following scenario - simplest case with 2 sheets:

Sheet 1 contains page 1 and 8 recto, 2 and 7 verso, Sheet 2 contains page 3 and 6 recto, 4 and 5 verso.

Is there any tool/script/workflow that let me end up with a PDF with 8 pages, from 1-8?

TIA and best regards,
Franz-Josef

Ahh… recto and version. Old printing terms I haven’t heard in years :smiley: They are describing what are called “printer’s spreads”. The order you read in are called (unsurprisingly) “reader’s spreads”.

The simplest answer is drag and drop the thumbnails. There is nothing built in to DEVONthink.

But there are 2 pages on each page, so you have to cut them into 2 halves before reordering …

I assumed you scanned each page individually

Use a flatbed scanner and Image Capture (via File > Import > From Scanner or Camera) and use the selection tool to crop one side and then the other side of each page – i.e., two separate scans for each page. You’ll need to merge and reorganize the pages later.

I have used ScanTailor as a tool to clean up and split 2-up pages. It may be a consideration for your needs if they persist after a couple years.

I have also used a plugin for Adobe Acrobat Pro called Quite Imposing that can split 2-up pages. “Imposing” is the printers’ term for placing pages in a particular arrangement and sequence to enable copy to be printed and folded and read in order.

In either case, you will have images that may be in the wrong order. Fixing this is time consuming. If you can number the pages, you can assemble a new PDF (for example) in the correct order. When I have done this, it was tedious. I did make some scripts to help but they were specialized to my use.

Lately I have been using a CZUR Aura book scanner which can detect page curvature (laser lines) and the division of a 2-up page. For new scans, I find this method preferable for making a PDF that is ready to use. I find their OCR to be slow so I export a generic PDF and use Acrobat Pro for OCR.

These are in order of cost, from free to moderately expensive. I hope one of them helps you or anyone else reading this.

James

There’s a multithreaded fork of scantailor called scantailor-advanced.

It can be quite a bit faster than the older scantailor at certain operations. You may need to compile it from source-- unfortunately I can’t quite remember what changes I needed to make

1 Like

@Keeline and @jerwin: thank you for those ideas. I occasionally need to scan A5 booklets and have never had much success cropping A4 scans so that each page is exactly the same size. In any case, doesn’t that just hide the part beyond the crop lines?

I like the look of Scan Tailor, but is there a binary that will run on a Mac or do you have to build it yourself?

At the moment, it’s strictly a build your own type deal.

If I remember correctly , it requires a working installation of Qt5., and boost. And some tweaks to the cmake script to use libc++ instead of libstdc++.

Thank you. That’s a shame, because it’s probably beyond my skills.

I found my old development snapshot in time machine. Apparently, I wrote a little build script in bash and put it in the scantailor-advanced-1.0.16 folder.

 #!/bin/bash

export Qt5_DIR=/sw/lib/qt5-mac/
export Qt5Core_DIR=/sw/lib/qt5-mac/lib/cmake/Qt5Core/
export Qt5Gui_DIR=/sw/lib/qt5-mac/lib/cmake/Qt5Gui/
export Qt5Widgets_DIR=/sw/lib/qt5-mac/lib/cmake/Qt5Widgets/
export Qt5Xml_DIR=/sw/lib/qt5-mac/lib/cmake/Qt5Xml/
export Qt5Network_DIR=/sw/lib/qt5-mac/lib/cmake/Qt5Network/
export Qt5LinguistTools_DIR=/sw/lib/qt5-mac/lib/cmake/Qt5LinguistTools
export Qt5OpenGL_DIR=/sw/lib/qt5-mac/lib/cmake/Qt5OpenGL
export BOOST_ROOT=/sw/opt/boost-1_68
cmake -DCMAKE_CXX_FLAGS="-std=c++11 -stdlib=libc++" -H. -Bbuild

I typed the following commands

mkdir build
chmod +x build.sh
./build.sh
cd build
make
sudo make install

All of this relies on fink for the support libraries. (Boost, Qt-5, tiff, zlib,png,jpeg9). Other package managers will install their libraries/include files elsewhere; the build script will need to be updated to reflect that.

Right now, the program can be invoked from the command line.

/usr/local/bin/scantailor

Technically there should be a way to package it up like a regular Mac app, but I’m too lazy to read the Qt5 docs.

I don’t have a mac development signing key, so. I’m not going to be able to distribute a binary.

And, yes, it does use all my cores.

ABBYY FineReader can do it from the box. You’ll just need to reorder some thumbnails afterwards

Came to my mind also. But maybe only with the 96$ Pro version?

I’m not sure if you’re referring to rearranging pages in a PDF, but that can be done in DEVONthink’s Tools > Inspectors > Contents > Thumbnails.

This is discussed in the built-in Help > Documentation > Inspectors > Content > Thumbnails

No, rearranging the pages can also be done in Preview, as well as in DT.

If you have a split-page pdf (as from ent-stapled text after scan), you can split this image both horizontally and vertically into two separate pdf-pages within the FR 13 Pro Image Editor.

This function can also be automatically done to every scanned page within a pdf-mulitpage document.

FR also has usable text-descewing function (image distortion correction) in the Pro version. Another $100 though…

The ABBYY OCR-engine is the only usable thing for us Scandinavian with funny letters :wink:

The deskewing function is rather slow, in my experience. But, then again, I usually deal with antique documents that present somewhat of a challenge to the OCR program.

I did say “usable” :wink:
There are better apps for this out there, and I’ve tried some of them over the years, but as I pointed out, only the ABBYY OCR-machine is good with scandinavian text/library.

1 Like