Scansnap 1500M : what to install, configure

Just getting ready to install my 1500M, already have DTPO. I think like the ScanSnap software installer wants to install ABBYY, which I have already installed with DTPO. Do I need both?

Anything other tips about the install, and about configuration of DTPO for stuff like DT-searchable but reasonable sized PDFs?

Seeking info or pointers to existing description.

Thanks

You don’t need to install the ABBYY FineReader OCR software that was included with your ScanSnap, for sending scanner output directly to DT Pro Office for OCR and storage of searchable PDFs in databases.

Scanning to DT Pro Office is controlled by the ScanSnap Manager application. To configure it, first uncheck the “quick” feature it that’s checked.

Open ScanSnap Manager settings and set DT Pro as the application to which output is to be sent. Also, by default, ScanSnap Manager will send originals of PDFs to your Pictures folder. That’s OK. (You can set DTPO Preferences > OCR to delete the original PDF after creating a searchable PDF.)

Click on the “Scanning” tab of ScanSnap Manager Settings. I usually use the Best (Slow) image quality setting. For most copy (if color isn’t important) I set color mode for B&W. I use the Duplex setting, which automatically detects and scans text on both sides of a sheet of paper. I check the option to continue scanning after the current scan is finished. That allows one to automatically merge sections of a document fed into the sheet feeder; when all segments of a document have been scanned, click on the “FInish” button.

Click on the 'File option" tab. I use PDF as the file format to be output.

Click on the “Paper size” tab. I use automatic detection.

Click on the “Compression” tab. I move the slider all the way to the left, for minimum compression (only effective for color).

To initiate a scan, put paper copy in the sheet feeder and press the Scan button.

Now for DT Pro Office 2 Preferences > OCR:

Check the box, “Convert to searchable PDF”.

If desired, check the box, “Move to trash” for the original documents (I have it checked).

If desirerd, check the “Set attributes” box. Note: I have that unchecked, as I don’t bother with the attributes and don’t want each scan to be interrupted by a modal dialog that stops the OCR queue untion I respond. I rename documents within the database, ususally by selecting appropriate text, then right click and choose “Set Title as”.

The original PDF image is not retained after OCR and must be recreated. This can result in a considerable increase in the PDF file size. DTPO provides resolution and image quality settings that allow a compromise between file size and view/print quality.

By default, Resolution is 150 dpi and image quality is 50%. At the moment, my settings are 200 dpi resolution and 50% image quality. There’s also an option to check a box to keep the original resolution of the PDF, but with a larger file size as the result. For very large PDF documents I sometimes use the PDF Shrink application with custom settings to reduce file size.

I use the “Accurate” setting for Recognition. No OCR software is perfect, and there can be character recognition errors, but ABBYY compares favorably to other OCR software for the Mac. The “Accurate” setting is slower, but as it takes place in the background, one can continue using the computer while OCR is proceeding. (Click on Window > OCR Activity to check progress.)

I use “English” as the primary language, with German and Frence as secondary languages.

If you wish to run OCR on a queue of documents, set DTPO Preferences > Import - Destination to send new content to the Global Inbox, or to the Inbox of the frontmost database. If I’m scanning only a few documents I leave Preferences > Import - Destination as “Set group”.

There are some other options in ScanSnap Manager that I didn’t mention, such as setting up file names of the scan output (I use YYYMMDDTime). You may wish to look at ScanSnap Manager > Help for a complete description of the settings.

Hi Bill. Thanks for posting your configuration it was quite helpful in getting started.

You wrote:

“The original PDF image is not retained after OCR and must be recreated. This can result in a considerable increase in the PDF file size. DTPO provides resolution and image quality settings that allow a compromise between file size and view/print quality.”

I don’t quite understand what you are saying. Do you mean that OCR creates a new pdf, discarding the original?

Also, my newly scanned and OCR’d pdf’s show as PDF+Text. I am wondering what that means? Why don’t they just show up as pdf.

Thanks - Ryan

Hi Bill. Thanks for posting your configuration it was quite helpful in getting started.

You wrote:

“The original PDF image is not retained after OCR and must be recreated. This can result in a considerable increase in the PDF file size. DTPO provides resolution and image quality settings that allow a compromise between file size and view/print quality.”

I don’t quite understand what you are saying. Do you mean that OCR creates a new pdf, discarding the original?

Also, my newly scanned and OCR’d pdf’s show as PDF+Text. I am wondering what that means? Why don’t they just show up as pdf?

Thanks - Ryan

No, the original PDF is retained, unless you have set the OCR preferences to move the original to the trash. The original PDF contains only the image, not recognizable text, and it is this image that is not retained in the new PDF.

Related to your first question, the PDF now includes text data, thus the PDF+Text document kind indicates that the PDF has recognizable text.

Thanks Greg.

“No, the original PDF is retained, unless you have set the OCR preferences to move the original to the trash.”

Where exactly is the original PDF? I’m scanning from a ScanSnap, not seeing the original file anywhere. Though now that I have read your description, I don’t see any reason to keep the original file.

  • Ryan

The default setting in ScanSnap Manager saves the original PDF in your Pictures folder.

Unless you set DTPO Preferences > OCR to send the original PDF to the Trash, the originals will keep accumulating in your Pictures folder.