can't get PDF's to OCR using ScanSnap S1500

Hi,

this is driving me crazy… I may be missing something very simple here but I can’t seem to get my PDFs to OCR using ScanSnap Pro. I’ve tried pretty well every setting on the ScanSnap, even upping the quality to maximum (see image 1). Also OCR by ScanSnap before entry to DTP. I attach a screenshot which shows a document which was scanned using maximum resolution - yet it seems to still only have a resolution of 402 x 606 pixels according to the DTP menu bar.

Document is a 15 pages slightly smaller than A5.
Using Mac OSX 10.6.6
DTP 2.0.9
Scansnap S1500 3.0.23.1608

when I try to convert the PDF to text, i only get a few characters!

i have not set up the scansnap to send all documents to my DTP as I scan a lot of business info which only needs archiving - no need to clog up my database with this.

Can anyone help?

thanks,

Joe Lafferty


What are your OCR settings in the DEVONthink preferences? Unless you have ‘Same as scan’ checked, DT will use the resolution/quality settings to the left of the check box.

That was fast!

Meant to mention this Greg, yep got it set to same as scan - see below.
but i see it looks like i have multiple languages selected somehow, maybe this is the problem?

Joe

I don’t believe the language settings would have any impact-mine are set the same as yours.

Also, I could be mistaken but I don’t believe that what you have highlighted is the image resolution. That’s the image size of the scanned document reported in pixels, and 402 x 646 pixels is ~5 1/3" x 8 2/3", or the size of A5 paper. I always get a number around 620 x 820 for 8 1/2" x 11" paper regardless of the resolution of the scan.

Have you tried scanning with Image Capture? That might help isolate the problem.

Greg,

spent a bit of time trying to scan with image capture - but can’t even see the S1500! searched a bit on Apple and other sites, but to no avial. v frustrating… Just tried to reinstall scansnap sotware, but i see it has kept all my old preferences - so clealry didn’t get it all!

thanks for your help - how might i see the scanner with image capture?

Joe

Does DTPO give you the option to search for the scanner in the browser? That’s the way I need to locate my Epson wifi scanner.

Edited to also add-In case you have not yet seen this, DT has a FAQ on scanning that might be helpful.

greg,
yep, get that dialogue box, but when i search - it comes up blank. apart from my iPhone which is plugged in.
i don’t normally get this stuck!!
Joe

I’m out of ideas, but perhaps Bill_D or one of the members that use a ScanSnap can offer a solution for you. Good luck!

greg,
thanks for the edit - didn’t find that FAQ section earlier!
was very hopful that removing the plist setting mentioned would do the trick, but no, still getting the same file size and no text when i convert to text…
Joe

thanks greg!
just tried by removing all the plist settings from scansnap, but even this doesn’t help…
anyone else?
Joe

  1. All settings for the resolution, etc. of the original scanner output are controlled by ScanSnap Manager. The settings in DT Pro Office Preferences > OCR then determine processing and characteristics of the searchable PDF in DTPO. I see that you properly choose DEVONthink Pro as the destination application in ScanSnap Manager Settings’ Application tab; that’s the proper choice.

  2. Make certain that ‘Use Quick Menu’ is NOT checked in ScanSnap Manager.

  3. If OCR software was supplied with your scanner, DO NOT check any option to perform OCR in your Fujitsu-supplied software.

  4. Under ScanSnap Manager Settings ‘Save’ tab, DO NOT select the ‘Inbox’ place in the Finder as the destination to save the original scan output. The default setting is your Pictures folder – that’s OK.

I’ve attached screenshots of the settings I use most of the time, both in ScanSnap Manager Settings and in DTPO Preferences > OCR. In the latter, I don’t check the box to use the original scan resolution, as that can result in huge files without commensurate viewing quality improvement. I’m usually satisfied with the default 150 dpi/50% image quality. For receipts and other short items I’ll often drop down to FAX quality of 96 dpi, as that results in searchable PDFs that are smaller than the original scan image.

I get very good text recognition accuracy – assuming, of course, that the original paper copy isn’t blemished and doesn’t have artifacts such as handwritten underlining, highlighter marks, etc. and doesn’t have very small or unusual fonts.





Bill,
thanks for your clear instructions and screenshots. Most of what you suggested I have done in the early instructions. I followed what you outlined to the letter and the results are a much smaller file as you suggest (2.9MB as opposed to 7.6MB) - see screen shot 1.

DTP tells me that it has scanned, and it seems to take a long time to recognise the text.
However, when I try to click on the pdf to select text, nothing happens. And, when I seek to convert to Rich or Plain Text i only get a few characters, see screenshot 2.

btw, your screen shots look like an earlier version of scansnap manager, so I uninstalled my v23 and re-installed version v20 from the DVD that came with the scanner which is OK with osx 10.6. but same results.
So, I’m scanning in with right settings, and no OCR. When it gets into DTP it seems to be doing the ‘work’ behind the scense to convert, but somethings not working.
btw, the document is a clean, clear A5 black text on white paper with no markings or annotations.
…a mystery to me!
regards,
Joe

I don’t think I’m adding anything new to the dialog, but I have a 1500M and these are my settings in the 1500M version of ScanSnap manager and in DTPO. The only other tab I’ve set in ScanSnap manager is to change the default folder in the Save tab to another place. All the other tabs (Scanning, File Option, Paper, Compression) are vanilla out-of-the-box. I’ve never adjusted any of then. With these settings, documents go to DTPO’s ABBYY engine and get recognized. Usually about 10 seconds per page or less.

thanks for contribution - I think there is a problem with DTP OCR somehow…
hope I don’t have to uninstall and re-install! I might email in a support request to see if they can help.
Joe

Joe, when you perform a scan, switch to DTPO and choose Window > OCR Activity. Does that window remain blank, or does it display activity?

If OCR isn’t working properly, check the Console Log for messages corresponding to the time period of the scan/OCR attempt. Copy the messages and send them in a message to Support.

Yeah, my ScanSnap dates back to 2005 and has been a real workhorse. Later models of the ScanSnap are a bit faster, but I haven’t yet been able to justify getting one. :slight_smile:

Hi Bill,

just emailed you.

re your questions:

  1. the OCR window displays activity
  • recognising pages
  • collating pages
  • importing
    so it looks like it’s doing real work.

re 2 - i can see a window called ‘log’ but this is blank.
don’t know how to find ‘console log’, sorry.

I just exported the OCRed PDF in my DTP database to the desktop, and ran a OCR in Adobe - which worked fine. I can click and select the text in this file, but not in any of the files in DTP!

Joe

In your Applications/Utilities folder, find and launch Console.

In the left hand pane, select this item at the top of the list

Scroll the log listing (right hand pane) down to the bottom. Attempt a scan. If messages appear in the Console log from DEVONthink, copy (select the rows with messages) and paste them in a message to DTech Support.

You haven’t mentioned reinstalling DTPO and the ABBYY plugin. Since nothing else seems to solve the problem, I’d suggest doing that.

To remove the ABBYY plugin so that DTPO will reinstall it, here is Christian’s advice (from another posting):

Thanks for tip - now found console & have sent file to DTP support.
I actually tried a re-install of ScanSnap Manager and also DTP, and it seemed to re-install the ABBYY without removing the file as you suggest. But perhaps that’s left behind a rogue setting?

Joe

That’s … strange.

You’ve mentioned “DTP” a couple times in your posts. Which version of DEVONthink do you have?

OK, I just posted a reply but it seems to have not made it…

in response to your question, I have DEVONthink Pro Office v 2.0.9
MacBook Pro 3.06GHz Intel Dual Core 2 Duo, 8 GB ram
OS X v 10.6.6
using ScanSnap S1500 with
ScanSnap Manager v 3.0 L23

another thing, I tried to reinstall ABBYY using the process you outlined, and instead of installing ‘just for my mac’ I installed the larger file that works on any mac. made no difference.

incidentally, i have pdf’s scanned in a while ago that i can select text with no problem. And, if i open the scanned PDF in Adobe Acrobat Pro (I have CS5 for Mac), then Adobe seems able to recognise the text fine.

very strange…

Joe