SEARCH DOESNT FIND

I AM USING FUJITSU SCANSANP WITH DTOP. I AM A NEW USER AND WONDER WHY WHEN I TRY , THE RESULT VARIES. SOMETIMES A WORD ISNT FOUND AT ALL, OR IT IS FOUND IN FEWER DOCUMENTS THAN IT ACTUALLY APPEARS. SHOULD DT FIND EVERY INSTANCE OF A WORD IN EVERY DOCUMENT? ARE THERE SOME SETTINGS IN EITHER SCANSNAP MGR OR ADOBE OR DTOP THAT I SHOULD ADJUST?

Yes, DEVONthink should find every instance of every word in documents that meet the search criteria.

But if you are working with scanned documents that have been OCRed, it’s possible that a word that you can read in the image layer of the PDF wasn’t recognized properly and so doesn’t exist in the searchable text layer.

No OCR software is 100% accurate in recognizing the characters and words in images of printed text. Indeed, that’s the basis of logon security procedures that present to the human reader text characters that are misaligned or in strange-looking fonts.

We recommend that scans be done at 300 dpi to present the OCR software with sufficiently high resolution of the scanned images.

Even so, paper copy that contains unusual fonts, small fonts or blemishes such as coffee stains, penciled underlining or other marks, etc. can result in OCR mistakes.

It you have a PDF in which you can see a word in the screen image that cannot be found in a search of that document, try this: select that document in the view window list and choose Data > Convert > to plain text. This will produce a new text document that contains the text layer of the PDF. You will find that the word was indeed not included in the OCR result.

The ABBYY OCR software used in DT Pro Office is as good as it gets for OCR on the Mac. I use the highest setting for accuracy in Preferences > OCR and I choose languages appropriate for the copy I’m scanning. I scan at 300 dpi and I generally use the black & white setting to produce maximum contrast scans. Even so, for reasons such as those noted above, some copy will produce errors when the scans are OCRed.

I’m extremely pleased with the accuracy of OCR of scans produced by my ScanSnap. Not all scanners are of equal quality, as is also the case with digital cameras. And in the case of flatbed scanners, don’t let the glass plate get smudged and dirty, as that will reduce image quality (like a dirty lens in a camera). Some scanners provide a routine to calibrate (optimize) scans; use it periodically if that’s available.

Scanners have mechanical components. Either the paper copy is moved through a feeder, or the optics move across the paper copy. Misalignment or speed variations will reduce image quality.

I think my problem may be my Scansnap settings. In the Scansnap settings, which application should I choose - Scan to FIle or Adobe or Scan to Searchable PDF?

If “Use Quick Menus” is checked, uncheck it. Now you can open the configuration tabs in ScanSnap Manager and set up to scan to DT Pro Office.

I have a variation on this problem. A scanned document, from HP Officejet Pro 8500, imported into the main inbox using OCR, then copied to a sub folder, no problem so far.

I can open the document choose a word most likely only found in that document, highlight it, look it up in the dictionary, get a definition no problem. So the OCR has worked fine.

However go out to the main inbox and do a search for the same word and the document cannot be found. Go to the previously mentioned sub folder and do the search again, and hey presto it finds the document. What am I doing wrong? Am on 30 day license trial, but like what I see so far, this is just a bit of a niggle to overcome.

I suspect the reason the document isn’t being found is that the search isn’t set for the database in which the document is located.

Try the full Search window (Tools > Search). Check to make certain that the search spans all open databases, or the specific database in which that document is located.