DTPO OCR creates 10x bigger files than scansnap!!

padillac · August 24, 2015, 6:34pm

All these years I’ve been using DTPO’s built-in OCR technology… and wasting tons of hard drive space.

I scanned a three-page document using DTPO’s built-in OCR, and ScanSnap’s OCR. The difference?

DTPO: 4.7 MB
ScanSnap: 431.2 KB

and scansnap seems to have recognized words better, too - if I search for a word that shows up under a stamp, it finds the scansnap document but not the DTPO one.

Crazy!!

cgrunenberg · August 25, 2015, 7:43am

Which OCR settings do you use?

padillac · August 25, 2015, 3:00pm

Screen Shot 2015-08-25 at 8.58.36 AM.png

TruckTurner · August 28, 2015, 9:42pm

Without commenting on the settings of the OP I have to say that the standalone ABBYY Finereader program does a MUCH better job than the built-in DT OCR engine (in terms of file size, not quality). The file size difference is sometimes as great as 80%. I’ve always been pleased with the quality of the DT OCR engine, but for those with hard drive space concerns you may want to look into the standalone program.

hnmif · September 2, 2015, 11:02pm

I also raised this issue a while back at [url]Large PDF file growth when OCRing]

hf

alanshutko · September 3, 2015, 12:49am

DTPO uses Apple’s PDFKit to save PDFs. Apple’s code is not as space-efficient as it could be.

padillac · September 8, 2015, 4:07pm

I don’t have a separate ABBY program installed… I only have whatever comes with ScanSnap.

Is there any way to use that to convert documents, instead of DTPO’s built-in?

At this point, the only way I can think of is to print out the document and scan it in again… obviously what I save in space I lose in time.

It would be neat if I can “scan-from-file” and get the much smaller OCR from ScanSnap.

korm · September 8, 2015, 4:59pm

(wrong answer; removed)

padillac · September 8, 2015, 5:03pm

sorry I meant I already have a non-searchable PDF and want to use scansnap to OCR it, because it yields much smaller files than DTPO (as mentioned in the initial post)

I don’t see a way where I can load a PDF through scansnap… it only does its thing when scanning a paper document.

alanshutko · September 9, 2015, 1:48pm

Unfortunately, the only way to get the ABBYY that comes with the Scansnap to OCR a document would be to print it and scan it again. The engine that comes with the Scansnap is limited to only OCR documents that come from the scanner, to induce you to purchase the full Finereader product if you want to OCR general documents.