Loss of scan quality when importing to DTPO

cgraber · December 4, 2007, 10:52pm

I’m a new DTPO user, and I have been systematically moving all of my scanned .pdfs from Finder into DTPO. I have a paperless office, so my scan quality is very important to me - I need future printouts of my documents to look as good as if made by a copier. I use a ScanSnap and scan at 600 dpi which is good enough for my needs. However, when I import the scanned documents (which I’ve just noticed after importing about 1000 documents), the scan quality is greatly reduced, which makes a printout of the .pdf look inferior than a printout prior to the import. The file size is also substantially smaller.

I have even tried adjusting the OCR settings to 600 dpi and 100% Image Quality. This improved the scan quality, though it is still not as good to the scan quality prior to importing it, but the file-size is now 3x the size of the original file.

Any tips?

If nothing works, I suppose I could just import the .pdfs w/o the OCR treatment and wait for version 2.0 and rely on Spotlight to search my .pdfs.

Thank you for any help,

Cameron

annard · December 5, 2007, 12:10am

The IRIS software we are using is not capable of keeping the original image intact, so you will always get artifacts. However in my tests I found that the defaults are quite usable but YMMV. If space is not an issue for you, you could opt to write an Automator workflow that will import the scan as is, and then convert it to a searchable PDF where you keep the original. If you set the image dpi and quality to match your screen, you would end up with less disk space overhead. But you would have full search capability and for printing purposes the original would be there.

I fail to understand how Spotlight may help here because it doesn’t have OCR capabilities. So when you put them in DT without OCR regardless of the future database structure, other than the filename (which with the ScanSnap Manager is quite useless) there is nothing for Spotlight to index on.

cgraber · December 5, 2007, 2:03pm

Thank you very much for your reply. My thought on Spotlight is that Spotlight searches the contents of .pdf files, so once version 2.0 has Spotlight support, even if my documents do not have OCR added, I will still be able to search the text of the documents via Spotlight. Is my assumption incorrect?

Thank you,

Cameron Graber

annard · December 5, 2007, 2:07pm

The PDF documents generated by the ScanSnap Manager are JPEG files wrapped in a PDF skin and as such do not contain any textual information that will allow you to find them using any indexing technology. You need to run OCR on them first.

cgraber · December 5, 2007, 2:27pm

Oh, ok. Thanks for the clarification.

Cameron Graber