Problem importing PDF's / missing pages

betterbutter · February 1, 2010, 3:45am

Hi.

I’ve tried a number of times now scanning a 25 page document into DTPO with my SnapScan 1500.

Around the 12-15 page area I start missing pages, as they appear blank.

I’ve scanned them in 4 times and every time, one to three pages are blank.

I’ve used both OCR on and off through DTPO, and it doesn’t seem to be affected either way.

When I open that document in Acrobat, the pages appear fine. However when I open the imported version in DT, the pages are blank.

When I export that DT version back out of the database OR try to open it via Acrobat it opens but when it gets to the pages in question it returns the error, “There was a problem reading this document” So, my guess is that during the import into DT, it somehow is corrupting, which is scaring me because I’ve sort of committed to making my office paperless, and I’m worried now that my documents imported into DT are not stable.

Does anyone have a solution to this or have encountered PDFs fine in Acrobat but upon importing into DT they become corrupt?

Thanks

korm · February 1, 2010, 10:59am

ScanSnap Manager creates a file on disk and writes pages to it as the scan progresses. The manager does not know in advance how many pages to expect. Sometimes the manager pauses between pages, for processing. At that time the file is not a complete PDF (with internal PDF metadata). If you are scanning to the DT global inbox it is possible for DT to grab the file before it is “finished”. This can result in various errors and damage.

I’ve discovered the foregoing in my own experience with ScanSnap, particularly long multi-page scans and documents with numerous images. For this reason I don’t scan into DT directly, but into a working folder for later import into DT. Undoubtedly others have different approaches.

(In my case, I use Hazel to examine the file and copy it to the DT inbox after the file reaches a pre-determined age. The Hazel step is entirely unneeded. )

betterbutter · February 2, 2010, 5:04am

Thanks for your reply to my problem.

On first glance I think your suggestion could be right about importing into DT, however there is one catch to it and that’s when I scanned to a Folder in on my computer, then opened the document in Acrobat it looked and worked fine.

Then importing (dragging into a DT folder) that document from my computer into DT provided the same error of missing pages albeit different pages but more or less the same errors. So I know it’s not a scanner communication error with DT or processing error between the scanner and DT.

There is something happening on some PDFs that when bringing them into DT that provide errors or corrupted incomplete PDFs in return.

Seems a mystery. Until then I divided the document into 3 parts, which seemed to work, of 8 pages each.

I don’t have many long documents, so it’s averted for the moment, but the thought of importing important documents into DT and having them corrupt is frightening.

betterbutter · February 2, 2010, 5:07am

Another thought.

I use DropBox to reside my databases.

Could that be causing the errors, especially if DropBox is updating while the document is still processing?

korm · February 2, 2010, 11:50am

That’s the problem. Lots of timing issues introduced (scanner to folder, folder to DT, folder to DropBox, etc.). DropBox syncing problems have been well documented in these forums.