I have been using Devonthink Pro Office with Fujitsu ScanSnap for a few days. While it is very nice to be able to keep scanning while OCR is working in the background, there are annoying problems regarding the DTPO handling of OCR errors.
Problem 1.
When multiple PDF files are lined up for OCR, and one file caused OCR problem of some sort. I can select to give up OCR on that file and continue. However, in reality, this quits the rest of the OCR job. Files that were waiting to be OCRed in DTPO is will no longer be processed, and they are not imported as-is either. So far, I haven’t found any workaround other than manually digging the PDF files saved by the scanner driver, rename them, drag them into DTPO and convert to searchable PDF. This is very annoying.
Problem 2.
There were several cases where DTPO rejected the PDF as it caused OCR error. However, when I attempt to OCR the same file later, it sometimes went through. I don’t understand why. Most of the time, when DTPO can’t handle it twice, I could OCR them in Acrobat 7.0.9. I don’t understand why either, as I find the OCR in DTPO is generally superior to that in Acrobat 7. (Haven’t used Acrobat 8 tho.)
Problem 3.
DTPO cannot OCR documents larger than 50 pages. HOWEVER, this error will come up only after spending time to OCR the document. This results in a kind of error and Problem 1 above applies, besides wating time and not getting a single searchable page. Can this be detected before wasting time? Even better, can DTPO detect scanned PDF longer than 50 pages and invoke Acrobat to do OCR? (I have been using Acrobat for documents longer than 50 pages.)
Problem 4 (feature request)
Say I have 1000 documents in the OCR queue. A popup window will come up and ask for the file name and keywords, every time a single file OCR task is completed but before saving it.
a. While the popup dialog is active, it doesn’t seems that the next document in the OCR queue is processed. That is, if I don’t realize that the popup dialog is there, I’m wasting time. Can this be fixed?
b. Can the OCRed results be saved in a temporary folder, wherein I can rename the files and add keywords altogether at a later time? This is so that I can scan a bunch of documents and go away for a cup of coffee or even go home.
c. the filename dialogue above currently has a postage-stamp size preview of the first page. Can this be made bigger, and also pageable? This is because I scan a lot of academic journal papers, and the necessary info to make up a filename is often printed in 8 or 10 point size and there is no way to see them before they are saved with a filename. This is very annoying. I usually have to keep a stack of paper in the scanned order so that I can look at them to make up a filename. This is very annoying. Otherwise, I had to save the document with a random filename foo1, foo2, whatever (oops, these are deterministic and highly predictable, and not random and rename later, until I reazed the problem below.
d. Is there a way to rename the file after it is once saved in the dialog? Changing the document title in DTPO does not seem to change the filename. Also, how can I edit the keywords and subject metadata in the PDF file once the file is saved?
Thanks!
Ryuji