Greetings, I’m new to DTPO and just starting to get the hang of it.
I have a couple of questions regarding Acrobat, which I have been using for number of years in my attempts to organize.
When I scan into Acrobat (I’ve had a Fujitsu 1500 for a couple of years) I routinely optimize scanned document and reduce file size. Optimizing straightens and cleans the document and makes documents that print out neater. Reducing file size eliminates backward comparability (I chose compatible with Acrobat 8.0) This significantly reduces file size.
Question 1, With regard to DevonThink is there any reason not to optimize and reduce? Other than the extra time, which I’m more than willing to take.
Question 2, although I’m wiling to take the extra time, I would like to make the process more efficient/automated. Now I scan into DTPO. Open the new PDF, open in Acrobat, choose optimize, choose reduce file, replace the new optimized/reduced file, close Acrobat, and move on to the next file. I haven’t figured out how to batch process or automate. If someone has come up with Automator/Scripting or Plug-in for this process I would appreciate some help. If not, someday I’ll figure it out.
Mark, you didn’t mention what it was you are scanning and if your goal in putting your documents in DEVONthink is to have searchable, OCRd PDFs in your database. The nature and quality of your originals, what you want to do with the scans, and whether you need to print scanned documents later – all of this affects your workflow.
Assuming that, like many users, you have paper documents that are mainly text, that you want to OCR the files so they are searchable in DEVONthink, and that you won’t be printing PDFs frequently, then answers to your questions could be:
You’d want to have sufficient quality (resolution) for the documents to be acceptably readable and for the OCR to successfully recognize 100% (or close to it) of the text. If quality and OCR don’t matter, then there’s no inherent reason to optimize and reduce – other than the normal constraints of disk space and recommended database size.
An efficient process is to configure your S1500M to scan to DEVONthink, and to configure DEVONthink (in DEVONthink > Preferences > OCR) to:
- Convert incoming scans to searchable PDF
- Set resolution to same as scan, with quality 100% - and use ScanSnap Settings to set your resolution
- Set recognition (OCR) to Automatic
This will let you scan document after document - the scanner software will send the document to DTPO - DTPO will have ABBYY do the OCR - and DTPO will place the documents in your global Inbox where you can rename, tag, move, or copy as needed.
In ScanSnap Manager > Settings,
- In the Application tab, set DEVONthink as the target application for incoming scans
- In the Scanning tab, set image quality to Normal or Better (check the results, and adjust if they are not good)
- In the Compression tab, set compression rate in the middle of the range (again, check the results, and adjust if needed)
There is always a little fine-tuning based on your particular circumstances, but these are standard settings that could meet the objectives you described.
Thank your quick response, especially considering the Holiday.
I was a little more conceded with the mechanics of Acrobat “doing things” to PDF files, both prior to importing to, or already in DTOP.
I scan many types of documents, almost exclusevly text based; contracts, correspondence, reports,… Many of the documents old, copies of copies, faxes, and copies of faxes and very often of relatively poor quality. Before I go on, I have to say I am extremely impressed with DTPO’s OCR ability, it does a far better job than Acrobat. Having said that, I like Acrobats Optimize and Reduce feature because I do need to print these documents and Acrobat does a remarkable job of “cleaning up” the PDFs. And, I prefer to reduce them because I often email copies (I often run into email size barriers, and in recognition of the people I deal with, I try to avoid FTP and other methods of sending large files). I also receive PDFs from other sources and find that it is not uncommon to be able to reduce files to as little as 10% of their original size.
If I read your response correctly; although “there’s no inherent reason to optimize and reduce” there is also no harm. Besides scanning documents into DTPO, I am in the process of importing roughly a thousand PDF’s that I optimized and reduced prior to acquiring DTPO. I have workflow habits that will likely change, I just wanted some assurance that, while I may be wasting some time, I’m not doing things i’m going to regret.
As to basic work flow on simple documents, I have been successfully scanning, importing and letting DTOP handle the OCR process. Thanks the advice, I’m going to check my settings and tweak my work flow.
Thanks again, for help on a Holiday. I have very high hopes for DTPO, I’m extremely impressed, and anxious to become proficient.
When I don’t want ScanSnap to send documents directly to DTPO, I scan to a folder. I often use Acrobat’s Document > OCR Text Recognition > Recognize Text in Multiple Files Using OCR batch process. Once you’ve told Acrobat which files or folders to process, the Output Options dialog opens where numerous settings can be configured – including the PDF Optimizer settings.
In fact, using the batch OCR command is a convenient way to do batch optimization – a little obscure, but lots of Acrobat’s techniques are obscure.
Your right, Acrobat has obscure abilities. I’ve been using Acrobat for years and didn’t realize how much it could with batches of files.
My philosophy is that you should something new every day, you’ve tought me something early in the day, so the pressure is off, i enjoy the rest of the day without the pressure to learn something new.