How do I change the 72dpi that DTPO uses to create PDF?

wdfs · December 17, 2009, 1:48pm

Arrrggh.

I just exported PDFs I made as PDFs in DTPO using the bookmarklet in the last three months to include at the back of a report.

The quality was awful: 72 dpi. Type faded and pixelated. Images can’t handle grey.

So I checked the original in DTPO: 72dpi. This is completely unacceptable. How do I change it?

I checked the latest user manual. Nothing.

P.S. I haven’t used my ScanSnap in DTPO yet. I need a database of great, high-quality PDF scans, and was thinking of using DTPO because I can tag. Am I looking at headaches?

cgrunenberg · December 17, 2009, 2:10pm

Well, the bookmarklets use the available online images and those usually use 72 dpi.

You can ignore that information, it’s quite useless in case of PDF documents and therefore will be probably removed by the next release.

wdfs · December 17, 2009, 3:33pm

Christian,

Is my only workaround to create PDF to a folder on my hard Drive, then import into DTPO, or index them?

I collect articles that backup reports, which I create in InDesign, in DTPO and have to include them in the professional reports for printing. But when they look like this, I can’t.

My issue is printing. These PDFs get attached to the end of an InDesign doc and I prefer to keep them in DTPO because I can reference them.

Should I ditch that idea?

Bill_DeVille · December 17, 2009, 8:34pm

No, your conclusion that DTPO is converting or storing PDFs at 72 dpi isn’t correct, as noted by Christian. That reference to 72 dpi has nothing to do with capture or storage resolution or image quality; as it is easily misinterpreted, Christian is thinking about removing the reference.

If I wish to capture a PDF file that’s presented by a Web site, I will use my browser’s File > ‘Save As’ command, and choose ‘Inbox’ as the destination. Now I’ve got the actual PDF file in my Global Inbox. It will have the same resolution and image quality as the PDF put up by the Web site. If the PDF is searchable, it will not require OCR.

When you Import a PDF into DTPO it isn’t changed in any way, with the exception of PDFs that are subjected to OCR. For your need to include PDFs in your professional reports for printing, this is an important issue for you when you are setting up DTPO Preferences > OCR.

When the OCR module converts an image received from your scanner, or an image-only PDF downloaded from the Web, it rasterizes the original image. The tools built-in to OS X for that purpose are not very efficient for the size of the resulting new image layer of the searchable PDF, so it can balloon substantially in storage size.

That’s why Preferences > OCR allows the user to configure the resolution (dpi) and the quality of images in the PDF (quality %) as a compromise between the view/print quality of the stored searchable PDF, and the file size. The lower the settings for dpi and quality, the smaller the file size, but with increasing degradation of the view/print display of the PDF.

The default settings of Preferences > OCR are 150 dpi and 50% image quality. When I scan paper copy with my ScanSnap scanner, I’m scanning at a higher resolution and image quality, as OCR requires a scan image of 300 dpi and high quality (uncompressed) images for good OCR accuracy. After OCR has been done, the Preferences > OCR settings then save the searchable PDF at lower resolution and image quality to save disk space on my computer.

I’m satisfied with the default Preferences > OCR settings for most of the documents that I routinely scan to a database, such as receipts, invoices, contracts, letters and so forth that I want to keep in a database for personal use. The resulting searchable PDFs approximate FAX quality for viewing and printing, and that’s good enough for my purposes for those documents. I have a lot of such paperwork coming into my home, which I want to keep for various purposes, including tax records, etc. The advantage of storing them into my databases is that I can actually find information when I need it, much more easily than I could find the paper copy — and I don’t end up with hundreds of pounds of paper in file boxes and file cabinets. Although I’ve got terabytes of disk storage space, I see no need to save all searchable PDFs with high view/print quality.

When you need higher quality OCR results: In a case such as yours, you don’t want significant degradation of the view/print quality of a PDF that you intend to include in a print publication.

Beginning with DTPO public beta 8, there’s a simple alternative to ‘tweaking’ the Preferences > OCR dpi and image quality settings upwards. There’s now a check box in the Images section, ‘Same as scan’. When that box is checked, the quality of the searchable PDF resulting from OCR will approximate the quality of the original scanner output. That should work for your print publication needs, assuming your scanner was producing acceptable results.

Another alternative that you had thought of would also work. You could Index-capture a PDF that you planed to use in a print publication. If, within your database you then OCR the resulting PDF document using Data > Convert > to Searchable PDF, the result would be that the original PDF (external to the database) would be untouched (remain image-only) and retain its full view/print quality, while the PDF in your database would be Imported into the database with the resolution/quality settings in Preferences > OCR. This approach would allow you to keep a copy of the PDF for use in a print publication and also place a searchable copy of it into a DTPO database.

Comment: Many PDFs from Web and other sources are searchable; there’s no need to run a PDF through OCR if it is already searchable.

bstadelman · December 18, 2009, 12:59pm

“When the OCR module converts an image received from your scanner, or an image-only PDF downloaded from the Web, it rasterizes the original image. The tools built-in to OS X for that purpose are not very efficient for the size of the resulting new image layer of the searchable PDF, so it can balloon substantially in storage size.”

I have noticed that when I used a pre-scanned PDF of 6.3 MB, OCRing it in DTPO made it balloon up to 65.5 MB!

What is interesting to me is that OCRing with another product (PDFpen) only took the size to 6.7MB - which is in line with what I expected, since text takes so little space. Granted, PDFpen made a couple of mistakes that DTPO did not, but considering the filesize difference, I’d probably be willing to make that sacrifice.

Any ideas?

annard · December 19, 2009, 11:08am

This has been discussed many times on this forum, so let me repeat:

The OCR module as used by Pro Office converts all PDF images to colour images, this is why PDF files with 1 bit images become (much) larger. This is not going to change until Abbyy’s PDF parser becomes more sophisticated.

That said, with version 1 the files were even larger.

You could always use an Automator workflow to convert the images back to black and white if so desired.

wdfs · December 19, 2009, 11:16pm

Bill, you wrote:

How do you do that? I can’t access the DT’s Inbox via Safari’s File > ‘Save As’ command. What browser are you using?

Bill_DeVille · December 20, 2009, 2:30am

I’m using DTPO pb8 under OS X 10.6.2. I can Save As a PDF displayed by any of the browsers I normally use, Safari and DEVONagent, or from most Cocoa browsers, as well as from Firefox. Note that an HTML page would be saved as a WebArchive file, except from Firefox.

If ‘Inbox’ isn’t already listed under ‘PLACES’ in the left column of a Finder window, you can install it by locating the ‘Inbox’ folder at ~/Library/Application Support/DEVONthink Pro 2/ and dragging that folder into the list of Places in the left column of the Finder window.

Using ‘Save’ or "Save As’ from an application that provides that command allows one to directly send a file to the DT Pro or DT Pro Office Global Inbox. In this way you can send files directly into DTPO from a great many applications, including Word, Pages, Numbers, TextEdit, Mellel, OmniGraffle, etc.

wdfs · December 20, 2009, 9:21am

Well, I be d**ned. I never even imagined that you could put the Global Inbox Folder in Places. I realize the Sorter is located there, but…

Wonders never cease.