Problems opening OCRed PDF in Preview (2 GB RAM needed!)

I tried to open a 33 MB big pdf file (4 scanned pages, OCRed with DevonThink Pro Office) with Preview.app (version 5.0.1 on MacOS X 10.6.5).

I only saw the spinning beach ball for a long time and after finally being able to open Activity Monitor (which took ages), I saw that Preview.app needed nearly 2 GB of physical memory (3,75 GB virtual memory)!!

Opening such a PDF can make my MacBook unusable for minutes!

How is that possible?

It was only this one pdf file open and I just had opened Preview.app.

(I’m using a MacBook 13" white, 2,16 GHz Processor and the maximum of 3 GB RAM)

I often have performance problems when opening or viewing the PDFs created from scanned pages and OCRed with DT Pro Office.
I have lots of them and the performance of Preview.app, Skim.app and DTPro Office itself is quite poor when working with them.

Does anybody have a similar problem and a solution?


File information says:
Version: 1.3
Pages: 4
Image size: 2551x3147
Created by: DEVONthink Pro 2.0.3 OCRPlugin 2.0
Coding Software: Mac OS X 10.6.3 Quartz PDFContext


edit:

I’ve found the original scanned pdf (without OCR).

It is 8,1 MB in size,
Coding software: iText 1.4.9 (by lowagie.com - I don’t remember, how I created it from the scanned original)

It can be opened with preview.app without a problem, and then physical memory needed is 99 MB

It’s not just the size of the PDF. I’ve got a book in one of my databases that takes 159.9 MB for 421 pages. It opens instantly in my database or in Preview, on my MacBook Air with 4 GB RAM, running under OS X 10.6.7. (Yes, the Air has an SSD instead of a rotating hard drive, but it’s not infinitely faster.)

What are your settings in DTPO Preferences > OCR? I use the default 150 dpi and 50% image quality most of the time, and my searchable PDFs are much smaller than yours. I’ve got lots of 4-page documents that are under 1 MB in size. If the original scan quality was sharp and with good black/white contrast, the searchable PDFs are very readable.

And for items such as receipts or other documents that would be OK in FAX view/print quality I use 96 dpi and 50% image quality, resulting in searchable PDFs that are significantly smaller than the original scan.

Hi Bill,

thanks for your answer!

It’s quite a long time ago that I OCRed those pdfs, but I think I did not let DTPro change the size of the scans because I feared that the quality of graphics (diagrams) could be too bad.

Maybe that was not the best option, as it extremely increased disk usage of my database.
I now used a Quartz Filter for some of the PDFs to reduce file size by decreasing the resolution and by JPG compression with Preview.app (see here: discussions.apple.com/thread/12 … 0&tstart=0).

So does anyone have an idea what could be wrong with this PDF and how I could fix it?

When a page is re-rasterized to create the image layer after OCR, it is affected by the settings for dpi and image quality in Preferences > OCR.

Graphics, including pictures, are converted to JPEG images. If you have experimented with saving JPEG images at various quality settings, saving the JPEG at 50% rather than 100% quality will result in a very substantial reduction in file size, but in most cases the 50% quality image isn’t “too bad”. For graphic such as charts with small text in the image, you might try boosting image quality, for example, to 75% or 80%.

In some cases a PDF that seems quirky can be made more usable by opening it in Preview (or Acrobat) and using Save As to make a copy. If this is one within the database and you are using DT Pro or Pro Office, save the copy to the ‘Inbox’ place in the Finder, which will send it to your Global Inbox. Then, after testing the copy, if it is an improvement delete the original and file the copy into the original’s previous group location.

Hi Bill,

thank you. I tried that (saving it again with Preview), but with no success.
It still blocks my MacBook for minutes (until I force quit preview, as I can’t stand it…).

I just OCRed another PDF today and had the same problem:
extreme memory usage, MacBook paralyzed. :frowning:

Interestingly:
With Skim.app, I can open it and memory usage is reasonable,
if I open it with preview, rien ne va plus and if I select it in DevonThink (so that the first page is shown in another pane of the DT Pro window), DevonThink also gets very slow and I see the spinning beach ball some time.

Any ideas how to find out what makes those PDFs so difficult to display?

Martin