images cropped in Preview revert to uncropped state in DTPO

twicks · December 13, 2009, 10:35pm

I just scanned in 12 sheets printed on both sides, with 2-up pages (booklet style- 2 pages per side in landscape mode). I scanned them to a folder and bypassed DTPO since I have to crop and rearrange the pages. With these scans, you find page 24 on the left, page 1 on the right; page 2 on the left, page 23 on the right; etc.

Therefore I need to crop page 1, then crop page 2 and add it to page 1, crop page 3 and add it to pp. 1 and 2. This I did using Preview. It works quite easily and fairly fast. The finished PDF document consisted on each page on its own PDF page, portrait mode. All 24 of them.

However, when I did an “Import (with OCR)…” into DTPO the captured PDF document showed up just as the original document (which by now was in the trash) with 24-1, 2-23, etc. Doing a Convert to Plain Text resulted in such a mess of stale gum that it added insult to injury. Because articles in the booklet sometimes flow from, say, p.3 to p.4, and then to halfway down p.5, I have to carefully cut and paste the plain text into some semblance of order, which at this point is very painstaking and slow.

So I tried an experiment: I took page 3 (the first page with a story on it) and cropped and saved it alone. The PDF file in Preview showed just page 3. Page 3 only. When I imported (with OCR) into DTPO, it showed up as page 22 - 3 in the PDF file. When I converted it to plain text, the same flow of page 22 into page 3 resulted. Total chaos.

So I’m wondering why (and how) DTPO figures out the doggone original layout (the original is by now in the Trask) and displays the so-called cropped image as uncropped? How can I work around this? Or better yet, how can I fix this unorthodox behavior, behavior that I’ve never before experienced.

I’m using DTPO PB8 on an early 2009 Mac Mini with 4GB RAM OS X 10.6.2

Any worthy tips would be really appreciated.

korm · December 14, 2009, 10:09am

I can reproduce this.

This seems to be what’s going on: The Preview help file says

If you drag the saved document into DTPO (plain import), the crops appear in the saved document. Although the “hidden” data is still in the file, there is no command in DT to reveal it, unlike in Preview. A side effect of importing with OCR is that the OCR engine apparently reveals all data, without the ability to hide it. Preview apparently does not do destructive crops. Answer seems to be to not use OCR on documents cropped in Preview, or in Preview to save the document in some form other than PDF (PNG, tiff). If you need a PDF, you can convert it back. This however will play hob with OCR.

Rock+hard place =

twicks · December 14, 2009, 8:43pm

@korm

Thanks for digging into this and coming up with the issue with Preview. This helped me as i did some experimenting earlier after reading your post.

I set Scansnap to save the files as JPG rather than PDF, then cropped using Preview. Apparently cropping JPGs saves only the visible cropped portion as I then did an Import (with OCR…) and this time single pages appeared, as i had expected earlier and hoped would happen now. I did this to all my pages and imported the JPGs into DTPO and all looks well.

Apparently my scans of earlier issues (done with a different scanner) were saved as JPGs so I didn’t notice the cropping issue then.

Thanks for your help in nudging me in the right direction.

annard · December 18, 2009, 4:18am

There is not much we can do about it because the way Abbyy takes apart the PDF for images isn’t very sophisticated. Unlike on Windows/Linux were they use the Adobe PDF library and might be able to get the cropping data to reconstitute the original page.