- You do not need to rotate a JPEG scan image to prepare it for OCR.
The ABBYY OCR software first analyzes an image, including the orientation of the majority of text characters on the page, and will automatically choose the orientation that has the text properly oriented.
EXAMPLE: I took an upside-down snapshot of your post as displayed on my iPad, using a digital camera. The camera was hand-held and I didn’t try to be especially careful in taking the shot. I then transferred it to my iMac, opened the JPEG in GraphicConverter and kicked the resolution up to 300 dpi and did a quick and dirty white scale correction. The image was then imported into DT Pro Office, clearly upside-down. Data > Convert > to searchable PDF then produced a searchable PDF that was properly oriented. The text copied below resulted from using Data > Convert > to rich text.
The OCR recognition wasn’t entirely accurate, probably because a bit of camera shake emphasized moire patterns in the image, especially on the right side of the image. The phrase “that the bug” wasn’t recognized in the first sentence, below:
I have just retested the new version and can confirm affecting the rotate of images is still there in 2.0.4.
Steps to reproduce the problem:
1- select a jpg image in the database 2- rotate the image left or right (f5 or f6)
3- select another image
4- return to the previously rotated image
5- one can verify that the image is not rotated like she had, and the image changed format from jpg to tiff (higher disk space usage)
I would like to send a bug report in some way to the development team so that it can be fixed as soon as possible (in 2.0.5 or in a hotfix way)
since this is in my opinion a major bug in an application that has to deal with lots of images (scanning documents has often to do with rotating
images i.e. if the document is landscape oriented)
The example illustrates that ABBYY’s OCR is “smart” enough to handle an orientation problem. That’s why it works so well with the ScanSnap scanner, which requires that pages be inserted in portrait orientation, even for a document containing a mix of portrait and landscape pages.
WHY DID I POST-PROCESS THE IMAGE? I could have set the camera to produce a higher resolution JPEG, but thought it might be useful to demonstrate this trick. Indeed, the original image had less than 200 dpi resolution, and the OCR recognition of that image was less successful than recognition of the processed image. I used GraphicConverter to kick the resolution up to 300 dpi, with resampling. That involved some interpolation, which improved character recognition. This resulted in increasing the size of the image from about 400 KB to 14 MB. But that’s not important in preparation for OCR, if one has selected Preferences > OCR to move the original image to the Trash.
The white scale fix improved the contrast. All in all, the OCR recognition was pretty good. It would have been still better had I used a tripod or other camera mount and carefully aligned the camera to the iPad’s screen.
- The little image rotation procedures provided in DEVONthink are not intended for critical work. They are useful when viewing photos that were not properly oriented, but I generally wouldn’t save the result afterwards.
Above all, don’t use those routines on a multipage PDF or TIFF, as only the rotated page will be viewable afterwards, if the change is saved.
If you wish to change the orientation of a page in a PDF, that can easily and safely be done in DEVONthink’s view of the PDF, or in Preview.