major bug with images

hybra · September 22, 2010, 7:32pm

Hello,
I have just retested the new version and can confirm that the bug affecting the rotate of images is still there in 2.0.4.

Steps to reproduce the problem:

1- select a jpg image in the database
2- rotate the image left or right (f5 or f6)
3- select another image
4- return to the previously rotated image
5- one can verify that the image is not rotated like she had, and the image changed format from jpg to tiff (higher disk space usage)

I would like to send a bug report in some way to the development team so that it can be fixed as soon as possible (in 2.0.5 or in a hotfix way) since this is in my opinion a major bug in an application that has to deal with lots of images (scanning documents has often to do with rotating images i.e. if the document is landscape oriented)

Bill_DeVille · September 22, 2010, 11:49pm

You do not need to rotate a JPEG scan image to prepare it for OCR.

The ABBYY OCR software first analyzes an image, including the orientation of the majority of text characters on the page, and will automatically choose the orientation that has the text properly oriented.

EXAMPLE: I took an upside-down snapshot of your post as displayed on my iPad, using a digital camera. The camera was hand-held and I didn’t try to be especially careful in taking the shot. I then transferred it to my iMac, opened the JPEG in GraphicConverter and kicked the resolution up to 300 dpi and did a quick and dirty white scale correction. The image was then imported into DT Pro Office, clearly upside-down. Data > Convert > to searchable PDF then produced a searchable PDF that was properly oriented. The text copied below resulted from using Data > Convert > to rich text.

The OCR recognition wasn’t entirely accurate, probably because a bit of camera shake emphasized moire patterns in the image, especially on the right side of the image. The phrase “that the bug” wasn’t recognized in the first sentence, below:

I have just retested the new version and can confirm affecting the rotate of images is still there in 2.0.4.
Steps to reproduce the problem:
1- select a jpg image in the database 2- rotate the image left or right (f5 or f6)
3- select another image
4- return to the previously rotated image
5- one can verify that the image is not rotated like she had, and the image changed format from jpg to tiff (higher disk space usage)
I would like to send a bug report in some way to the development team so that it can be fixed as soon as possible (in 2.0.5 or in a hotfix way)
since this is in my opinion a major bug in an application that has to deal with lots of images (scanning documents has often to do with rotating
images i.e. if the document is landscape oriented)

The example illustrates that ABBYY’s OCR is “smart” enough to handle an orientation problem. That’s why it works so well with the ScanSnap scanner, which requires that pages be inserted in portrait orientation, even for a document containing a mix of portrait and landscape pages.

WHY DID I POST-PROCESS THE IMAGE? I could have set the camera to produce a higher resolution JPEG, but thought it might be useful to demonstrate this trick. Indeed, the original image had less than 200 dpi resolution, and the OCR recognition of that image was less successful than recognition of the processed image. I used GraphicConverter to kick the resolution up to 300 dpi, with resampling. That involved some interpolation, which improved character recognition. This resulted in increasing the size of the image from about 400 KB to 14 MB. But that’s not important in preparation for OCR, if one has selected Preferences > OCR to move the original image to the Trash.

The white scale fix improved the contrast. All in all, the OCR recognition was pretty good. It would have been still better had I used a tripod or other camera mount and carefully aligned the camera to the iPad’s screen.

The little image rotation procedures provided in DEVONthink are not intended for critical work. They are useful when viewing photos that were not properly oriented, but I generally wouldn’t save the result afterwards.

Above all, don’t use those routines on a multipage PDF or TIFF, as only the rotated page will be viewable afterwards, if the change is saved.

If you wish to change the orientation of a page in a PDF, that can easily and safely be done in DEVONthink’s view of the PDF, or in Preview.

hybra · September 23, 2010, 8:46am

Hi,
my report was not entirely about OCR procedure. I know the OCR engine can recognize even wrong oriented documents.
The problem is that a document, or image, or photo, has always to be rotated to be viewed in the correct way. Furthermore, when rotating an image, Devonthink is acting like opening the file, start some kind of editing mode, saving the file in TIF format (even if the file was in another format i.e. JPG).
My suggestion would be that Devonthink when including images in the database flag them with a logical orientation status of “0”, and then, if a user rotates it 90° right, it just update that flag to “90”. If the user rotates it again the flag status becomes “180”, if rotates again it becomes “270”. Instead, if the images is rotated 90° left from the original orientation the flag becomes “270”, then “180”, then “90”… and so on.
This way there is no need for Devonthink to save orientation changes and make changes to the original file (i.e. non destructive orientation change, or editing, à la Lightroom database). DT can retrieve the original file from the database, look at the orientation flag and rotate it in memory before displaying it.
I hope I have been clear, if not I am available for further explanation of the suggestion…