Information after an OCR scan is lost - half of the document is missing

Repro:
Try to convert the attached File to PDF+Text

=> Rotation of the Document is wrong -> half of the document is missing

I had a support ticket on a Hangul (Korean) conversion showing that issue but this OCR’d as expected. Take your original without OCR in the Finder, ZIP it, and upload the ZIP - just to preserve the exact file you were processing.

Maybe I´ve got to tell, that this was created with the beta1 of DT3

See before and after Screenshot - it isn´t like i would expect that.

ZIP FILE:
00000001-1.JPG.zip (4.0 MB)

Thanks! That is reproducible. Where did this image come from?

I´ve scanned it with a Xerox WorkCentre 6515DN

There’s a rotation applied in the image that appears to be messing this up.

Orientation : Rotate 90 CW

Development will have to look into whether this is a DEVONthink or ABBYY issue.

This is fixed and will be in next beta

@aedwards - I’m still seeing this problem in DT3b2 - is that known, or do you need more details?

It’s an issue with the ABBYY engine and we are in discussion with them about it. Thanks for your patience and understanding.

1 Like

this is only FYI, because I don’t know whether any changes have been in beta 3 made regarding this problem: the problem is still apparent in beta 3. Although the OCR result is now different (maybe a coincidence), the document is still not complete.

I’m not sure if this specific issue was consistently fixed. I know there are discussions about OCR going on right now though.

I’m new to DEVONthink and got the same problem. After excluding all other causes, it seems to be the same problem on my system.

My workflow looks as follows:

  • Brother ADS-2800W (Auto Start Scan Mode)
  • PDF (no autorotate, no OCR) on a SMB share
  • MacOS folder action on SMB scan folder: “DEVONthink - Import, OCR & Delete”
    up to this point, all is fine

as soon as the document gets imported and OCRd, landscape pages get rotated and cut off on the right side while a white area on the left is added.

Unfortunately I didn’t figure this out during my trial… hopefully this will be fixed soon, as I am now kind of stuck with no idea how to proceed meanwhile.

Welcome @BergNerd

I can’t comment on “soon”, as we don’t give release dates.

  • Are the original scans in your system Trash still?
  • Do you not have the original documents to rescan?

Thanks for the reply, @BLUEFROG.

I still have the originals for re-scanning, so no information is lost. It was my fault not to check the inbox earlier… I recognized this after scanning about 900 documents… :roll_eyes:

Okay, but that is the risk when using beta releases…

My question ist more how to rescan the documents without running in the same problem again… is there any workaround or do I have to wait until it is fixed?

Thank you!

I recognized this after scanning about 900 documents…

Yikes! :flushed:

Okay, but that is the risk when using beta releases…

Indeed!

According to: https://www.brother-usa.com/products/ads2800w , the scanner software has OCR capabilities. You could use their OCR then just import the files post-conversion.

A workaround is to rotate the landscape pages first in DT, and then run OCR. That works fine, at least on my setup.

Which of course might not be fun if you have 900 documents and the position of the landscape pages is unpredictable.

Looking forward to a proper fix. The auto-rotate feature is really useful.

I am saw this problem yesterday as well (DT pro 3.0). It is a long strip shaped document scanned with my ScanSnap s1300i. My only option was to scan-to-file and then import without OCR (and stay not-OCRd’). Is there a fix in the works?

See:

Hi - yes, I did see that, but it was written 3 months ago and when DT 3 was in beta. I wasn’t;t sure if you guys thought it had been fixed or not.

cheers.

We are still talking to ABBYY about this.