Information after an OCR scan is lost - half of the document is missing

GOODRIDDANCE · May 5, 2019, 6:20pm

Repro:
Try to convert the attached File to PDF+Text

=> Rotation of the Document is wrong -> half of the document is missing

BLUEFROG · May 5, 2019, 7:02pm

I had a support ticket on a Hangul (Korean) conversion showing that issue but this OCR’d as expected. Take your original without OCR in the Finder, ZIP it, and upload the ZIP - just to preserve the exact file you were processing.

GOODRIDDANCE · May 5, 2019, 9:28pm

Maybe I´ve got to tell, that this was created with the beta1 of DT3

See before and after Screenshot - it isn´t like i would expect that.

ZIP FILE:
00000001-1.JPG.zip (4.0 MB)

BLUEFROG · May 5, 2019, 9:32pm

Thanks! That is reproducible. Where did this image come from?

GOODRIDDANCE · May 5, 2019, 9:44pm

I´ve scanned it with a Xerox WorkCentre 6515DN

BLUEFROG · May 5, 2019, 10:37pm

There’s a rotation applied in the image that appears to be messing this up.

Orientation : Rotate 90 CW

Development will have to look into whether this is a DEVONthink or ABBYY issue.

aedwards · May 7, 2019, 3:29pm

This is fixed and will be in next beta

Blanc · June 1, 2019, 11:41am

@aedwards - I’m still seeing this problem in DT3b2 - is that known, or do you need more details?

BLUEFROG · June 1, 2019, 3:48pm

It’s an issue with the ABBYY engine and we are in discussion with them about it. Thanks for your patience and understanding.

Blanc · June 14, 2019, 7:47am

this is only FYI, because I don’t know whether any changes have been in beta 3 made regarding this problem: the problem is still apparent in beta 3. Although the OCR result is now different (maybe a coincidence), the document is still not complete.

BLUEFROG · June 14, 2019, 12:56pm

I’m not sure if this specific issue was consistently fixed. I know there are discussions about OCR going on right now though.

BergNerd · June 23, 2019, 5:22pm

I’m new to DEVONthink and got the same problem. After excluding all other causes, it seems to be the same problem on my system.

My workflow looks as follows:

Brother ADS-2800W (Auto Start Scan Mode)
PDF (no autorotate, no OCR) on a SMB share
MacOS folder action on SMB scan folder: “DEVONthink - Import, OCR & Delete”
up to this point, all is fine

as soon as the document gets imported and OCRd, landscape pages get rotated and cut off on the right side while a white area on the left is added.

Unfortunately I didn’t figure this out during my trial… hopefully this will be fixed soon, as I am now kind of stuck with no idea how to proceed meanwhile.

BLUEFROG · June 23, 2019, 5:28pm

Welcome @BergNerd

I can’t comment on “soon”, as we don’t give release dates.

Are the original scans in your system Trash still?
Do you not have the original documents to rescan?

BergNerd · June 23, 2019, 5:34pm

Thanks for the reply, @BLUEFROG.

I still have the originals for re-scanning, so no information is lost. It was my fault not to check the inbox earlier… I recognized this after scanning about 900 documents…

Okay, but that is the risk when using beta releases…

My question ist more how to rescan the documents without running in the same problem again… is there any workaround or do I have to wait until it is fixed?

Thank you!

BLUEFROG · June 23, 2019, 5:38pm

I recognized this after scanning about 900 documents…

Yikes!

Okay, but that is the risk when using beta releases…

Indeed!

According to: https://www.brother-usa.com/products/ads2800w , the scanner software has OCR capabilities. You could use their OCR then just import the files post-conversion.

Chazzo · September 24, 2019, 7:31pm

A workaround is to rotate the landscape pages first in DT, and then run OCR. That works fine, at least on my setup.

Which of course might not be fun if you have 900 documents and the position of the landscape pages is unpredictable.

Looking forward to a proper fix. The auto-rotate feature is really useful.

JCHHenderson · September 25, 2019, 11:18am

I am saw this problem yesterday as well (DT pro 3.0). It is a long strip shaped document scanned with my ScanSnap s1300i. My only option was to scan-to-file and then import without OCR (and stay not-OCRd’). Is there a fix in the works?

BLUEFROG · September 25, 2019, 12:22pm

See:

JCHHenderson · September 25, 2019, 1:48pm

Hi - yes, I did see that, but it was written 3 months ago and when DT 3 was in beta. I wasn’t;t sure if you guys thought it had been fixed or not.

cheers.

BLUEFROG · September 25, 2019, 3:15pm

We are still talking to ABBYY about this.