Markups wrong after OCR

Andreas76 · May 14, 2020, 12:41pm

Hello,

I have a document which has been digitally signed. I have had it in portrait format from the beginning and I have imported it in this format into DT.
After I converted the document to OCR, the signature is in the wrong place. Is this a know issue ? This is VERY annoying, because I didn’t notice it and I have distributed the document further, which doesn’t leave a good impression

Andreas76 · May 14, 2020, 12:46pm

In addition I spend the weekend to clean up my duplicates and then do OCR on 3.000 documents. Thank you… which of my documents might have same or other problems now ? Do I need to recover everything from a backup ?
Sorry but I am really getting frustrated - almost every time I try to reorganize something I run into trouble. I spend so much time since V3 with forum, google, workarounds, waiting, reorganizing… I am not willing anymore to do so. Sorry for my open words but I have been patient enough and need to focus on work.

aedwards · May 14, 2020, 1:19pm

I have just tried this with the new update of DEVONthink (v3.5) which is now using ABBYY OCR v12. This version of the OCR has worked with the digital signatures I tested. Once you have updated, if you still encounter this issue could you send a sample document.

Andreas76 · May 14, 2020, 1:55pm

OMG, of course I waited for months to OCR everything and right now the new version came out.
But No, it’s not working - same result.
Sorry, I cannot share the document because it’s a sensitive kind of contract.

cgrunenberg · May 14, 2020, 2:06pm

Which app did you use to add the signature to the PDF?

Andreas76 · May 21, 2020, 9:08am

I just got the response.
It was made via pdf attachment using apple mail on iOS and the native Apple Pencil integration. No third party software .

aedwards · May 21, 2020, 10:05am

I have tried using the same method and unfortunately haven’t been able to reproduce the issue. Is it possible to send a signed document that does not contain sensitive information?

Andreas76 · May 30, 2020, 9:43am

I cannot reproduce it here either… I will try different scenarios.

But what about my other questions ? I have to make a decision for a rollback.

the new pdf files do have a slightly worse resolution - why can’t they just stay the same, when I just want to add the ocr layer ?
are there other known issues that I should be aware of - I noticed the rotation problem only by chance.
The new ocr files do have an updated “added date” - how can I keep having all original attributes ?

aedwards · June 1, 2020, 9:04am

The ABBYY OCR will recreate the image layer as well as adding a text layer when it generates the new PDF file. Have you turned off the Compress PDF option in the OCR preferences?
2.Which rotation problem are you referring to?
The “added date” is a DEVONthink value and is not stored in the PDF file

Andreas76 · June 4, 2020, 4:24am

Hi Alan,

thanks for trying to help me.
The rotation problem I am talking about is the signature problem of my initial post.

Yes, I do have turned off the option to compress PDF.
Is this a limitation of DT or does EVERY OCR Software recreate the image layer - I can’t imagine, why it should not be possible just to edit / add the text layer or why not to keep the image layer as it is. This way the PDF files are not getting better if several adjustments have added up over the years.
I might remember wrong but in DT 2 I was able to decide which compression level to use. If there is no compression at all (as DT3 gives me the option to do so), I expect to keep the image layer as it is.
Why is software trying to get simpler but in the end I have more questions than before.
Okay, that’s the reason… but no solution. If I use a DT feature it should at least be scripted in a way that offers me to decide wheather I want to keep the attributes. At the end I am using DT to assist me and not to offer features that I have to understand in detail if I want to make use of them. I want to keep the added date for my 2000 documents - do I really have to deal with scripting again ? As a user … do I really have to know about which field is a DT field and which is not as well as the different handling ?

BLUEFROG · June 4, 2020, 5:26am

On 2. that is a non-negotiable. The Date Added is the date a file is added to a database, not the Creation Date. Clearly, if a new file is generated via OCR, it has a new Addition Date since it’s added at a different time.

Andreas76 · June 4, 2020, 7:11am

Yes, understood. But I used this date to identify different tranches of documents I was working on (importing, working, reorganizing). So this was initial a pure technical information but I continued using it as a content information.
On the other hand the Date Added keeps the same when moving a document from on to another database. This is inconsistent if it would be a pure technical information for adding a file to a database