Convert incoming PDF automatically

Doyanole · October 3, 2019, 11:42am

Hi
I have to add on a daily basis PDF’s to Devonthink . I have to convert them to searchable documents with OCR.
Can this be automated ? Putting a PDF in the Inbox and starting to do the OCR immediately ?

Thanks in advance

cgrunenberg · October 3, 2019, 11:56am

Which version/edition of DEVONthink do you use? In version 3 smart rules can be used to automate this.

Doyanole · October 3, 2019, 11:56am

I just bought 3.0. But i’v updated to 3.0.1 already, Pro Version

cgrunenberg · October 3, 2019, 11:58am

In that case a smart rule which is performed on import and using the conditions…

Kind is PDF/PS
Word Count is 0

…could perform the action OCR > Apply.

Doyanole · October 3, 2019, 12:16pm

mmmh

i did this :

but it doesn’t seem to work .

Am i forgetting something ?

Doyanole · October 3, 2019, 12:17pm

forget it … never mind. I found the error. Works now. Thanks for your help !!

cgrunenberg · October 3, 2019, 12:18pm

The action has to be performed on import, just in case that somebody else reads this.

MarkPM · October 23, 2019, 9:01pm

Please share the error and fix. I’ve not tried this yet and would like to know of potential issues.

amalis · October 23, 2019, 9:37pm

This works for me …

rkaplan · October 23, 2019, 9:56pm

Silly question perhaps here…

I have my Settings under OCR set to “Convert Incoming Scans” to PDF/OCR.

As far as I can tell, all PDF files I import via Drag/Drop or through the sorter get converted to OCR format.

In what situations would I add a non-OCR PDF that does not get OCR run on it when added to my database?

BLUEFROG · October 24, 2019, 5:26am

As far as I can tell, all PDF files I import via Drag/Drop or through the sorter get converted to OCR format.

Dragging and dropping a PDF into DEVONthink will not do OCR on it automatically without smart rules. That is not an incoming scan.
That setting is for input detected from apps like ScanSnap Home.

Many PDFs don’t require OCR, e.g., ones from Word or apps like InDesign, print to PDF from macOS apps, web clippings as PDFs,…

Tanner · January 6, 2020, 8:08pm

I have tried out the intelligent rules, to create a OCR over the dropped pdfs or mails in the “Eingang”. No effect.
Can You tell me, how must I set the actions?

My goal is: after dropping a pdf from external applications to the Eingangs-folder, the OCR process should begin. I just want to get always searchable pdfs.

Thank you so much.
Tanner

amalis · January 6, 2020, 9:16pm

Tanner,

This works for me:

https://devontech-discourse.s3.dualstack.us-east-1.amazonaws.com/uploads/original/2X/a/a127a02a69ccc299eccf83fef327664dfbe8cc00.png

BLUEFROG · January 6, 2020, 11:33pm

@amalis’s example is a simple an effective one.

Tanner · January 7, 2020, 5:15pm

This works so great! Thank you very much amalis.

amalis · January 7, 2020, 5:50pm

Glad to help.

Tanner · January 8, 2020, 10:39am

amalis,

how can I avoid the duplicates?

amalis · January 8, 2020, 12:30pm

Tanner, I don’t get duplicates. The “word count=0” clause is only true on files that haven’t been OCRed, so that should prevent duplicates from occurring. You might want to file a bug report on that.

BLUEFROG · January 8, 2020, 2:05pm

Avoid duplicates… in what way?
Duplicates created by OCR (which shouldn’t happen with OCR > Apply) ?

Tanner · January 8, 2020, 3:50pm

Hi Bluefrog,
If I right-click on the pdf, choose “OCR -> in durchsuchbares PDF”, than I get a second copy of the pdf and the original is still there. In this case, I get a lot of duplicates.