PDF file type not updating after OCR scanning

mesroporsem · May 30, 2017, 11:15pm

i’ve set up a workflos in acrobat pro that does some ocr-clearscan to pdfs. i add the pdf files directly to the workflow by dragging them from the devonthink db. the workflow is set up to overwrite/replace the original files after scanning is done. there is one issue with this: dtpo doesn’t seem to update the file type afterwards. i have a smart group that shows all pdfs without ocr, but the scanned fils still remain there. type is still “PDF”. how can i force dtpo to update the file type to “PDF/text”?

note: exporting and reimporting the files is not a solution that i want to use, although this works

BLUEFROG · May 30, 2017, 11:19pm

There is a good chance you are creating inconsistencies in your database by doing this. Acrobat is likely writing a completely new file, not changing the original. Do a Tools > Verify & Repair on the database.

mesroporsem · May 31, 2017, 6:19pm

already tried “verify & repair”, but this didn’t work … is there an option to select the file and something like “update metadata” ? or some script that would do the trick?

BLUEFROG · June 1, 2017, 5:12pm

If the PDF is searchable, it doesn’t matter if it’s marked as PDF instead of PDF+Text. Rely on searching as the indicator.

mesroporsem · June 1, 2017, 10:19pm

the thing is, i need the smart group to sort out pdfs without ocr, so i know which have yet to be ocr-scanned …

so the question still is: how can i ocr a pdf file from a dtpo db in adobe and have its type updated?

BLUEFROG · June 1, 2017, 10:43pm

This isn’t done with the Kind criterion. You could use Word Count is 0.

mesroporsem · June 5, 2017, 9:38pm

i know … but as already mentioned, even with wordcount=0, the smart group still shows the clear-scanned pdfs …