i’ve set up a workflos in acrobat pro that does some ocr-clearscan to pdfs. i add the pdf files directly to the workflow by dragging them from the devonthink db. the workflow is set up to overwrite/replace the original files after scanning is done. there is one issue with this: dtpo doesn’t seem to update the file type afterwards. i have a smart group that shows all pdfs without ocr, but the scanned fils still remain there. type is still “PDF”. how can i force dtpo to update the file type to “PDF/text”?
note: exporting and reimporting the files is not a solution that i want to use, although this works
There is a good chance you are creating inconsistencies in your database by doing this. Acrobat is likely writing a completely new file, not changing the original. Do a Tools > Verify & Repair on the database.
already tried “verify & repair”, but this didn’t work … is there an option to select the file and something like “update metadata” ? or some script that would do the trick?
If the PDF is searchable, it doesn’t matter if it’s marked as PDF instead of PDF+Text. Rely on searching as the indicator.
the thing is, i need the smart group to sort out pdfs without ocr, so i know which have yet to be ocr-scanned …
so the question still is: how can i ocr a pdf file from a dtpo db in adobe and have its type updated?
This isn’t done with the Kind criterion. You could use Word Count is 0.
i know … but as already mentioned, even with wordcount=0, the smart group still shows the clear-scanned pdfs …