In the process of manually applying OCR to individual PDFs, occasionally a replicant of the produced searchable PDF will be generated in the Global Inbox. This phenomenon happens sporadically without any discernible patterns.
I searched in the forum and found two potentially related posts, neither of which seemed to reach a definitive answer:
Here are some of my settings that may be relevant, judging from the above threads:
DEVONthink Preferences > OCR > Original Document: Move to Trash is unchecked.
This phenomenon only occurs to a small portion, I’d estimate <10%, of the OCR’d files.
I only have one Smart Rule, Filter duplicates, installed at the moment.
Screen shot of smart rule
That might be the case. I’m using a Smart Group to filter all the documents that need OCR, but I do use ⌘R to Reveal the record before applying OCR. Taking your theory into consideration, maybe for some records I forgot to Reveal them first, and some of them happen to exist only in the Tags group.
Some weeks ago, I did notice that a small portion of my records only belonged to the Tags group. I tried to fix it but didn’t find a way, so I decided to forget about it. Now that you brought it up again, is there any way (e.g. Smart Groups) to filter all the records that only exist in the Tags group? Thanks
I selected all the records (by searching for kind:any in All Databases) and ran the script in Apple’s Script Editor but did not notice any new replicants created (by searching for item:replicated in All Databases). Now I’m confused again: does this result mean I do not have any records that only exist in the Tags group? Then I guess all such records have already been discovered and dealt with during my OCR process?
The issue happened again, which was the first time after I made this post. This time I am confident the original scanned PDF did exist in a group before I applied OCR, so there is indeed something else going on here.
Yes, I could reproduce it several times in a row on a specific pdf document. Revealed it before.
Each time two files were created, one in the original folder and one in the global inbox, both indicated as replicants. When I delete the one in the inbox, the file in the original folder is no longer indicated as replicant.
Note: the option to delete the original file during the OCR process is set to OFF in my settings (I like to check manually if everything was converted correctly).
It only happens to SOME pdf files, not to each and every one.
Today, I went through some groups in my databases and manually OCR’d pdf files that needed it (low or 0 word count). I stumbled upon a random pdf file, manually OCR’d it and the above described behaviour appeared. I deleted both replicants and manually OCR’d the original file again, and again, two replicants were produced (one in the same folder, one in the global inbox).
At this point, I can’t see a specific pattern which PDFs are vulnerant to this behaviour. This one had a 0 word count and was somehow old (created 2016).
Yes. I am confident I found the cause now: it happens everytime you OCR a protected pdf file (read-only mode). Somehow, when the OCRing has finished, two new identical files are created: one in the same folder as the original file (source), and one in the inbox. Both are red and italic (so marked as replicants). If I delete the one in the inbox, then the other new file in the source folder is no longer marked as a replicant. The OCR-process seems to have worked fine, and the OCR’d new pdf file also has no protection (read-only mode) anymore.