Hello! I’m a new and unskilled DT user, so this problem may be a result of my ignorance. Anyways, I need to import photographs of texts (JPG files) into a DT database and then use DT (of course, with the ABBYY add-on) to convert them into OCR’d PDFs. I’m doing this on a pretty massive scale, which is the whole reason I’m even using DT.
I just want to convert each JPG to an OCR’d PDF.
But I have run into what may be a bug with both of the ways I’ve tried to do this.
The easiest (but least automated) way is as follows: I select the file or files (JPG or JPGs) and then Data>OCR>to searchable PDF. When I do this, it creates a triplicate of the OCR’d PDF. (If I select Preferences>OCR>Move to Trash, then it deletes the original JPG file; if I don’t, then it makes the triplicate PDFs and leaves the JPG file in the same location.) Curiously, each of the OCR’d PDFs are also of slightly different file sizes.
I’ve made a Smart Rule configured as follows:
This, however, leaves me with three copies of the file–the original JPG, a regular PDF, and an OCR’d PDF. Needless to say, I just want the latter! (I’ve tried to make another Smart Rule that I could run after this to delete the unwanted files, but so far to no avail.)
Obviously, I’d love to be able to automate this. I had hoped by using DT I could set it up so that I could drag and drop a folder into my DT database and have DT automatically convert all the JPG files nested within the files/groups into OCR’d PDFs. I guess I would settle for getting 1) above to just allow me to convert JPG to OCR’d PDF without triplicating or duplicating.
Help would be appreciated!