Problem with OCR - creating searchable PDF

@cgrunenberg Thanks again for your help.

I understand that. But you mentioned that the folder actions are the old and may be deprecated way, so I’m now trying to set it up like the developers planned it.

Thank you, bluefrog!

OCRs running now, but after the files are automatically moved to the “Out” Folder I’d like to move them to different groups with a smart rule based on the filename’s prefix. The rule for that works fine, when i manually apply it to the files in the out folder by clicking on the file an the appy rule. But I can’t get it to work automatically. I already tried

  • on import
  • on moving
  • on creation
    and so on and it does not work.

I wanted to split the OCR and the moving by filename to different Rules to make it easier to manage since i have a lot of prefixes.

So what i overall want to do are the following steps

  1. import the file to DT when the scanner puts the pdf in the “In” Folder
  2. ocr the file
  3. move it to certain groups by the prefix of the filename
  4. change the Date
  5. change the name
  6. mark it new
  7. delete the not-OCRd file in DT
  8. delete the soucre PDF in the “In” folder in the file system

perhaps you or @cgrunenberg have any ideas on how to do it. Here are my two smart rules:


and when I use one smart rule for alle the actions, then the OCRd file stays still in the Finder and is even not deleted when I empty the trash. Is this a bug?

There’s actually no Move to Trash action, meaning that the rule doesn’t delete the original. Convert & Continue creates & processes a new copy.

So would Convert alone work in that it converts the original PDF on import to a paginated one and applies OCR afterwards?

No, the conversion always creates a copy. The difference is that after the Convert action, the original item is used by the following actions whereas Convert & Continue uses the copy for the remaining actions (likewise Duplicate vs. Duplicate & Continue)

1 Like

This question would’ve deserved its own thread, I think. Anyway: One smart rule’s actions (like Move to or whatever) do not trigger another smart rule, as far as I know. You could add an explicit “execute smart rule” (or maybe “apply smart rule”, not sure about the English UI) to your first smart rule. Not sure about the second smart rule’s trigger, though.

As you said elsewhere, you have “a lot of prefixes” governing to which group to move your documents. Which in turn would require (I believe) a lot of near-identical smart rules if you were using the approach illustrated by your screenshot (Name begins with ... condition).

I’d suggest that you have a look at the Scan name and File action. The latter can deal with variable text detected by the former. So, you can (for example) sort all documents whose name begins with “PKV_” into the group PKV (or whatever).

Besides: Why do you Move into Database in your first rule right after you moved the document to a group? Wouldn’t one of these actions be sufficient?

You’re welcome :slight_smile:

As @cgrunenberg has clarified (for me as well), you can use a Move to Trash action to use the database’s Trash as a temporary location. This way the conversion will happen in the Trash leaving the original file there while you move the file produced by the Convert & Continue action.

Also, if you don’t include a Move Into Database action, the original file remains indexed. Use this action to essentially import the file so it’s not left behind in the Finder.

1 Like

Dear @BLUEFROG, @chrillek and @cgrunenberg,

thank you guys so much. It finally works as planned. The solution was to use the three steps

  1. move into database
  2. move to trash
  3. convert & continue

in this exact order.

It’s too sad, that the declaration of the functions does not tell what they really do. @cgrunenberg, please just read the three lines as listed by me. They dont make any sense at all, when you’re not the developer. Just renaming them into

  1. move file from filesystem into database
  2. move original file from file system to trash
  3. convert & continue

would just say, what those functions actually do (hopefully I got it finally right). Clear naming would have saved me hours. So just a little hint to make a great software even better.

Thanks for your support!

Actually the item in the database is moved to the trash by this action, this doesn’t have any immediate impact on the filesystem.

But which action deletes the file in the file system?

In the mentioned config everything finally works perfectly. But I’d like to understand what those steps actually do.

The Delete action bypasses the trash and immediately deletes the item & its file.

OK, then there must be a bug in the Software.

I’m using the following steps:

  1. move to db
  2. move to trash

and the file on the harddrive is deleted, when those actions are executed.

When you do that, the file will be moved from location x on your hard drive to a location within the database package. It is not deleted from your hard drive, but moved. Moving a file to trash in the database actually doesn’t move it at all - it just changes its association within the database.

2 Likes

hallo Christian, is there an ETA for the fix on this issue?

Will this be a fix in the ABBYY code, or you will apply some workaround in DT3 so that ABBYY will not fail when OCRing these specific files coming from the Brother scanners?

Many thanks,

Luca

Have you checked DEVONthink 3 > Install Add-Ons for an OCR update?

Yes Jim,

I did, everything is greyed out.

Screenshot 2022-12-20 at 15.08.54

I remember doing an update a few weeks ago, but the problem is still there with two files coming from a Brother scanner.

Thanks,

Luca

I’ve been informed this is an issue specifically with the Brother software and an issue with the OCR engine, so we can’t work around it. Using Apple Image Capture application or scanning via our scanning interface (via the View > Import sidebar) should produce results acceptable for ABBYY’s processing.

No worries Jim, I have been able to work around the issue by reading the longish message thread.

Converting from PDF to PDF and then OCRing the resulting PDF fixed the issue.

It was only a couple of files which I received via email which triggered the bug, and it took me a little while to figure things out.

This forum is a gold mine of information, BTW! :+1:t2: :muscle:t2:

Bye, Luca

1 Like

This forum is a gold mine of information, BTW! :+1:t2: :muscle:t2:

We’re glad to hear it. :heart: :slight_smile: