OCR on Idle

LibertyTrooper · January 25, 2015, 5:18pm

Hi,

Is there a way to cause DTPO to, while the application is idle, perform the OCR function on files which require it? This would allow work to continue if a large number of file are to be processed.

Thanks!

korm · January 25, 2015, 7:58pm

The process cannot explicitly be scheduled in the background by the user, though the OCR process can be running while other activities are occuring in DEVONthink and DEVONthink adjusts priorities to accommodate demand. I’ve never noticed a slowdown while OCRing files and doing other tasks concurrently in DEVONthink (or any other app, for that matter).

Or, you could kick off a back of OCR during your off hours?

LibertyTrooper · February 5, 2015, 6:41pm

Actually, perhaps I should say “Index” on idle. I suppose what I want is to not have DTPO prevent me from working with other items while it is indexing or, perhaps, even importing. Getting information into DTPO should be easy and not cause me to need to walk away from working with it when bringing in a plethora of documents (as one does when dealing with legal cases)l.

R,
LT

korm · February 5, 2015, 7:36pm

Is it fair to assume you must be indexing a large folder hierarchy – perhaps in the 100s or 1000s of MB range? Today, DEVONthink 2.x does not index in the background – that would be a feature request. Personally, I do a fair amount of indexing, but I try to break this down into smaller chunks. So, in this case:

A
–B
–C
----D
–E

I wouldn’t index “A”, I would index B, C and E so that the index update addresses smaller data sets.

LibertyTrooper · February 6, 2015, 2:41pm

Exactly. I have thousands upon thousands of documents that are related to a case in which I’m involved. Not to mention the ‘normal’ things that I wish to do. The luxury I do not have is time. Which is why I’d love to be able t figure out a way to index, import, and/or ocr without having to go get a cup of coffee. Indeed, right now I’m running into a problem where DTPO spinning beachball’s while trying to empty the trash. Each time I force quit, I get a bit concerned because I have lost entire databases before as a result.

My next approach is to try to create an Automator workflow which will index, import, or ocr one item at a time. However, I do not believe this is really going to address my usecase.