Import with OCR in Devonthink 3 limited to only processing 1 document at a time?

Sent this question to the devs as well but wondering if anyone knows the answer here. Like many, currently trialing the new 3.0 beta and came across a potential issue/concern when using import with OCR.

I noticed that in Devonthink 3 that OCR on import seems to wait for you to tag an imported item before moving onto the next item (assuming you are importing multiple items). This is different than DTPRO Office 2 which continues onto next item. Is this a limitation of DT 3 in general or just the beta ? When I usually do an import with OCR I stack about 20 - 30 documents then come back and tag after its processed a few therefore if DT3 handles this differently and waits for tagging etc even after 1 doc before moving on, that is unfortunately a big impact to those of us that like to bulk import and OCR.

Anyone seen the same and or know the answer/workaround etc ?

Thanks

I noticed that in Devonthink 3 that OCR on import seems to wait for you to tag an imported item before moving onto the next item (assuming you are importing multiple items).

This is not a requirement. You can disable Enter metadata after text recognition in Preferences > OCR.

PS: This is also present in DEVONthink 2.x.

Jim, I respectfully disagree (am I actually allowed to do that? :stuck_out_tongue_winking_eye:?)

DT2.x: I scan 20 items to DT with OCR on import. When the first item has OCR’d, a window opens, requesting me to enter metadata. Whilst that window is open, OCR continues on the 2, 3, 4, … document.

DT3: I scan 20 items to DT with OCR on import. When the first item has OCR’d, a window opens, requesting me to enter metadata. Whilst that window is open, OCR does not continue on further documents. OCR is performed on a document by document basis, waiting for metadata to be entered.

Whilst I am aware that I can choose not to enter metadata on import, but rather add it later, the behaviour most certainly has changed between DT2 and DT3. Whilst OCR has also become rather faster, this limitation is still a bit of a pain. I posted about it some while ago here: DT3b3 - Waiting for next OCR

@BLUEFROG, Jim thanks for the reply. I have to agree with @Blanc on this one that the behavior in DT2 was more beneficial and user friendly for importing multiple docs with OCR. I don’t want to turn metadata on import off but instead prefer the OCR engine to keep moving through the batch list while the window remains open for metadata as the user is available to enter it. That behavior in DT2 allows me to check back in periodically, update metadata on say 10 to 20 imports at a time while the engine continues. I find this much more conducive to the import flow of multiple docs than going back in after to add metadata. In DT2 Pro Office I often times stack up 50 to 60 in the queue of imports.

The OCR engine in DT3 waiting on 1 document at a time is less than an ideal workflow IMO.

Is there any chance this can be modified by the team before release to implement the DT2 functionality and approach to OCR ?

Thanks for the consideration

Development would have to respond to this.

And yes, civil (even if passionate) disagreements are allowed on our forums.

1 Like

@BLUEFROG, Jim I just wanted to check in and see if there was any update to this request ? As a longtime DTPRO user I have to say this OCR engine change would seriously impact my inbound workflow and therefore unfortunately make this a deal breaker for me personally. Hoping something can be done on this. Thanks so much for the consideration in advance

There is no change in the process of entering metadata.

  • If you disable Enter metadata after text recognition in Preferences > OCR, you can scan in an uninterrupted fashion.
  • If you enable this option, you will be prompted after DEVONthink receives each scan.

Jim, that was your answer above but as myself and @Blanc explained we dont want to turn off metadata but there is a difference in DT3. As I stated above 'I don’t want to turn metadata on import off but instead prefer the OCR engine to keep moving through the batch list while the window remains open for metadata as the user is available to enter it. That behavior in DT2 allows me to check back in periodically, update metadata on say 10 to 20 imports at a time while the engine continues. I find this much more conducive to the import flow of multiple docs than going back in after to add metadata. In DT2 Pro Office I often times stack up 50 to 60 in the queue of imports.

The OCR engine in DT3 waiting on 1 document at a time is less than an ideal workflow IMO.

Is there any chance this can be modified by the team before release to implement the DT2 functionality and approach to OCR ?’

Therefore wondering if the dev team have considered reverting to the engine functionality currently in DTPRO Office 2.x where the scanning/OCR continues ?

Thanks

The OCR engine in DEVONthink 3 is not the same one as was used in 2. This means the former options may not even be technically feasible any longer.

Development would have to assess the feasibility of this.

Is there any chance this can be modified by the team before release to implement the DT2 functionality and approach to OCR ?’

I’m not sure what you’re referring to here, as DEVONthink 3 was publicly released last Thursday.

Jim, thanks for your reply. Yes this was publicly released last Thursday but as I had heard no updates since our last conversation 20 days ago where you mentioned *‘Development would have to respond to this.’, I wanted to check in to see if anything further had been or could be done.

Thanks and hope that makes more sense now

Nothing new to note.

Remember to check the Help > Release Notes with each release of DEVONthink. While we usually don’t report deep non-user facing technical details, a change like this would be recorded.

I agree with this threed, this is a huge downgrade from Devonthink 2. What were the devs thinking? I scanned a huge stack of documents today thinking I could come back later and just quickly assign the tags and other metadata one after another. However when I came back the same huge stack was “adding document…” with only 1 document in queue for me to assign the metadata. This is a huge disappointment. This makes the OCR function completely impractical. Who wants to sit at their computer while it chugs through hundreds of pages only to take 1 second to enter the metadata every 4 minutes.

I too would appreciate a more detailed response from DevonTech on this; I actually don’t understand why this would have been changed from DT2 to DT3 and would appreciate an explanation (whilst nobody owes me any explanation, I find things easier to accept when I understand them; ideally of course I would like to hear whether and when uninterrupted sequential processing will be available again).

I love DT3 even so, though :heart: used daily, appreciated daily. Life-changing software as far as I am concerned :slight_smile:

We are currently working on a fix for this issue with the aim of including it in the next update.

1 Like

That’s great to hear, thank you :+1:

All, I’ve not checked back in on this topic since i started it and to be honest was losing hoping this very real and very impactful issue would be resolved. As a long time user of DTPro Office I can honestly say that the ONLY thing holding me back from upgrading to DT 3 is this issue. I love the works the devs do and think this is an amazing product but there is no way that the removal of this feature in the OCR part of the workflow could ever be seen as an upgrade or a benefit to most users.

So with all that said, I am extremely happy to here @aedwards that there is a fix being built to address this. Are there any details you can share, for example will it re-enable the functionality (hopefully this) from DTPro Office 2 or is it being implemented in some other way ? Any timeline on when the next update (even ballpark) may be delivered ?

Thanks again for listening to this issue and at least agreeing on an attempt to fix. Hope more details can be shared on the solution being worked and timeframe.

The OCR queue functionality should now work in a similar manner to v2. Whilst we do not have a release date for the next update it is likely to be towards the end of Jan or early Feb.

1 Like

Hi there Alan. Coupe of follow up questions:

  1. When you say should work similar, can you explain how this has been implemented and also how it still may be different from V2 ?

  2. Any update on when this will be available so we can try ?

  3. Will this updated version exist in the latest trial version so can make sure this once again fits the needs before purchasing ?

Thanks again

  1. What I mean by they work in a similar way is that the PDF metadata entry window no longer blocks the next OCR job from starting. Therefore if you had 10 documents to OCR and did these overnight, in the morning all documents will be OCR’d and you could then enter the metadata.
  2. I don’t have a update on release date at the moment
  3. Yes this update will be available in the version

Thanks Alan. Hopefully coming soon.

One last question, what will be the amount of docs/pages it can hold before it would stop and wait for entry, any limits short of hardware (memory etc) that are being imposed on this ?

Thanks again !