DT 3.5 update - DTOCRHelper eat up the memory!

cgrunenberg · May 22, 2020, 7:51am

3.0.4 (or a Time Machine backup) is necessary to install the old engine.

nano5 · May 22, 2020, 7:53am

Thanks, I will stay with version 3.5, and leave the OCR idle for now.

cgrunenberg · May 22, 2020, 7:54am

Splitting the document into smaller ones and processing them might be also a possibility.

nano5 · May 22, 2020, 7:57am

Not sure I am comfortable with splitting documents from now on.

By the way, does DT developer consider this an issue to be fixed in future release, or smaller document for OCR is the way to go now?

cgrunenberg · May 22, 2020, 8:21am

This is definitely not a fixed issue and we’re in contact with Abbyy (as it’s their updated engine that causes the troubles).

nano5 · May 22, 2020, 8:23am

Thanks.

ebowman · May 22, 2020, 9:13am

Your QA processes let it through to release to your customers, though.

An option to revert to a previous version of the OCR engine without rolling back all of DT3 might be nice if they can’t fix it quickly.

WhyO_74 · May 22, 2020, 9:50am

I’m also having update problems, but only CPU-hugging.
Haven’t done scans after the update, only saving articles as multi-page pdfs. DT3 is therefore mostly open in the background, but I instantly noticed that my Mac Mini got asthma with the update

Watching the Activity Monitor, DT3 memory usage while open in background is stable around 500 MB, but the CPU-usage idles between 50-80%. As soon as i activate the DT3 window, CPU rise to 89-140%, still after all these days after the update. This CPU-hugging was also while in the background at the beginning.

I´m wondering, is DT3 maybe rescanning/updating the OCR-version of all documents in the Inbox and open libraries? Then I could understand the “underground” working in the background: The DT3-database is on my NAS, and I hear the NAS scanning HD aka reading files all the time.

My mid 2012 Mac Mini (i5 Duo 2,5GHz) is still my working hero, maxed with 16 GB RAM and a SSD-disc. Now shorter breathed, but not giving up

cgrunenberg · May 22, 2020, 9:52am

DEVONthink 3 doesn’t automatically OCR files again, only scans on demand (see Preferences > OCR). Or do you use any smart rules that perform OCR?

BTW:
How is the NAS connected? Storing databases on network volumes is only recommended in case of wired connections.

BLUEFROG · May 22, 2020, 12:30pm

Your QA processes let it through to release to your customers, though.

We saw none of this behavior and received no reports from any beta testers.

ebowman · May 22, 2020, 12:47pm

I’m sure … that means there is a gap somewhere. It’s part of a customer-centric, continual improvement mindset when something like this happens that gives your customers a bad experience, you respond by finding the missing automation or process steps that would have prevented it and make sure they are in place going forward. This is how the world’s best software companies operate.

BLUEFROG · May 22, 2020, 12:51pm

Sorry but we are one of “the world’s best software companies”

ebowman · May 22, 2020, 1:09pm

I would so love for that to be true. You certainly have created one of my favorite pieces of software.

BLUEFROG · May 22, 2020, 1:41pm

It’s a matter of opinion. Many of our clients tell us exactly that we are.
You are free to disagree with them, as you wish.

ebowman · May 22, 2020, 2:04pm

Of course. You folks do an admirable job as a small company that makes an enormously useful piece of software. I value it and use it daily. I’ve convinced others to adopt it.

But there is plenty of room for improvement, in my opinion. I find there is generally a defensiveness and victim mindset undercurrent in almost all my support requests, for example. Blaming a vendor, claiming the sync code hasn’t changed in a long times I what can we do? That mindset is not what we have come to expect from the best software companies in the world.

One thing the best companies know is that there is always room to improve, and they are always looking for those ways. If you are also looking, I’d be happy to give more feedback privately.

WhyO_74 · May 23, 2020, 8:07am

The only kind of DT3 usage after the update is copying Safari-articles with the “Clip to DevonThink” Safari addOn. Mostly as multipage pdf (I check page formatting of import result in DT3), evtl as web archive on bad formatted webpages.

Only automation in preferences is converting incoming scans to searchable pdf, but thanks to Catalina, my Canon P215 doc scanner does not work any more (it was bad anyhow).

My Synology DS415play NAS is connected with GB-ethernet to my always-running Mac Mini.

WhyO_74 · May 23, 2020, 8:13am

BTW: CPU usage is now back to “normal” on the Mac Mini

(As we speak, I will update my elder MacBook Pro from DT3 v 3.04 to v3.5. No open library, only fresh synced Inbox. Just keeping an eye on the Activity Monitor)

WhyO_74 · May 23, 2020, 8:44am

No problem what so ever after updating the MacBook Pro. Also Mid 2012 2,5GHz i5 Duo w/ 10GB RAM and SSD.

Was peaking at 130% CPU the first 5 sec after upgrading the ABBYY-library, but back to sub-13% CPU usage.
Also no lag or so when opening 120paged pdfs and instantly fast scrolling, the old lad peaks at maybe 80% CPU.

Also tried opening an library the first time, and no suddenly “background-updating” or so

Think this will confirm, that DT3 v3.5 is rock solid!

nano5 · June 16, 2020, 1:06am

After updated to DT v3.5.1, it initiated reinstall of the ABBYY OCR module (761MB). But the memory issue remains, that ocrhelper quickly consume up memory after starting the OCR job and freeze.

Now I use standalone FineReader with smart rule created by @Silverstone Script to OCR PDFs with the latest FineReader, set to “high resolution”, which works well.

By the way, it seems standalone FineReader works a bit different in terms of resource strategy, e.g. on average it takes 25% CPU capacity during “import” and “saving”, and 75-80% when “recognizing”; my iMac running a quad-core intel i5

It is a 86 pages BBC Science magazine, a lot of images,

during Import, FineReader consumes up to 800MB memory;
during Recognizing, up to 2.8GB memory;
during Saving, around 2.6GB

The original pdf is about 78MB and OCRed pdf is doubled to 156MB.

cgrunenberg · June 16, 2020, 10:05am

How much memory does the DTOCRHelper.app require to process this document? Or could you send us a copy?