OCR - Unable to create PDF export file

bill.raynor · November 1, 2021, 6:54pm

I have a recurring error that seems to be due to a memory leak. I have indexed a directory with 9500 pdf files. Of these ~950 do not have text attached. I have a smart group to select those files. When I select a small number, 5, say, and OCR to a searchable pdf all goes well the first time. However when I repeat this, it eventually fails with the error shown above. Repeatedly Freeing memory between batches allows it to run further but the same error eventually reoccurs, till every OCR job fails. Starting and stopping DT3 just brings me back to the beginning (the 1st conversion batches work, but eventually DT3 just reports the same error again.) I would like to select all the documents and do them as one batch.

Hardware: 2019 MBP, 16GB ram, 400 GB of 2TB SSD available.
Software: DT 3.8 Pro Edition, deleted and re-installed the ABBYY engine as suggested in this forum (for the licenses:0 error)

BLUEFROG · November 2, 2021, 2:35am

So the issue only occurs when doing large batches?
Are you running the trial edition (I’m guessing no but have to ask)?

bill.raynor · November 2, 2021, 3:12am

multiple small batches also cause the error. If I monitor memory, it decreases with the number of papers, whether they are in small batches or a large one.
here’s the about window:

image896×598 99.8 KB

bill.raynor · November 2, 2021, 4:28pm

Also these are all professional level math/statistics paper. Lots of equations, greek mixed in with text (e.g. on the same line)

BLUEFROG · November 2, 2021, 4:47pm

Please hold the Option key and choose Help > Report bug to start a support ticket and attach a problematic PDF for us to test. Thanks!

Blanc · November 2, 2021, 7:15pm

Bill, which macOS are you using? I’m reading an increasing number of reports on memory leaks in Monterey (search, e.g., on DuckDuckGo.)

bill.raynor · November 2, 2021, 7:27pm

There is not one problematic paper, its the volume.
@Bluefrog:

If I start DT3 and do a single paper, it works even for large books (500 pgs), even if that paper was the one that failed in a previous run.
If I start DT3 and do multiple single papers, one at a time, it eventually fails.
If I start DT3 and do a batch of, say, 10-15 papers, DT3 fails partway through the batch (usually)
I can send you several papers if you wish.

@Blanc: I’m using Monterey 12.0.1. Thanks for the tip. I’ll check it out.

bill.raynor · November 2, 2021, 7:53pm

As suggested, I just filed a bug report (#699397) and attached a single paper.

bill.raynor · November 13, 2021, 12:37am

The bug appears to be in the Finereader engine and support has opened a ticket with them. In the meantime, OCR works if you do single OCR jobs. no batches. Hardly fun with 950 documents. It does give me a chance to add tags, though.

Blanc · November 13, 2021, 5:40am

Thanks for reporting back

Mgfrei · January 16, 2022, 5:16pm

I am having the same problem… any update

BLUEFROG · January 16, 2022, 5:25pm

Do you have a support ticket open?