How Do I remove OCR

vauha27 · March 14, 2019, 12:54pm

I started using DTPO from scratch after some years not using it. I’ve imported some files. Two of them are pdf files (digital planners to be used with the apple pencil and Goodnotes or Notbehelf 2 on my ipad).

Those files had a size of 36 mb. After the OCR DTPO made of that 1 GB each. For a file which is basically an empty template with plenty of hyperlinks within that file.

Is there a way to remove the OCR?
How do I prevent files from being OCR’ed?

Tia

BLUEFROG · March 14, 2019, 1:02pm

DEVONthink doesn’t do OCR on its own. You need to initiate the process yourself.

What are your resolution and quality settings in DEVONthink’s Preferences > OCR?

vauha27 · March 14, 2019, 1:22pm

I did not say, that DEVONthink did the OCR on its own. It was me, who started the whole batch. Sorry for not being precise enough.

Nevertheless the result is unexpected. 36 mb -> 1GB just because of OCR.

Could this be undone? Other than removing the two big files out of the database and reloading the smaller ones?

And how do I prevent those files from being listed in the intelligent group “Reine PDF suchen”.

cgrunenberg · March 14, 2019, 1:50pm

This smart group lists all PDF documents without searchable text, it’s not possible to remove items from smart groups. It’s only possible to remove the smart group or to change its conditions.

BLUEFROG · March 14, 2019, 3:36pm

You shouldn’t use Selbe wie Scan unless you know the resolution a file was scanned at. It’s best to set a maximum resolution yourself, but we suggest no more than 300 and optimally 200dpi.

There isn’t an easy way to fix these files. I would suggest deleting and reimporting them. If you’re going to do OCR again, adjust the settings first.

mbarton98 · March 15, 2019, 3:00pm

I often need to reduce the file size after I import the document into DT. Open the file from DT and try reducing the file size and saving it back. I suspect it may take a while being 1GB though. I used PDF Expert, but the Preview.app should do the trick if you use the Export option under the File menu. This will not remove the OCR, but basically will resize to a more reasonable resolution for typical viewing.

Happy_DB · March 23, 2019, 3:26am

My experiments showed that at 300 dpi the OCR was clearly less accurate compared to setting the dpi to „same as scan“. Then 200 dpi would be even worse.

The Quality setting did not have that much of an influence and 75% seems to be fine.

As for the file size, I am never suffering from exploding file sizes. Only in earlier versions of DT this was a problem.