Speeding up DTTGO with high quality OCR scans?

syntagm · December 20, 2019, 3:12am

I am always scanning in the highest possible resolution and apply OCR without compression on the files, so a lot of them are very big and take some time to load. On my mac this is less of a problem because it’s more powerful but on iOS this is a problem because most of my documents even with 1 page take 1-3s to load. This makes fast processing and skimming through scans very painful

Does anyone have a good workflow to keep high quality originals but also a downscaled version for mobile devices? DTTGO sadly lacks capabilities to create the downscaled versions itself so I’d have to offload or preprocess somehow to somewhere else.

anon6914418 · December 20, 2019, 5:46am

Could you elaborate what your outcome would be? People use DT and DTTG for many different purposes and hence end up with different accessibility questions.

Personally I don’t “skim” through my database very often, but frequently use a combination of OCR search and classification to find files in DTTG.

Is this necessary? Do you need near 100% perfect OCR, or would a lower percentage suffice?

As I don’t experience this problem myself, I don’t have a workflow, but you might be able to duplicate high-res PDF’s into low-res versions without (near perfect) OCR that you can skim through. Whether this fulfills your need depends on your use case of course. You do end up with a database that is >100% of the current size.

You didn’t mention on what mobile device you’re running DTTG, but a more expensive and less elegant solution could be to buy a new iPad Pro. Whether that will work in your case is also something only you can conclude,

On the other hand, if you’re looking for “reasons” to buy a new iPad, this might be it

rfog · December 20, 2019, 10:19am

To get good OCR, I use two tools. Sometimes one does the work as I want, sometimes the other. I don’t know why, with same scanned file from same scanner (well, iPhone photos), and mostly same scanned kind (old books and magazines), one tool does the work better than the other and vice-versa.

For non trivial stuff (like a screenshot converted into PDF and OCRed), I use Cisdem PDF convert (one purchase of about 50 bucks) and PDFPen (I don’t know the price of this, as I’m using it via SetApp).

Result from Cisdem is very good, compress a lot and the final result is more or less exact than original scan, you only notice the loss of quality when zooming more than 4x or so. But sometimes the result is 10x instead of half or 1/4 of the original size, and the program does not have any option to change parameters. Sometimes the resulting PDF is corrupted, with pages half rotated or simply cutted in half or displaced.

Then PDFpen comes to rescue. It is slower, it sometimes fails and you need to divide the PDF in little PDF files, takes forever to do the OCR but the result always is the original size plus some KB.And cannot batch OCR as Cisdem can.