OCR in Japanese

theworldinunion · February 24, 2009, 11:38am

I’m delighted to see that DT uses Abbyy for OCR.

My problem is, I am scanning a lot of documents which are partially in Japanese and English. I believe that Abbyy’s SDK for Mac is only ver 8.0 so the OCR only works for the English parts. Abbyy SDK for Windows v9.0 has full support for Japanese.

My questions are :

will DT be incorporating Abbyy SDK ver 9 when it is available?
when Abbyy ver 9 is incorporated, is it possible to re-run OCR for ALL the existing PDF documents in the DT database EASILY, so that the Japanese parts will also be recognized as well? It’d be a nightmare to have to manually re-run OCR on each single PDF.

Thanks.

annard · February 24, 2009, 12:16pm

I cannot guarantee we will ship Japanese as it is a separate licence. If we do it might happen through our Japanese version of DTPO 2.
In order to prepare it is best to set the Image settings in the OCR preference pane to 300dpi and the quality to at least 85% preferably higher depending on how much disk space you’re willing to sacrifice. This will create a Searchable PDF that should be able to run through OCR again without too much loss in quality.

theworldinunion · February 24, 2009, 2:30pm

Thank you for a quick reply.

If I purchase DT2.0 english now, does the licence allow me to use DT2.0 Japanese when it is released in the future?
Is there a way to create a searchable PDF without a loss in quality of the original PDF?

Thanks.

annard · February 24, 2009, 2:59pm

We have no idea whether we will be supporting Japanese OCR, we only know that we will have to get a separate licence for it (and of course Abbyy does not give these away for free).

Your document will always be changed by our OCR module, that’s because we do fire-and-forget OCR. Of course nothing stops you from keeping both the original and the OCRed document in the database.