OCR on images in PDF

dstay · April 15, 2011, 9:01am

I have a few PDF and PPT files with images containing text in them. It would be great to just run the OCR on the respective images or slides, cutting down on quality-loss and processing time. Am I overlooking something or is this at all possible?

Bill_DeVille · April 15, 2011, 1:41pm

In the case of the Powerpoint file, you can edit it, extract or copy the image files and run them through OCR and finally replace them.

That can’t be done on the PDF images, however. I suspect the best that could be achieved would be OCR only of the slides that contain the images that need OCR. For example, one can select the page icon of a PDF page and copy it to the clipboard, then paste it into the same group as a single-page PDF. Use Data > Convert > to Searchable PDF on the page, then replace it into the original PDF and Save.

The accuracy of text conversion of the images will depend on the quality and resolution of the images.