OCR and images

parlar · February 8, 2007, 2:45pm

So I’m a relatively new, but very happy, user of DT Pro. Now that I’ve been using it for a bit, I’m starting to notice just how many papers I come across that were originally done in PostScript, and are not searchable after converting to PDF.

So, I’m considering upgrading to Pro Office just for the OCR, to convert these converted PS files into something more useable. My question though, is how does the OCR software handle images in the paper? Does it leave them in place, or remove them? It’d be pretty useless to me if the pictures were gone.

Thanks in advance,
Jay P.

annard · February 8, 2007, 4:29pm

You will get a PDF with the original image and invisible text under the letters that can be selected and are used for our indexing mechanisms. In other words: you loose none of the images. Try it out and you’ll see!

parlar · February 8, 2007, 4:41pm

That sounds very cool!

Question then: I have DT Pro installed. If I install the demo version of Pro Office, is it going to “mess up” my DT Pro install? Like add extra scripts that aren’t compatible?

Thanks,
Jay P.

annard · February 8, 2007, 4:58pm

Well, the core application of Pro Office is identical to Pro. So the database will not suffer. It is better to remove Pro temporarily because otherwise scripts and the Services may fight with one another. You can always put it back later. The online help has a detailed listing what is installed where if you change your mind and don’t want to keep Office installed.

parlar · February 8, 2007, 5:03pm

Is there an easy way to uninstall? Neither the built in help, nor a search of your website comes up with any hits on “uninstall”. That’d probably be the easiest way for me to make sure everything is temporarily removed.

Jay P.

annard · February 8, 2007, 5:18pm

Try to search for “remove” instead.

Actually, I see an omission:
~/Library/Mail/Bundles/DEVONMailConduit.mailbundle

This will be added to the next update of the documentation.

parlar · February 8, 2007, 5:20pm

Ahh, perfect. I definitely understand the use of the word “remove” in the context of that particular help page, but it’s not the first thing that comes to mind when I want to uninstall something. “uninstall” is though

Jay P.

parlar · February 8, 2007, 10:44pm

Wow… Wow. I downloaded and tried Pro Office, and 10 minutes later I had my wallet out, buying an upgrade. The OCR is fantastic! It even worked on some badly scanned pdfs I had, where the pages were two column splits, at horrible angles.

The 50 page limit is a pain, so I wrote up some Python code that will split and merge PDFs. Someday, when I have time, I’ll have to find a way to automate the process from DT (I have no Applescript experience, and it always looks like such a painful language to learn, too imprecise).

Thanks!
Jay P.[/img]

annard · February 8, 2007, 11:06pm

I’m glad to hear that! Note that there are bindings for AppleEvents into Python, so you could invoke AppleScript commands from Python…

parlar · February 9, 2007, 12:39am

That’s pretty much my plan. When I get around to it, I’ll make sure to put the results into the scripting forum.

Jay P.