ReadIris 11 or/and DTPO

I’m not sure what to do:

It seems, I need the full version of ReadIris, because I have a lot of newspaper articles and old documents (many of them handwritten) to scan and, if possible, to 'OCR’. I tried this yesterday and did not find a convenient way to import these scanned and OCRd files in DTP, where I would like to have the document stored as PDF+Text like in DTPO. I got only .rtf or .pdf documents.

Am I missing something?

I don’t need the Email-Function of DTPO, so I thought, buying ReadIris 11 would be a (rather expensive) way and workaround to get, what I need in DevonThink Pro. It seems, I have to buy both, DTPO and ReadIris. Please tell me, I am wrong.

Ursula, if you already have ReadIRIS Pro 9 or 11 you can set the output of OCR’d files as PDF+Text and then import them into your DT Pro database.

If you don’t already have an OCR application, the OCR engine in DTPO should meet your needs. You don’t have to have both DTPO and an additional OCR application, in other words.

I didn’t yet buy ReadIris, but after a conversation with Annard here in the forums - he suggested the full version would suit better to my needs - I wanted to do so. That’s why I tried yesterday to find a workaround for importing OCR’d files to DTP. I will try again, it seems, I did not find the right settings.

If you use an external OCR application and are saving the OCR’d files to the Finder, you can ‘tell’ the OCR application to send those files to a designated folder.

Now you can attach a Folder Action script to that folder. Whenever you save or drag & drop a file into that folder, the script will perform its designed action.

I’ve got a Finder folder to which an Folder Action script is attached so that any file dropped or saved into the folder will be Imported into my open DT Pro database. For example, when I download a PDF file from the Internet I press ‘Save As’ in my browser, choose the Folder Action folder as the destination and presto! That PDF goes into my database. Periodically, I trash the previously added files contained in the Folder Action folder.

The Extras folder on the DT Pro download disk image contains two Folder Action scripts. One will Import newly added items to your database, the other will Index newly added files into the database. For PDFs you will probably use the Import script.

Here’s some information about how to set that up, from online Help:

"Folder Actions

Folder Actions are scripts that you can attach to folders in the file system, and that act on all items you add to these ‘hot folders.’ DEVONthink Pro Office comes with two folder action scripts (located in the ‘Extras/Scripts/Folder Actions’ folder) on the DEVONthink Pro Office disk image.

Note: It’s recommended to copy all folder action scripts to the directory ‘/Library/Scripts/Folder Action Scripts.’

The two scripts,

  • Action Import
  • Action Index
    do what their names suggest: import or index folders or files that you drop into a folder to which the above scripts are attached to.

Attaching a Folder Action: To attach a folder action script to a folder, do the following:

Control-click (right-click) the folder.
Select Attach a Folder Action from the contextual menu.
Select the folder action script you want to attach and click Choose.
Manage Folder Actions: To manage all your folder actions, Control-click (right-click) a folder and choose Configure Folder Actions from the contextual menu. Use the Folder Actions Setup utility to see which folders have scripts attached, and to remove folder actions from folders."

Hope this helps.

Thank you, Bill, you are the hero of my day, I didn’t realize the possibility with Folder Actions. This helps, I’m sure. I will try tomorrow morning.

Has anyone noticed how big these files are? (PDF+Text). Any idea why?

mbizer, the PDF size depends on the resolution of the image layer.

The current beta of DT Pro produces substantially smaller files than previous betas, as the image layer is down-sampled to 150 dots per inch. That resolution was chosen so as to retain reasonable print quality in the OCR’d PDF files.

There’s an Extras folder on the download disk image of the current beta that contains a number of potentially useful items. Among these, if you would like to experiment, is an Automator workflow created by a user, that down-samples to 96 dots per inch resolution and produces still smaller PDF files.

And DTPO 1.3beta3 has added a contextual menu option that allows one to OCR an image-only PDF file that had already been captured into a database, and contains no text. The OCR process re-rasterized the image, so the OCR’d new PDF file will likely have a different file size – sometimes higher, sometimes lower, depending on the resolution of the original PDF and its images.

Perhaps I wasn’t clear. I’m not talking about DTPro specifically. I’m starting with image files and running them through ReadIRIS 11, and they’ll increase in size by 16x in going from image to image + text. Makes no sense.

All OCR engines that I’ve checked re-rasterize the PDF image during the process, which very often balloons the resulting file size. Some OCR applications let the user select the resolution of the saved PDFs, but even then it’s common to see larger files after OCR.

The exception might be a PDF with many graphic/picture elements that are included at high resolution. In that case, OCR might produce a smaller file size (and certainly will with the current DTPO beta).