Entering data from scanned PDF files into a database

I have Devonthink Pro Office and a Fujitsu Scansnap. I want to be able to scan multiple documents containing point-in-time and time-series numerical data (eg: pathology test reports) into DT Pro Office and then pull them into a SQL, XML or some other type of relational database for further analysis. Is there a way of doing this from within DT?

For this is would be better to use ReadIRIS since you probably only need the text for processing and we only store a PDF with an invisible text layer for the OCRed data.

Perhaps you have two objectives: [1] maintain in searchable form the documents related to a patient’s care and [2] transfer some of that information to other software for analysis.

If so, there will be value in maintaining the records as PDF files in DT Pro Office.

To transfer quantitative information in the text of those documents, just make another copy as a plain text document using Data > Convert > Plain text. That data can then be transferred by export, or perhaps by parsing the text (if it’s in a structure that could be interpreted by a script and transformed into tabular data) and sending it to another application.

Annard and Bill

Thanks for the useful ideas. Question to anyone - a problem I’m having when trying the convert to text option suggested by Bill is this: having selected the text (“select all”) in the scanned document stored in DT Pro Office, the contents of the “convert” sub-menu remain dimmed. Why is this happening? Thanks in advance for any feedback.

Select the Name of the document (left column in Vertical Split View, top pane in Three Panes view), then select Data > Convert > Plain Text.

Thanks Bill. I was trying to convert an open document :frowning: