PDF vs PDF+Text

A large amount of data I keep in DTPO are scanned documents converted to PDF. I’ve noticed of late that documents I thought I had converted to PDF+Text were listed in DT as being just PDF. Checking on this, I find that the documents in question do, in fact, have text that I can freely highlight and copy into a text editor, even though they are listed as being PDF only (not PDF+Text) and the source was a scanned image (June 1865 Scientific American for example).

What is the difference between PDF and PDF+Text? I though this was rather obvious, but now I’m not sure I know the answer. Ultimately, the real issue comes down to; is the text in these PDF only documents included in the DT index, making the documents searchable. Experimentation indicates “yes,” but I’m at a loss in understanding the what PDF+Text really means.

PDF+Text should be OCRd documents or a document that was originally from a text-based source (a web page; a Word document).

But errors can happen. I suggest you send Support examples of the documents that are not behaving as expected so the developers can check for anomalies.

Does exporting & reimporting the files or rebuilding the database fix this?

Yes, that appears to have resolved the issue. The rebuild however took most of the night, with the last 10% taking 90% of the time.

Does Devon Technologies have a recommended rebuild interval? It seems like I saw a recommendation on one of the boards saying (typical users) should do a rebuild every…

Anyway, all is good.

Thank you.

I did some looking abound and found this

blog.devontechnologies.com/2015/ … -database/

Helpful, but rather vague beyond the car analogy. Since my original post is filed under “Feedback, Requests, and Suggestions,” in the spirit of that heading, could future releases of DT monitor my database size, usage, elapsed time etc. to suggest a verify, or rebuild event? To continue with the automotive analogy, my car uses miles driven and elapsed time to compute “oil life,” which is a suggestion that an oil change is due. Users could set a simple time based “tickler” in another app, but I think DT has all the information really necessary to know when these tasks should be done.


We don’t suggest a Rebuild often, but we do suggest a Verify & Repair weekly or bi-weekly won’t hurt anything.