What is the best document format for storing a high volume of documents in a single database? The loss of formatting with plain text files is no good for me so the remaining options are html, pdf and rtf. Which of those formats is the best for managing the size and functionality of the database?


PDF is recommended if you’re not going to edit the documents and if the original layout is important. Otherwise RTF is the way to go.

Why is pdf better than html?

HTML images can disappear if you are linking to a Web page and it is changed or is deleted from the Web. So HTML is evanescent. An HTML capture of a Web page, e.g. a New York Times article, will almost certainly lose image references in a few weeks time. Of course, if you need to recover access to the images the NY Times will happily accept a fee to allow that.

I do 99% of my captures from the Web as rich text that includes only the desired text and images. Most of the journals I visit, and most of the news sources include a lot of extraneous, totally unrelated material – ads, summaries of other articles, and the like – that I don’t want to capture, and that could show up in searches or on See Also lists.

PDFs retain a structured layout, retain images, and usually have a memory advantage compared to WebArchive files. Also, one can easily send PDFs out as attachments regardless of the platform the recipient uses. Under Leopard, “printing” a Web page, Word document or what have you now retains any working hyperlinks in the source document.

I do most of my drafting in rich text inside DT Pro Office. I’ll do final polishing in a competent word processor, usually Papyrus 12 or Pages. Final output is as PDF, which is viewable on all platforms.

Thanks for the explanation. I use DT for law school and the student legal aid clinic. I grab any case law that seems like it might be useful so I store a very large number of reference cases in my databases. DT is involved in almost everything I do at school these days. I’ve only been using DT for about 6 months but now I can’t imagine what I did before I had it, especially with the ScanSnap scanner. So thanks for all the hard work!