Text-type files such as rich and plain text, HTML and WebArchive files are stored in a monolithic database rather than as individual files in the Finder. So corruption of the database can make those tiles unrecoverable.
That’s not the case with PDF, postscript, image and QuickTime media files, which are stored as individual files within the Files folder, inside the database package files. In the case of database corruption those files will likely not be damaged or lost – unless your computer has serious directory problems, in which case all files are at risk. In the Finder you can look inside the database package file and discover your PDF files, which could easily be copied to another location.
The version 2.0 database structure will be substantially revised, and all files will be stored like PDF files are, in the Finder. And they will be visible to Spotlight.
Yes, you can do everything with Index-captured files that you can do with Import-captured files. But there really are logical pitfalls that can “bite” in the case of Finder and database reorganization of Import-captured files.
I’m currently managing more than 150,000 documents among a number of topically designed DT Pro databases. If I were to try to merge those into a single database it wouldn’t fit on my MacBook Pro with 100 GB hard drive (my Power Mac has 1.5 terabytes of online storage). Moreover, it would be slow and unresponsive, as it would force continual usage of Virtual Memory. I’m spoiled. I like most search queries to take less than 100 milliseconds.
I find that using topical databases works well. There are very few cases where I feel the need to duplicate any material in more than one database. And that rare need will go away in version 2.0. My main database provides a wide-ranging and comprehensive set of reference materials and notes (about 23,000) for my professional interests in environmental science, technology and policy matters. There’s no need to mix in my financial database which contains lots of detail about my financial accounts, taxes, etc. My email archive with about 25,000 messages constitutes still another database, and so on.
I try to keep my databases to a maximum size of no more than about 24 million total words, so that they are quick and responsive on my MacBook Pro with 2 GB RAM. (My Power Mac G5 dual core has 5 GB RAM, so can handle much larger databases without needing to use Virtual Memory.)
Another important advantage of topically-designed databases is that the artificial intelligence features become more focussed and effective (and fast). That really helps with literature research.
My next Mac laptop will have 4 GB RAM, so I’ll be able to handle multiple open databases – still with fast performance – when that becomes possible.
Example: I have another environmentally-related database that is about the same size as my main database, but deals with the details of chemical analytical methodologies, statistical data evaluation procedures, sampling design procedures and similar technical literature. The AI features work much better (for both databases) with the split of this material from the main database. But once in a while I do find it useful to switch from a question raised in my main database to the technical procedures involved, contained in the auxiliary database.