The revised database structure planned for a future revision will somewhat reduce the memory footprint.
But for those with large collections of documents I’ll probably continue to recommend consideration of topical databases, both to keep the databases speedy within any constraints of the RAM resources of a computer and to assist the AI features by keeping less topically relevant materials out of the databases.
In my case I’ve got a separate database into which I’ve archived my collections of email from Entourage and Mail. It holds almost as many messages as yours, many with attachments. I don’t often need to access the older messages, although when I do, I really appreciate the improved searching/filtering features in my database.
Some of the more recent messages in that collection, however, are relevant to current projects. So I identify those for export to the appropriate database.
The needs of other users might lead to different decisions. For me, the email archive is a secondary resource. For others, it might be a primary reference resource to be incorporated into their default database.
As to scanning/OCR with DTPO I’m approaching 2,000 pages scanned through my ScanSnap to DTPO databases.
Most of those scanned documents don’t fit the topical coverage of my main database. The majority fit into a special topical database covering administrative and medical policy and procedure documents for a health care facility. Most of the remainder fit into a database of my financial records.
I’ve probably got well over 100,000 documents in my various DT databases. If I were to consolidate those documents into a single database the performance I’m used to for searches and AI functions would take a big hit, as I don’t have a computer with enough physical RAM to avoid diving into heavy use of Virtual Memory, which is disk-based and slows down many processes if I used a single database. My MacBook Pro has 2 GB RAM, and my Power Mac G5 has 5 GB RAM. I try to hold my individual topical databases to a size that’s comfortable and fast on my MacBook Pro, roughly 20 to 25 thousand documents and a total word count up to about 25 million words. (A Mac Pro fully populated with RAM could handle my document collection in a single database, but I would have trouble running it on my notebook computer, and I favor portable databases.)
Perhaps more importantly, I make a lot of use of searches and AI features when I’m researching material. I’m interested in the history of science, which fits into my main database. But I’ve also got a very large collection of materials about the Apple Newton. Documents by or about Isaac Newton are in my main database, and when I search for “Newton” they are what I want to find. In that case, I wouldn’t want to see thousands of hits for the Newton PDA. Nor do I want to confuse the ‘See Also’ operation by mixing those materials. I think it makes good sense to separate such materials topically into different databases.
By carving out my collection of documents into topical databases that fit comfortably into the capabilities of my MacBook Pro I can switch between them fairly quickly. The occasional documents that fit in more than one database can easily be exported from their current database into one or more additional databases.
Even when, in a future release, memory requirements are somewhat lessened and it will be possible to have multiple concurrent databases open, there will still be ultimate resource restrictions, especially of physical RAM, on a user’s computer. At some point cross-database searches will be possible, so that a search result can be opened in a different database; as a practical reality smaller databases will always open more quickly than larger databases.
I’m spoiled. Many of the searches on my databases can be completed in a few milliseconds. And when I’m running a series of See Also trails, or using Classify on a series of documents I want real-time interactivity. That wouldn’t be possible (and the results would have far less ‘focus’) on my MacBook Pro if I tried to run a single database compiled from all my existing databases.
Perhaps there’s a sense in which everything is related to everything else. But as a practical matter I have little trouble creating topical databases in which the relationships of the items contained in each one are infinitely richer than to the contents of my other databases.
Back in my days as a professional graduate student I picked up more than a hundred hours in philosophy and logic. Lets say that I wanted to manage philosophical books and papers that would assist me in analyzing them. Would it make a lot of sense, for example, to include in a single database everything dealing with Aristotle, Plato, Aquinas, Kant, Sartre, Hegel, James, Carnap, Ayers, Hume, Locke, Whitehead, Russell, Popper and so on? Not, IMHO if one hopes to make much sense of the basic differences and approaches. Similar terminology, for example, doesn’t mean similar concepts. Nor, for that matter, do similar concepts mean similar terminology.
If I were a graduate student in philosophy these days I suspect that I would find DT Pro very useful. But I’m pretty sure I would have a number of different databases covering my studies and research, in order to make the most effective use of DT Pro.