Maximum Database Size

Carts · December 21, 2008, 2:14am

Back in 2005 and then again in 2006 we discussed maximum database sizes and the value of indexing (over importing) for limiting RAM use.

Bill submitted some detailed and very helpful advice.

Does 2.0 resolve this concern and remove any performance concerns when importing, versus indexing, a large number of files?

CatOne · December 22, 2008, 3:06am

A guy just commented he was having issues moving his 45 GB database from DTP 1.5 to 2.0… it was taking a while.

How large we talking? I’d suspect the 2.0 limits are a fair bit higher than 1.5, because of the way HTML, RTF, and Web Archives are handled, but if you’re worried about a 10 GB database and the practical limit is 500 GB this whole thing is moot

Carts · December 22, 2008, 1:56pm

It’s sounds as though the point is moot. The files for import total 23GB. This was enough to noticeably affect computer performance using v1.5 and so I opted for indexing. Based on your remarks, I’m assuming that 23GB should not be an issue using 2.0.

Thanks CatOne

Bill_DeVille · December 22, 2008, 7:57pm

I’m sure I would find a 500 GB database on my laptop with 4 GB RAM to be clumsy and slow, probably very unsatisfactory. But as I only have a 200 GB hard drive, it can’t happen anyway.

Although the memory footprints of databases have been reduced in DEVONthink 2, there are still RAM-dependent and other practical considerations for designing/sizing databases.

I think the memory difference in DEVONthink 2 between Imported and Indexed databases is less significant than it was in DEVONthink 1 – especially for RTFD and WebArchive content that contains a lot of images, which (when Import-captured) had to be stored in the monolithic database structure of DEVONthink 1 – those images took memory space at loading. Those images were also incorporated in the Backup folders in DEVONthink 1, so disk storage space of internal backups in DEVONthink 1 had to be larger than for the same database content in DEVONthink 2.

The principal advantage for me of Import-captured databases is that they are highly portable. I can move such a database from one computer to another, or run it off an external drive.

A large fraction of my main database is devoted to RTFD files. In DEVONthink 1, they required more memory to open the database than corresponding content in PDF format. When I was running Macs with 1 or 2 GB RAM, I found that database sizes above 24 to 30 MB of total words were about the upper limit for my databases if I wanted to retain quick performance. Above that, especially if I made a lot of use of searches, See Also and Classify, I would run out of free RAM and move into heavy usage of Virtual Memory, resulting in slowdowns and occasional pauses during operations. I hate that.

I’ve always created topically designed DT Pro/DT Pro Office databases, initially to keep database sizes under control. Now, under DEVONthink 2, in principle I could merge some of those databases. I suppose I would be satisfied with database performance at a size of 50 MB, or perhaps 100 MB total words. I haven’t experimented with a maximum size that I would still find acceptable on my laptop with 4 GB RAM.

But there are other reasons that persuade me to keep topically designed databases, and perhaps even to split some of my current databases.

For example, I found that the focus of searching and See Also operations in some of my reference collections was much improved by managing those referencers in separate databases. Example: my main database deals with environmental science, regulatory and policy issues. I also have a large collection of references dealing with environmental sampling, chemical analytical procedures, data evaluation and quality assurance protocols.

But when analytical procedures, for example, were in the same database as the policy and regulatory issues, a search or See Also in which I wanted to focus on health effects of a contaminant, such as mercury in fish, also pulled a large number of references to analytical procedures. And vice versa. My information becomes more useful when splitting such topical differences. Once in a while, I may need to search across both databases. Now, with DEVONthink 2, that’s also possible. I can assemble the information scope of topical databases like Leggo blocks, by having multiple open databases.

I’m exhilarated by things that See Also sometimes shows me in my main database, such as the fundamental quantitative relationships between chemical reaction equilibria and the influence of invasive species on population dynamics in an ecosystem. My main database is still “rich” enough to do that for me.