Memory Leaks

slackademic · March 13, 2011, 1:42am

I’m a newbie and have DT Pro. So far the program uses a TON of memory and it gets worse the longer I have the program active. I’ve only used it to set up two databases which indexed subsections of my Documents directory. When DT Pro first opens, it is using 400mb of memory but after about an hour it is using over 1gb of ram when it is sitting idle. Is there something I can do about this?

sjk · March 13, 2011, 2:36am

Is that with DT Pro 2.0.7, which version OS X are you running, and how much total RAM on your system?

slackademic · March 13, 2011, 6:16am

Yes, it is DT Pro 2.0.7. I’m running OS X 10.6.6 with 8gb ram. Yes, a gig of ram won’t bring my system down to a crawl, but that is still way to memory intensive compared to some other programs I use.

sjk · March 13, 2011, 7:30pm

What are your databases sizes? Maybe post Statistics of them from File > Database Properties…, e.g.:

DTPO stats.png

slackademic · March 13, 2011, 10:13pm

I’ve attached a screenshot of statsitcs for my two databases. When I created them, I only indexed the files since I’m not yet ready reorganize some of my files. What was most important to me, for now, was to be able to have a more effective tool for searching through them than what Spotlight offers.

Even though I’m just indexing them for now, should I have a greater number of databases that index portions of what I have already indexed?

sjk · March 14, 2011, 1:05am

Thanks for posting those Statistics images. Doesn’t surprise me DT will (eventually) consume a lot of RAM with 6.3GB+22.GB sized databases open.

Maybe it seems idle but is doing some normal internal/background processing during that time to account for that increased RAM usage? I’ll let you know if I can reproduce similar behavior here, though with smaller databases and no indexed documents. Edit: Nope; slightly less real memory usage after leaving a freshly launched DTPO process sit idle for a couple hours. Don’t know if interaction with it for awhile before leaving it idle would make a difference.

I’m not sure; hopefully someone else posts a definite answer.

slackademic · March 14, 2011, 3:24am

I checked the file sizes on each of those databases. The one that indexed 22gb worth of stuff is 859mb and the other is 400mb (indexed 6gb) stuff. For file sizes like that to cause a program to use THAT MUCH ram doesn’t yet make sense to me. If I had imported files into those databases, it would be more understandable (but still excessive).

I like what I’ve seen so far from this software but I would like to have it eat up much less ram.

Thanks for all of your insights sjk!

sjk · March 14, 2011, 7:38am

Reminds me of why web browsers suck so much memory to access “indexed” content. Caching?

Thanks for piquing my interest in this topic.

Bill_DeVille · March 19, 2011, 7:14am

The two most important size numbers in Database Properties are the number of documents in a database and (especially) the total number of words. The file size (disk storage space) is relatively unimportant as concertn performance.

A DEVONthink database loads into memory a number of files about the documents it holds. Think about what two of my favorite features in DEVONthink do.

The Classify assistant will examine the contextual relationships in the content of a new document and compare those to the common contextul relationships of each of the groups in your database, and suggest one or more possible groups into which that new document will fit.

The See Also assistant examines the contextual relationships of the words in a document you are viewing and compare them to all the other documents in the database, and suggest other documents that may be similar. I use it frequently in one of my databases that holds more than 30,000 documents.

Those artificial intelligence features are built into the core of the database. Although my brain can manage much more data than any Mac, my brain wasn’t constructed to do what See Also can do, to quickly see similarities among word patterns in tens of thousands of documents.

All told, I’ve got a number of DEVONthink databases that hold in the aggregate more than 250,000 documents.

My databases are topically designed to meet specific interests and needs. I don’t put all the files on my computers into databases. Nor do I put all my databases on my laptop, as they wouldn’t fit. Indeed, I rarely open some of them, but always have 2 or 3 databases open, that I use every day. I can assemble databases for a particular purpose like ‘Lego blocks’ of information content.

I work on two Macs, a quad-core iMac with 8 GB RAM and (currently) a MacBook Air with 4 GB RAM. I find that I’m usually working with the Air.

I’ve found that databases of up to about 40,000,000 total words run comfortably in free RAM in 4 GB RAM space, so are very responsive. I like most of my single-word queries to take 50 milliseconds or less, and See Also suggestions to pop up quickly. I never want to see a spinning ball.

Forty million total words is a LOT of information. I don’t find it difficult to construct topically-designed databases within that size limitation. My primary database for writing and research reflects my professional interests in environmental science and technology, law and policy issues. It contains scientific and engineering literature covering a number of disciplines, as well as references concerning environmental laws and regulations (U.S. and EU) and policy issues. It holds about 30,000 references (from abstracts to books) and about 5,000 of my notes. I’ve lovingly built it for a number of years, both adding to it and pruning outdated or less useful items (which may be moved to an archive database of older references).

I hammer my databases pretty hard, with frequent use of searches and the Classify and See Also assistants. I would probably find your smaller database of about 78 million total words becoming less responsive than I like, not to say the larger one of about 140 million total words, even on my iMac with 8 GB RAM. My tendency would be to split them.

I prefer Import-captured databases, as it’s easier to migrate them among my computers because they are self-contained (and are fully backed up in database archives). By the way, there’s essentially no difference in memory requirements for Imported or Indexed databases. Version 2.0.9 allows one to convert an Indexed database to an Imported database, or vice versa.

Note that currently DEVONthink remains a 32-bit application (primarily for compatibility reasons), so that the maximum addressable memory space is 4 GB. One of these days, a 64-bit version will be introduced, so that as more RAM becomes cheaper, larger and larger databases will remain responsive during use.

I don’t find thar increase in RAM usage over time with DT Pro Office and my ordinary set of databases is a significant problem. As others have commented, its my Web browsers that keep grabbing more and more free RAM during use.

kmlawson · August 6, 2011, 5:38pm

I just happened upon your posting. I just want to thank you for taking the time to write it. It is very helpful to understand the practical limits and good strategies for large database management. Again, thanks. Konrad

danrhiggins · November 10, 2011, 5:28am

I am replying to a very helpful comment on using multiple databases in light of possible memory leaks. I may have a memory issue as I am struggling to import large numbers of emails (many with attachments) from Outlook 2011 into DT. But my immediate question is around the use of multiple databases. Is there a way to “split” an existing database or do I need to create the new ones from scratch? I tried creating one and then dragging a group from the old/larger DB to the new/smaller DB but only 209 of 1,863 items actually transferred.

Is there some sort of command for splitting out a group or groups into a new database?

korm · November 10, 2011, 11:13am

Replicants in one database cannot be dragged to another. It’s possible that’s why some items didn’t transfer. Over here, I’ve always found the brute-force method the most reliable way of splitting databases.

Close the database you want to split. In Finder, make two copies (with new names). In DEVONthink, open the copy. Delete the records/groups/tags, etc., that you don’t want in the copy. Empty the trash.

Reverse the procedure with the other copy.

When you’re satisfied that the two new databases are correct, you can delete or archive the old original database (making sure it is closed, first).

This is a one-time effort. Might be time consuming, but does a better job than any scripts or automation short cuts.

Greg_Jones · November 10, 2011, 11:30am

I’m in 100% agreement with this process to split a database. A search on the forums here will turn up alternative processes, but korm’s method is the way to go.

mikebore · April 2, 2013, 6:04am

I am a new user and found this thread because I have the same problem as the OP with DTOP ver 2.5.1 on a 2010 MBP with 10.8.3 and 8Gb of RAM.

I left the machine on overnight and this morning this was the situation:

I have six databases totalling 8.4 Gb in size. The two largest have 6.8 million and 600,000 words.

Is there a memory overhead for having separate databases, ie would it use less memory if I combined some or all the databases into one.

I really love DT but something seems wrong here.

Thanks for any input

Xenophon · April 2, 2013, 2:02pm

@mikebore: There’s nothing obviously wrong with that picture. These days Unix kernels are pretty aggressive about their use of inactive memory. Such pages could be previously used pages kept around in case they’re needed again, or they could be additional pages read in during idle time in an attempt to optimize for future reads, or… well, there are lots of scenarios. Seeing little or no free memory isn’t a problem. The kernel will happily abandon the old data on inactive pages for use by current processes whenever needed.

Indicators of a significant problem would be either seeing significant amounts of paging activity (your screen shot shows none) or seeing the sum of inactive (blue) and free (green) memory getting low.

Xenophon

Bill_DeVille · April 2, 2013, 5:28pm

@mikebore: Your screenshot shows little remaining free memory, a lot of pageouts and considerable usage of Virtual Memory swap files. I would predict slow performance of DEVONthink in those conditions.

The title of this thread, “Memory Leaks”, is misleading. That’s not the problem.

While Apple’s memory management is pretty good, my guess is that the large amount of inactive memory shown in your screenshot isn’t available to DEVONthink. Sometimes inactive memory accretes as “crud” that isn’t released for use of RAM.There are utilities available that purge inactive memory, freeing up RAM. I use C0cktail, which has such a procedure to purge inactive memory and optimize memory, and is set to do that automatically every hour.

I’ve attached a screenshot of Activity Monitor on my MacBook Pro (Retina) with 16 GB RAM. The set of databases open in DEVONthink Pro Office total about 50 million words of content (probably the most important size measurement relative to performance). There’s a LOT of free RAM available, 11.93 GB and 0 pageouts when the screenshot was taken – I never see a spinning ball on this computer. Notice that at the moment the screenshot was taken, DEVONthink Pro Office ranked only 7th in Real Memory usage.

Now that DEVONthink runs in 64-bit mode on recent Macs, more than 4 GB memory can be addressed. As a practical matter, larger databases can be used, especially if free RAM is available.

But if available RAM is limited, splitting databases can help keep performance fast. When I used Macs with 4 GB maximum RAM my rule of thumb was to keep the total word count of open DEVONthink databases below 40 million total words.

mikebore · April 2, 2013, 6:45pm

Bill, thanks very much for the response,

I wonder if you scrolled my screen shot across, to show on the right hand side that DT had taken 4.55 Gb of real memory?

Today I have been doing some more testing and see that after quitting DT the real memory used by DT drops to about 250Mb, and it rises steadily during the day. After six hours it was back up to 2.5Gb of real memory.

What I have further noticed is that the memory leak is directly associated with the sync process. If I manually sync all, a chunk of real memory is grabbed and not returned. There is no memory increase between syncs. I have the syncs set to occur every hour and I think this is the reason for the memory build up.

I have currently turned syncing off completely to check what happens to memory. So far it has not changed in three hours.

This effect is much more pronounced on one of my macs than the other, with the same databases (16 million words).

I have not seen any spinning beachballs on eihter machine

mikebore · April 2, 2013, 6:55pm

Thanks for reply Xenophon,

As I said to Bill I wonder if you scrolled my screenshot across to show that DT had taken 4.55Gb of real memory?

Also, I agree that at the moment the screenshot was captured the pageouts were 0 bytes/sec, but the total pageouts since last boot were over 8Gb and more than the page ins. This has never happened on the machine before. The last boot was less than a day ago.

I agree that low free memory is not a concern per se, since the inactive memory is still available, but are you saying it is normal for DT to grab more and more real memory like it appears to be doing?

Bill_DeVille · April 2, 2013, 7:55pm

Interesting. I havent done a Sync in some time (as I’m the sole user of my databases and work on a laptop, I really don’t need to sync), and have since restarted the computer.

I often hammer my databases hard, doing searches, using the AI assistants, adding content and opening and closing windows. There can be significant variations in memory use, which isn’t surprising when you consider what DEVONthink is doing.

In Apple’s scheme of memory management, data that has been recently called into RAM, and especially data that is frequently called, “sticks” longer as inactive memory. The memory purging routine that I use will tend over time to clear data from RAM that hasn’t been called for recently. (That’s an oversimplification, but gives the picture.) And of course a computer Restart clears RAM.

Next time I test Sync I’ll keep an eye on memory use.

sjk · April 2, 2013, 9:10pm

Essentially what running the purge command will do.