Maximum Database Size

I’m new to the program (purchased yesterday) and prior to restructuring my personal processes wondered what the maximum recommended database size is relative to performance, and if there is in fact an actual maximum?

The answer will really help determine just how much I rely on the program.

Thanks very much :exclamation:

I’m sure it’s in the gigabytes.

Thanks - gigabytes are good - 4-5 would do the job(s). I appreciate the reply.

Carts

One user tells me he has dumped 24 GB of images into his DT Pro database.

It takes about 90 seconds to initialize and load, but runs OK.

As of this morning, my DT Pro database contains the following:

286,598 unique words, 29,714,455 total, made up as follows:

1,857 groups, 569 HTML Pages, 1 XML file, 130 images, 171 Web Archives, 5236 Links and 25,085 Rich Texts.

I am about one third of the way through capturing data for a knowledgebase I’m creating, so I fully expect this database to end up with 70-80 million words, maybe more.

So far, no problems. It runs fine. Searches are usually rapid apart from searches for phrases which I have always found to be a complete waste of time as they take forever and I have better things to do than watch a spinning beachball for minutes at a time.

I do make sure I back-up several times a day, just to be safe. That takes time, so I tend to do it if I go to have a meal, or am going out.

I don’t ever see this mentioned in other posts, but I have found that following lots of links pushes up the size of the file astronomically. I can browse, say, 100 links, and it will often add 3-4 GB to the file size. That is taken care of as soon as I “back-up and optimise”, but it doesn’t half fill the disk rapidly if you don’t!!!

I would like to see some kind of a preference that limits cache sizes, since I think this is way over the top. Has anyone else had this issue?

Rollo

You guys are outstanding - thanks to all for the excellent information. I’ll begin the data dump this weekend. Interesting comments re web surf and cache size - I’ll watch for it.

Thanks again to all, and a good weekend to you,
Carts

Carts:

A couple of suggestions, as it sounds like you have a great many files that you plan to import into DT.

[1] There’s always a possibility that one or more of your existing files may be corrupt. If you batch load everything at once into DT or DT Pro, then run Verify & Repair, it may tell you that there’s an unfixable error in the database. The best course than would be a database rebuild, which will probably no longer include the corrupted file(s).

It may be best to do File > Import > Files and Folders a folder at a time, then run Verify & Repair after the import. This will either give you confidence or alert you that DT/DT Pro is having trouble with some of your files.

[2] Unsupported files: DT/DT Pro can recognize a great number of file types, which are listed in the Help documentation. If unsupported file types are encountered, DT/DT Pro will report that in a log file, visible if you select Tools > Log.

Another problem: DT/DT Pro see unsupported package files (such as NoteTaker, NoteBook, Keynote, Pages, etc.) as folders rather than files. Often, this will mean that the database doesn’t really capture the text content for searching and analysis.

So it may be best to sort the Finder view of a folder by Kind, and select for import those file types that are supported for import to DT/DT Pro. Then you can experiment with importing your unsupported file types.

If some of those unsupported file types are important to you, and a normal Files and Folders or Index import doesn’t work well, post a request about importing specific unsupported file types and we may be able to suggest a workaround.

Thanks Bill - all is noted. I’ll take the cautious approach and do this in steps. The export from my old windows database has been done using supported file types, but I will be cautious with folders of random files.

Thanks again,
Carts

Hello, I used to be active on the DT forums and beta forums, but have been busy lately. So, pardon if this question has already been asked and answered – I didn’t come across a definitive answer in some searches.

====

I use DEVONthink primarily to archive PDF’s of scientific and medical journal articles, and have noticed that “See also” and “Classify” have become almost unusably slow, especially the first time during a session that the options are used. This is a pity, because it’s the feature that I use to “show off” the power of DEVONthink the most.

I’m wondering whether I’ve hit a database size / performance limit? My specs:

Database:

  • ~311,000 unique words, ~10M total words
  • ~2,500 PDF files, ~350 HTML files, ~370 RTF files
    (this is after removing literary text files: I used to have ~350k unique words and ~13M words)

Computer:

  • 1.5 GHz Powerbook G4 running OSX10.3.9, 1 GB RAM

Thanks in advance for any insight.

Hi, if you check out my previous posting (see above), you’ll see my database is already much larger than yours. As of today its specs are as follows:

302,543 unique words, 32,696,590 total
2,250 groups
221 web Archives
6447 Links
28,337 Rich Texts
569 HTML Pages

Based on this, I don’t think you’ll be anywhere close to DTPro’s capacity. I am fully expecting to double the current size of my database.

Rollo

Hi Rollo – good to recognize a familiar “face” from way back when. Thanks for the reply.

My question wasn’t whether I was near a database capacity limit, but whether I was having functional consequences of having a large database. I.e., I recall that you have forever had difficulty with searches of “all words”, but you have not commented on “See also” and “Classify” functions.

I have no problem with my database if all I’m doing is browsing it or adding new PDF’s. It’s not even too bad with word searches. However, the more “intelligent” functions of finding similar documents or figuring out where to categorize them has become slow.

Any comments on that, from anyone?

Hi … good to see you here as well. I have never had problems with “all words”, it’s really quick these days. My problem is with phrases, which take forever. I don’t lknow that that is a function of database size, since it has always been near to impossible to use.

I have to say that “See also” is rather too slow for my liking too, but I live with it by not using it much and I never use “classify” because I have created s filing system within DTPro that suits my needs and I usually have a good idea where to find what I want without ever using classify.

Rollo

Hi all,

I’m new to DTpro and this forum. Please, if you don’t understand what I’m meaning, excuse me, I’m form Germany.

So far.

In one of my databases I’m storing over 500 e-books. It hits vor 30 million words at all. It takes some time to start DT, but then it is running as fast as in the beginning.

But, if I try to store some pictures, pdf’s or something else which is not a text-file, DT’s performance is going down.

I think, DT is a great tool for working with txt and rtf, but for foto’s or else, you should consider to take a multimedia database like Cumulus, or so.

Cheers

Thyrfing:

There are several variables that can affect perceived or actual performance of a database. Important variables: the Preferences settings on how to import and store certain file types.

Let’s talk about images, for example. I’m going to ignore mere external linking (File > Link To) in this discussion. (The discussion also applies to PDF files.)

[1] Preferences > Images > Link to originals. Example: Importing a JPEG image 6456 KB in size results in a new database file that’s 36 KB in size. The full image file is stored in the Finder, external to the database package.

[2] Preferences > Images > Copy files to database folder. Example: Importing a JPEG image 6456 KB in size results in a new database file that’s 36 KB in size. The full image file is copied into the database Files folder, and also remains stored in the Finder, external to the database package.

[3] Preferences > Images > Copy file into database. Example: Importing a JPEG image 6456 KB in size results in a new database file that’s 6456 KB in size. The full image file is copied into the database ‘body’, and also remains stored in the Finder, external to the database package.

Now if I import 100 images of that size into my database, I should expect a significant difference to show up in the package size of my database, depending on whether I’ve used import method [1] above (the minimum size growth), or import method [2] or [3] above (significant package size increase). That’s pretty obvious. Let’s say my database size (not package size) is (approximately) 5 GB for option [3] and less than 1 GB for import options [1] or [2].

But what about differences in the performance of the database, depending on the import method used. Should I expect differences? The answer is yes, but there are variables affecting that answer. The most important variable is the amount of free RAM that I’ve got when I launch the database.

Case 1: I’ve got 8 GB RAM and the database size is roughly 5 GB (if I used import option [3]), or less than 1 GB for import options [1] or [2] . I’ve got plenty of RAM “headroom” so that I can load the entire database into RAM and in this case I won’t see much performance difference between any of the 3 import options above. The important point is that I won’t need much Virtual Memory use (in the form of swap files requiring disk access). But, needless to say, most of us don’t have 8 GB RAM!

Case 2: I’ve got the same database size conditions as in Case 1, but I’ve only got 500 MB RAM. Oops! I’m going to start using Virtual Memory (and building up HD disk swap files) pretty fast as I access images, and that’s going to happen a lot more for the database created using import option [3]. So I would expect to see more performance deterioration for the database created using import option [3].

Caveat A: Don’t take my numbers too seriously. Just very rough approximations.

Caveat B: It’s really more efficient to import items directly into the database body, because the the other options result in a ‘pointer’ document that then requires disk access to pull up the desired image. But that concept of efficiency assumes no limitations of physical RAM; and as a practical matter, available free physical RAM is something most of us just don’t have enough of. :slight_smile:

Bottom line: There are always trade-offs, depending on what one wants to do. It’s up to the user.

If I want my database to be highly portable between computers, I import images, PDFs and unknown file types into my database Files folder, so that the files will be available if I move the database to another computer (or run it off a portable FireWire drive or even a CD). That’s the option [2] approach. My database package file can become very large, but the database body remains relatively compact.

Most of the time, I leave my digital camera photos in the Finder, so that I have lots of options, including importing some or all into my iPhoto library, or doing digital editing of the original photos under other applications. Sometimes I will import a few photos into my DT Pro database using import option [1]. That sacrifices portability of the database, but can be useful for other purposes, e.g., creating a Web site using DT Pro File > Export > As Website.

I rarely use the import option [3] to import photos directly into the database body.

Bill_DeVille:

Thanks for your long extraction of facts. I’ve dropped those information on my first posting. I know the options, but I’ve forgotten to explain, that I see no use in NOT copying the files into the database. Storing files in two places is always confusing me :wink:

But after your explanation I’m going to try the option with syncing a folder with the database.

The fact with the RAM is very important. But sometimes a database will let it explode.

On the other hand, I’ve got much data from the internet and I’m not all the time connected, so it is important for me to store all offline inside the database.

Never the less, your explanation is very useful.

Thank you very much.