Indexing and DB file size

I think I understood indexing until I indexed about 20 files. My database size prior to this was 2.6MB. Now it is 4.4MB. The file sizes ranged from 100kb to 1MB.

I thought indexing items does NOT bring a copy into the DB. Therefore if this is true, why did my DB increase in size after I indexed these files?
Thanks
Kim

The database will hold the text content, index information and other metadata.

The file size may drop a bit after you run Tools > Verify & Repair, followed by Tools > Backup & Optimize. Thea latter procedure will compact the storage of the database files.

Thanks Bill,
So it’s normal for the file size of the DB to increase with indexing? And even though the index documents show their actual file size in DT, that is for file information only correct? In other words, even if a document file shows 1.3MB in DT (which is the actual size in OSX finder), DT only brings in info as you mentioned but not the entire 1.3MB of the indexed file… Correct?

Thanks again
Kim

Correct. Indexed files are not copied into the database. But there will be overhead in the database, as noted previously.

The alternative to Index-captures is Import-captures. Import is my own preferred mode,as it’s just as efficient in storage space, if one then deletes or sends to an archival external drive the folders and files that were imported into the database. As an Import-captured database is self-contained, it’s easy to migrate it among computers, or to run it on an external drive. Another advantage is that there are no constraints on reorganization of content, as there are in Index-captured databases.

Note that even if you start with an Index-captured database but then add new content such as notes written within the database, documents saved directly to the Inbox by external applications, scans OCRed by DT Pro Office or items captured directly from the Internet – your database will become “mixed” with some content stored externally and some stored internally.

If one is sharing a collection of PDFs, for example, between two database applications, such as a citation manager and DEVONthink, Indexing the material into DEVONthink may be preferable.

There’s nothing wrong about choosing Index or Import as the capture mode; the user should choose the mode that works best for his/her workflows and preferences.

If you are experimenting with a DEVONthink application for the first time, and don’t yet know whether you will purchase it, I would suggest Import capture of sample content from the Finder. Your external files in the Finder remain perfectly safe, as nothing you do within the database can affect those external files. You can also test a separate Index-captured database.Playing with DEVONthink is a good way to learn how to use it!

Kim, as one example I have two databases that are mirror images of one another with respect to the content. One of the databases is entirely indexed, the other entirely contained (imported) in the DT database. The databases each contain 4.6 million words, 80 thousand of which are unique. The indexed database file is 320 MB in size while the imported database is 12.5 GB.

If anyone’s curious, I took the imported database and created the indexed database from it for testing purposes of tagging, OpenMeta searching, etc. and I’m ‘feeding’ them concurrently. One of these days I’m going to drop one in favor of the other-just haven’t decided which one to keep yet.

Wow. That is a difference.
Thanks

Not really. :slight_smile:

Sum the sizes of the Indexed database and it’s externally linked content and the sum will be amost exactly the sixe of the Imported database.

The only difference would result from the storage overhead for Paths. If anything, the Imported database would probably be a tad smaller in storage requirements (assuming that Imported Finder folders and files were then deleted or moved off the disk to archival storage elsewhere).

Bill is correct and I can see where the way I presented the numbers could be misleading. The total size (internally and externally) of both databases would be ~25 GB on disk. The point that I wanted to make was that the file size of an indexed database will increase as the number of files indexed increases, even though the actual content (documents, etc.) is not contained in the indexed database.

That makes sense.
thanks

Ok
Having used DTO for several months, my DB size is 110 MB. I know this is small. My question is this, (and I know I have asked a question about this in the past), when does the time come when one should split up the DB?

  1. Is there a max size that you should not go beyond?
  2. Is it a function of how well the DB is performing, ie. slow downs, sluggish performance etc.

I am trying to weigh having a snappy fast and lean DB with no speed issues vs having to split it up. I should say, it is working great right now and from what I read on the forum, my 100 MB sounds small.

Am I worrying about nothing or do I have any valid concerns?

Kim Marietta

Snow Leopard 10.6.4
2.8 GHz Intel Core 2 Duo
4 GB memory
Mac Book Pro
Fujitsu S1300 Scanner
DTPO 2.0.6

Here is the relevant database size that Christian has posted in the past:

The filesize doesn’t matter, only the number of items (not more than 200000 recommended) and total number of words (not more than 300 million recommended) are important.

The most important reason to split, or not to split, for most users is not the database size but rather the database content. Keep the topically related data together and DEVONthink can best apply the AI logic to classify, see also, etc. Split the database when the content begins to lose a topical connection for the same reason. In addition to my larger databases, I have 3 very small databases that are maintained individually just because the content in them does not ‘fit’ with the content in any of the other databases.

Great so by the attached database properties screen capture shown below, correct me if I am wrong, but it looks like I have:

Total 902 items
Total 21,120 unique words, or 300,698 words

Is the significant word count the number of unique words or the total number of words.

Did I correctly interpret the database properties screen shot? In other words is this the correct information I needed to interpret the guide lines you posted above.

Thanks so much

Kim Marietta

Snow Leopard 10.6.4
2.8 GHz Intel Core 2 Duo
4 GB memory
Mac Book Pro
Fujitsu S1300 Scanner
DTPO 2.0.6

Correct, so you can add another 199,000+ items and/or 299+ million words before you start to max out the database! :smiley:

:smiley:
Thanks again
Kim

PS. I think my database will out live me before I get those stats! :smiley:

I just dragged over 1000 files from finder into an inbox in DT. I thought it would copy the files into the DT database; it is currently taking a while “Indexing Items” - does that mean the files themselves are not being copied across?

No, everything‘s fine. What DEVONthink is doing right now is indexing the text content. It does this regardless of the state of records.

Records can have one of two states: Either imported or indexed.

The usage of „indexing“ / „indexed“ can be confusing but once you know it’s pretty clear.

1 Like