I’m finally taking the plunge into DevonThink. Over the years I have built up a digital law library that is around roughly 500gb. This is a combination of mostly PDFs, Word documents, Excel spreadsheets, etc.
Is there a general target metric for each database I should aim for? For example, instead of importing everything into a single database, should I try to create say 25 20gb databases? For performance purposes, would having that many databases open at the same time be essentially the same thing as having the single giant database?
Edited to add that I am running DT on a Mac Mini with the following specs:
Apple M2 Pro with 10‑core CPU, 16-core GPU, 16‑core Neural Engine
@ManTooth could you explain a bit more what your general regular uses of the law library database(s) might be? For example, on the one end, if you’re mainly searching the library for citations, cases, etc., then a powerful search tool like DEVONsphere or FoxTrot Professional, or EasyFind might be a good approach, while leaving your library in its existing location.
Or, are you looking for software to help organize or rationalize the library collection? Are you creating briefings, or other documents based on the library?
Personally, though my own library is in the neighborhood of 500GB I would never consider putting it all into DEVONthink. (Across five daily-use databases I have 50GB of files.)
Welcome @ManTooth
Both replies are good ones indeed – different facets but pertinent.
Your Mac sounds like you thought carefully about the purchase and the extra RAM is definitely a plus.
As mentioned, for DEVONthink it’s about the numbers of words. Depending on the documents involved, you likely have quite a few words. However, file size isn’t an indicator of this as a scanned PDF with no text layer would be much larger than a PDF printed from, say Word. However, the effect on the database would be distinctly different as the smaller text-based PDF document adds to the index of the database. The raster one (with just scanned pages and no text layer) would have little effect.
@korm’s line of inquiry is where I would go as well.
What’s the purpose of this database?
Who is the end-user?
What do you imagine doing with the database, e.g., for consulting, nostalgia, teaching, …?
I would also take this a step further and ask: _What do you need in the database? You mentioned things like Excel files. Are those necessary within the focus and intentions for this database? Remember, just because it’s easy to keep things, that doesn’t imply keeping everything is expedient.
But why? As I mentioned previously, permissibility does not imply expedience. You should have good, thought out reasons for putting that volume of data in a single database. And so far, we don’t know much about the OP’s use case or hopes.
The files are a combination of reference material (secondary source treaties, articles, etc.) and old client files. I consider the old client files “research” material as I will often refer back to an old client file where I maybe did a certain type of pleading or transactional document. For example, I might want to see all the past instances where a client filed an 850 Petition; or if I’m drafting a trust I may want to see if there are any past instances of using certain distribution language.
I am in a somewhat similar situation as an expert witness in legal cases
If your client files contain only material that is now public domain because it has been filed in court without a seal or restriction, then I would suggest it would be easiest for you to keep all of these items in one database. If the database gets too big for your hardware and you start to get beach balls or other performance issues, you can decide whether to upgrade your hardware or split the database then.
But - importantly - if your client files contain confidential information which you would not want to inadvertently be mixed with your reference material, then you might want to split it into two databases.
Remember you can search across all open databases. Not a reason to keep everything in one database just for searching.
My main criteria for deciding segregation of databases … if there is usefulness to use replication which can only be across the same database, I keep all that stuff in one. I have never even looked at the size of my biggest and most useful database as i have never noticed any performance degradation.
Because I don’t need access in DEVONthink to the other 450 GB of data (documents) a regular basis. It’s there on disk. I can search with FoxTrot. And if a block of documents needs to be swapped into a database I’ll index the relevant folders while the need exists, then remove the index when I’m finished. It’s not a performance rationale (my M3 has 36 GB onboard), cramming a database – my daily working space – full of dust is untidy.
We each have different ways of doing things - nothing wrong with that.
I find it easier to keep everything in DT and search for it there. I recognize that for some advanced searches FoxTrot may be more performant or more capable.