New to DTP - Organization Strategies question

I am very new to DevonThink (Pro), and have a very simple, basic question.

I have a wide variety of information, contained in all kinds of different formats (images, movies, music, scripts, Word .doc files, email, Excel files, etc), contained in a very well structured series of directories, that all have the common root ~/Projects.

In the past I have used a variety of information management tools, as well as leveraging everything that OS/X has under the hood, including AppleScripts, Automator actions, perl scripts, etc…

DTP looks like an absolutely wonderful way to get a much better overview of all my data. So, my question is this:

I have roughly 3 million different files in the afore-mentioned Projects collection. What I would like to do is use DTP to access this entire collection, but also maintain my directory structure, and the ability to work with the individual files simply by using the Finder, Spotlight, whatever Apple decides to throw into OS/X with 10.5, etc.

I would like to be able to work with – for instance – a rich text file, sort, organize, manipulate the data within DTP, and have the changes reflected in the original file(s) on the filesystem.

Is there any specific advantage to simply importing the entire Projects directory with it’s 3 million+ files, into a DTP database, or can I simply drag and drop the whole thing, and make it use links to reference the original files.

Far more important: Can DTP reSycnhronize itself to reflect changes/additions to my Projects directory that occur OUTSIDE of DTP. (For instance, I add a new directory to ~/Projects/ … I would like for this change in the filesystem and any new data/files to simply show up in DTP, without having to manually keep track of/re-import the files.

Is what I need to do feasible within DTP? Are there any significant disadvantages to using the filesystem itself to store the files, vs. using DTP’s database(s) (what are they BTW, MySQL, SQLLite, something else?) My entire collection of information is well tagged, sorted, etc. I would like to leverage all of this, and add the additional level of granular control and wide-angle overviews that DTP appears to offer.

Thank for you any insights and/or suggestions. I am very new to DTP, and very impressed with what I see. In short, a very solid foundation with a series of tools built on top of it, that leverage all the core Apple technologies I am already making use of.

EPS

Eugene, your questions are far from simple or basic. :slight_smile:

First, in the Import mode, the current version of DT Pro stores text-type files (plain text, rich text, HTML and the like) in the monolithic “body” of the database, which is loaded into memory when the database is launched. Other file types, including PDF, postscript, images, QuickTime media and (if Preferences is so set) “unknown” file types are copied into the Files folder inside the database package file, and need not be loaded into memory when the database is opened. (When DT Pro 2.0 is released, all files will be stored in the Files folder, so that the memory required to load the database will be reduced.)

My main database comprises about 20,000 documents containing about 20,000,000 total words. The majority of those imported files are text-type, so are loaded into memory.

This has consequences for RAM usage and operational speed. I’ll confess to being somewhat spoiled, as I expect very speedy searches and other actions in my database. So I try to minimize the slowdowns that would happen if my computer were to make heavy use of Apple’s OS X Virtual Memory, as that involves swapping data to and from hard drive storage; of course, data reads from disk are much slower than reads from physical RAM.

Although the 20,000,000 word size of my main database is by no means a “ceiling” size, it works well for me because it holds a very comprehensive and growing document collection reflecting my interests in environmental science and technology and policy, plus “holding” groups that I periodically spin off as separate databases, or send to existing separate databases as new material is added, e.g. from web browsing or DEVONagent search result transfers. As a practical matter, this database size lets me get search results in a few milliseconds for most short search terms, and See Also results appear very quickly on my MacBook Pro 2.0 GHz with 2 GB RAM, or my PowerMac G5 dual core 2.3 GHz with 5 GB RAM.

My PowerMac has two 500 GB drives, and I have two external 500 GB FireWire drives for backup. The drives hold a great many files from years of collecting information. I haven’t – and probably never will – put all of that material into a DT Pro database. The Documents directory alone on the PowerMac holds more files than would fit on the MacBook Pro.

I’ve never considered DT Pro as a Finder replacement or supplement. I use it to manage and help me “mine” information of special interest, in topically-oriented databases.

I often need to use some of my databases away from my office. My MacBook Pro has only 100 GB drive space, much less than the terabyte of the PowerMac. So by creating self-contained topical databases I’m able to carry around with me the material I need for a project or meeting. That means that I’ve used the Import, rather than the Index mode for capturing files. Although the Index mode requires less memory than a database created using the Import (copy) mode, it is less portable. As a consequence of making my database self-contained (with the sole exception of Word .doc files), I could care less about the original files and folders left back in the Finder. I do, however, make external backups of my databases whenever significant changes or additions are made.

As I add and organize new material to a topical collection, I do it in the database rather than in the Finder. I will often use DT Pro’s Classify feature to help me decide where to put new material, and often use the replicate feature to place an item into more than one group if that’s appropriate. So in time my database structure will diverge from the original structure of the folders and files in the Finder. I see no need to mirror my database in the Finder. But if needed, I can easily export the contents of the database back to the Finder, and the Finder contents will have the file and folder organization corresponding to the groups and document organization in the database.

So, would I place 3 million files in a single DT Pro database? I suspect I would want to see how much RAM can be crammed into Apple’s Intel Pro Macs, and experiment with DT Pro 2.0. :slight_smile: And the Index mode of capturing data, perhaps with deselecting certain file types in Preferences > Import, would make more sense than copying everything into the database.

But first I would question what you mean by an “overview” of the files, and second I would look for topical “splits” in that huge collection. Calling everything “Projects” doesn’t necessarily mean that the contents are topically cohesive.

I don’t like spinning balls. I like speed. :slight_smile:

Bill, thank you very much for taking the time to write me a very detailed reply; my understanding of what I may want to do with DevonThink has increased considerably.

My ~/Projects directory is a completely arbitrary starting point, and contains symbolic links to other directories and portions of the filesystem anyway. As mentioned, it is already well organized, sorted, tagged, and can easily be broken down into 45 or 50 much smaller databases, which would then presumably function in a speedy manner. My main machine is a MBP with 2GB RAM, which has a FW800 mirror to the internal 100GB disk, and a G-Technology RAID containing 1TB of redundant disk. Hmmmm, great minds think alike? Here’s hoping for a MBP with 64bit Merom, and at least 4-8GB of RAM onboard in the near future!

Really, I think that kind of focuses me down to two further questions:

  • If I have an entire series of smaller databases, can I link them to one another, when I find it useful to sort/sift/play with data, that may pertain to 3 or 4 possible databases? In other words, can I tell a database to look at another database (or 5) when giving me an overview, or do I have to create a whole other “monolithic” database, and group all this data manually?

  • My PRIMARY concern is still the “live” filesystem, and effectively needing two copies of the same document (one inside DTP, and another in my external filesystem). Primarily these are the ubiquitous .doc files, which completely litter my filesystem. While I am not a big fan of Word, “word processing” and “document” has become synonymous with Word, inasmuch as nearly everyone else I have to collaborate with is concerned.

I guess my last main hurdle/question would be… What exactly do I do with all the doc files? Right now they are scattered all over my Projects directories; a directory may hold images, text, rich text, html, Pages, Word, PowerPoint, Quark, InDesign, etc … in short I am organized by content/topic, not by file type…

So, I get 3 more .doc files in email that pertain to a project I am working on. I tag them, throw them into the appropriate directory, and … THEN WHAT? Can DTP update itself with regards to linked files, so that the changes in particular portions of my filesystem are automatically reflected in the appropriate DTP database, or do I have to add another step to my workflow, and constantly re-import these files?

Thank you very much for your time and insights.

EPS

Hi, Eugene:

You will likely want to consider using Index capture of files. In that case, DT Pro will capture text for searching and analysis and link to the external files. Note that information will be lost if the external files are deleted, or moved in such a way as to break the Path link.

The initial organizational structure of your database will correspond to the Finder structure of the captured files.

You can establish a synchronization (one-way from the Finder to DT Pro) between folders and groups. That can accommodate additions of new content both to your Finder folders and to the database. Read up on that in the documentation for a bit more detail.

Files captured via the Index mode cannot be directly edited in DT Pro. Instead, use the toolbar Actions > Launch Path to open the corresponding Finder file under its native application (or Actions > Open With to open and modify the file under another application). There will be one-way synchronization of the modified file, e.g. a Word file, from the Finder to the database content.

Remember that DT Pro cannot read all file types. “Unknown” file types include PowerPoint, Excel, Pages and others not listed in the documentation as recognized. Note that you can ignore such unknown file types if Preferences > Import - Files leaves the option “Unknown file types” unchecked.

If you check that option, however, in Index mode DT Pro will create an empty link document to the external file. In Import mode, an empty link document will be created and the file will be copied to the Files folder inside the database package file. Your database will hold a name for each such linked file and metadata such as date of creation, etc. Although you cannot directly add text to the empty document, you can add plain text comments and notes to the Content field of the document’s Info panel.

If it’s important that the content of an unknown file type item should be in the database content, e.g. the content of a Pages file or Excel sheet, this can be done by “printing” the file under it’s native application. When the Print panel appears, press the PDF button and choose “Save to DEVONthink Pro.scpt”. You will be asked to designate the group to which the PDF version is to be saved. Now the text and other content of the “unknown” file are available for viewing, searching and analysis in your database.