Indexing vs. Importing

As my heavy use of DTP continues on half a year now, I’m finding myself again and again asking the question as to whether I should index or import my documents. Here are the benefits of indexing over importing:

  1. Does not strip away Word formatting.

  2. Files remain referencable (using file:// links) from all other applications.

  3. Files can be located via Spotlight.

  4. Files can be rearranged external to DTP using the Finder, with a simple Synchronize to incorporate the changes.

  5. Files can exist on separate volumes! This is important to me, as I have 3 categories of data: Private (on an encrypted volume), Reference, and Offline (for really huge stuff I don’t want to carry around everywhere).

  6. My database file does not become an enormous monolith (which is problematic when doing rsync updates, it copies nearly a gigabyte every time there are changes).

  7. Searching and replication still work great.

  8. PDF updates are easier to incorporate.

----- The advantages of importing over indexing:

  1. Files can be edited directly in DTP (or easily in TextMate using “Edit in TextMate…”).

  2. Labels are persistent (Finder labels do not synchronize to DTP).

So I wonder, if I don’t use DTP as my editor (which I never do), why really would I want to use Importing? It seems to cost me more than it gains.

John

John, there’s nothing wrong with your analysis, as Indexing clearly meets your needs. And currently, Indexing is especially appropriate if you have a large collection of Word files.

You might add one more advantage of Indexing, which is a somewhat lower memory footprint for loading the database, as well as smaller size of the database package file itself.

I’ve gone in the other direction, creating a number of topical databases that are self-contained (Imported) and which in total contain well over 100,000 documents. In my case, easy portability of the databases is important, and that was my driving reason for this approach.

There’s no ‘right or wrong’ prescription for choosing how to capture information. DT Pro provides flexibility to meet your needs.

I would encourage new users to start small, perhaps capturing just a few folders of files that especially interest them, and experiment. It’s usually a bad idea to start by ‘dumping’ the contents of a hard drive into a database, as this may or may not work well given the RAM and storage resources of the user’s computer.

The Backup Archive script (DT Pro’s Scripts > Export > Backup Archive) is highly recommended to make sure you have current internal and external backup of your database(s).

Thanks for the reply, Bill! Also, I forgot yet another advantage of Indexing:

If you capture a website locally (using SiteSucker or something similar), Importing it will break the links between pages (and links to the captured images), whereas Indexing will allow you to browse the site naturally.

John

I just watched the videocasts and realized that a) I hadn’t nearly begun to scratch the surface of what DT can do and b) I probably didn’t need to buy Yojimbo :-/

It’s like figuring out that the world isn’t nearly as flat as you thought it was.

I too was leaning towards indexing rather than importing, but one of the videos says that indexing is not recommended by DT, but didn’t really go on to say why.

Can someone elaborate?

Are imported files NOT visible via Spotlight?

They are not visible to Spotlight yet (until 2.0, from what I’ve heard).

After switching over to completely indexing everything, I’ve recently switched back to importing everything except for websites (which are not navigable after you do an import). So I keep a “Sites” folder where I put all website snapshot, and I index them in as needed; but everything else is imported.

Importing lets you edit-in-place, and it makes it easier to snarf a webarchive from a website (well, I could write a script to save it and then index it…). Importing just feels more solid, I’m not sure yet why.

John

Actually, all indexed (external) files are also indexed by Spotlight, although Spotlight has no way of knowing which files are also Index-captured into your database.

My own preference is for self-contained (Imported) databases, as I can easily move them from one computer to another, or distribute a database on DVD.

Warning: Don’t forget about those pesky Word .doc files, as these remain externally linked even when Import-captured. For those who do a lot of work using Word, Index-capturing those files may present fewer problems, i.e. ways of making mistakes when editing and saving.

Further to this thread, I have just tried importing an OmniOutliner file, but DTP seems to only be indexing it. That is, it seems to be there, and I can open it, but when I do it opens in OO, not in DTP. This isn’t a problem, really, I’m just confused. Does it matter? And is the document still searchable by DTP?

Thanks!

When you import files that DTP does not recognize natively (and you have to turn this on, in the Preferences), it will “psuedo-import” them. That it, it copies it into Database.dtBase/Files and then Indexes it. So, when you double-click on the Omni file, it is opening an actual file inside of Omni, and yet the file does sort of live within the Database package, it’s just not in the actual DT database proper.

I have found that such files are neither searchable by DT, nor searchable by Spotlight! It seems that Spotlight omits digging into packages.

John

Now I’m even more confused, not by your explanation, which is very clear, but because I thought it handled OPML files more easily than that.