Import vs index (again)

I’m new to DT. After spending some time on the forum, it is clear that importing and indexing are both options. But–why would I want to do one and not the other? I’ve not found a list anywhere that says, “If you import, you’ll get (higher performance? automatic indexing? [I’m guessing here])” “But if you index, there is [this cost] and [this benefit].”

And I also gather one may choose to index some things, and import others. But I have found nothing to help me decide what to do with which files.

I’d love to know that I just missed the page in the manual, or did not search the forum correctly. Please lead me to a forehead smack! :slight_smile:

Joel

1 Like

As you saw there have been many discussions in the forum about Indexing or Importing content into DEVONthink databases.

You didn’t find a discussion that conclusively resolved the question, “Which mode should I use?” Nor should you. That’s because the personal preferences, workflows, and uses and the locations of data collections vary among the user community. Nor is this an either/or issue. You may decide to Index certain items that are shared externally (e.g., shared files via Dropbox, or the PDFs in citation manager database), and Import others.

I generally prefer self-contained databases, and so try to avoid Indexing. Even so, I do Index a few items in one of my databases. I usually note that I prefer the organizational environment within DEVONthink, which provides tools such as Classify that are not available for organization in the Finder environment. And I like to be able to move my databases among my computers with minimal worry about any external files “dangling” from them. Almost all of my Indexed files are iWork documents shared with my Macs and iOS devices via iCloud, so that doesn’t cause missing file messages when a database is moved from one Mac to another that holds my user account.

I’m not you. I don’t share data with another database application such as Papers or Endnote. I don’t use data that’s structured from outside, such as client or other information that must be maintained externally to my DEVONthink databases.

There are power users such as korm and Greg who make heavy use of Indexing, for reasons that are entirely valid for them.

DEVONthink provides the flexibility of adapting to your choice(s) of Indexing and/or Importing database content. Think about this and make choices that best fit your wishes.

Thanks, Bill, for both the speed and breadth of your reply!

You gave me one benefit of importing: The ability to classify, not available if docs are merely indexed. And another: No “dangling” files, when moving between computers.

Also, I read in one of the threads–in a message from you, I think–that the files are not altered in any way when they are imported. They are merely “wrapped” and can easily be extracted, unaltered, from the “wrapping.” So that would seem to eliminate a major fear of the uninitiated, that of being absorbed by some Borg…

Both you and some others have mentioned creating separate databases for separate topics. I understand doing so because of professional/personal boundaries. Are there also performance considerations?

And finally, someone pointed out that importing does place some restrictions on access via the creating program. Any guidelines there?

For me, the huge potential of “See Also,” in terms of mining my rat’s nest of articles I’ve written and stuff that I’ve saved over the years for new value, makes it worthwhile to wade through some complexity. There’s gold in them hills! :slight_smile:

Joel

1 Like

No, that’s not the case. Nor what Bill wrote - he said Classify was not available in the file system, which means Finder does not have such a feature. DEVONthink’s Classify is available for indexed and imported documents. Under the covers, DEVONthink knows as much about indexed documents as it does about imported documents – the word concordance, the patterns, frequencies, etc., that it uses for the AI features such as Classify.

It’s sometimes easier to test claims than read postings about them. I suggest making a database and import several folders of documents. Then make a second database and index the same folders. In all respects the databases will be the same – except for indexing / importing. Now test them. See if you notice differences that matter.

korm’s comment above is correct: Classify and other tools for filing are available in DEVONthink for organization, whether documents are Indexed or Imported.

Every now and then I tend to reorganize a collection of documents, either to improve the functional meaning of the organization to me (and, hopefully, to Classify), to move documents from one database to another, or to prune obsolete or less useful references. That can get confusing for Indexed items, especially if I want to move some but not all the contents of an Indexed group to a non-Indexed group, or to a different database. The confusion is amplified if I need to keep the corresponding external files and folders untouched except, perhaps, moving some items that appear only in a database out to an Indexed folder. I try to avoid puzzles about what “belongs” where, so I feel constrained about reorganizing Indexed items. Tip: When in doubt, replicate rather than move Indexed items (but replication doesn’t work across databases).

But the largest constraint on reorganization is on the external files & folders. Renaming Indexed folders or files will break DEVONthink’s paths to them and result in those “missing file” errors. So will deleting those external items, as will moving them outside their existing hierarchy. The safest approach for organization of Indexed folders and files is to place them within a top-level folder the name of which doesn’t change, and Index that folder.

Re access to Imported files by an external application: Yes, while any file in your database can be opened under an application capable of opening it, using commands available to the user, there are limitations here. For those (few) applications that can properly interpret DEVONthink’s Item Link, that’s a good approach. DEVONthink databases store files within the Files.noindex folder inside the database, using Apple’s filesystem. At first blush, one might think an outside application could link to the path of an application stored in Files.noindex and access it whenever needed. But for performance reasons, the paths to those files can change. Se we have to caution that this approach is unreliable.

Re reasons for creating multiple databases: The original reason I split my document collection into multiple databases was for performance, as my old TiBook had only 1 GB RAM, and it slowed to a crawl when asked to work with a large database. My current MacBook Pro Retina with 16 GB RAM, but as I have about 250,000 documents among all my databases, it would choke on a single database that attempted to hold everything.

There are other valid reasons for creating databases that meet a specific need. My research collections hold references and notes about different topics. The performance of Classify and See Also is improved by those topical designs. I have other databases in which Classify and See Also don’t work well, but are irrelevant, as in the case of my financial database, which holds, e.g., invoices and receipts classified by tax year.

1 Like

Thanks, Korm and Bill. I can see we are once again reaching the point that has perplexed me in attempting to follow the many other threads on this topic.

Korm, you said:

What is there about the comparison you suggest that makes it preferable to your simply stating the likely outcome? What differences should I look for? Performance? What, in brief, is it that makes it hard to answer the question, “What are the relative benefits of importing vs indexing?”

Joel

Well, sometimes it’s better to test software than read about it? Just suggesting you use it and see which method you like.

I second korm’s recommendation. Reading about DEVONthink isn’t as useful as playing with it, to see what happens. Playing can be the best form of experimentation and learning.

Start by creating a new database and importing files into it. That copies files from the Finder, and nothing you do to the files in your database will affect the original files. You can play with editing, merging or splitting documents without any damage to the originals. If you screw up everything, no harm done. Close the database or Quit the DEVONthink application, delete the database and start over.

Also play with a similar Indexed new database. But remember that the external files will be affected when you do document edits, merges, splits, etc. Assuming you don’t do anything to those external files, you can treat this database as a learning plaything, Quit the DEVONthink application, delete the database and start over.

Here’s one direct answer: If you create two identical databases in content, one Imported and one Indexed, you won’t see any performance differences, so long as your computer has enough free RAM available. If your computer runs out of free RAM, slowdowns may occur, of course. But the text indexes and metadata of the Indexed and Imported databases that must be loaded into memory do not differ significantly in performance or in memory requirements.

I sure would appreciate at minimum a sticky topic in this forum that covers importing v. indexing, such sticky topic to be updated by the DT team as needed. Many times I have wished I had the time to play around with test databases to try and discover the differences in how DT behaves with indexed and imported files, but found myself asking the same questions as the OP of this thread:

Even better than a sticky topic in this forum would be a real treatment of the topic of indexing v. importing in the DT manual, maybe even with a table that covers what happens with various DT functionality if files are imported or indexed – maybe even this as a decision tree.

There’s no shortage of advice in this forum, sprinkled about in “this and that” posting, but as far as I can see, there is no single organized, complete, definitive, and up-to-date and officially maintained treatment of the topic in terms how DT behaves with indexed v. imported data, tips and tricks, benefits and drawbacks of each approach, and so forth. I believe such a treatment is sorely needed. Who knows what such a treatment might reveal in terms of unleashing DT’s power.

But as it stands now, every time I think “I’ll have to look into this some more,” I find it just too complicated to unravel by looking for tidbits in threads in the forum, and I give up out of frustration and lack of time to recreate this wheel.

Formally: I request that the DT team create such a writeup on the topic of indexing v. importing. Thanks, DT team.

@Shoolie, I think you’re right - a sticky post might be useful for some readers. I’d add, though, that there is nothing missing from the above or in any of the existing lengthy posts on this topic in the forum. The answer to “should I import or index” (as it is with most features in any software) is always “it depends on what you want to do”. The OP’s “what I want to do” hasn’t yet come out in this thread though it can be useful info for the other readers when answering the question.

Which explains the index/import decision in terms of the user’s requirement for accessing files – this is the fundamental reason for indexing instead of importing.

Joe Kissels’s book is endorsed by DEVONtechnologies and sold by them, and has an extensive in-depth discussion of import vs. index. Joe takes the same approach as the manual.

The decision point, again, is “If you want to maintain access to a file in the Finder

Korm’s point (via Joe Kissel) that if you want to maintain access to files within Finder, then indexing is the best strategy. I occasionally wish to share material that I’ve gathered for research projects with others who do not use Devonthink and are on Windows. This can be a problem, as I make liberal use of OS X’s ability to use special characters in file names (eg. a file which includes an archival repository call number such as M4/B/3). When adding such a file to Dropbox, I need to edit the name - tiresome, but necessary for the intended recipient, otherwise the file won’t appear in their Dropbox folder. Maybe I’m implementing file sharing in a klutzy manner.

For me, indexing research material works best. I have my databases on an SSD, with research files on a secondary HDD located in the optical drive bay. Yet every now and then I review my use of indexing as there is occasionally a 1/2 second delay when I click on an indexed file, probably due to SATA 2 restrictions for the HDD. This isn’t a huge problem for the moment, and the benefit outlined above outweighs that drawback.

It is slowly dawning on me, from the patient and explicit responses to my questions in this thread, that I may be making more of a mystery of this matter than it warrants. Let me try to express what I’ve understood so far:

Importing gives me a neatly packaged single database, that makes it easy to move to other machines.

Indexing leaves the files accessible via Finder, for ease of access from other apps, and for traditional organization (for visual recollection of organization, perhaps).

There are no performance advantages to either approach.

It’s ok to mix and match, indexing and embedding, although with care, lest excessive complexity be thereby introduced.

Have I missed anything?

That is mostly accurate, however, note that you can very easily change your setup from one to the other by choosing “move to database” or “move to external folder”.

Experiment with a backup database if you rely on replicants a lot tunil you know what to expect.

It’s worth mentioning that ‘Move To External Folder’ is only an option for groups that are already indexed. In other words, one cannot just select group(s) and/or documents(s) contained in a database, right-click, and move the item(s) out of the database and into the filesystem (Finder) unless the top-level group is already indexed in the database.

Though this utility (for moving groups externally and indexing them) can help get to the same point.

Even if you have a self-contained database (i.e. imported) you can decide at any time to move the actual content to any folder in the Finder without too much trouble. Here’s how:

  1. Create a folder in the Finder, name it sensibly
  2. Inside Devonthink, index this folder (it is empty in the Finder so don’t be surprised to find it empty in Devonthink after indexing it)
  3. now move everything into that group in DTPO that you want to appear in the Finder
  4. right-click that group and select “move to external folder”. Done

It sounds a bit involved but takes only a minute to do or less. I vaguely recall Bill DeVille recommending this earlier.
This is not to say that there is anything wrong with korm’s script. I haven’t tried it but he is one of the Devonthink wizards and knows what he is doing.
HTH
Prion

All perfectly fine ways to move imported documents to a Finder folder that is indexed. The point that I wanted to make to the OP, who I gather is new to DEVONthink, is that one cannot just right-click on contained groups and/or imported documents and select ‘Move To External Folder’. You’ll need to run the script linked above or perform additional steps like the ones mentioned in this thread to first create the indexed structure.

Greg
so was I (assuming the OP was new to Devonthink). The preceding post was only meant to avoid a possible misinterpretation of your comment, which is 100% correct.

@OP:
GJ falls into the same user group as korm :smiley: