Why do people prefer importing rather than indexing?

Hi, I would like to throw out some questions here.

I recently started using DEVONthink and am trying to figure out when to index and when to import files.

I’ve read through several forum discussions and noticed that people generally prefer importing rather than indexing.

But why is that?

I’m studying multiple subjects as an undergraduate, and PDFs I use usually don’t contain just one topic.

For example, suppose there’s a book titled ‘Math and AI.’ If so, in DT’s approach, it is impossible to see ‘Math and AI’ book both appear on ‘Math’ Database, and ‘AI’ database, without indexing, isn’t it? (Duplicates aren’t a good approach since they double storage space even within DT, and Replicates don’t work across databases) So, would I have to use a “Books” database and create subfolders like ‘AI’ and ‘Physics,’ then replicate files across these subfolders?

Also, doesn’t importing files into DT limit the effectiveness of search tools like HoudahSpot that rely on Finder, since DT stores files in its own structure?

I originally thought of DT as an additional organizational layer built on top of Finder, which I use as the basic storage tool. For instance, all video files go into ‘Videos,’ images into ‘Pictures,’ and text documents into ‘Documents’ within Finder, with DT serving as an additional layer. Videos might include downloaded movies, personal trip videos, or lecture recordings, and this kind of information would be organized within DT.

From this perspective, the import feature falls short, since a single file often needs to appear in multiple databases. It seems only useful if one file exclusively covers a single topic, which is quite rare in my experience.

Have I missed something important here?

1 Like

Inside a database it’s possible to replicate items to multiple groups.

2 Likes

The thing to remember about DEVONthink you can do as you like.

Indexing files allows you to do things with the files with other apps that you mention. Importing allows you more access to DEVONthink features designed for imported files.

As you are indexing, see the DEVONthink Manual and take note of the additional cautions and advice about Indexing that you should be cognisant of.

3 Likes

Yes, I know, but that doesn’t seem like a good approach though…

Then the users should create ‘Semester 4 handouts’, ‘Physics’, ‘Math’, and bunch of other groups in the ‘book database’. Thus, what if the book is part of ‘Physics’ class, which contains lecture video, which is a very common case? Should I rename the database, put all the videos inside and make another group? But then, the sync feature would sync the videos with DTTG3 too, since sync feature doesn’t allow specific file, but the whole database inside.

In what context will you want to retrieve the information? A mathematician will use the “Math and AI” book differently from an AI researcher.

Remember that search exists, and therefore creating a detailed hierarchical outline of your material is generally not necessary to find things.

Remember that you are laying the foundation for your entire career. That is, you may find that some material is ephemeral and can be discarded after the semester ends, but some may still be a useful reference for years or even decades. You may prefer a structure that makes it easy to separate the two.

DTTG’s “Sync On Demand” feature is configurable at the Group level.

3 Likes

I would put the question the other way round: why would you index something instead of importing it?

In my case I have a single database which holds material related to multiple different topics and areas of my life. These days, I don’t expend much effort in organising items into groups – I prefer to use search to find things, which is blindingly fast and often turns up items I hadn’t thought of. (My database totals over 32 million words and search results happen in milliseconds.) Having only one database means I don’t have to worry about where to put things, nor where to search. I know it is in the database. The only exception to this is pdfs of academic papers, which are all handled by Bookends. However, I do index the Bookends attachments folder, so I can still search those pdfs within DT.

When it comes to pictures and videos, I don’t store those in DT. I keep them in Photos. There are advantages and disadvantages in both approaches.

9 Likes

Personally, I think that is the wrong way to think of it. To my mind, DT and Finder have completely different purposes. Finder has to exist because everyone needs to save and open files. DT is a tool for research, writing, admin, and various other activities, and it has tools to support that sort of work.

3 Likes

Yes, that is exactly why import isn’t working.
The book maybe for Semester5, and moreover, Semester 6, or moreover.
So…the book should appear on S5, S6, not that every single time I should search the name of the book, or tag by ‘S5’, ‘S6’. The problem gets bigger when materials are much more than several things.
Isn’t import feature just putting book on ‘exact’ place, which ins’t ideal when it is referenced in various way? That’s exact reason why I don’t prefer hierarchical outline of my material.

For me, translating it into what my hunch is your are doing, I would be importing everything into a single database, and a group called “References and Books” or something. Then to assemble the selected references and books for a particular semester, I’d create a Group for each semester and then replicate the applicable documents into the appropriate group. This avoids duplication. Still easily found either by hunting for it in the Group Structure, or a quick DEVONthink search–ad hoc or a stored Smart Rule.

4 Likes

As well as replicants, don’t forget that tags offer what is effectively an additional folder-like hierarchy independently of the group structure. This is another argument for a single database for all your academic materials, whether you go with importing or indexing. (The choice between those is really about whether you need to sync your database between devices, which is a case for importing, or retain filesystem access to the folder structure, which argues for indexing. It’s slightly easier to go from indexing to importing than vice-versa, thanks to the Data => Move Into Database command.)

5 Likes

Your data is your data. But if it were me, I would think more about topics and less about “projects.” That is, a paper about the mathematical foundation of AI might be relevant to Semester 4, Semester 6, and your thesis project, but looking in the “mathematical foundations of AI” group will always find it. You can use a tag to identify “Semester 4” materials if that’s useful, but how important will that grouping be once the semester ends?

Also, a handy tip for books: DEVONthink can split PDF-based books into multiple files based on the Table of Contents, and then you can sort the individual chapters based on your own organization.

6 Likes

<Why I Switched from Indexing to Importing (And Why You Shouldn’t Overthink It)>

Hi. I used to be an indexer. Now? I exclusively use importing. What changed? My use case and my priorities—I spend a lot of time in transit (planes and trains) and I have to rely on my phone to get work done, but over the course of the day I’ll use a phone, an iPad, and two computers, so importing avoids sync conflicts, user error, and quirks of indexing. DEVONthink is flexible enough to accommodate rather significant shifts in how you work, whether you’re reorganizing your entire workflow or just having a midlife crisis about digital storage.

<There Is No Perfect System (But There Are Many Trade-Offs)>

I encourage you to experiment with both indexing and importing, but keep in mind:
1. There is no perfect system.
2. There is definitely no app that does everything for everyone.
3. Any system you set up today will, at some point, make you question your past life choices.

There are always trade-offs. But for me, importing ultimately made more sense with my commute.

I am a history professor. I accumulate absurd amounts of data—far more than my computer, much less DEVONthink, could ever hope to contain. Every aspect of my work—teaching, research, administrative tasks—demands access to a mountain of digital resources on external SSD drives. And when I say “mountain,” I mean avalanches of PDFs, videos, archival documents, museum catalogs, and various sources that I swear I’ll read someday. Several terabytes.

For every single class reading or handout students see, I have dozens of supporting research articles, primary sources, video materials, and other references. A single day of teaching can involve several gigabytes of material. I wish I were exaggerating.

<My Folder/File System (A Necessary Evil)>

To manage this chaos, I’ve settled on a folder/file system for mission critical stuff that is, frankly, a pain in the butt to organize upfront (when creating, editing, and gathering sources). But in the long run, it functions well as an archive and remains stable enough that I can link to external storage reliably.

Some key elements of my system:
• File Naming Conventions: Everything follows a yyyymmdd_keywords format (e.g., 20240310_medieval_trade_routes.pdf). This helps me quickly locate files (HoudahSpot is one option) and allows automation tools (like Hazel) to handle them efficiently. Only things I know I will repeatedly refer to (key research projects, class materials, etc.) get sorted into files and folders. If it’s not mission-critical, it doesn’t deserve the time or effort, and just gets a descriptive name before getting tossed into the pile somewhere.
• DEVONthink Database: Contains plaintext files (yes, plain text, not even Markdown) with lecture transcripts, research notes, and all sorts of things in a “zettelkasten” manner.
• Wikilinks & Bibliographic Tracking: In my files I use DEVONthink’s wikilinks to reference internal files (text) and keep track of related resources using bibliographic data (complete with page numbers) — think of it as a footnote with a file name appended to it. If I am at my computer, in my experience, I can pull up any file I need after a few seconds of searching. In rare cases, usually with decades-old stuff, it takes a few minutes, but I rarely “can’t” find something.
• Digitization & Search Workflow: I digitize everything—even textbooks. With a simple name search, I can find what I need. If it’s a frequently used source, I extract pages and import them into DEVONthink. But since that’s labor-intensive, I only do it for mission-critical materials. As a student (lifelong learning), I generally leave the textbooks and other sources in my digital library without importing then into DT. There are a handful of exceptions.
• Why DEVONthink? If I’m primarily working with plaintext files and storing everything on an external SSD, you might wonder: Why use DEVONthink at all?
The answer comes down to flexibility, integration, and convenience—even for those of us who keep our notes minimalist and portable. All my notes are linked, encrypted, and immediately available on every device (using my phone on the train now). I find myself frequently moving from my phone (now), to the iPad (in class), and to the computer (in my office). I do have mission critical stuff (my current research project) with some data in various other formats, but only a few gigabytes. Yes, you can build a plaintext-only knowledge base with something like Obsidian or a simple folder structure, but I prefer the flexibility I get with DT. For example, I went on a research trip the other day, moved a bunch of photos, PDFs, and other data into DT for the trip, referred to it while wandering through the mountains to find a largely forgotten historical site, and then deleted the data when I got home (I only needed it for that trip and the originals are on the SSD). The data was useful then, I’ve still got my notes about it in plaintext, and those notes will probably get linked to a lecture transcript someday.

<Final Thoughts: Embrace the Chaos (But Make It Searchable)>

If you’re struggling with whether to index or import, my advice is simple: pick whatever makes your life easier today. Your future self will probably curse your past self no matter what you choose—so at least make sure your system is functional enough that you can find what you need, even if it’s buried under a digital avalanche of historical texts.

Personally? I’m definitely not using DEVONthink to its full potential.

I’m not out here crafting elaborate taxonomies, embedding custom scripts, or running AI-powered classification algorithms. I’m just doing the minimal amount of organizing necessary to get stuff done.

And yet—DEVONthink is open all day, every day, on all my devices.

That’s the real test. If an app is always running, always useful, and never in the way? Then it’s doing something right.

Good luck, and may your archives remain (somewhat) organized.

9 Likes

Preamble: I don’t index if I can avoid it.

I ditched Houdah recently, even though I liked it a lot. Reason? Spotlight indexing was making my system very slow, and Houdah relies on it., Nowadays, if I need to find anything outside of Devonthink, I use FAF - Find Any File. A terrific app.

I would (and do) consolidate the books in one database. That’s how I do things.

Search limitations? Devonthink is more than enough for me. Search / tags / smart groups / labels.

DT is not an organizational layer on top of the Finder, it is my PIM, personal information manager. I capture, author, organize, share in DT. I live in DT, all day long. The Finder? Nah. Little use for finder labels, comments, file and folder dates. Too brittle.

Indexing. I got in trouble too many times by indexing Finder folders that I later moved, deleted, or stuff happened to them somehow. DT is golden, and all my databases (20+) are backed-up religiously, to many destinations, cloud included.

I’ve made my databases fairly orthogonal, but in the rare event I need to search something across, DT’s search allows it.

I suspect other people may use indexing a lot, I think it all comes down to personal way of doing things. But if you keep what should be semantically kept together in a database, why not?

1 Like

I bet there are some ways to take the pain out of organizing for class readings or handouts.

Tags come to mind, as a way of creating an on-the-fly group of documents of interest. Browse the database, tag things of interest with something like Monday Handout, and then you have one place to look for the information you need.

You could make maps of content, as the productivity nerds call them.

That’s a document with links to things that relate. I would do that in Markdown. Use item links (or wiki links) in a Markdown document to link to the component documents you want to use.

Another trick I like for building an outline out of a lot of ideas is transclusion. I think you’ll need to do this in a Markdown document, but Markdown is just plain text. A few characters have meaning for formatting, but it’s plain text otherwise.

The way I use this is I’ll write notes about what I want to say. To reference facts in other documents, I’ll transclude them. The Markdown document becomes a little dossier with notes and supporting evidence.

Transcluding a document is done with {{document name}} or {{x-DEVONthink-item://4976E704-85BD-470F-B00C-B9E124ED3281}}

The incoming links on the other document show where you transcluded it, and the outgoing links in the main document let you visit the separate components.

While it takes a Markdown document to contain a transclusion, you can transclude plain text and RTF documents as well as other Markdown files.

Would any of that help?

3 Likes

My personal take is to create a database called “Stuff” and throw everything in it. Semester 5. Semester 6. Cable bills. Romance novels. Physics. AI. Origami. The PDF manual for the toaster.

Having multiple databases is not necessarily needed. After using things for a while, you may find that some stuff truly needs to be separate. Or maybe you won’t. My first instinct is always to put stuff in my main DB (which I call Filing Cabinet and gave a nice filing cabinet icon. (And it would be really nice if database icons synced)

I do use different databases. In fact, I have 35 of them right now. Most of them are for separate roleplaying games that are more libraries than workspaces. But I still have one DB that pretty much rules them all.

I initially indexed most things and often ran into problems with missing links when I had moved something in Finder rather than DT. These are easily fixed, but it becomes a pain. I then stood back and looked carefully at what information I actually needed to be accessible from more than one application. This turned out to be remarkably little (don’t forget you can open any DT document in an external app by double click - if it is set in settings - or else by a key stroke).

The only information/data I needed to keep within Finder (on iCloud) and index in DT was:

  1. A single DT group containing all my markdown bird notes and blog posts - They needed to be available for my website app and 1Writer on my phone (to read in preview)
  2. A group (and sub groups) of my cookbook/recipes - I found Obsidian to be better for display here as I can have the index, recipe and ingredient graph displayed together which makes navigating and find recipes quick
  3. Bookends attachments - I use Bookends to catalogue all my reading and notes

Everything else I imported. An example of why this works for me: most ‘work’ documents come to me as MS Word files. Imported in DT I can preview them to quickly skim without opening in Word. If I do need to edit or read more closely, double click and they are open in Word.

Don’t forget, If you import a group and then decide it would be better back in Finder, it is easy to export and index again.

4 Likes

I then stood back and looked carefully at what information I actually needed to be accessible from more than one application. This turned out to be remarkably little (don’t forget you can open any DT document in an external app by double click - if it is set in settings - or else by a key stroke)

This kind of self-reflection would be useful to many people and you’re spot on here.

3 Likes

Various commentators have identified excellent strategies, and I expect that yours will evolve with use over time as mine have.

I have found that one large DT database is preferable over many small ones in order to enable replication. However, the size of the database(s) and type of storage used are also important when the databases get to be multi-terabyte in size. I keep a large, catch-all database on SSD storage, and a couple of other databases on magnetic drive storage, which is cheaper but slower.

Everything is imported, or created inside DEVONthink (which is not “importing”). A few cases suggest to me that indexing is a good path – each of these cases (for me) are rare. Thus indexing is not the rule for me.

  1. Documents frequently edited in external applications. (Usually when I edit the document in a Windows app under a Windows VM.)
  2. Documents that need version control. (This is a disappearing case for me.)
  3. Files you share with others. (The most common case for me.)

This is not a prescriptive recommendation for decision making. Just my own.

1 Like

I could be wrong. I haven’t seen a difference between an imported file and one created within Devonthink. A Markdown file imported from the outside seems pretty much the same as one created within DT.

I have seen some mild issues with some external editors. An editor that renames its input file to .bak or something like that leaves orphaned files behind. Easily remedied, and it’s been so long since I had the problem I can’t remember what application left backups behind.