Importing, indexing, tagging — how do you handle it?

Hello everyone,

I’ve been using DT and DTTG for several years now. Currently I’m on version 4, running on a Mac mini M1 (Tahoe 26.3.1) and DTTG 4 on an iPad Pro M1 12.9 (iPadOS 26.3.1).

Recently I ran into sync issues with what I’d call a “near data loss” situation (thank God for backups). I want to use the upgrade to version 4 as an opportunity to consolidate my five databases and clean up issues that have accumulated over the years. The plan is to move all data from the old databases into a new one and tidy everything up along the way. So far, so simple.

Up to now, I’ve imported all data, organized it into groups, and used replicants to assign items to additional groups. But this has become a bit confusing. Also, I increasingly need access to files outside of DT. That works, but I’d like to avoid the extra steps (show in Finder / open in external app).

My idea: Only index files going forward. That way, all files would also be accessible without DT. It gets a bit tricky with files coming in via DTTG—they are imported first and would need to be switched. Alternatively, I could use a cloud folder that gets indexed.

I also see an advantage of indexing for my PDF archive: it’s a few hundred files, around 800 MB total. These could live in a cloud folder and be available on all devices, accessible with any reader. I wouldn’t need to sync a large database but could still use DT’s excellent search.

Combination of groups and tagging: Instead of creating many groups like before, I would reduce it to, say, one group per year. The actual organization of documents would then be handled via tags.

My questions:

Does this approach work well long-term, or can it get out of hand? I read in a forum that someone ended up with an unmanageable number of tags. How reliable is indexing in practice? I understand that I shouldn’t move or rename files outside of DT—but is it realistic to keep things cleanly indexed, or do you eventually end up with a mix of imported and indexed items? That would make backups more complicated.

How do you handle this? What structure has worked best for you in the long run?

Thanks a lot and best regards

Michael

(translated using ChatGPT)

I only use indexed folders if I need to access files outside of DEVONthink
It’s so much simpler to import files and have DEVONthink handle the filing

Combination of groups and tagging

For organization, I use tags; minimal groups
I reflect hierarchy in my tag names; for example Journal, Journal-Health, Journal-Finance

I would reduce it to, say, one group per year

What is the purpose of these year groups?

2 Likes

This thread may be of interest for all people that are interested in tagging… I got a lot of very interesting responses some time ago… Do you use tags in DTP?

1 Like

Recently I ran into sync issues with what I’d call a “near data loss” situation (thank God for backups).

Our often-spoken mantra: If your data is important to you, backups should be a top priority (regardless if you use our apps).

clean up issues that have accumulated over the years.

What issues?

The plan is to move all data from the old databases into a new one and tidy everything up along the way. So far, so simple.

Why one large database?

Up to now, I’ve imported all data, organized it into groups, and used replicants to assign items to additional groups. But this has become a bit confusing.

Replicants are optional. If they don’t make sense to you, don’t use them.

Also, I increasingly need access to files outside of DT. That works, but I’d like to avoid the extra steps (show in Finder / open in external app).

Those steps are almost always unnecessary.

My idea: Only index files going forward. That way, all files would also be accessible without DT. It gets a bit tricky with files coming in via DTTG—they are imported first and would need to be switched. Alternatively, I could use a cloud folder that gets indexed.

Indexing is not a panacea and we caution you to not commit to this until you’ve read and understood the In & Out > Importing & Indexing section of the built-in Help and manual.

I also see an advantage of indexing for my PDF archive: it’s a few hundred files, around 800 MB total.

800MB of PDFs is quite small.

These could live in a cloud folder and be available on all devices

But do you have a practical need for this?

accessible with any reader.

Again, do you have a practical need for it, i.e., are you actively using another PDF application on a mobile device. The Mac doesn’t matter as you can open PDFs from DEVONthink via Data > Open, Open With, or double-click to open in the system default application.

I wouldn’t need to sync a large database but could still use DT’s excellent search.

That is incorrect, especially if DEVONthink To Go is involved. You can disable syncing the content of indexed items but that will not work with DEVONthink To Go. Also, it won’t work if syncing to another Mac unless it also has access to the indexed files outside your database.

Combination of groups and tagging: Instead of creating many groups like before, I would reduce it to, say, one group per year. The actual organization of documents would then be handled via tags.

Feasible but I would build a parallel database and see how it feels. It’s easy to theorize but actually living with the new structure may not suit you.

1 Like

There are considerably more gotchas with indexing than with importing.

There are valid reasons to index - but you should only do it if you thoroughly understand the pitfalls. Even then it is hard to imagine a situation where indexing makes sense for all or even most of your data.

Importing with a solid backup plan likely is safer.

For sure import instead of index if you are at all in doubt.

1 Like

I will speak first with a few extreme counters to the seemingly conventional mantras here. By background, I am a moderate user, not a pro. I see DT as a tool offering the best options from two distinctly different approaches. You can bring Finder content to copy into its database and plan forever thereafter to be able to ignore the source content at the Finder (import). Or you can introduce a representation of Finder content into its database and prepare forever thereafter that you may have to keep track of what application did what to the content (index). The former treats DT as a fortress, where no one sees over the walls and everything that comes in or goes out is tightly regulated. The latter is an open commune, where everyone sees what everyone else is doing with everything.

My comments are as below.


I would import only when you have the fullest intent to take everything you just imported into DT, delete it from the Finder, and never plan to need to find those same files in the same places (folders) directly from the Finder.

I would never, when in doubt, trust to import first and index later. Rather bluntly, when in doubt, I would never trust to import or index. Rather, I’d read the friendly manual and map my use cases carefully to the pros and cons of each approach.

I would never import files that I intend to continue editing at the Finder level. I would index them.

I would never directly edit a file that is imported in DT and continue using the edited file as though it is the working version. I would create a copy of the file, rename the copy with a proper versioning system, and edit on the copy.

Before editing a file that is indexed in DT, I would take caution to assure that I am editing a version that is the most recent at the Finder level as well as within DT.


I can offer one example to use importing and indexing combined. Suppose that you have taught a course once a year over a span of a few decades. You have bunches and bunches (and bunches more) of files of various types assembled dutifully in “archive type” folders on the Finder organized with the folders named by course year. As it happens, you also have an active course this semester.

You’d like to write a textbook based on your course notes over the years. You know you have tons and tons (and even more tons) of duplicates and replicates and things that may be somewhat the same with just a few tweaks. You also have lots … of junk stuff.

Import all your prior year archived folders into DT. Create ZIP archives of each folder at the Finder level. Move the ZIP archives to an external SSD (the ZIP archive step could be optional if your SSD is large enough). Delete all the prior year folders from your internal SSD. From now on, work on the imported files only from within DT.

Index your current year folder into the same DT database into a clearly identified group from the others.

Now, go to town clearing out the clutter from the imported files, stripping down to the bare minimum of content. Realize that nothing you do to the imported files in DT will destroy the files that you stored safely as ZIP archives on your external SSD. Realize that you should be working on the imported files only to clear duplicates and replicates and files that should have been deleted ages ago. Perhaps you will also plan to use the AI tools in DT to assemble information from the imported files, creating a new document. Great! Store that new document within DT only when you expect that you will never want to access it from outside DT. Otherwise, export that new document to an organized location at the Finder level, delete the document in DT, and index the document back into DT. Agree that, if you ever make edits to files imported into DT, no one outside of your personal DT universe will ever see those changes. If you want someone else to see a document that was imported in DT and edited within DT, export the document back to the Finder level, rename the file with an updated version number, and go about sharing the renamed version.

In the meantime, also go to town updating the indexed files in this year’s folder, both from within DT as well as directly at the Finder level. Be happy that you can work on the indexed files within DT or using some Windows computer with remote access to those same files, and both places will (with some cautions) see the same thing, time and time again.

As for tags – your choice. Use them exhaustively. Don’t use them at all. But be well reasoned in why you will or will not use them. Otherwise, it is not the overwhelming abundance of or excessive absence of tags that kills good work, it is not knowing at the outset why you should or should not have them to improve your workflow habits. Do you use tags at the Finder level? What do you already know from this experience about why (or why not)?

In summary, I do not intend to strike fear and anxiety about using DT with my statements counter to seemingly conventional mantras. The example I gave about writing a textbook is indeed exactly on my roadmap for using DT. I have great expectations about the positive outcomes from this upcoming adventure. I hope instead to demarcate some clearer boundaries about why you should say YES or NO to certain practices. Sometimes, saying yes, especially doing so based on advice that says “when in doubt … do this”, can get you deep in thorn bushes rather quickly.


JJW

Why not? It’s bullet-proof. The only gotcha is to not CREATE a file within DT by any means except importing. No detailed manuals / rules to understand - that’s all you need to do.

Put simply - There are many posts you will read here over the years of people who lost data when indexing and wish they had imported instead. I can’t recall such a story when someone lost data due to importing and wished they had indxed instead.

6 Likes

I’ve modified to emphasize a caveat.

Let’s do a survey about the number of folks who edit a file that was imported in DT and then are fully lost to figure out why their edits never show up in the original Finder file. That number will not be zero. Hence, importing is not bullet proof.

Not really. If it were so, Devontechnologies would not have a manual with so much devoted to each case, and this forum would not be swamped with folks who continue to ask for which case is best for which situation (and why or why not).

I respect your expertise. I raise concerns whether it is a blindness to nuances and complexities that trip up others time and time again.


JJW

Why not?

1 Like

No data is lost in that situation so it’s a harmless learning exercise.

2 Likes

Certification.

I do not trust DT as a self-contained document editor. Absent the duplicate-first, then edit rule, one has no trusted way to return to the original source. And, folks who do not know well enough that DT is its own fortress cut off from the Finder can overwrite an original document at the Finder with a version edited in DT and thereby have no way to recover the original document.

Even if this happens only once, it is a failure. Perhaps a drastic failure.

EDIT: Also as such, this is not a harmless learning exercise. And, even when the file can be recovered, such “learning exercises” are not entirely harmless in any case.

To be fair, I give the same advice to anyone intending to make any change to any electronic document using any other application. It is however easier to teach folks why they need to make a duplicate of a Word document at the Finder level before they make the next round of revisions on that document using Word. It is less transparent to train someone why to do the same thing on an imported file in DT. Such messages as yours do not help.


JJW

Right. I’ve seen someone complain about that. But I don’t get it: Why would anyone expect that edits to a copy appear in the original? No program that I know of does that. Import, than delete from Finder. It’s as easy as that. But importing, modifying in DT and then expecting that the file you left lying around in Finder has changed?

It is. But it requires people to think about it and understand it. As does indexing, BTW.

And I seem to see a lot more reports of indexing gone awry because of a lack of understanding than importing. IMO, people can do what they want. If they understand what they’re doing and what the implications are.

What makes you say that? DT’s databases are just glorified folders that can be opened in Finder. Not that they should, but DT is definitely not cutting anything off from the file system (to avoid the term “Finder” here, that is just misleading, imo).

With how many duplicates do people end up finally? Why not simply advocate a strit backup regime? Or, if that’s really needed, a versioning control system? But making a copy of a document before editing it, and then another copy of the copy before editing that one … And all that in the filesystem, without any proper way of seeing what is what and what has been modified by whom?

7 Likes

Let’s agree for sake of discussion that this is a concern. You have just explained the problem and it is very easy to understand (and thus avoid) the issue as you explained it. So crisis averted.

But with indexed folders it’s quite nuanced - and quite easy for even an experienced DT user to relocate an indexed folder, indexed file, or indexed group and then either lose the data or lose track of it. There is no quick/easy explanation of “Just do this and it will be fine.”

To put it another way - I think it’s fine to get started with DT4 importing real data with the one rule mentioned above - it’s bullet proof that way.

But if one is to do indexing with DT4 I think it should be done first with test data or at least unimportant data or easily replaced data. Don’t move to real critical indexed data until you thoroughly understand the pitfalls and have seen the gotchas yourself.

3 Likes

Overall, I will not hold to any simple mantra about importing versus indexing. The simplicity of such messages fails to appreciate the common lack of understandings in the general audience about even the most basics of how to manage files properly at the Finder, let alone how to manage them when using a tool that can (and does) take over actions that the Finder does.

I shall leave at this.


JJW

2 Likes

But you think someone who cannot manage Finder can manage the nuances and gotchas of indexing?

With due respect, both importing and indexing should require nuanced forethought. Otherwise, I find that “when in doubt, do this” pronouncements cause the most trouble later for folks who do not fully comprehend the Finder.


JJW

3 Likes

First of all, many thanks to everyone who is participating so actively here and writing such detailed posts. Thank you very much for your time.

That’s arbitrary and could just as well be named something else. But many of my documents, such as invoices and bank statements, are only valid within the respective year.

Absolutely. No backup, no mercy.

Duplicate documents, messy organization. All on me—nothing that DT did wrong.

I find myself increasingly spending time copying documents and information back and forth between databases because I need them in multiple places.

Until now, it seemed like the right solution for documents that belong to multiple categories. For example, an insurance invoice belongs to the respective insurance policy, but also to the car and to the tax records. I used replicants for that. In the future, I plan to handle this with tags.

For one thing, I read, print, and email many of the PDFs I have. Many of them are press articles and specialist books from which I extract excerpts. Handling them through DT still adds an extra step on all devices.

BTW: my PDF database is 16 GB in size. No idea where the 800 MB came from.

These days, I’m experimenting with a test database and syncing via Bonjour to see what works best. I’ve read the manual on import and indexing, and I understand the basic differences. I also realize how quickly things can get messy if DT and I start moving data around independently :wink:

I’ve found some really good ideas in your responses and questions that help me find the right path. Thank you very much.

Kind regards, Michael

1 Like

If I’ve imported a file into DT, there’s a good chance that the Finder copy no longer even exists, because I deleted it. Maintaining multiple copies of a document is an easy way to create confusion no matter what software you’re using.

3 Likes

Folks who do not fully comprehend the Finder are going to find themselves in difficulty regardless of what software they use or how they use it. It’s like saying that people who can’t read will find themselves in difficulty if they try to get information from a newspaper. That may be true, but it’s not the newspaper publisher’s problem to solve.

4 Likes

My index/import strategy is a bit different but aligns with some of the comments here. It could be nuanced, but I prefer simple choices. I index in two cases: with shared directories that others who will not use Devonthink (even a server version) and with very large datasets that will be used by other software. In both cases, the decision is based on external software and processes using the data. My use cases within DT is for search, “connecting the dots” of related information, and some of the new AI analysis capabilities. All others I import.

Additionally, I too have recently combined a few of my databases to eliminate duplications between them and better enable replication. I’ve begun using a group to contain a single paper with any closely related files - annotations, AI chat summary, etc. Then I can replicate the group itself to any other needed areas. For instance I may have multiple projects that will reference the same paper. I can replicate the paper’s group to any such project group and quickly access all particulars. Any new items that are specifically related to that paper I want to keep I can put in the paper’s group and find it again easily. I had separate databases that I wanted to replicate across in this manner so I combined them. I don’t use tagging very much and I don’t really want to think about how to use tagging for similar purposes while this seems so easy.

Imported data doesn’t exist on my file system otherwise. And yes, backups are your friend regardless.

1 Like