Indexed databases - weird and dangerous DT behavior when split into several new databases

I have a indexed database on local external drive that grows quite large (over 700GB), and today I decide to split the database into several new databases, when a series wired behaviors happens, and dangerous.

I create a new database, and drag the selected group(s)/folders from the old database into the new one, there is a grey-out group icon in the old database after you drag it to the new one, and I put it in DT’ trash can, then close the newly created database, and repeat the cycle until all new databases are created and transferred, now everything seems ok.

Then a few minutes later, I notice DT starts indexing items for all the newly created databases, even they are closed-status and from the old database, and all the folders/items on the external drive are untouched during this whole time, so why index again?

and the DT trash-can suddenly appears to have more items (over 100k) than in my just-operated-databases, and when I choose to delete the trash-can, and carefully selected “Only in Devonthink…not folder and files on the local drive”, DT still deletes all folders and files on the local drive.

I have little clue where goes wrong, maybe there is complicated logic behind the simple drag-n-drop of Indexed-database?

How many items did you try to drag and drop between databases?

Just drag and drop of first level group(s), while it may contain several thousands items or more.

Also, is there a way to stop the auto-indexing-checksums operation by Devonthink?

now I want to recover the files in Finder-trash-can, but its folder/subfolder/file structure have been lost so there is no easy way to just “Put-it-back” as you delete files normally, so I have to rely on Devonthink database to show me the folder/subfolder (or group/subgroup) structure and the files in, for these deleted items…but while I re-create the folder/subfolder and put the items back, Devonthink keeps its auto-indexing-checksum, and it just mess things up even more.

it’s like a cleaning bot persistently following its routine, but no way to pause it unless you unplug it, but I do need the file structure from it so I have to keep it on… so it’s a total headache

I have a few after-thoughts, and I don’t know if it’s good or bad ideas, but to make actions consistent and maybe simpler.

  • remove the option of “Import files and folders”, only keep “Index files and folders”;
  • then, in addition to “Move to trash” (which is a Finder-function performed in Devonthink), add “Un-index” option to just remove selected items out of Devonthink database records, with the original files untouched;
  • another user option to pause the auto-indexing-checksum, just in case

this may help reduce unnecessary risks, and keep user-action, Devonthink and macOS-Finder in consistent.

I also have a few questions, like

  • when is a good time to “Update items” and “Update indexed items” from File-Menu-option;
  • and when is a good time to “Empty cache”

No, yoou can’t stop the checksumming process.

Also, transferring several thousand items is no simple, easily reversible method.

Your requests are noted, but this one will surely not come to fruition…

  • remove the option of “Import files and folders”, only keep “Index files and folders”;

There are good reasons why indexing is not the default method for getting documents into DEVONthink.

  • when is a good time to “Update items” and “Update indexed items” from File-Menu-option;

Usually used when indexing items in a cloud-synced folder, e.g., indexed Dropbox items. Also, can be used when opening some document types, e.g., .xlsx files externally and editing them. The way Excel saves the files can cause an erroneous checksum report in DEVONthink.

  • and when is a good time to “Empty cache”

This is generally used with web-related content, e.g., unexpected styling or behavior in HTML-based files.

Thank you for your reply and clarifications.

Just feel I stepped on 2 red-lines yesterday, split-database and delete-files in Devonthink :slight_smile:

You need to remember – or realize – databases are not mere Finder folders. Moving masses of data is not doing a simple file move between two locations. Each database is an independent entitiy, with its own internal storage as well as metadata about its contents. So when you move a document from one database to another, data has to be purged from one database’s records then subsequently added to the receiving database. Now multiply that by thousands in a single move and I hope you’ll see the potential problems at scale.

So it’s not that it can’t be done. It’s a matter of “once the train is rolling, there’s no easy way to stop it”. :slight_smile:

Well put.

I have observed that the ease with which new data can be added to DT3 hides the complexity of the database backend and can lead to surprises when trying to do the same thing at scale - and even worse when trying to interrupt such an at-scale process.

Ultimately what I have realized is that if there is some major change I want to make regarding moving items to a different database, it is best to not do that on a production database. I make a copy of the relevant databases and do the reorganization on a 2nd Mac so it all happens in the background.

1 Like

Sorry to hear about the OP’s problem. For what it is worth, I have encountered problems with massive changes to indexed items as well. It’s uncommon, but a possibility, so it’s good to take precautions.

Before I do any major moving or deleting of files, I make sure to (1) select the groups in DT and “move to external folder” to catch any stragglers that are in the database but not in the indexed folder (occurs when creating something new in DTTG), (2) update indexed items, (3) sync, and (4) copy the relevant folders to an external drive. Of course, I also have regular backups.

If you do all of this, even if things go wrong, you won’t be stressed over it. Obviously, perfectly smooth operations would be ideal, but I assume there is a lot of complex stuff going on in the backround, and I consider this limitation to be a relatively innocuous issue that can easily avoided by just making incremental changes.