Only import files which don't already exist in a database?

At one point, some time ago, I did a full export of my large (100gb+) DT3 Pro database. Now, I want to delete this export, but due to a few happenings, I am nervous there may be some files in the export that do not exist in DT, and I don’t want to lose them.

What is the most efficient way to import only files which don’t already exist? Should I index the external folder and then set up a smart group to show me those files which aren’t duplicated? Or should I import all these files and then use a smart group?

Or…something else? I’m currently running an index job on the folder to test, but it’s taking a very…long…time.

Thanks in advance!
d1rewolf

1 Like

I’m currently running an index job on the folder to test, but it’s taking a very…long…time.

Indeed, it would - indexing 100GB+ of data. And no, I wouldn’t interrupt it now.

Did you do a full export at some point then continue using the exported folder outside DEVONthink?

Correct, but if I did use it, it was an accident. I just want to gut check to make sure all those files exist in my DT database if that makes sense.

The interesting thing also is that indexing seems to have locked up DT. I left it with the spinning rainbow pinwheel overnight, but just had to kill it as it continued to spin this morning and was unresponsive :frowning:

If this should happen again, then a sample of the frozen app would be useful.

Sure thing…are the instructions for gathering such a sample?

Just launch Apple’s Activity Monitor application (see Applications > Utilities), select DEVONthink 3 in the list of processes while (!) it’s frozen, choose the menu item View > Sample Process and send the result to cgrunenberg - at - devon-technologies.com. Thanks in advance!

Ok…I’ll do that next time it happens @cgrunenberg. Shall I assume that indexing and then using a smart group is the right approach for this sort of thing?

There’s no smart group condition to check whether files already exist in the database, therefore the only option is probably a script.

@d1rewolf considers to index everything and to use a Smart Group that checks for duplicates. Not sure whether it’s a good idea to index 100gb+ into a database that already has 100gb+.

@pete31 That’s right…that’s what I was thinking to do. @cgrunenberg, specifically following your advice here: Compare two groups and look for files which *aren't* duplicated between them? - #2 by cgrunenberg

That’s a suggestion related to two groups, not a 100GB+ database! :flushed: :slight_smile:

Good point @BLUEFROG. I’ll do it on the command line instead. Is there a way to safely kill an index process?

Force quit, while not ideal, is the only option.

What are you planning on doing on the command line?

@BLUEFROG my thought was to roll through the DT3 database Files.noindex directory, and generate checksums to a text file. Then, do the same for the exported directory. Finally, compare the two to find checksums which exist in the exported directory checksums but not in the DT3 one.

I’m very happy to take suggestions from the expert though :wink:

@BLUEFROG any thoughts on this approach? Thanks in advance!

Oh, and just a side note…I did finally force quit because indexing is still running, and yet, when I launched DT again, it resumed indexing… :flushed:

And now, sadly, it hangs on Inbox verification and just spins…

@cgrunenberg since it’s freezing, albeit under slightly different circumstances, I sent you the sample. Please let me know if you have any ideas what to do, or if restoring from time machine is my only path forward.

Thanks,
d1rewolf

Actually, as soon as I was pressing send, DT came back reporting inconsistencies in the database. So, no sample sent, but now it continues to index. Indexing makes the app very slow to respond :frowning: Any suggestions? If I close the database and restore from backup, will it still try to index?