Can DTPO show which documents are uniquely held by DTPO?

EarthDemon · June 12, 2018, 11:15pm

I am avoiding the use of the terms ‘duplicate’ and ‘replicant’ as I am not sure I can wield them accurately.

I have imported a lot of stuff into DTPO (including 7000+ evernote notes) but I may want to start again and ideally would just like to completely reset DTPO - hence would want to export those back to Finder first.

Many thanks

E.D.

BLUEFROG · June 13, 2018, 12:05am

I’m not sure what the subject line means but you can make a selection in DEVONthink and choose File > Export > Files and Folders to a newly created folder in the Finder. (This just helps contain any ungrouped files.) You could then do as you wish: delete the database (or not), start a new database, etc.

I wouldn’t advise any deeper “reset” of DEVONthink than that.

EarthDemon · June 13, 2018, 12:17am

For clarity, supposing I imported files into DT in various ways, in particular using the import command within DT, and via the sorter. Presumably at this stage files have been copied and the original remains in Finder?

But then say some of the originals got deleted via Finder… if DT were completely wiped then data loss would occur.

And exporting all DT data into Finder would cause many duplicates would it not?

So what is the best way of ensuring no duplicates and no data loss?

E.D.

BLUEFROG · June 13, 2018, 1:46am

Correct.

Logically, yes, but you should be diligent in your local backups.

Not according to what you just said. If the originals were deleted in the Finder, why would there be duplicates when you exported?

It is still unclear as to any issue regarding duplicates.
Regarding safeguards against data loss, again, Time Machine.

EarthDemon · June 13, 2018, 10:15am

What I am asking boils down to this - can DT go beyond its own boundaries and provide info on files which have not been imported to within DT?

Conversely, can Finder reach into DT and see which files it contains?

Many Thanks

BLUEFROG · June 13, 2018, 12:46pm

If they are not indexed either, no.
Potentially, but only as a list of files (ie. not content) and it’s not something we suggest people do. So technically, yes. Practically, no.

Greg_Jones · June 13, 2018, 2:07pm

Assuming that you do not by design already have a large number of duplicates in the database, and also assuming the bulk of your documents in the Finder are in a central location (e.g. the ~/Documents folder), then there are ways one could hack what you are looking to do. Here is one example:

Backup your database.
I’d use a Duplicates Smart Group to make sure there are no, or least an insignificant number of, duplicates in the database now*.
I’d Index (NOT IMPORT) the Finder document folders into the root level of the database. This will only be practical if your documents are for the most part centrally located (~/Documents folder plus perhaps 2-3 other Finder folder locations. It’s not important that there may documents in sub-folders that you don’t have a need for in the database-you will reverse this later. For now you are looking to identify the duplicates of the documents that you do want in the database.
Create a new, Duplicates Not Indexed Smart Group that only identifies the duplicates that are imported (NOT INDEXED) into the database.
Select all the documents in the Duplicates Not Indexed Smart Group and assign them a unique tag e.g. ‘Duplicates’.

At this point you have now identified all the documents imported into the database that have documents in the Finder that DEVONthink has identified as duplicates*. You have tagged these imported documents as you do not want the indexed documents located in the Finder to have this temporary tag of ‘Duplicates’. It’s harmless if they do, but only clutters the Finder tags when you don’t need them tagged in the Finder.

Now you are going to Phase 2 of the process, where ultimately you will have a database that does not have any documents that are duplicated in the Finder*.

Make sure the trash in the database and the trash in the Finder are empty.
In your database, select the group(s) that you indexed in Step 3 above and move them to the database trash.
Empty the trash in the database, making sure that you select the option ‘Only in database’. This removes the Indexed documents from the database, and leaves them in their original location in the Finder.
Check the Trash in the Finder to ensure that no indexed documents were inadvertently moved to the Finder Trash.
Now you should no longer have any duplicates in the database, but you have identified all the previous duplicates with a ‘Duplicates’ tag. Create a Smart Group that identifies all the documents in the database with the tag ‘Duplicates’.
Select all the documents in the Duplicate Tag smart group and delete them, and empty the trash. Now you no longer have any documents in the database that have a corresponding document in the Finder*.
Now you can begin the process of importing the Finder documents into the current database, or export all the documents out of the current database and begin from scratch with a new database.

It is important to note that the way DEVONthink identifies duplicates is NOT an exact science! DEVONthink can, and does, identify documents that are very similar, yet not 1-1 identical as duplicates. As this lengthy, and convoluted process is destructive to some of your data, it is possible that you may delete documents in your database that were identified as duplicates by DEVONthink that in fact are NOT identical to the documents in the Finder.

As such, I would evaluate carefully if this level of work, and risk, is justified in restructuring your database. You have been warned! Good luck with it if you want to try it.

EarthDemon · June 13, 2018, 11:30pm

Thank you Greg that is extremely helpful and I realise I need to get my data more organised before bringing it into DT, this I will now do.