I am new to DTP. Can I “safely” delete files that are tagged as duplicates in the smart folder “duplicates,” or does that folder contain both the original and the duplicate?
A DEVONthink smart group doesn’t really “contain” any documents; it can be considered a virtual list of documents that are actually located in other non-smart groups.
Documents deleted from DT smart groups (using Data > Move All Instances to Trash (Option-Command-Delete) will be deleted from their non-smart group locations.
If that explanation doesn’t make sense or is incorrect I’m sure someone else can do better.
The smart folder lists both files. If you see two identical files, you see all that are existing, there is no third identical one.
Don‘t use >Data > Move All Instances to Trash, because the original is also an instance as the duplicate is an instance and the replicant is an instance. Therefore all files would be trashed then.
The right way to delete duplicates while keeping the original is > script menu > data > move duplicates to the trash.
Be careful!
You have to control what DT lists as duplicates. Some files with identical names, but not the same content can be in the list. I also experienced that single files were shown to which no obvious duplicate was listed, which maybe is happening because of tags - I didn’t check for tags for those files.
Edit, 23-11-2010: I have to correct the following passage in italics. I cannot reproduce all tests that I made, because not all files are available anymore, but one I could and noticed that I made an error - I did not mark exact duplicates. The fault was mine and I think as of now, that the script in DT works correct. I apologize for this.
I extensively tested if the script works correct and to my astonishment I must state, that it does not always. I don‘t trust it yet and I am still trying to find a pattern to submit a report to the support team.
E.g. I have 100 pictures and 100 duplicates of them. I marked all these files and let the script move the duplicates to the trash. If the action works as it should, you will find 100 files in the trash.
I tested that in the beginning three times and it worked perfect. I continued and at some time I noticed that the original was deleted sometimes too. I did restarts (system and app), emptied the cache etc. and created new databases to make new tests. Sometimes out of those 200 pictures 150 were moved to the trash. My actual impression is, that errors come up, when I mark only parts of which is listed.
Whatever is happening here on my otherwise absolute fine working system, you should anyway control what is listed and what is moved to the trash. When you mark 100 files that really match the criteria ‘duplicate’, 50 of them should be trashed. When you mark those 100, the preview pane informs you about how many you have marked, so controlling is easy.
Regards,
Bernd
The script below takes a different approach than the inbuilt “Duplicates to trash” script.
It is important to understand the introductory comments in the script. DT has no concept of “original” file. Duplicates are files that DT considers equal. (See sjk’s comment and the Help file for why.) The first file selected in a group of duplicates is the “original” at that time. If the group is sorted and another file is the first in the selection, then it is the “original”. Is is not really the “original” anything, it is just primus inter pares. Thus this script, and the inbuilt “Duplicates to trash” script will keep the first file in a selection and move the others to the trash. Sort the selection, and you’ll get a different result.
Use this at your own risk. I have not tested it in your database.
-- Locate and Delete Duplicates
-- Uses a different approach than the inbuilt Duplicates to Trash script.
-- The script will delete all duplicates of the CURRENTLY SELECTED record.
-- It is important to know that once a duplicate is made in DT that there
-- is no concept of ORIGINAL -- duplicates are equal to one another.
-- Thus, the record that is selected is ARBITRARILY considered the "ORIGINAL"
-- and any other record is a DUPLICATE.
-- So, in a pair of records that are duplicates, the first one that the
-- script encounters (in theSelection) is the original and the second one
-- is the duplicate. If the order of records in theSelection is reversed, then
-- the identification of original and duplicate is reversed.
-- Use this script at your own risk.
-- Losing all of your data is your own risk.
tell application id "com.devon-technologies.thinkpro2"
try
set theDatabase to current database
set theSelection to the selection
if theSelection is {} then error "Please select some records"
repeat with thisItem in theSelection
set theseDuplicates to duplicates of thisItem
repeat with thisDuplicate in theseDuplicates
set thisDatabase to database of thisDuplicate
move record thisDuplicate to trash group of thisDatabase
end repeat
end repeat
end try
end tell