I use DTPO to store all my email. I do this by archiving all my email in a database. Last year for a short period I decided to use Email Archiver which converts all your emails into PDF. I’ve decided I want all my email in one place and imported the PDF emails into DTPO also.
My problem is that some of the PDF’s are duplicates of emails already in the database. The duplicate finder won’t pick this up. I have some 15,000 PDF’s and have no desire to go though manually. As not all PDF’s are duplicates I cannot jut delete PDF’s. In the listing the eml file in DTPO and the PDF file carry the same date and timestamp. Email Archiver adds the date and a letter to the email name as below:
Original imported email name:
Google Apps: sign-up confirmation and next steps
Duplicate PDF name:
2011-12-02 09.40.12Z Google Apps sign-up confirmation and next steps.pdf
Is there any way to find all the PDF’s that are duplicates so I can remove them?
Any help would be much appreciated!