Sorry to bother you with this but it has me puzzled…
I’ve just added seven sample sections of an O’Reilly book to my DT PRo database. Each sample is a differentally sized .pdf file consisting of the same first page (the cover image from the book in question) followed by a different number of pages and (obviously!) different text.
So why does DT think the files are duplicates when they are not? (The file names are all Egyptian blue and getting info on one reveals that it has six duplicates).
My prefs are set to ‘Copy files to database folder’ and ‘Use built-in pdftotext’.