No duplicate recognition

Jwbr · May 15, 2019, 1:39pm

I’m seeing PDF files that are definitely duplicates not being designated as such. Same exact Creation and Modification dates, but not seen as dupes. Not in the inspector and not color change for the filenames.

cgrunenberg · May 15, 2019, 1:40pm

Is the word count of the files identical? See e.g. navigation bar above preview pane.

Jwbr · May 15, 2019, 2:08pm

OK, I’m confused here. In this case I downloaded a PDF file from online and did nothing to it. Yet the word count between one copy already in DT and one being imported is 6 words off. Confusing. Also, using the same 2 files in DT2 (one already imported, one to be) had them both show in blue as dupes in the Inbox.

Also in DT3 if I try t import a file I know is a dupe it appears in the Inbox then “poof” it disappears. Is this a new feature?

cgrunenberg · May 15, 2019, 2:17pm

Did you import the PDF documents on the same macOS version? Depending on the version macOS’ PDFKit results might vary. Finally, is the smart rule “Filter Duplicates” performed on import?

Jwbr · May 15, 2019, 3:40pm

I did recently update the OS, but I believe this was happening before that. I’ll monitor my files and see if there’s really an issue here.

The Smart rule was performing on imports. I removed this since in the past I’ve found that with DT2, PDFs save from the web from different accounts from the same site were designated as dupes. Unless something has changed in DT3 I don’t want have a file unnecessarily trashed.

Bozol · May 15, 2019, 4:01pm

Hi, the same is happening right now with me (macOS 10.14.5):

always the same scanner,
always the same document,
3 duplex scans with page 1 front,
3 duplex scans with side 2 front,
3 duplex scans with page 2 flipped 180º

results in 9 scans, 9 different file sizes and 9 different word counts.

How come?

cgrunenberg · May 15, 2019, 4:05pm

In case that you scanned each page multiple times the quality might slightly vary and this could affect the OCR engine. But if you scanned each page once and duplicated the scans in the Finder, then the results should be identical.

atdnorth · October 29, 2019, 10:31pm

I have a large number of journal articles stored in DT. They were originally imported in DT2 and I have since upgraded to DT3. In addition, I recently upgraded my OS to Catalina.

As part of the DT3 workflow, I wanted to apply smart rules to all imported journal articles. Thus, I exported all the journal articles to my hard drive and then reimported the articles into DT3 in order to apply the smart rule.

As expected, I now have two duplicate articles (the exact same PDF since I exported and then imported the same PDF). Under Preferences–>General, I have clicked “Stricter recognition of duplicates”.

The problem is that each pair of PDFs is identical, DT3 does not identify these as duplicate documents. I have attached a screenshot. The newly imported documents have a red label. In some cases, the word count is the same between both documents. In other cases, the word count is different between both documents. I have uploaded a screen shot to show examples.

What can I do to force duplicate recognition here?

Thanks.

cgrunenberg · October 30, 2019, 1:53pm

Could you please export such a pair of PDF documents and send them to cgrunenberg - at - devon-technologies.com? Thanks.

atdnorth · October 30, 2019, 8:02pm

I sent you copies today.