Thousands of .html documents with only a hyperlink in it - what to do?

All of a sudden I noticed several thousands (!) of .html documents in my DT4-database. Each .html document has nothing but one hyperlink in it, to a pdf-file. What are these .html documents? They are cluttering my database, but I’m not sure whether it’s safe to delete them? FYI a lot of my file were imported from Evernote (over a year ago), no idea if that’s got anything to do with it? TIA!

Importing from Evernote is indeed the most likely explanation (assuming that you don’t use any scripts or other automation tools on your own). Is the addition/creation date of these files a recent one?

I thought they were added recently, but they show much older dates. Are they safe to delete?

There is nothing wrong there. It is part of how Evernote created notes. Your PDFs were not embedded into the notes in Evernote. They only appeared that way for your convenience. The PDFs were stored separately and linked into an HTML document (the .enex file).

When importing these notes, DEVONthink imported them as-is. So unless you added other text to the note(s) in Evernote, yes you can delete them.

You could also choose Script menu > More Scripts, switch to Smart Rules, and install the Clean Evernote Imports rule. Double-click the newly installed rule and choose the Evernote import group in your database in the Search in dropdown and close the predicate editor. Then Control-click the rule and choose Apply Rule. This should remove the files with only the links and the now extraneous groups created for them.

Thank you! It looks like that worked. I think I know what triggered this whole thing for me. I am running DT4 and upon restarting my Mac, DT3 accidentally started automatically. In Apple Mail, I had a rule running to automatically save mail attachments (pdf’s) to DT, and that got stuck in a loop when DT3 opened. That added a lot of pdf’s to DT (copies upon copies). When looking at my files, I saw all these .html’s, that must have been there for a while, but I think I just never noticed them. I cleaned those out. Now to remove all those duplicate pdf’s that came from mail, without removing the originals. I have those smart rules to find duplicates, but I’m not sure if they delete all of the copies of a file. They need to leave one copy/original behind of course!

You should not have DT3 and DT4 installed on the same machine. As is clearly stated in the document introducing the DT4 beta.

2 Likes

You’re welcome :slight_smile: And as noted, you should not have both installed on the same account, on the same Mac.