Disappearing Files

Here is how I made DT Office eat files:

  1. Created a file with random data and pdf extension:

$ dd if=/dev/urandom of=Desktop/file.pdf bs=1024 count=10

  1. Pulled this file together with some others into the Sorter with command key pressed.
  2. The other files are in the inbox, but file.pdf is gone. No warning given. There is an entry in the protocol however.

It doesn’t have to be random data: Just giving a plain text file the .pdf extension is enough to provoke this behaviour.

This worries me, because I plan to import rather huge quantities of files into DevonThink. Imagine I lose 1% of my import. Even if it’s not a real pdf file, I would like to preserve it. It’s not unlikely that some files in my collection have the wrong extension. Just deleting those is not what I had expected.

The files have not been deleted, but they have been moved (by virtue of holding down the command key when you dragged them to the Sorter) and not imported (by virtue of being the wrong file type). The log window will display the files that failed to import and you can navigate to them by a) right-clicking on them in the log window and selecting ‘Show In Finder’ or b) in the Finder, go to ‘~/Library/Application Support/DEVONthink Sorter/Global Inbox/’ (assuming that the Global Inbox is where the documents were dragged).

I see. Is it that DevonThink generally cannot import files with known extensions but content that it doesn’t understand? Or is this limited to pdf files?

And is there a way to change this, i.e. tell DevonThink “Please do import every pdf file even if you don’t understand it”?

It is not a question of DEVONthink not ‘understanding it’. DEVONthink does understand that the file type doesn’t match the extension, so it is not imported because the document may be corrupt. Preview will not open one of these (non) PDF documents either for the same reason.

Even if DEVONthink could/can be forced to import these documents, personally I would rather discover that the documents may be corrupt at the time of import, so that I can reconstruct them or find a non-corrupted version in a backup. If I import them and then find out days, weeks, or months later that the documents are corrupt, it may be too late to do anything about it.