Perfect! Thank you very much. I have now managed to replicate the database, and the file integrity errors have gone.
I wonder what caused DT to mess up the DB? I suppose we will never know, but I am going to make sure I export all the documents, and work with indexed files in the future.
Just my opinion, but I donāt think this a good plan. I know you think imported files were the root cause of your file corruption, but without more evidence Iād be reluctant to conclude that. Simpler and I believe (?) more robust imported rather than indexed. Just what I think. Your mileage may differ.
If you do index everything, then please make sure you read and understand the advice that is documented about indexing in the DEVONthink Handbook
If I were you, Iād look into the robustness of the sync service you use. I scanned above and didnāt see at first glance what you are using. Syncing is very complex and frankly once a file gets corrupted (perhaps in file transfer) and not noticed, the corrupt file could just continue to sync itself in the corrupt state forever as if nothing happened. DEVONthink is sometimes just the āmessengerā about file corruption, and not the ācauseā. Just keep this in mind. Interruptions in synching can possibly happen. I know for sure sync services have a lot of technology in place to prevent/detect that, but computers are what they are and not infallible.
Syncing is very complex and frankly once a file gets corrupted (perhaps in file transfer) and not noticed, the corrupt file could just continue to sync itself in the corrupt state forever as if nothing happened.
And we continue to monitor and put measures in place to mitigate such transfers when possible
I am using the default Cloudkit sync. This seems to work well. However, I am equally happy to use dropbox or any alternative if it is seen as more reliable.
You are quite right that the problem may have been introduced anywhere in the process.
I will definitely look into the pros and cons of indexing in more detail. My concern is simply that if things go wrong, the restore process is very fiddly, because of the need to cherry pick the files.
I have just have had 3 scares where Dropbox, in front om my eyes started to delete ALL 4000 files from a dropbox folder.
Just one dropboxfolder that happened to be external indexed by Devonthink.
With any sync service, you have to be very careful about inadvertent propagation of user-generated errors.
If someone with access to one of your devices accidentally (or maliciously) deletes a large number of files, most sync services will happily propagate the damage to all connected devices, typically without asking for confirmation. Ditto for the contents of shared folders. This is not āthe faultā of DevonThink or any other application, itās the sync service behaving exactly as designed.
For this reason, trusting a sync service as your only backup for critical data is a terrible idea.
It would be extremely helpful if Devonthink computed a crypto āchecksumā which and kept it with the meta data.
That way we can verify each fileās integrity whenever we want. Or is that already done?
I have had quite a few files disappear over the years and still feel I canāt really trust DT/DTTG because of it. Itās much better than it used be now you have the new iCloud sync.
I donāt like to store critical data like tax returns and lost a bunch of files by accident when though I was deleting a tag (favourites) but it actually deleted the files. Time Machine came to the rescue but it was quite tough to get them back.
Is there some way to compare two database, a live one and a restored backup? I would like a list of files that differ or are missing.
Itās worth building this into your weekly review/digital housekeeping if you do such a thing (and if you donāt, itās well worth doing generally ). Our databases are very important, so the least we can do is check that theyāre ok and that CloudKit/Dropbox hasnāt done something daft. I also do my weekly back-up then (yes, I know I should back up more regularly) as I know that the back-up is accurate at that point in time. If I know Iāve made some big changes to my databases (e.g. moved a lot of files around), I will sync to my iPad and iPhone while Iām at my desk so I can check that the sync has carried over correctly (thus preventing any surprises a day later when Iām away from my Mac and need a file).
(Worth noting that I donāt believe this is an issue unique to DT, itās just more āvisibleā because weāre in control of the sync. One of my note apps this week had duplicate files due to a corrupted sync, and asked me which file I wanted to save - and of course they were different )