DTTG 3 is corrupting files

So it could be that the problem happened as part of the sync? That would render tonight’s fix at least halfway useless. Puzzled. We’ll investigate deeper tomorrow.

Maybe I can send you a console.log from DTTG 3, because I haven’t deinstalled it yet. And I also have these corrupted files. I didn’t opened them all and some of them are generated by a Scansnap ix1500 in 2018.

→ I’ve made a ticket via DTTG 3 and also put in some additional screenshots I’ve made after discovering the problem with corrupted files.

But no negative feedback at all and no own encounter of a problem does indicate that there is no problem. Not even “there” could be specified more precisely.

This is where the beta testers come in. And I assume that most of them are a bit more attentive to and communicative about even small misbehaviours of a software and not just eager for the newest features.

I was and still am one of these beta testers of both DTTG and DT and I can assure you that my focus of testing lay on synchronising. Because my experience with iCloud (legacy) syncing was really, really bad. Synchronisation sometimes took days even for a small number of small files. So I created a database for the sole purpose of getting stuff into DT and gave it the telling name Charon. And for some content I even bypassed DTTG completely by using an iCloud Drive folder I indexed with DT.

This changed completely with DTTG 3’s iCloud kit which I have experienced as significantly faster and more reliable.

Just another white swan, of course, but from someone who volunteered as a black swan spotter.

I don’t do Twitter as they have overstepped and become political. My concern is that an app has been released with a major bug

I wouldn’t have noticed one thing were it not for the post by @Blanc. Many of my files are sitting there until I need them, and I’ve never had the intention to start looking for 0-byte records until today. And yet there they are. And they might have been there much longer, who knows?

That’s one of the reasons why the absense of feedback isn’t necessarily a good reason to estimate the size of a problem, although of course as you state it tells you something . It’s a matter of where you place a ‘camera of quality’ so to speak.

The chance whether some aberration can actually be detected also comes into play, but ultimately the amount of permutations can increase significantly in comparison to a pool of beta-testers as suddenly all kind of behavior is performed the beta-testers never had a reason or possibility to try out. What happens if you use multiple machines with various operating systems and versions of DT or DTTG in unison on a network for example? Was that tested? If you use 2 different macOS versions, 2 different iOS versions, 2 different DT version and 2 different DTTG versions spread across 8 devices it might be hard to say what combination led to what outcome I think. Yet DTTG supports iOS 13 and 14, Big Sur has been introduced only recently, and obviously DT and DTTG have had multiple updates.

At least my version of this bug (see other thread: Incomplete sync in DTTG3?) has nothing to do with opening the PDF files. DTTG3 was already listing the files in question as “0 pages” before I opened them. Also, I rarely annotate PDF files in any way, and definitely hadn’t annotated the ones showing the problem.

Now, it’s possible that there are two related bugs with similar symptoms, as my version also doesn’t appear to affect any other copies of the database.

I’d like to add some findings to the case. I retired my Mac and use a mobile-only setup with one iPad and one iPhone successfully for a couple of years now. The content of the sync store is comprised of an initial upload from a Mac database and later additions from my mobile devices. I also experience the spread of corrupted files since I migrated to DTTG 3 on my iPad.

Those files are already listed as corrupt—no need to open them to become like that.

Most corrupted files do not contain annotations of any kind.

The files were before accessible by DTTG 2 but now they aren’t.

I also discovered that (in my case) only files were affected that I added from a mobile device (that is by DTTG 2) over time. All the files from the initial Mac upload seem ok (for now). Every* corrupted file I have examined so far is still intact on the very device it was uploaded from and only seems to became damaged as soon as DTTG 3 synced the database. I cannot imagine how this could’ve happened. Is there any kind of two-way data flow when v3 downloads metadata from a legacy iCloud sync store?

*Files that were uploaded from mobile hardware that I have upgraded in the meantime and therefore had to be re-synced to all of my devices are broken as one might expect.

1 Like

Thanks for the follow-up. Knowing the files originated in iOS/DEVONthink To Go is a good detail to know.

Good advice, just ran on mine and found 3 PDF with size of 0. Those 3 are in Inbox which means they haven’t been processed / read yet either on Mac or iPad if that helps in anyway to track bug.

Gents, I have been working with PDF around with betas and had no issue with them. My use of DTTG is entirely (or most entirely) related to use PDF. Across beta I’ve re-synced, re-installed, updated, lost all my databases (no problem as I have strong security copy policy), but never had a corrupted PDF.

My scenario is quite complex: I have about 400 GB of PDF files in Dropbox/iCloud Drive, all indexed into DT/DTTG, plus about 3.000 more inside DT/DTTG. I have annotated indexed same PDF and at same time by error in DTTG, macOS PDF Expert (opened from inside DT and outside in file system), iOS under PDF Viewer, and I’ve got some duplicates or lost last annotations because this kind of collision, but never a corrupted PDF.

Talking in defence of beta testers, the reason to be one of them, is to report all bugs and glitches.

2 Likes

How do you know?

Because I review periodically all my PDFs, or said with other words, I use FreeFileSync to find “differences” between my used collection in Drobox and iCloud Drive and my “master” copy in my NAS.

If I see something strange, like change in size or date/time (as sometimes use to happens due syncs and re-syncs) from not touched PDFs, I verify those by opening them.

Edit to add: But be careful with FreeFileSync, as it does not understand iCloud Drive placeholders. You can use more Mac friendly tools like Sync Folders Pro, or create a script that generate fingerprint for all your files, that is a pending thing I have.

2 Likes

With the additional information from @Blanc, @Solar-Glare, and @chrillek and some help from @sherlockholmes and @johnhwatson we have adjusted our theory about what happened:

  1. At some point in the past someone added PDFs (or other files) to DEVONthink To Go or DEVONthink for Mac. Maybe, the indexed path of the document was set wrong (there was a bug here that we fixed a while ago), or the PDF was corrupted through some other means. And all this was synced.
  2. The user now upgrades to DEVONthink To Go 3. The data store is copied and all is still well, all files are still intact.
  3. Now the user switches to a different sync method, presumably to CloudKit.
  4. This forces DEVONthink To Go to download all metadata again and merge it with the local copy migrated from version 2.
  5. Depending on many factors the sync determines that the PDF needs updating and downloads it from the sync store — and receives a zero-bytes file. The harm is done.

This scenario would explain the error messages that some of you have seen: Couldn't move *.pdf into the database package and why the problem occurs together with the switch to DEVONthink To Go 3.

Using an observation @chrillek made and @Blanc’s smart group I also found a few of these empty PDFs, all imported from Scanner Pro in April 2020. This would confirm the theory that the actual cause of the problem lies far in the past and just manifests now with DEVONthink To Go 3 and the switch to, e.g., CloudKit.

Now to confirm: Those of you who experienced this problem: Did you switch your sync method together with migrating to DEVONthink To Go 3? Did you receive error messages like the one above?

As a guard against data loss we will add a check for documents with a size of 0 bytes (excluding Markdown and text files), stop the sync if we find any, and activate a new smart group listing the affected files.

3 Likes

No. Solely WebDAV.

Yes, but much earlier than the DTTG3 release. They a\occured with the ‘UUID already present’ error.

I agree the 0-byte files might have long existed when looking at the file dates, but would also point you once more to my observation a databsse contains both 0-byte and regular PDFs with identical file-names.

In my case, the file was added to DT/desktop and synced to DTTG. It was never indexed.

Kindof: I kept my WebDAV sync and added CloudKit. No error messages so far. Have turned off Cloud Kit in the meantime.

This is on the Mac, right? And where do these files reside, both in the same folder inside the database package or in different sub-folders? A screenshot might help tremendously.

With DTTG 2 and 3 installed on a total of 4 ios devices with multiple sync locations, a certain number of pdf/images/text flies show 0 byte/corrupted file on both versions on all devices . That means that corrupted files “traveled” to different sync locations

1 Like

Not behind my mac currently but I’ve done the following.

  • Create smart group, scope all databases, size 0 bytes
  • wrote down what files were 0-bytes and where they were located (say file A, database B)
  • closed DT
  • created a copy of database B and a copy of a previous backup of the inbox dating a week ago where those files were before they were moved to database B

Results Inbox backup:

  • folder ‘File.noindex’ (I think, I’m doing this from memory)
  • subfolder pdf
  • search for filename A
  • filename A present once

Results database B:

  • folder ‘File.noindex’ (I think, I’m doing this from memory)
  • subfolder pdf
  • search for filename A
  • identical filename A present in folder ‘number’ and folder ‘letter’

The two files have identical names, but obviously differ in size. One being 0-bytes and the other the size as expected from the backup. But also their creation/modification/addition date are different. One of the two have all dates go back to 2019. One of the two have all dates go back to november last year. If I copy the files to the documents folder, Finder creates two files but adds a number to one of the two copies, proving their name is identical in the package.

The PDF document was created externally by some company and I certainly could have imported that file previously (perhaps explaining the ‘UUID already present’ error).

Addition: if my findings are correct, the files aren’t necessarily corrupt, but duplicates with 0-bytes exist that are referenced wrong by DT. I.e. DT shows some files being 0-bytes, but their original counterpart seems to still be present like matter and anti-matter :grinning:. But this is just a hunch of course.

I enabled CloudKit alongside iCloud (legacy) on my iPad (but not on my iPhone which still only runs DTTG 2) and downloaded the whole database with the intent of syncing all files again with CloudKit.

I’ve got at least a dozen PDFs that were uploaded from and synced between two mobile devices (DTTG 2) in the past and were completely accessible on both devices, then. All of a sudden they were displayed as corrupted in DTTG 3 without being opened at all and no error messages*. After going through the download process of DTTG 3 they were also displayed as corrupted in DTTG 2 on the device they were synced to (iPhone) but are still accessible on the device I uploaded them with (iPad).

To be clear: The corruption occurred before I could activate the sync via CloudKit during the process of downloading the very PDFs from the sync store.

*When I tried the OCR function and put DTTG 3 in the background while recognising a PDF the document got all split apart into zero-byte files and I received the very same error message you mentioned; this did not happen when I kept DTTG 3 open—haven’t reported this, yet.

Edit: The statement above corroborates #5 of your theory but I want to object against #1 because the PDFs I mentioned were definitely not kaput before—they got synced correctly and became damaged only when they were downloaded by DTTG 3.

1 Like

Don’t know whether it has been already established, but this bug seems to affect at least .png files, too.