Files lost again

OogieM · August 12, 2021, 1:06pm

Well it happened again, No warning, database passes all verifications but suddenly there is a zero byte length file in my system. This time at least it was a file I was using regularly and I found it out quickly so I got it back from a backup.

The item in question was created on 03 May 2021 and went AWOL between Aug 7 and today.

I cannot send you the item or more info as the data are proprietary.

But this is the same ghost file bug that got me with over 500 files lost.

cgrunenberg · August 12, 2021, 1:08pm

What kind of file? Is there enough disk space or is this an encrypted database? Did DEVONthink freeze or crash?

OogieM · August 12, 2021, 1:27pm

Plain text
997.1 GB space available on my disk
unencrypted database

neither, I got the report on the zero byte length file from the smart rule I borrowed from a post here and went looking. Located the file in finder by doing a show file in finder, the did an open package contents, navigated to the correct folder and found the file. It had azero length in finder and contensts were blank when opened in Text edit.

Went to an older backup that had been created with an export daily backup script out of DT and found the file intact by opening package contents and looking in the same folder structure where the empty one was. Copied the contents using text edit and promptly went into Obsidian and put this critical file there.

Then deleted all references to that file within DEVONThink.

Please note that it was nothing that DT did that indicated I might have a problem. It was the custom smart rule that a user here created that alerted me to the potential problem which I was able to recover from fairly easily because I found out in time to go back to a backup and get the contents out of the file.

cgrunenberg · August 12, 2021, 1:30pm

As plain text documents can be empty (e.g. new ones), such files do not cause any warnings. Do you know which action caused this?

OogieM · August 12, 2021, 2:25pm

No, It was a file I have been using regularly and needed to reference again today.

I described the problem to my husband and he reminded me of a problem he saw back with a CPM system where during an update the directory on the file system of what locations were part of the file got corrupted due to a bug in how a board design app did updates of a file. To determine the real cause took looking at the actual on disk bit level contents of the entire file and tracing where the length got set to zero but the directory of the segements that made up the file on disk was still intact. The cause related to the way the app handled file updates but it was a very intermittent error. His suggestion was to have someonoe with an intimite knowledge of exactly how the Apple file system stores the details of where file bytes are located wring out bit by bit a disk with the problem and see if the that sheds any light on the issue. That’s a summary of the discussion. But it’s a reasonable place to look. The question is whether the corruption starts on the Mac or on the iOS system. I did sync the database with iOS since the backup twhere the file was good. Because of the issues with DTTG and DT I have avoided using it as much as possible.

rmschne · August 12, 2021, 2:29pm

interesting. hopefully you have good backups.

OogieM · August 12, 2021, 2:35pm

I did. I actually had good backups before (going back 2 years) but lost files were still gone from them. IN this case I was able to recover the data in the file.

cgrunenberg · August 12, 2021, 2:48pm

One idea coming to my mind is to add a smart rule like this to all your Macs:

Robejazz · August 13, 2021, 11:52am

I just added that, thanks ! How do you get NAME after warning ?? By the way, I made 2 rules… the above rule and one that scans on startup. Is that OK ?

Anything other rules I can run to ensure no data loss?? I solely Index

cgrunenberg · August 13, 2021, 12:08pm

Placeholders like Name can be inserted via the contextual menu. And two rules are of course fine but unnecessary. Rules can be triggered on multiple events, see + button to the right of After Saving on my screenshot.

OogieM · August 13, 2021, 1:01pm

Added.

I just want to point out that the horse has already left by then and this is attempting to close a barn door left open. I’d really like to see some sort of action on DEVONThink’s part to try to actually locate and fix the problem not just put a bandaid on what has clearly been a longstanding bug.

Further information to a similar bug that happened years ago in a another system and the methods used to determine the cause and fix it as I described above should be attempted on this one.

As a thought on how to try to trigger it. Create a new DT database on a thumb drive and use it until you see the problem then send the thumb drive off to the file system expert for a bit by bit evaluation to try to determine what happened might be a good start. OTOH if the issue is related to old database that have been converted over time then that won’t work. Then you might have to dust off a DT rev 1 or 2 build and use the database for a while then migrate it up the revs and see if that causes the issue.

I find it rather amazing that none of the people actually working at DT have experienced the problem.

Another question, for those of us who have had it happen how old are our databases?

In my case the primary database that shows the problem was created back in 2013. Given the age of the bug if all you have are more recently created databases maybe you will never see it. I don’t know I’m just speculating but I’d like to see some sort of reports and updates on the work being done to try to determine the cause rather than the silence we get here on the forum.

rmschne · August 13, 2021, 1:14pm

FYI, the last place I’d put a database or any important set of files would be on a thumb drive. Not the most reliable bit of kit on Apple’s OS.

Perhaps a better target for this very low level investigation would be Apple.

BLUEFROG · August 13, 2021, 1:18pm

Create a new DT database on a thumb drive

Is this something you’re actually doing in practice??

OogieM · August 13, 2021, 1:58pm

It’s a debugging tool not a use case. Trying to help give some ideas on what could be done. Since a reasonable suspicion is something in the very low level hardware and how DT tells the system to update files it almost looks like a pointer to the table that indicates where things are gets corrupted or that the byte length gets set to zero or something.

OogieM · August 13, 2021, 1:59pm

Not as a regular case no, I was offering it as a debugging tool only.

OTOH I do have use cases where having a database on an external drive that is portable is important. I just have already moved all of those items out of DT as I can’t afford to risk them with what has become, for me, an unreliable place to archive data.

BLUEFROG · August 13, 2021, 2:07pm

That’s good to know as a thumb drive isn’t appropriate for daily or long-term use with a DEVONthink database.

Portable hard drives are a much better option, but all hardware is susceptible to an eventual demise. I switched from Seagate to Western Digital externals three years ago as I had three Seagates spontaneously die of the dreaded “stuck head” issue in a very short time. Seagate’s QA seems to have suffered in the past 10 years.

OogieM · August 14, 2021, 1:35am

But I find it exceeedingly unrealistic to claim that files lost in a single application over time are all related to a hardware failure that has never caused a problem for any other program or files.

When will DT admit that there is a serious, if rare, bug that totally undermines their entire premise for the software. That it doesn’t affect more people or files is scant comfort. You need to admit there is a problem and explain to us what you, as a company, are doing to attempt to determine the cause not just keep foisting off platitudes that are irrelevant to the situation at hand.

Yes, in this particular instance I was able to recover, what about the poor suckers who depend on your software to store critical data and have no clue that behind the scenes valuable files may go missing and the only recourse is that you’ve found it in time to recover from your hopefully sufficient backups.

Blanc · August 14, 2021, 7:19am

From the information you have produced, I’m not sure it’s fair to assume there is such a bug. What I’ve read leaves the possibility open that such a bug could be responsible; it doesn’t exclude other possibilities.

Again, not enough information to verify that; the probability must relate to the number of files, their size, the number of accesses etc.

I’m not saying there is no bug; I’d be somewhat surprised if it were related to the “ghost file bug” as known, because the number of people previously affected and the number now affected are disproportionate.

What am I saying? There is a lack of useful information; in my opinion both you and DT probably need to look into this. Stopgap measures can help pinpoint the problem. As such, you may be interested in a script I implemented when the original “ghost file” issue surfaced. It has, to date, not turned up any problems on my devices.

Addendum: I’m grateful to any user working toward making DT an even better product - let’s all work together with that goal in mind

OogieM · August 16, 2021, 6:24pm

Well here’s a few quick numbers for you, my current active imported files databases have
3170 items
81333 items
5712 items
respectively.
Age of those items goes back to my first instance of DEVONThink back in 2010
Sizes of files varies widely from as small as 2 bytes to as large as 47.4 MB
Access is almost impossible to determine. I have archive files I only acess once every 5-9 years and some things I look at and work with multiple times a day.

I did look at your script but have not implemented it.

I’m giving up on DT and moving files out. I can’t risk data loss I do not catch in time to recover and the initial failure with over 500 files lost was the straw that broke this camels back. I only pointed out the most recent issue because the error or problem STILL HAPPENS!

InnerPortal · August 17, 2021, 4:18am

Well, FWIW, after seeing this I realized that this has happened to me before as well. At least three or four times that I can remember since I started using DT about 1.5 years ago. In those cases I just assumed I had done something stupid and simply restored from backup and moved on (hourly backups FTW!).

And, like @OogieM, I ran verification on the databases (I do so periodically anyway) and found no issues at the time. Not sure where I would go from here!