Database Corruption and Backups

Another Newbie question/comment:

I read on another forum the following quote: “If the DEVONthink database gets corrupted, you are screwed. Eaglefiler leaves the files intact but lets you search through them using the interface.”

Is database corruption a common thing or is this something that happens occasionally and can be avoided with backups? How often should a person back up his/her DT databases?

I was a little spooked by the above comment, and I wondered what DT users think about the storage differences between DTP and EF (i.e. EF leaves files intact).

Thanks,

Susan

Database corruption shouldn’t happen on a computer with a sound operating system and disk directory. The only time I’ve had to resort to a backup in more than three years was last year when I deliberately installed an extension that had caused problems for a user. It blew my database, too. But I had a current backup.

DEVONthink 1.x stores text files (including RTF and RTFD), HTML and WebArchive files within its “monolithic” database rather than in the Finder. If the database is corrupted, those files can’t be accessed from the Finder. Other file types including PDF, Postscript, images, QuickTime media and all “unknown” file types are stored in the Finder, inside the database package file. These, of course, are accessible even were the “monolithic” database to be corrupted.

DEVONthink 2.x will store all file types in the Finder. So files will be accessible in the event of database corruption, and there will be lower memory requirements when the database is loaded.

Most of the files in my main database are stored in the monolithic database, 15,988 of them as RTF or RTFD. I’ve been building that topical collection since 2002. I don’t lose data. I’ve got lists of file contents or additions from the past, and when I check the current contents against those old lists, everything is there. That said, the potential of data loss will be further reduced in DT 2 – but not enough to make me comfortable. Things could happen.

I’m a stickler for backups, and I initiate a backup whenever I’ve spent significant time and effort, rather than waiting for an automatic backup. Because a hard drive failure is always a possibility, I store Backup Archive files on external media, and I use Time Machine. Because my computer equipment might be stolen, or my log cabin might burn down, I periodically copy Backup Archive files of my databases to DVD and store them offsite. (The 20 GB size of my “me”/.Mac account isn’t big enough to hold my archive files.)

The value to me of some of my databases is much greater than the value of the computers that host them. I’ve spent hundreds to thousands of hours on some of them. As there’s no such thing as an absolutely bullet-proof computer, operating system, file system or storage medium I “buy insurance” through a simple set of procedures that are, nevertheless, much less costly to me than the annual cost of my property insurance premiums.

Is my data completely safe? No, it would be hosed if the Sun goes nova or an asteroid strikes. But I wouldn’t be around to worry about it. It’s those little things that I can protect against, such as a failed hard drive or stolen computer equipment. Several years ago the hard drive on my TiBook crashed; I had backup on an external drive. A few weeks ago the power supply on my Power Mac G5 blew; my important databases have external backup archives, and I’ll get that computer fixed under warranty. Meanwhile, I still have access to important data.

“Database corruption shouldn’t happen on a computer with a sound operating system and disk directory”

That’s a nice theory but I have experienced more than one instance of database corruption forcing me to go to the backups. As far as I know, my disk directory is fine as I periodically verify both the disk and the directory (using DiskWarrior). I find the DT database in its current form to be somewhat “touchy.”

A sound disk directory is important and I like DIskWarrior, also.

But the operating system is also important. I keep a pretty stock OS X. I’m always highly suspicious of extensions that modify it. So, under OS X 10.5.4 and on my primary work computer, I don’t have anything installed that uses an Input Method plugin, no Application Enhancers, no third-party QuickTime media plugins. No Safari add-ons. I always delete the Adobe PDF viewer plugin that Acrobat or Adobe Reader installs in Internet Plugins (it’s not necessary in Tiger or Leopard, and may cause problems).

After adding a large batch of new content, or every 2 or 3 days I’ll run Tools > Verify & Repair, followed by Tools > Backup & Optimize (or Backup Archive). I can’t remember the last time I’ve seen an error report – it has been a very long time.

Every week or so I run a C*cktail (the asterisk stands for “o” – the forum software thinks the name is a bad word) suite that includes clearing all cache files. Yes, that slows things down a bit at first run, but minimizes the possibility of errors resulting from corrupted or “stuck” cache files. I’ve got a stock set of font files, none of them corrupted.

Whatever, my databases are rock solid stable.

Your experience is not typical. I use DT in preference to the alternatives in part because of its reputation for stability, and it’s never let me down. If you are experiencing data corruption on a regular basis, there is something wrong with your system and/or your DT installation.

Katherine

"If you are experiencing data corruption on a regular basis, there is something wrong with your system and/or your DT installation. "

I didn’t say anything about a “regular basis”…I said “more than one instance” which means that it has happened, well, more than once. Its nice for you that you never had any problems, but every database I have ever used in 35 years of computing has corrupted from time to time and DT is no different. I do find your statement to be kind of outrageous. You have no idea what went wrong but you are positive it must be on my end? Interesting…I didn’t realize that DT, or any other product, was so perfect that anything which goes wrong must be the fault of the user.

“But the operating system is also important.”

No doubt but I am not so spartan about the way I do things and I do run some Safari addons, etc. I doubt that is the problem however. It is more likely that people use the program in different ways and that some unknown combination of uses can cause a problem at times. Either that or cosmic rays as they used to say.

Please read my statement again. I never said any such thing.

A system with data corruption has a problem and should be fixed. Without examining it further, there’s no way to know what the problem might be.

Katherine

I don’t think I’ve ever had any problems with DTP, and I would say that, on average, I’ve used it solidly for at least four hours a day over the time that I’ve owned it (since October of 2006, IIRC). I had a problem with exporting database contents and then reimporting them (the “comments” were lost. This was fixed in the next DTP update).

Back up. Back up religiously.

I’m definitely welcoming the new setup in DTP2, but it’s for speed and memory usage, not because I doubt the integrity of the database.

"A system with data corruption has a problem and should be fixed. "

There is no “system with data corruption”…there are only DT databases that sometimes and for some people become corrupt. There are many possible reasons why a database can become corrupt, some of which can be avoided and some of which cannot. Thats why there is no point telling somebody about your experience because that may not be his experience. As everybody says, back up and back up as regularly as you can afford to lose data. That goes for most things including word processors. I also have used DT just about everyday for years and 99.9 % of the time, its fine (although finicky) but I have had to go to my backups on occasion. If you haven’t, then great. My girlfriend thinks I’m insane because I don’t use Windows. She says in 5 years, she never has had a single problem with it and can’t imagine that anybody else would either. She insists that I must have been doing something wrong when Windows drove me over the brink.

There you go.

Hey, that’s funny. Thanks for the chuckle. :slight_smile:

Yes, I recall reading a couple of papers noting that cosmic rays are capable of causing memory errors, with calculated frequencies of such errors – fortunately, not too common. And there are times I’m almost convinced that gremlins are not mythical creatures, but really do exist.

Back around the mid-1990s the agency where I worked removed all Macs and replaced them with a truckload of PCs. The IT guys hooked up one of the PCs, with a big monitor, in my office. They turned it on, speaking to me in praise of the wonders of Windows. As we watched, suddenly the screen turned blue – the Blue Screen of Death. The IT guys went into a huddle, conversing among themselves in low voices. One of them restarted the computer. Success! They set up my Outlook email account and Intranet privileges. They started out the door. Then one of them looked back in time to see the screen turn blue. They replaced that computer with another PC, and set it up. A couple of hours later, while I was reading email, the screen turned blue.

I had a PowerBook 170 that I carried to my office every day (I had bought it for a science exchange project in Egypt). I had been accessing the Intranet through an ethernet jack in my office back in Mac days, but access via that port had been changed. So I went to see the last Mac guy in the IT group, just before he left to take another job. He gave me the access codes, so I could use my Mac just as before. The only thing I couldn’t do directly was access the Outlook database and calendar, but I had remote access via a Web browser, and high speed Internet access, so that was OK.

From that point forward, I turned on the PC when I came into my office. Sometimes I could get to email. But it always crashed, as many as seven times a day, even when I hadn’t touched it. As PCs got bigger and better, once in a while the IT guys would bring me a bigger and better PC, but always with the same results. I was amused. Did all my work on the PowerBook, which became a Blackbird, then a TiBook. I sat on the agency IT oversight committee, and built one of the largest databases on the Intranet. I never commented that this was done using a Mac, and with back door access to the system. Did a gremlin lurk in those PCs, or did I have a mystical, antithetical power over PCs that made them die? (It wasn’t just me. Computer problems were so common that the IT staff tripled in a 5-year period.)

I think the reason my girlfriend believes Windows to be so stable is that she almost never installs/uninstalls new software and refuses to download almost anything where as I am a software junkie who experiments with hundreds of software packages. Thats the kiss of death for windows as it gets clogged up with dll’s and a bloated Registry.

I think the moral here is “robustness.” Yes, its probably better not to touch the Mac OS in anyway, avoiding plugins, etc etc but what the fun in that? Databases, by definition being so highly structured, are also notoriously finicky and I am not always as careful with them as I should be. Perhaps I would have the pristine experience that some have had if I was as rigorous in the way I maintained my databases, but that won’t happen knowing myself.

So… BACKUP BACKUP BACKUP. I myself use Time Machine, and Super Duper with an onsite and offsite backup and I still worry.

My (very minor) knowledge of databases comes from using PHP and MySQL. Even there, in probably the simplest real database development environment in existence, it’s painfully easy to b0rk everything.

The problem with database software is that it seems like air traffic control software – any bug, anywhere, could cause disaster. If someone forgets to escape a quote, the whole thing is mangled.

My wife knows nothing about coding, and since I started on my newest hobby (thehumanresource.org/), I’ve been complaining to her that 90% of all programming is error-checking. Fortunately, I’m so lousy at coding that I have to check every line for proper syntax, so I feel relatively confident that I’m not going to be dumping people’s deepest thoughts into the bit bucket. At least not often.

Anyway, I did remember this morning that I had a database corruption problem with DTP. It followed such an unlikely chain of events, though (there was a bad hard drive and a power outage during an import), and I recovered the information and imported it into a fresh database, so I couldn’t hold it against DTP really.

Why isn’t there an option for Verify & Repair and/or Optimize to run automatically during automatic backups (if enabled)? Seems to me there ought to be some method for at least V & R to be automatic at regular intervals instead of having to do it manually. Or maybe I’m not really clear on its intended purpose?

Not sure I’ve ever seen one, though there was some corruption in a DT database on my iBook G3 after tinkering with Smart Group scripts several years ago that V & R didn’t catch. And I noticed a single item in another db that’s name had mysteriously changed to a timestamp.

Awhile back I wanted to compare my largest database with a restored copy. The Statistics in the Database Properties windows differed but there wasn’t (and still isn’t) an easy method for isolating those differences, which I posted about at the time. So if I ever had to do a real restoration there might be something lost in the process, which is a concern I hope DT 2.0 will address.

I’m a bit amused you’re conservative like this:

… yet still have a reason to do the C*ocktail ritual:

:slight_smile:

I’ve got several Input Managers, extra QuickTime components, Safari add-ons, etc. installed (but no APE). I’m not overly meek about what I install but don’t get careless and reckless about it. I’ve never had any system instabilities that would tempt me to run Cocktail-like maintenance apps. I rarely have the kinds of issues that some people will suggest “delete the preferences” (ugh) or “reinstall the app” (sigh) to attempt resolving.

I’ve done every major OS X upgrade as a clean install, which is when most of my housekeeping occurs. I’ve never done combo updates, like some people prefer. I’ll occasionally run Verify Disk in Disk Utility, but don’t share common misconceptions or believe ridiculous claims about Repair Disk Permissions doing magical things it’s provenly impossible to. And I run DiskWarrior when it’s convenient or have reason to suspect some fileystem errors.

I think it’s possible to overmaintain a system in ways that actually increases the risks of certain problems rather than decreasing them. And some people undermaintain their systems relative to the amount they “abuse” them. Even with skill and experience there’s no guarantee certain maintenance routines will be effective with particular system usage styles.

Scott, I’m in general agreement with you. I know about all the cautions recommending clean installs or combo updates of OS updates or upgrades. But my attitude is that if the existing OS is already in good shape, the easy Software Update installation will work just fine. I’ve never had any problems with that.

However, we’ve had several instances, one very recent, where the solution to weird problems (such as the inability of DT Pro to launch and open any database) was deletion of the DEVONthink app preferences and cache files. Cache files are a neat way of “remembering” a procedure, for example, so that the next time it’s run it doesn’t have to be reconstructed. That saves time. But once in a while a cache can get “stuck” so that it doesn’t act properly. Over the years, I’ve seen that in different applications. I don’t think I’ve ever had a corrupt DT preferences .plist file. But it happens to some people. I’ve never had a problem that could be solved by reinstallation of the application. But it has happened to some people.

There are a number of Mac models in use, a wide variety of applications and extensions, and a lot of individual differences in the ways people use and maintain their computers. All that can complicate support issues.

But I’m grateful that I don’t have to do support on Windows/Vista computers. :slight_smile: