Continuous saving?

jrothman · May 24, 2004, 1:48am

One feature I’d really like to see is continuous saving of the DT database. It seems really silly that I can be working on a document, save it (using Command-S), and then, after a crash caused by another program, re-open my database to find several minutes worth of work missing. I’d (somehow) been under the impression that this had been implemented in the newest build, but it just happened to me in one of the betas.

If this could be improved it would make me a lot more comfortable using DT.

sjk · May 24, 2004, 4:56pm

Yesterday morning my eMac crashed and DT’s database was corrupted after restarting, requiring me to rebuild it. I have the “Automatic flush to disk every 5 minutes” preference set, but hadn’t added anything new to the database for at least 30 minutes before the crash so I was a surprised (and disappointed) about this corruption during what I thought would be a period of inactivity. Fortunately it was okay after two other system crashes during the week.

I’ll have a replacement UPS tomorrow to protect the system against more power outage crashes but I wish something could be done for improving the database integrity in a situation like I’ve described.

milhouse · May 25, 2004, 12:03am

Yes, I need a very stable app/database in which to save my coming PhD work.

As much as I love DT (pro), the stability and robustness must be a no brainer.

Bill_DeVille · May 25, 2004, 1:19am

After going forever without DT database corruption, it happened to me last week – a Safari crash/system freeze while DT Pro was optimizing and backing up the database (the worst case event, as the backup was also corrupted). I did have a 4-day-old external backup of the database, but I had done a lot of work that wasn’t in the backup file.

I use a TiBook, so don’t have to worry about power outages, of which we’ve had several in the last few months. I even have my PowerBook AC adapter plugged into a UPS!

But my very large database was rebuilt without incident, although I found it best to do manual group-by-group export, and imported into a new database. DEVONtechnologies has done a great job with the rebuilding/recovery tool.

And since I reported my problem to Christian, the latest alpha of DT Pro has a modification of the Backup & Optimize tool that makes backup corruption very much less likely in a situtation like the one that bit me.

Bottom line: I trust DEVONthink even more after my corruption incident. The experience of recovering everything (nearly 7,000 files) after a bad crash was very reassuring. DT is as bullet-proof as they come. My data is critical to me, and it’s in pristine condition again.

Of course, I’ll continue to do external backups – my hard drive tests out OK, and I’ve never lost a drive in many years of using Apple computers, but the worst could happen.

milhouse · May 25, 2004, 6:59pm

Thanks for the report Bill.

sjk · May 25, 2004, 7:06pm

Today I noticed a top-level NetNewsWire.app group that looked like it had been imported from /Applications and the entire app tree was browsable in DT. Seems peculiar since I’ve never entered anything from /Applications in DT.

Last week I copied the database from my iBook to my eMac and am wondering if rebuilding it after the system crash (explained earlier) was confused by something in it that had originally been created on the iBook. The directory hierarchies on the two systems aren’t identical and it’s likely something in the database referred to files under ~/Documents that aren’t (yet) on the eMac.

I’d like a way to export the History in plain text format so different “snapshots” of it could be compared (e.g. with the diff command). That would provide a crude way of checking for missing or new items in the database if rebuilding was necessary. Or some other mechanism that makes it possible to compare item lists from different points in time.

Bill_DeVille · May 25, 2004, 9:02pm

sjk:

I did the database rebuild by exporting a group at a time, then doing a quick look at the exported folders to make sure they contained everything, including all the subgroups and their contents.

I’ve got 177 groups (including sub-groups), but only 25 groups at the ‘root’ level, so this wasn’t hard to do.

Then I created a new, empty DT Pro database and dragged the group folders into it.

I’ve got a NoteTaker notebook with logs of many hundreds of the items entered into DT. I went back and selected at random 20 files to check; all of them were in the new database. Also, the number of files in the new database agreed with the number of files in the old database, so I can conclude everything is there. All the links to external files, such as PDFs, that I’ve examined are working.

Took a while to do it this way, but I’m pleased that there were no surprises or lost material.

Because my database contains links to rather than inclusion of PDF, Word and some other file types, it’s not as portable as yours. One of these days, I may replicate my database with PDFs and so on captured into the DT database. That would let me take advantage of the new DT Pro feature of storing databases on optical media for distribution to other DT Pro users.

I second your vote for some way of capturing History files so that comparisons between databases could be simple. Christian, how about that? After your holiday, of course. :)

sjk · May 26, 2004, 3:19am

(emphasis mine)

I’m impressed. How could you be certain they contained everything?

My root- and sub-group numbers are about the same as yours, with around 2500 total items. That’s was too much for my unphotographic memory to keep track of and know for sure if all groups (much less individual items) were recovered. Any missing older items could easily be overlooked, which was the motivation for the "exported History" suggestion. Whether or not the data itself is still intact is another matter but at least with items names there’s more possibility to recover it from original sources. Logging those numbers automatically at regular intervals might be a nice feature.

I’ve occasionally saved screen captures of Database Properties to track the numbers over time, but that’s too dynamic to be helpful in a post-crash recovery scenario, unless there was a drastic difference that would be noticeable anyway. At that point recovering a backup database rather than salvaging the current one becomes a more likely alternative.

After my last post I saved the History as a PDF (18MB!) and converted it to text with "pdftotext -layout History.pdf History.txt" (same utility that DT uses). After cleaning up extra newlines, page breaks, and duplicate entries I was amazed that the "wc -l History.txt" result exactly matched the total number of entries in the database (w/o groups). Later I’ll do that conversion with the last pre-crash backup database and compare it with the current item list.

So, I’m satisfied that’s a usable hack until Christian implements a more elegant solution for item History memory.

Bill_DeVille · May 26, 2004, 2:59pm

sjk:

After exporting a group I looked at the folder in the Finder, including number of items. Should be one larger (for each of the folders/subfolders created) than the number of items in DT groups, as DT exports a file into each folder that contains database information.

True, I didn’t actually compare item numbers for all 177 groups, but I did inspect about 12 fairly carefully and spot-checked many others. Found no discrepancies. If I had, I would have dug deeper to compare the groups/folders.

So I made; a total of three checks on the completeness of my new database:
[1] Random checks of files that I knew should be in the database (all were found);
[2] Quick comparisons for all groups/export folders, and number of items checks on about 8% of the groups/folders (looked OK); and
[3] Comparison of item numbers, e.g. Rich Text, HTML, in Database Properties for the old and new DT databases (looked OK). (Words count for the new db was larger than for the old db.)

These steps gave me a pretty high level of confidence in the integrity and completeness of the new database.

I did find a number of duplicate PDF files in the new database, that should have been replicants rather than duplicates. Spent about an hour looking for such duplicates, deleting them and replacing with replicants of the files. (Note: most items set up as replicants in the old db were properly recreated as replicants in the new db.)

Your suggestion of History text files that could be examined for differences is great, and would have saved me some time.

sjk · May 30, 2004, 8:47pm

Hopefully that’s not something to do too often, if at all.

After my crash I used Rebuild Database in DT, which was a one-step Export/Import operation. It also preserved pre-rebuild database in …/DEVONthink/Backup~. This morning I temporarily installed that backup database and exported everything, then exported everything from my current database and compared the exported directories with "diff -r …". Some of the RTF files had changes like:

4c4
< {\colortbl;\red255\green255\blue255;\red0\green0\blue255;}

> {\colortbl;\red255\green255\blue255;}
28c28
< Feedback? \cf0 \ulnone Late-breakers@macfixit.com\cf2 \ulnone \cf0 .\

> Feedback? Late-breakers@macfixit.com\cf2 \cf0 .\

Otherwise the content looked fine.

And I noticed the mysterious NetNewsWire.app group creation seemed related to a link item with a path in that app’s hierarchy (keyboard shortcut list), which I’d forgotten about. The link item looks fine, but maybe NNW wasn’t installed on the eMac when DT crashed. It’s a minor thing since only a bogus group item was added and no items were damaged or deleted.

Anyway, after this experience and the results I’m satisfied with the integrity of rebuilt database.

cgrunenberg · June 6, 2004, 12:06pm

Version 1.9 will be probably one of the most robust databases for OS X available. However, this will never be a no brainer as filesystems can get corrupted, harddiscs or computers may fail and bugs of OS X or third party extensions might cause troubles too.

We’ve experienced this in the past and some of our users too. Sometimes we were able to recover the data, sometimes not. Therefore it’s always a good idea to be prepared for the worst case scenario.

cgrunenberg · June 6, 2004, 12:07pm

Version 1.9 will remove the auto-flush preference and update the database files every 3 minutes (as a lot of users haven’t activated this option). And DT Pro will save all changes immediately.

cgrunenberg · June 6, 2004, 12:11pm

Such a problem is usually caused by the disk/filesystem cache of OS X (especially as this cache is part of the virtual memory system). However, version 1.9 will force OS X to flush its cache after every update of the database files and this improvement should make such troubles much more unlikely (in fact almost impossible as long as the filesystem and/or the harddisc are more or less intact after a system failure).

sjk · June 7, 2004, 8:16pm

That’ll help avoid the trouble I had, especially with the system on working UPS backup again. Btw, do DEVON apps exit cleanly when sent a SIGTERM signal?

cgrunenberg · June 8, 2004, 7:32am

Just checked this and seems that this is identical to quitting the application (probably Cocoa or OS X handles this).

sjk · June 8, 2004, 7:38pm

Thanks for checking that. I’ve added “killall -m DEVON Devon” to the /usr/libexec/upsshutdown script for a clean database shutdown if insufficient UPS power triggers a system shutdown.

cgrunenberg · June 8, 2004, 8:41pm

I was wrong - thought that the Activity Viewer command “Quit” sends a SIGTERM signal but this doesn’t seem to be the case as using killall or kill (using a SIGTERM signal) seems to be identical to “Force Quit”.

Using a simple AppleScript like

tell application "DEVONthink" to quit

or

osascript -e "tell application "DEVONthink" to quit"

is probably a better solution.

sjk · June 8, 2004, 11:23pm

Interesting. I’d have thought a normal Quit would be equivalent to SIGTERM and Force Quit to SIGKILL (aka, the infamous “kill -9 …”). Thanks for the AppleScript suggestion; I’ll use that instead.

Poetsfolly · June 10, 2004, 1:59pm

One form of continuous saving that I haven’t seen mentioned is keystroke logging. Adobe InDesign does this, and the /usr/bin/vi Unix text editor has done this for over 20 years. While editing a document, keystrokes and edits are logged as they occur (or in small batches for efficiency). If there is a crash or power loss, upon restart the application reopens the file and applies the edits. As long as the keystroke log is around, at most a few seconds of work is lost. For most applications these days, implementing this isn’t too hard since they already have unlimited undo/redo.

Particularly in the early days when Unix was still pretty unreliable, I depended on /usr/bin/vi pretty heavily because of this feature. Now I still use /usr/bin/vi, because I am a curmudgeon and proud of it! :-)

sjk · June 10, 2004, 7:52pm

Hmm. Are you sure Bill Joy’s original vi had “unlimited” undo and saved a keystroke log? I just remember post-crash file recovery (“vi -r …”) capability.

Continuous saving?

4c4 < {\colortbl;\red255\green255\blue255;\red0\green0\blue255;}

> {\colortbl;\red255\green255\blue255;} 28c28 < Feedback? \cf0 \ulnone Late-breakers@macfixit.com\cf2 \ulnone \cf0 .\

4c4
< {\colortbl;\red255\green255\blue255;\red0\green0\blue255;}

> {\colortbl;\red255\green255\blue255;}
28c28
< Feedback? \cf0 \ulnone Late-breakers@macfixit.com\cf2 \ulnone \cf0 .\