better crash recovery

erico · October 22, 2010, 1:38am

Dear all,

So I just had my primary DT research database get corrupted, and have spent some twenty four hours in a sheer panic, trying to save the three years worth of scholarly and teaching work that have been poured into it. I’m still trying to figure out the best way to fix things, but one thing is clear: when one has a package that is some 65 gb and something goes wrong with it, it’s a nightmare, no matter how well things are designed. One hopes that nothing ever goes wrong, and generally I have had no trouble with Devonthink. But given that the virtue of DT is that it allows one to build HUGE databases, the other side of the system is that when these fail, there needs to be multiple levels of support to get one’s data back.

So here are are a few thoughts on this, with some suggestions for features and safeguards that might help prevent anyone else having the day I just had…

The new DT Pro 2 file format, which stores everything in regular files is obviously superior to the old DT format. You now have some hope of getting your data back if DT becomes corrupted. The question is whether your organization of the data and your metadata can be recovered…
The problem is that if you have organized all your data in DTPRO, those files (for me, some 80K files) are very disorganized in the package, and it will take months if not years to resort a raw dump of your files.
just because your database passes the “verifying” check at the beginning of devonthink’s startup or opening of databases does not mean that your database is clean. Mine had some 400 errors in it, but I had not noticed them, and it’s very hard to use time machine to keep restoring versions to guess when the whole package might have been good.
I had thought, foolishly, that a “rebuild database” command would always work to rebuild even the most corrupted database. I was wrong. I have had the rebuild database commmand crash on me 20 times in the last 24 hours, running it on different iterations of the recovered file on two different computers.
Given that “rebuild database” can crash, it is really too bad that there is no safeguard built into the rebuild command that would allow it to start over where it left off after a crash. If it takes three hours on an octo core machine to rebuild your database, and it crashes half way through, this can add up to days of lost time. It was unbelievably frustrating to see DTPRO create 215 GB of stuff in the “Recover Folder”, start to import it to a new database, and have it crash! I literally couldn’t get the program to get that far again. I would really like to see the rebuild database process be able to recover from its own crash and continue the rebuild once the program is restarted (and if the problem comes from a particular file, it would be so nice to notice the user and go on with the rest of the rebuild).
Time Machine support? It would be really, really nice if DtPro had some sort of time machine support built into it. I actually had a (desperation) dream about being able to just click on time machine inside a DTGroup and have it show me the deleted or missing versions of the file and allow me to recover them from a TM backup. This would make good sense for a program that uses an intentionally uninterpretable filing system.
I found that back-in-time from Tri-edre software is the only thing that stood between me and suicide. It allows one to go through time machine backups and pull individual files much faster and more readily than does apple’s interface, which works very poorly inside DT’s packages and their indecipherable folders. Too bad back-in-time is not scriptable, or I’d also be able to use it to recover each “missing file” in DTpro from the backup as I find it, sending it the full path. Instead, I have to copy the path, do a search, and then put the file into place inside the DTpro package. It’s tedious surgery, and kills LOTS of time. If you have 400 missing files, you have to hunt through them in DT and them search for them in Time machine or BIT, you can drive yourself nuts. I still have a long weekend ahead of me.
I really, really wish that DTPro, when it verifies a database and finds missing files would tell me at startup. I also equally wish that it would create a “group” containing aliases to all missing files when it finds them, so I don’t have to hunt around for them, and slowly discover my data rot. This is imperative for those of us who don’t have 50 TB of backup space, because it is important to catch the deleted file before it gets pushed off the back end of a time machine expiration cycle. Please, I am begging here—and consider this an urgent feature request, along with resumable rebuilds.
If there was a way to tell dtpro to log the entire rebuild process to the console or a log file, so I knew which file it died on, that would be nice.
I do wonder whether there is enough rebuild information inside dt’s file system to actually rebuild everything from just the raw data. I would think that one should at least theoretically be able to destroy all the dtMeta files in the package and that the devontech_storage files would have enough information to rebuild the database from the nodes up. Nope.
I wonder if the problem with #10 has to do with nested replicants. I have quite some number of folders that have parents nested in children that are in turn replicated in the parent. In other words: contains replicants to who in turn have replicants of in them. Is this a no-no and is it what is causing my database to not rebuild? If so, this should be fixed or not permitted (I prefer the former of course )
Please, please consider a preference to give a “full report” of database checking at startup. I want to know the second my database goes bad, so I can fix it, rather than back up bad over top good.
I notice that my devonthink tends to crash a lot when the memory allocations for devonthink climb over 2.83GB. Is this a magical number for malloc failures? I wonder if DTPRo 2.0 has been adequately stress tested under low memory conditions. I tend to think my large database might be hitting the limits, but I wish I knew what the limit was, rather than imagining it a dark scary beast that will just kill me when I step off the magic trail.

Okay that was a rant, but hopefully a helpful one. I wanted to type it up while all this trama and injury is fresh in my mind. In short, I am hoping for :

an option for a “full” verify and repair at startup
an automatically aliasing of all “missing files” at said verify
a resumeable rebuild command
time machine support
a complete node recovery “salvage” command that will rebuild dt’s storage and replicant structure.

some guidance as to whether there are storage limits to dtpro under 32 bit malloc, or whether I just have “gremlins”. I need to know!

Don’t get me wrong, I love this program. But I’ve just been through a train wreck, and would like to my part to prevent future disasters for myself and others.

best,
eric o

p.s. In order to be maximally constructive, I will post a note below this one giving some tips on how to rebuild when the rebuild command doesn’t work.

korm · October 22, 2010, 11:49am

ouch ouch ouch this was painful reading, Eric. I’m glad you’re back up and running.

You mentioned using Back in Time (I agree, a great app) to grab files from Time Machine and put them back into the database package. Did I misread that? My understanding is that putting files into a database package directly, without using the DT interface, does not work well for the AI and other purposes. Is everything working ok?

I agree that the chore of building a meaningful folder structure from the obscure folder structure inside a database package is daunting if not impossible for large databases. I’d like to see an option for a kind of backup or links export that creates a folder structure in some destination that mirrors the group structure in a database and for each record in each group contains an alias back to the original document inside the package. Not export the file, but export an alias.

I also wonder if your experience the index/import question in favor of indexing?

sjk · October 22, 2010, 6:41pm

This topic remind me that I still don’t have a good sense of when to run Tools > Verify & Repair. Built-in help only says:

Use this whenever you feel it is necessary.

Hard to believe the choice of running what may be a crucial command is supposed to be based on a feeling.

erico · October 22, 2010, 6:48pm

Korm,

Thanks for your reply. I had a guess that this post might pull out a few of the devon die hards, among which I humbly like to think of myself.

To answer your main question—would I be better off with indexing? No, I don’t think so, because I use DTPro to modify the contents of folders all the time. And there’s still no two way synchronization. So I don’t think that would work. Also, as I tried to suggest, I do have some circular structures in my replicant system, which can mess up many command line systems of backing up the file system, though I haven’t tried it with Time machine (which should of course understand hard links, as it relies on them). I look forward to the day when devonthink does have two way live file-system synchronization, but I know it is hard to implement AND DtPro does what I need it to now. I also wanted to note in my rant-ish documentation that the situation with backups now is 1000x better than with the old DtPro. Those big blob files were absolutely opaque, and caused backups to expire very fast. Daily work on Dtpro now eats about 2-3gb of data churn, as opposed to 60-100gb. It was literally impossible for me to back up the old system daily using wifi. So there’s some continuing advantages and some real improvement with having files “in the database.” That of course and terabyte drives have become cheaper.

As for recovering files into the folders, you are right to say that you better know what you are doing. And it makes no sense if DTPro has lost all reference to the file. But if one has a bunch of “missing files” as I did, which is to say files that DT still knows about but which were deleted in the file system (in my case because of a crash that occurred in moving between two databases), THEN it makes perfect sense: put the file back in the place where Devonthink expects it, and poof, everything is back the way it should be. (I suspect that the worst thing that can happen is that it doesn’t work and creates an “orphan file” on the next rebuild, but I’ll let a true expert weigh in on that). Just for reference, I’ll post here an applescript that helps use time machine in this situation.

The big problem I was having was not the missing files. I consider that problem serious, but it was the problem that led to the big problem: the failure of the rebuild command on my database. Nothing could get that to work

What I ended up doing was manually rebuilding my database. Dtpro allows for two databases to be open. So I created a new one, and hand copied folders over, one at a time, with hour pauses in between for dtpro to digest the data. It was through this process that gained some sense of why the circularly nested groups (#10 above) might have caused problems. At any rate, I hand copied all the root folders over to a new structure with only a few crashes. After this manual rebuild, I was left with 12,000 duplicates that did not exist in the previous database—these were replicants that were duplicated as a product of the hand-copying method (each time a file was copied in, a duplicate was created, rather than being mapped as a replicant.) To solve this problem, I used my old “convert duplicate to replicant” script pretty liberally on the Duplicate smart folder. I enclose that script below, as it was posted to the forum years ago, and I couldn’t find it quickly. Might be be useful to someone!!!

But, finally, to go back to your main question korm: My hope is that someday dtpro will have full two way syncing with the file system. It would be nice if in the mean time there were an aliased structure in the package that reflected at least the basic root structure of the db. Resumeable rebuilds, an option for a fuller data integrity check, more console logging on rebuilding, the automatic aliasing of missing files (like those of orphans) and (why not?) time machine support would all really help me sleep more at night.

Does anyone else have these problems or additions to this list? I feel invigorated in that way that sheer terror puts one on edge!

best,

Eric o


----point time machine to path of DtPro item by Eric Oberle
----This script queries devonthink for the path of the current document.
---It then opens a finder window pointing to that path, copies the file name to the clipboard  
---and seeks to activate Time machine on the finder window pointing to the DT 
---file location.   The author found this useful in recovering "missing files" in devonthink
----inside the package.   This is to aid in painful database file rebuilds.  It is not for the meek or unknowledgeable or for those without backups.  Be very, very careful.  

tell application "DEVONthink Pro"
	set cursel to selection
	set z to first item in cursel
	set the_filename to path of z
	set {the_path, the_file} to my reverse_truncate(the_filename, "/")
	set the_path to the_path & "/"
	set the clipboard to the_file
	tell application "Finder"
		set the_dir to the_path as POSIX file
		make new (Finder window) to the_dir
	end tell
	try
		tell application "Time Machine"
			activate
		end tell
	end try
end tell

on reverse_truncate(this_text, search_string)
	---lops of all characters in string after given character
	set inverted to reverse of characters in this_text as string
	set {the_path, the_file} to my truncate_up_to(inverted, "/")
	set the_path to (reverse of characters in the_path as string)
	set the_file to (reverse of characters in the_file as string)
	log the_path
	return {the_path, the_file}
end reverse_truncate


on truncate_up_to(this_text, search_string)
	if this_text contains search_string then
		set save_delims to AppleScript's text item delimiters
		set AppleScript's text item delimiters to search_string
		set the item_list to every text item of this_text
		set truncated to item 1 of item_list
		log truncated
		set item_list to rest of item_list
		set remainder to search_string & (the item_list as string)
		log remainder
		set AppleScript's text item delimiters to save_delims
	end if
	return {remainder, truncated}
end truncate_up_to


---This script turns the duplicates of selected items into replicants of the same. 
---This is as safe as the DTPRO duplicate detection system is accurate.....be careful!
---important: do not try to use this on .doc files stored in the database--it will turn them into rtfs!

tell application "DEVONthink Pro"
	set cursel to selection
	set m to first item of cursel
	set p to first parent of m
	
	repeat with the_item in cursel
		set the_dups to duplicates of the_item
		set the_path to path of the_item
		
		----a cautionary addition for dtpro2:  if path is null, then the record is invalid. 
		---- so if we have a "duplicate" that has a path, then we can delete this bogus file
		---and replicate the one with a path and data to its parent.
		if the_path is "" then
			set the_dups to duplicates of the_item
			if the_dups is not {} then
				repeat with this_dup in the_dups
					set dup_path to path of this_dup
					if dup_path is not "" then
						set the_parent to first parent of the_item
						replicate record this_dup to the_parent
						delete record the_item
						exit repeat
					else
						set label of this_dup to red
					end if
				end repeat
				
			end if
		end if
		
		if (count of the_dups) is greater than 0 then
			----go through duplicates and replicate original item to parent locations of dupes
			set the_dups to duplicates of the_item
			
			repeat with this_dup in the_dups
				set the_dup_parent to first parent of this_dup
				set x to get name of the_dup_parent
				log x
				set the_rep to replicate record the_item to the_dup_parent
				delete record this_dup
			end repeat
		end if
		
	end repeat
end tell

erico · October 22, 2010, 7:02pm

Yes. This sums up the prevention side of the problem.

In my case, it was a very, very bad feeling that came too late. I’d love to see this automated—I’d be happy with everytime at startup with a cancel button. That would save the scary feeling and the feeling that maybe I should be having that feeling! I’d just go make some coffee and feel better while I waited!
-Eric

sjk · October 26, 2010, 5:00pm

Perhaps with a “Run Verify & Repair” checkbox under Startup in Preferences > General, although that could be icky if multiple databases open at startup.

The documentation says:

By default, DEVONthink Pro Office automatically verifies the database structure every time you open a database, and advises you to run this command when it finds significant errors.

… which seems more reasoned than the statement preceding it:

Use this whenever you feel it is necessary.

But apparently there are other cases to feel it’s necessary than only when DTPO advises me it is?

korm · October 26, 2010, 5:11pm

Interesting, that quote. I’ve never been prompted with advice to run V&R, even though I’ve run V&R and come across some nasty troubles in the past. IMO, if a system runs a utility such as V&R it should anounce what it is doing, and show the results somewhere. Otherwise, how do we know that V&R is actually happening in the background?

eboehnisch · October 29, 2010, 10:33am

You would not be told to do so but DEVONthink would report the found problem and advise you to rebuild your database. The checks are not as thorough as Verify & Repair but usually enough to catch the usual problems e.g. generated through frequent forced quits.

korm · October 29, 2010, 11:21am

I doubt most of us can limn the subtle differences between “verify & repair” (the heavy duty explicit command) and simply “verify” (the light-duty variety performed on startup). Perhaps the manual ought to explain what’s going on?

(FWIW, I’ve tested “frequent forced quits” and never been warned about errors by the startup variety of “verify”.)

cgrunenberg · October 29, 2010, 11:25am

V2.0.6 will actually just claim to be “Initializing” a database to avoid further confusion.