Database integrity: Should I abandon DT as my database?

Despite gripes about the pace of development of DT products, and the oft expressed impression that user’s requests are not always deemed of high priority, I appreciate the power of the DT applications. However, I have recently discovered a serious deficiency in DT’s process of testing the integrity of the database. The Verify function checks the integrity (checksum) of items stored in the “monolithic” database, but does not test the integrity of items stored outside this structure. This includes, for example, PDF files and Mail messages. (I lost several months worth of messages because of this.) Because these items represent the bulk of my database, this is a serious concern indeed.

Annard and Bill were sympathetic (and responsive, as always), and promised a fix “in the future” (I am surprised that a fix was not issued for a problem of this severity). Bill admonished me for not backing up my data; I back up religiously. However, because DT did not alert me that records were missing, more recent backups overwrote the missing files.

Now I am at a quandary: Should I abandon DT Pro Office for an application with more reliable data integrity features, such as EagleFiler? An appealing proposition, especially with the advanced features in that application’s most recent release.

Actually, the Mail archives are rich text documents and are stored in the monolithic body of the database, so are subject to the checksum verification process.

Your PDFs are as safe as any other files in the Finder. But Backup Archive stores a copy of them, as well into a compressed archive file.

Over the past three years I haven’t had to resort to a backup except fairly recently, when I tested an Input Manager plugin that was installed on a user’s computer. It thoroughly bollixed my database.

Personally, I never depend on scheduled backups. Whenever I’ve been making significant changes to a database (including updating Mail archives), at break time I invoke Scripts > Export > Backup Archive. Takes about 5 seconds to start it. When I return from break the database has been verified, optimized and there are internal and external backups. Plus I keep a recent Time Machine backup.

Those external Backup Archive files are the smallest possible compressed and dated copies of the database. I want them because I can move them to an external drive, in case of something like a hard drive failure, a stolen laptop or whatever. On that external drive I’ve got a historical collection of states of my database. Once in a while I copy recent archives of important databases onto a DVD, which I keep at my bank. I’ve never had a house fire, or had all of my computer equipment stolen. But things happen. Belt & Suspenders! In the worst case, I’ve got my important data at the bank. Everything else, including Time Machine backups, might be gone.

Bill,

Once again you circumvent the issue. The issue is not whether or how I backup, the issue is whether DT does its job verifying the database. As far as PDFs go, it obviously does not.

I am also surprised by your claim that Mail is stored in the monolithic database. As per Annard (and a quick trip to the package’s content confirms this), mail imported directly from Mail (via the Add to DT menu command) is stored outside the monolithic database, and is not included in the checksum verification.

Ah. Your statement is true that importing a Mail message as the Mail file type into the database brings it in as an “unknown” file type that cannot be displayed in the database, nor can the text content be indexed for searching. All you can see (or search for) in a database is the name of the message, and any associated metadata in the Info panel. And yes, unknown file types are stored in the internal Files folder.

Is that what you’ve done? If so, the database cannot index, interpret or display the contents of those files. But they can be viewed under Mail. I haven’t experimented, but what happens if you click on such a file and choose Open With… Mail?

But DT Office Pro provides hooks to Mail that insert commands into Mail’s menu bar. Under Mail’s Message menu is the command Add to DEVONthink Pro Office. And under Mail’s Mailbox menu is a command Add to DEVONthink Pro Office. So if you select one or more listed messages and invoke the appropriate command under the Message menu, rich text versions of those selected messages, including images and attachments, will be stored in your database. The text of a message will be displayed and is searchable.

DEVONthink Pro provides simpler archiving of Mail to a database as a plain text version of the message, without images or attachments. The scripts to send messages or mailboxes to the open database are visible in the global Scripts menu, only when Mail is the frontmost application. The database will display the text of the message, and it’s searchable.

No, as you well know from our extensive communication through direct email, I did use the “hook” provided under Mail’s Message menu as the command Add to DEVONthink Pro Office. This resulted in the messages being saved outside the monolithic database (as per Annard). As a result, these messages are not verified via checksum.

And, yet again, you refuse to address the issue of DT’s inability to verify the integrity of files outside the monolithic database. Most glaring is the omission to verify PDFs, which surely represent the bulk of the data stored by most users.

If Annard admitted that this failure is a deficiency (a “bug”, in his words), why has it not been fixed yet?! All Annard is able to promise is:

So we wait for the ever elusive v.2.

Just add this script to the folder ~/Library/Application Support/DEVONthink Pro and run it via the scripts menu afterwards:


-- Verify file references.scpt
-- Created by Christian Grunenberg on Mon Feb 05 2007.
-- Copyright (c) 2007-2008.

tell application "DEVONthink Pro"
	try
		if not (exists current database) then error "No database is open."
		
		-- Verify indexed stuff (including archived emails)
		show progress indicator "Verifying indexed files" steps -1
		set theItems to contents of current database whose (indexed is yes and path is not "")
		my verifyFileReferences(theItems, "Verifying indexed files")
		
		-- Verify imported imagges (including PDF documents)
		show progress indicator "Verifying imported images" steps -1
		set theItems to contents of current database whose (type is picture and indexed is no and path is not "")
		my verifyFileReferences(theItems, "Verifying imported images")
		hide progress indicator
		
	on error error_message number error_number
		hide progress indicator
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell

on verifyFileReferences(theItems, theTitle)
	local theItem, theName, theFile, theInfo
	tell application "DEVONthink Pro"
		show progress indicator theTitle steps (count of theItems)
		try
			repeat with theItem in theItems
				set theName to (name of theItem) as string
				step progress indicator theName
				try
					set theFile to (path of theItem) as POSIX file
					set theInfo to info for theFile
					if theInfo is missing value then error
				on error
					log message (path of theItem) as string info theName
				end try
			end repeat
		end try
	end tell
end verifyFileReferences

Thanks for the script, Christian.
To test it, I viewed the package contents of my database, and deleted one of the items in the Files folder. I restarted DEVON, and ran the script: it did not find the error.
In addition, it would be nice if a message is generated ant the end to notify the user if the verify operation was succesfull.

What kind of file did you remove? And did you really delete it or just move it to the trash? If it’s still in the trash, DEVONthink is usually able to locate it.

I deleted (on separate trials) a PDF file and an RTF (mail message). In both cases I emptied the trash and restarted DT Pro Office. In neither case did the script report an error.

Please try this:

  1. Import a PDF document
  2. Remove the PDF file from the “Files” folder inside the database package and empty the trash
  3. Run the script

This should open the Log panel and log the missing file (same applies to imported emails).

I tried this in a new database, and it works properly there. Thanks much!