Adding and verifying checksums to detect inapparent data loss

I’m just playing with an idea, and would welcome any input. Reading through this forum, my feeling is that there are two types of data loss - apparent and inapparent. Apparent data loss would be a catastrophic failure, e.g. database will not open, is empty, fails to verify etc.; inapparent data loss is one noticed only at a random point in time when the user wants to access a specific file - see here for an example. There are a small number of other reports of data loss of this category dotted around the forum. The number is small enough not to really worry me.

Whilst backups are an easy solution to apparent data loss, they are less helpful in the case of inapparent loss. The backup may even be deleted by the time the data loss is noticed, or loss may only apply to a small selection of files which can be complex to replace from a backup of the database.

I wondered whether a checksum could help detect otherwise inapparent changes to files; it would not, of course, make loss of the file as a whole apparent.

The following script - run from a smart rule limiting the script e.g. to locked PDFs - adds a checksum to custom meta data or compares that checksum with the current checksum of the file. Run e.g. once a week this could help detect otherwise inapparent changes to data. It requires a custom single-line text metadata field called “SHA1” with identifier “sha1”. It could easily be changed to run without user intervention (i.e. without a warning dialog, only tagging failed files. Edit: dialog now self-dismissing, so will run w/o user intervention. Files failing the check are tagged “Checksum”)

The script isn’t madly fast - taking approx 4.5 s for 246 PDFs with a total size of 1.8 GB.

I’m not posting the script to advocate using it - it’s a thought experiment, and I look forward to feedback and ideas. Clearly this script would be no use for files which are regularly altered - most of mine are not and are, instead, marked as locked.

property pTag : "Checksum"

on performSmartRule(theRecords)
	tell application id "DNtp"
		try
			set theCount to 0
			show progress indicator "Processing Checksums" cancel button 1 steps count of theRecords
			repeat with theRecord in theRecords
				step progress indicator
				-- if available get saved Checksum from record
				set md to custom meta data of theRecord
				try
					set OldCheck to mdsha1 of md
				on error
					set OldCheck to ""
				end try
				-- get the current Checksum of the record
				set thePath to path of theRecord as string
				set CheckSum to do shell script "/usr/bin/openssl sha1 " & quoted form of thePath
				set CheckSum to texts ((offset of "= " in CheckSum) + 2) thru -1 of CheckSum
				-- set the Checksum if none previously set - otherwise compare previous and current, warn and add "Checksum" tag if discrepancy
				if OldCheck is equal to "" then
					add custom meta data CheckSum for "SHA1" to theRecord
				else if OldCheck is not equal to CheckSum then
					set theDialog to "Record " & name of theRecord & "
Has Changed! Reset Checksum?"
					set AskUser to display alert "Checksum Error!" message theDialog as critical buttons {"Fail", "Reset"} default button "Fail" giving up after 30
					if button returned of AskUser is "Reset" then
						add custom meta data CheckSum for "SHA1" to theRecord
						-- tags routine adapted from suavito, posted May 2020 https://discourse.devontechnologies.com/t/applescript-to-delete-tags/9583/17
						-- remove "Checksum" tag if Checksum is reset
						set theNewList to {}
						set theList to tags of theRecord
						repeat with n from 1 to count of theList
							set theNewItem to item n of theList
							if theNewItem is not "Checksum" then set theNewList to theNewList & theNewItem
						end repeat
						set tags of theRecord to theNewList
					else
						set theCount to theCount + 1
						set tags of theRecord to tags of theRecord & pTag
					end if
				end if
			end repeat
			display notification (theCount as string) & " Records Failed" with title "Processed Checksums"
		on error error_message number error_number
			hide progress indicator
			if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
		end try
		hide progress indicator
	end tell
end performSmartRule

Edit: changed dialog to be self-dismissing, added notification

4 Likes

Development would have to assess the feasibility of such an approach.

I wouldn’t want to trouble them :blush: I was kind of expecting @chrillek to chime in with a single line of JS doing the same, and hoping @pete31 and yourself might notice any incoherences in my script :wink:

I’ve actually applied it across my databases now, using an appropriate smart rule; it takes less than 3 minutes for 10.000 files. I’ve added a NoCheck tag for the small number of files I regularly but seldomly edit (which are locked by the locking rule and would be picked up by my script otherwise). I’ll report back if the script ends up eating the lizard or something else bad happens.

PS @BLUEFROG you recently showed me how to add a line break to a dialog without actually “physically” putting a line break in the code; I failed to make a note and can’t remember. Would you tell me again, pls?

An aside, because I didn’t previously mention it: the script checks the integrity of the file; not it’s location, metadata or even existence.

Hmm… I don’t recall the conversation at the moment. Are you referring to creating an AppleScript dialog and setting the prompt?

display dialog "this is the text on the first line" & "
" & "this is the text on the second line"

vs

display dialog "this is the text on the first line" & JimsMagicCommandForALineBreak & "this is the text on the second line"

return is what I suspect you’re after.

display dialog "This is line one." & return & "This is line two"

1 Like

return is what I’m after. It’s the simple things… thanks Jim :slight_smile:

No problem :slight_smile:

is checksum checks anything that we might see in a future DT release? what does the DT team think about it?

Checksumming is not completely without pitfalls; I only perform it on files which are locked, which reduces the probability that files which I routinely change are constantly flagged. I also exclude files with a nocheck tag, which allows me to mark and exclude files which I change only occasionally, which would however be locked by my locking rule. How useful a checksum routine is probably depends on the type of data a user produces and how they interact with that data.

I’ve actually updated my script a little along the way; here is the current version, which includes some improvements and excludes rtfd files, as they are a file bundle and I can’t create a checksum using the methods I use for all other files:

property pTag : "Checksum"

on performSmartRule(theRecords)
	tell application id "DNtp"
		try
			set theCount to 0
			show progress indicator "Processing Checksums" cancel button 1 steps count of theRecords
			repeat with theRecord in theRecords
				if type of theRecord is not rtfd then
					if cancelled progress then error number -128
					step progress indicator (name of theRecord) as string
					-- if available get saved Checksum from record
					set md to custom meta data of theRecord
					try
						set OldCheck to mdsha1 of md
					on error
						set OldCheck to ""
					end try
					-- get the current Checksum of the record
					set thePath to path of theRecord as string
					set CheckSum to do shell script "/usr/bin/openssl sha1 " & quoted form of thePath
					set CheckSum to texts ((offset of "= " in CheckSum) + 2) thru -1 of CheckSum
					-- set the Checksum if none previously set - otherwise compare previous and current, warn and add "Checksum" tag if discrepancy
					if OldCheck is equal to "" then
						add custom meta data CheckSum for "SHA1" to theRecord
					else if OldCheck is not equal to CheckSum then
						set theDialog to "Record " & name of theRecord & "
Has Changed! Reset Checksum?"
						set AskUser to display alert "Checksum Error!" message theDialog as critical buttons {"Fail", "Reset"} default button "Fail" giving up after 30
						if button returned of AskUser is "Reset" then
							add custom meta data CheckSum for "SHA1" to theRecord
							-- tags routine adapted from suavito, posted May 2020 https://discourse.devontechnologies.com/t/applescript-to-delete-tags/9583/17
							-- remove "Checksum" tag if Checksum is reset
							set theNewList to {}
							set theList to tags of theRecord
							repeat with n from 1 to count of theList
								set theNewItem to item n of theList
								if theNewItem is not pTag then set theNewList to theNewList & theNewItem
							end repeat
							set tags of theRecord to theNewList
						else
							set theCount to theCount + 1
							set tags of theRecord to tags of theRecord & pTag
						end if
					end if
				end if
			end repeat
			display notification (theCount as string) & " Records Failed" with title "Processed Checksums"
			if theCount > 0 then log message "Checksum Verification" info "Found " & theCount & " Errors!"
		on error error_message number error_number
			hide progress indicator
			if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
		end try
		hide progress indicator
	end tell
end performSmartRule

It is run by the following smart rule:

As originally posted, it requires a custom single-line text metadata field called “SHA1” with identifier “sha1”.