AppleScript/SHA1/Content Hash

Hi all

I’m writing an AppleScript to export files from DT. After a lot of searching and reading docs I’ve got it more or less working. I’ve also got some python scripts that further process the DT export. Complicated and over engineered. Perfect.

Long story short, I was gonna use a DT record’s content hash as the (AppleScript) documentation says it’s a stored SHA1 hash of a file. But when I run shasum of the same file in a Mac terminal and compare them they’re different.

I’m sure there’s some straightforward thing I’m completely missing so can anyone shed some light on this?

Welcome @Mrmagpie

I’m curious what you’re using the checksums for - validating the export has been done correctly?

Basically, I’m building a website and I’m using DT to manage my files. I could have just used directories but wanted to use DT features such as replication as some files will be placed in multiple folders so rather than having to remember where I put them etc I can easily see there. And some other things that makes DT a good choice. Each folder (group) is a collection of files and some TSVs that contain metadata about those files and the group itself. Each file will have a line in the tsv, and one column is the content hash.

Rather than relying on filenames (though I can if this proves impossible), as they can be changed accidentally, I was going to use the files content hash to match files to their row. One of my scripts adds files to the TSV and stores the content hash, and in python I was gonna generate a files hash and match that to a row in the TSV. But as a test I decided to generate a SHA1 in terminal and it’s different to AppleScript/DT has stored.

I assume I’m missing something quite obvious tbh so it may well just be a dead end

:+1:

No, I’m seeing the same thing… when I apply the following script to an item, the result I receive is a 40-digit hexadecimal number which looks like an SHA1 hash, but which is different to what is returned by, say, ObjectiveSee’s WhatsYourSign or openssl when calculating the SHA1 hash.

tell application id "DNtp"
	set theRecords to the selection
	set theRecord to first item of theRecords
	return content hash of theRecord
end tell

Perhaps @cgrunenberg can explain the discrepancy?

1 Like

To make a long story short - the hashes aren’t comparable. E.g. DEVONthink’s hash supports also document packages. But it’s indeed based on SHA-1.

such as rtfd, for example?

That’s right.

That’s clever… thanks :slight_smile:

1 Like

Huh ok well at least I’m not going insane. Thanks all!

If I understand you correctly, you build a website, presumably using HTML, JavaScript and CSS files. These files must sit in a folder structure on your machine that the webserver understands.

Personal opinions ahead

Instead of putting the files directly where they belong (or in a CMS that knows where to put them), you store them in DT, set up a mapping that enables you to export the files to their proper places and on top of that try to use a hash value instead of the file names to make sure you have a bijective mapping. Which requires a recalculation everytime the content changes and consequently an update of the table. And all that because you want to replicate files?

It’s all up to you, of course. But aren’t there easier ways to shoot oneself in the foot (like foot, gun, trigger)? If you really need replication (i.e. duplicate content, which Google frowns upon in its SEO algorithm), you could simply use links in the filesystem.

Websites are complicated enough already. Is it really a good idea to force DT into the role of a CMS? Given that it doesn’t even have anything coming even close to an HTML or CSS editor…

Ha it’s not orthodox I know.

It’s a very simple website really but will just have loads and loads of files. So it’s a site about old Dublin pubs, like 1800s old, and I suppose once that’s exhausted I’ll move forward. Each pub is a folder in DT and contains newspaper clippings, and other images or documents. Each group has a tsv that contains the street address etc, and another tsv with each file listed. Some clippings reference several pubs. So I’m using DT as a research assistant and CMS. There’s a bit more to it than that, but that’s the gist.

DT doesn’t have to export to any specific folder structure. It just exports my main group which will contain all the data in it. My python scripts will update my DB and copy files to the correct destinations. As I said, I can use filenames to match a file to its metadata but I’d rather use something which was immutable and when I saw that DT generated a SHA1 of a file I thought that would be ideal until nothing matched.

And yeah, I’m not using DT to write any HTML or CSS! I’m the only one adding content and as it balloons I didn’t want to be messing with folders etc. I’m writing the entire site from scratch as I’m a mad and it’s just a side project so I’m using it as an excuse to play around with some languages and frameworks I’m not familiar with.

1 Like

You could use an SHA1 hash which you generate yourself and store in custom metadata (assuming you are using either DT3 Pro or Server, that is - DT3 Vanilla doesn’t offer custom metadata). Note, though, that document packages (such as rtfd) don’t necessarily play nice with hashing algorithms.

1 Like