A new tool for Zotero users

This looks great, though I have a question about how it handles Zotero libraries that link to files stored elsewhere? Everything in my Zotero library is stored on cloud-synced Box.com finder folders, which are then indexed in DT. As there are no PDF files in the ~/Zotero/storage folders, only .zotero-ft-cache and .zotero-ft-info files, how does the script handle this?

So I’m not familiar with how Box.com works, but if it has a Finder location, you should be just fine. If I’m reading the this help article correctly, simply point zowie to the correct location:

zowie -f pdf /Users/GWashington/Box/FolderWhereIKeepTheFiles

That’s actually how the Hazel rule works–Hazel is watching the entire folder and all subfolders. When it detects a match (say, because I just added a new journal article to by Zotero database), it passes the full URL of that PDF in the variable $1 to zowie, which then runs just on that one file (it would be really inefficient to do it this way if it’s the first time ever that zowie has run on a folder!).

Thanks. I’m hitting some other errors with Zowie (tuple index out of range), but can take it up on GitHub.

A quick question on DT <-> Zotero integration: Is there a way for DT to ignore the subfolder layout when indexing Zotero’s storage folder, and to ignore file types that aren’t desired (e.g. only index PDF’s)?

If I index the storage folder directly, I get the Zotero link identifier folders and lots of .js files stored as part of online web abstracts. I could use ZotFile, but that has some additional complexity with storing attachments outside of the storage directory, since then I will not be able to sync with my work laptop.

2 Likes

I am wondering the same thing–I created a smart group of my Zotero PDFs as a stopgap.

That is a good question! To my knowledge, there is no way to directly control DEVONthink’s indexing mechanism and get it to ignore certain file types. However, one can resort to using a smart folder. Here’s how I do it, as an example to get you started.

I have a top-level folder in my database called Sources. Within that, I have a folder called Zotero and within that, I put the actual indexed folder. (The reason for this indirection is purely aesthetic: I index the storage folder inside my Zotero database folder, since that’s where Zotero stores PDFs, but I find the name storage confusing in the context of the rest of my folder structure in DEVONthink. The indexed folder cannot be renamed in DEVONthink because it matches whatever’s on disk, so the only option left is to create another folder in DEVONthink and call that one whatever I want.)

image

WIthin the folder Sources, I put a smart folder that searches inside the storage folder. Here’s its definition:

I’m not convinced this particular smart folder definition is ideal; for one thing, if I ever put another file type into my Zotero database, I have to remember to update the smart folder definition to include it. But it’s good enough for now, and it avoids all those css, js, and other files that may end up in a Zotero folder. It produces a very clean list of articles:

Hope this helps!

5 Likes

No you can’t control DEVONthink’s indexing and exclude items from being indexed.

1 Like

Thanks! Good to understand how you have that setup.

For anyone using this approach, I wanted to alert you that I recently updated the embedded AppleScript mentioned above in order to deal with some problems with special characters in file names. The latest version is available on GitHub and here it is for convenience:

# The following function is based on code posted by user "mb21" on
# 2016-06-26 at https://stackoverflow.com/a/38042023/743730

on substituted(search_string, replacement_string, this_text)
	set AppleScript's text item delimiters to the search_string
	set the item_list to every text item of this_text
	set AppleScript's text item delimiters to the replacement_string
	set this_text to the item_list as string
	set AppleScript's text item delimiters to ""
	return this_text
end substituted

on performSmartRule(selectedRecords)
	repeat with _record in selectedRecords
		# In my environment, Zotero takes time to upload a newly-added
		# PDF to the cloud. The following delay is needed to give time
		# for the upload to take place, so that when Zowie runs and
		# queries Zotero via the network API, the data will be there.
		delay 30

		set raw_path to the path of the _record

		# Some chars in file names are problematic due to having special
		# meanings in shell command strings.  Need to quote them with 2
		# blackslashes, b/c the 1st backslash will be removed when the
		# shell command string is handed to the shell.
		set sanitized_path to substituted("&", "\\\\&", raw_path)

		# Another problem with file names is embedded single quotes. The
		# combination of changing the text delimiter and using the
		# AppleScript "quoted form of" below, seems to do the trick.
		set AppleScript's text item delimiters to "\\\\"
		set result to do shell script ¬
			"/usr/local/bin/zowie -q " & (quoted form of sanitized_path)

		# Display a DEVONthink notification if an error occurred.
		if result is not equal to "" then
			display notification result
		end if
	end repeat
end performSmartRule
1 Like

Thanks for this application! Works great (once I downgraded sidetrack as you suggested on Github). I’m trying to make Zotero my central source of wisdom for information sources I find, be it websites, articles or books. This lets it be so!

1 Like

Welcome @NationalInterest

I’m sure @mhucka appreciates the pat on the back! :slight_smile:

In case anyone is interested, I’ve released a new version of Zowie (1.2.0) that fixes a software compatibility issue, adds a new feature that will be of interest to DEVONthink users, has faster startup time, and comes with single-file self-contained runnable binaries for macOS 10.15 and higher. See https://mhucka.github.io/zowie/.

6 Likes

Another note for anyone using Zowie for Zotero: I wrote an explanation of my smart rule scheme in a wiki page associated with the GitHub repository.

2 Likes

If I understand it correctly, Zowie is currently capable of embedding Zotero select links into files in four different metadata locations, depending on the chosen method (findercomment, pdfproducer, pdfsubject, and wherefrom).

Have you considered adding a feature that would enable Zowie to embed other metadata from Zotero into those four locations? So instead of writing Zotero select links, maybe it takes the URL from Zotero and writes that to the file’s com.apple.metadata:kMDItemWhereFroms extended attribute.

I know Zotero “cleans” the kMDItemWhereFroms xattr on import. But I would prefer to keep it – or better yet, replace the “dirty” Where From: field with the proper URL displayed in Zotero. Sometimes I do this manually in DEVONthink, but I would prefer to automate it. So far I’ve been unsuccessful.

And if you don’t plan to support this feature, @mhucka, could you perhaps point me in the right direction to implementing this myself?

@aaaaaaaaaaaaaaaa Thanks for your interest. Some questions:

  1. Can you clarify what you mean by “the URL from Zotero” in the sentence “maybe it takes the URL from Zotero”? (A concrete example might be useful.)
  2. Regarding this:

Sorry to be dense, but I can’t quite follow this part. Can you please elaborate on what is this “cleaning” of extended attributes being done by Zotero in the sentence “I know Zotero cleans …”? It sounds like something I didn’t know Zotero was doing.

More generally, I wouldn’t be opposed to adding an option to Zowie to let it write something other than the Zotero select link. If the result still only required writing a single item of data into a file, then architecturally, the change to the software would be small – it just means letting the user pick what is to be written, and figuring out how to get the data from Zotero somehow. OTOH, if you had in mind to make Zowie write multiple values into each file at once, that would be a bit more work. Right now, it’s only designed to write one thing at a time.

A complication when it comes to Zotero is that some of the data fields are not present for all bibliography types (e.g., not everything has a volume number), and some data depends on the user having certain extensions installed (like BibTeX citekeys, which needs Better BibTeX installed and configured in a certain way). So, it’s important to keep that in mind.

For a single value that is available for all bib types, it would mean adding a command-line flag (starting with the interface definition in __main__.py) to let the user specify what is desired, then changing some of the middle code that passes the information down to lower levels, then changing the code that works with the Zotero data (probably just extending the code in zotero.py that currently constructs the Zotero select link), and finally, updating the documentation.

TL;DR if you just want a way to tell Zowie to switch between writing the select link and writing a different URL, that should be pretty easy and I could probably do that soon-ish (not immediately, but more quickly than other changes). If you wanted to let it write the user’s choice of other Zotero data items instead of the select link, that would take longer but still doable. If you wanted to make it write several different pieces of data at one time into every file, that’s probably beyond the scope of Zowie.

1 Like

At some point you need a “buy me a coffee” link…unless I missed it somewhere.

2 Likes

Here’s an image that I hope clarifies what I mean - it’s the URL field in a Zotero entry.

I was under the impression that Zotero removes junk metadata in files downloaded from the web (ie. the name of the admin who converted the file to PDF and added their name for some reason). But maybe i’m misinformed. Whether or not this feature ships with Zotero isn’t so important to my point - which is that I’d like to be able to import additional metadata from Zotero into DEVONthink.
Actually somebody made a similar request in reference to your tool over in the Zotero discussion board: Adding metadata to PDF properties - Zotero Forums

Here’s a use case I had in mind: Say I’ve just pulled up a PDF for a scholarly article in my web browser. I use the Zotero Connector to import it into the reference manager, and Zotero retrieves high quality metadata. But the PDF file itself doesn’t carry that metadata. If I use DEVONthink to index the file, usually I’d either add that metadata manually, or perhaps copy the DOI into DT’s custom metadata field so that I can run the excellent script to ‘Download Bibliographic Metadata.’ But even that process is a bit of a hassle. I was hoping that with Zowie (+ some DEVONthink smart rules), I could automate this. Zotero will detect the paper’s DOI. Then a tool like Zowie would copy the DOI into the Finder Comment extended attribute. Then I could have a DEVONthink smart rule copy that Finder Comment into custom metadata and follow it up with the ‘Download Bibliographic Metadata’ (or a slightly modified version that doesn’t change the file name).

Importing the DOI is just one use case. But it would amazing if Zowie was configurable enough that I could easily copy any metadata field from Zotero into the file’s Finder Comment. Or use it to copy multiple metadata properties from Zotero into the Finder Comment. To use the screenshot I shared above as an example, maybe I want my Finder Comment to say:

Publication: Postmodern Culture
Volume 26, Issue 2
Date: 2016
DOI: 10.1353/pmc.2016.0005
ISSN: 1053-1920
Library Catalog: Project MUSE
Publisher: Johns Hopkins University Press

Thanks for the explanations. The first one (the URL) shouldn’t be too hard, although it’s worth noting that not all papers’ Zotero entries will have a URL (or at least, in my experience, it’s a bit spotty), so unlike writing Zotero select links, Zowie may often fail to be able to write anything.

The second topic is getting a bit far afield for Zowie, but I had the same desire at one point, and started writing Zoinks for exactly the purpose of asking Zotero for data and writing the results into DT metadata fields using smart rules. That effort stalled, unfortunately, due to lack of time, and I also found that for my needs, being able to jump from a PDF in DEVONthink to its Zotero entry was enough (because from there, I can get other info).

Since my speed lately is only slightly faster than the speed at which pitch drips in the pitch drop experiment, here are some pointers to potential alternatives in case you or anyone else wants to explore them:

If you happen to use Alfred, you may be able to use ZotHero to achieve what you’re looking for. Someone pointed me to it after I mentioned wanting to write Zoinks.

If you happen to use Better BibTeX (or maybe even if you don’t), and can write some basic JavaScript, you may also be able to use its JSON-RPC interface to get the Zotero data. In fact, I’ve asked the BBT developer about adding an endpoint that would allo looking up a record based on its attachment key. If it could do that, then I could vastly speed up Zowie by avoiding network calls to the Zotero API servers, and it would also make writing Zoinks a piece of cake.

Lastly, I looked through the ZotHero code, and what it’s doing is reading Zotero’s local SQLite database directly. Every major programming language has an interface package for SQLite, so in principle, if you didn’t know JavaScript or Python and didn’t use BBT, but you knew a little bit of some other language, you could probably write something that reads the local Zotero database to get the necessary info. I don’t know what the structure of the database is (you could probably get some clues from the ZotHero code) but it’s probably not too hard to work out what the tables are.

1 Like

hi @mhucka im curious what your approach is in regards to annotating and comments with your documents? do you currently have a workflow with zotero that allows you to have them be attached with your pdfs etc?

from what i know, zotero does not automatically save markups unless you do it manually

do you then only do annotations in devonthink?

4 Likes

Whoops, sorry this went unanswered for so long!

My approach has been to do all annotations in DEVONthink. I know the recent version 6 of Zotero is supposed to have much improved PDF annotation facilities, but I haven’t even tried them because DEVONthink’s facilities are so good and because sticking to a single common workflow for all PDFs brings benefits.

The software configuration is:

  1. Configure Zotero to use stored (local) file storage (not linked attachments)
  2. Index Zotero’s storage folder from within a projects database in DEVONthink
  3. Sync my projects database with DEVONthink To Go on an iPad

Then for reading & annotating, the workflow is:

  1. Using an Apple Pencil, highlight, mark up, write notes by hand on the document in DTTG.
  2. The results get sync’ed automatically with DEVONthink on my Mac, and since DEVONthink automatically synchronizes changes in the contents of indexed folders on disk, the annotations also become visible in Zotero after a short time. (But I don’t do anything further with them in Zotero.)
  3. Back in DEVONthink on the Mac, I make use of DEVONthink’s annotation documents facility, with a custom template for “reading notes”. My goal is, after reading and marking up something, to write a summary and reflections about it in a reading notes annotation document.

Tip for indexing the Zotero storage folder in DEVONthink: a smart folder in DEVONthink can be made to search this indexed folder and provide a nicer view. I described my approach here.

3 Likes