Fetching DOI-based metadata for my large PDF library

Bear in mind that what you attempted to do was attempt to identify the PDF, while I got the PDF through its DOI.

However, I ran the DOI (no. 10.2307/163639) in Zotero on iOS and didn’t come up with anything.

I originally found it on the PC version of Zotero. I’ll replicate the process as soon as I get access to that computer again.

Okay, I ran it through that same PC and didn’t return with a result.

The way I found it was through indexing the entirety of an academic journal according to this list of DOIs. That how I got this and many other files that I’ve since been unable to ID.

How did you go about doing this?

On the PC app, there’s a button with a picture of a magic wand. Pasted them there.

There is similar on the Mac version. However, I pasted in 10-15 and it yielded mostly singular items with the metadata (which are functionally groups) but only four actual PDFs.

You might not be able to replicate downloading that particular PDF because I used a custom script with my school’s log in details.

Since I can’t tell which of these DOIs is the correct one (the one I provided you with doesn’t work on doi.org), it seems like we might have reached a dead end.

I don’t think so, and I seem to remember that somewhat asked a similar question here in the last two weeks or so: Limit search results to selection or so.

You could use a script to find those values, using Apple’s PDFKit framework, and extract them into custom metadata fields. Those can then be searched. Analogously for your “Experimental methods”.

What makes you say that? PDF is an ISO standard, for ages. And its derivatives like PDF/X and PDF/A are very much standardized by various bodies. Even the PDF metadata is, AFAIK, standardized. And it does not contain received or accepted dates.

Maybe that statement needs some qualifications. I’ve been heavily using Python on Macs since the late 1990s, for data science work. It works as well on macOS as on Linux (the other platform I do a lot of work on), and much better than on Windows (which I sometimes have to support because some of my students use it).

If you are trying to build a GUI consumer app on macOS, it’s not a great choice. But for its main intended purposes (scripting, data science) it’s a great language.

That said, the demise of Appscript (an abandoned Python-AppleScript interface) has made it definitely suboptimal for driving Mac apps. But once you get data (from an AppleScript call, say), I find Python a great choice for doing something with the data.

1 Like

There’s a person over at macscripter.net who claims to use Python all the time to drive their Mac apps (sometimes through an XMLRPC server, of all things). Maybe they can give you new ideas.

Thanks for the tip. I did a quick search for “python” there, and all that came up were a few simple examples, the most recent from 2019. If I missed something, I’d love to see a pointer.

[Edit: Never mind—searching instead for “XMLRPC” turned up what you mentioned. Python is explicitly referenced, but the puzzling MacScripter search engine doesn’t include those threads in a “python” search.]

Back when Appscript was being maintained, I was able to use it to control OmniOutliner to implement a GTD workflow (something like a simpler version of OmniFocus). It would be painful to do that nowadays, without Appscript.

As someone who has been programming since the 1970s, and likes to experiment with new programming languages, I’ve never understood the allure of AppleScript. Its attempts to be “conversational” just make its behavior too mysterious. I don’t understand why Apple didn’t build better connections to one or more scripting languages like Python (Appscript was such a connection, but from a 3rd party). Then again, I suppose they went a step further with Swift, which has definite Python influences.

The next release will also check the title if no DOI is found in the document’s body.

2 Likes

I feel your pain. Especially since I’m lurking at macscripter – all that code people write to get things done, which should be simple, like a regular expression search. (Larry Wall’s old motto: Make easy things easy and hard things possible).

Well, they moved on to the “Digtial Hub”, and they know what the users want. Which is just more bling. The whole automation thing is a shambles, and it’s not getting better with Shortcuts. But I guess the demand is simply not big enough for some solid scripting support, so they rather build VR glasses.

2 Likes

Much appreciated!