Script or Smart Rule to extract DOI from a PDF and add it to Custom Metadata?

I haven’t found anything for this in the forum, hence starting this new thread. I apologise if my search wasn’t thorough enough.

I would like to extract DOI numbers from the PDFs in my library and place that DOI into that document’s Custom Metadata Field.

The Script Crossref Lookup doesn’t seem to accomplish this. I have tried it with different files and the script found title and author(s), even keywords but no DOI.

Does anyone have an idea for a smart rule, maybe with regex or already have a script?

If the DOI can be identified unequivocally in the text, that should be possible with a regular expression. Can it?

That’s what I thought and I also found some expressions such as

/^10.\d{4,9}/[-._;()/:A-Z0-9]+$/i

but I am not proficient enough to apply this as a smart rule. Does anyone have any idea?

Smart rules & batch processing provide a DOI placeholder.

I am sorry, I am lost a bit/blind. Where can I find these in the Smart Rules and Batch Processing?

1 Like

In the menu to insert placeholders, e.g. accessible via the button in text fields or via the Insert Placeholder submenu of contextual menus.

oh, neat! I didn’t know it could be that easy!

A search for “PDF DOI” in the form might have enlightened you.

Okay, not so easy apparently. I tried it with the Abstract MD which is a multi line text type. Then I changed the DOI-meta data from identifier to single line text because apparently smart rules cannot change identifier-type custom metadata. Then in the smart rule I set "Change DOI"→ Document DOI Placeholder. No result, the DOI md-field stays empty. Any thoughts?

Probably. Thanks!