Search and set ISBN as intelligent rule

Hey there,
I have a selection of pdfs (books and papers) that have a ISBN or a DOI number in their text.
I would like to have an intelligent rule that finds that ISBN Number and puts it in a customized meta data field.
can you help me ?
thanks

IMO it would be difficult to find the ISBN identifiers inside the document without generating a lot of noise.

  • Books present ISBN in various formats, e.g. 978-0-00000-000-0 and 9-78-000-0000-000. A script would have to take all possible formats into consideration.
  • Older books may contain ISBN-10 only.
  • If the PDF is scanned, there could be further complications introduced by the OCR process.
  • An ISBN identifier (ISBN-10 especially) can be indistinguishable from e.g. a phone number.

An alternative is to match the book with an online database. Try running the Google Books Metadata add-on script. Sometimes it returns satisfactory results. Sometimes not.

1 Like

thanks for that - yes I also experienced that… too many options to do it 100% - but if we could do only 75-80% AND mark the others - that would be a biiiig help. .

  • about the script- I do not see it in my scripts depository and do not find a way to dl it. do you have a link? - thx

I suppose that an ISBN is prefixed by “ISBN” – how else would one know that it’s the ISBN and not a phone number, for example? So, looking for the regular expression
ISBN[- ]((?978[- ]?)[0-9- ]+)[^0-9-]
might work. Searching the net for “ISBN regular expression” turns up many variants.

Indeed.

1 Like

The digital object identifier AppleScript property and the same placeholder in smart rules/batch processing should be able to handle those PDFs having a DOI. Support for ISBN might be added to future releases.

thanks for that
and (sorry my ignorance) what would then be the output - the ISBN that I want to put in a custom meta data field?
\1 ??

The RE contained an error. This should work better:
ISBN[- ]((?:978[- ]?)[0-9- ]+)\b
Using \1 in the “Set ISBN to…” field of a smart rule works as expected, “ISBN” being defined as a single-line text-field.

But I’d rather go for a small script that standardizes the ISBN before storing it as a custom metadata field. As @meowky pointed out, ISBNs come in many variants, and if you want to use them eg for searching, you’d be better off with a unified format, containing only digits.

1 Like

many thanks for that- seems to work - unless ISBN is on the end of the book
! that helps a lot
now: how would that script be like? Would you be up to help me with that?
or is there an existing one?

I sent you a PM.