Hi, request for help here.
What I’d like to do is have DEVONthink scan a database of (many, highly varied, unstructured) PDFs and extract any dates or possible dates it finds in them, and then save them as metadata which would then allow a user to search for, say, all documents (and if possible the relevant passage of the matching document) referring to a specific date or month, or year.
So one document, say, might include a date when it was created, a date when it was declassified, and may in the body include numerous strings that might be dates – a day of the week, a month, a year, a decade etc – which DEVONthink could identify and store as separate pieces of custom metadata.
If one were to search that metadata later, the user could ask for all documents that include reference to a specific date or range of dates, and see the matching documents with the relevant sections highlighted or otherwise identified.
Hopefully the code would be able to make some assumptions – it mentions a date – the 23rd not contiguous but near a month, June, say – which is itself not far from a year, 1987, say, and so the assumption could be made that the 23rd is June 23 1987.
I’ve seen a few other threads not far from this idea, but not seen any that refer to extracting multiple dates, and kinds of dates, per document. And I have no Regex knowledge or experience to speak of, so perhaps my request is far too ambitious. But I’d welcome any advice.