Populating Devonthink Database Metadata from PDF DOIs

fmenard123 · December 15, 2015, 4:04pm

Hello,

I find myself wanting to see Devonthink allow an automatic extraction of the DOIs inside the PDF file and to convert these DOIs to Devonthink metatadata. Mainly the PDF Title as well as the PDF Authors. The Highlights App picks those up, but the only thing it exports to Devonthink is an HTML file with the same title as the PDF file. This doesn’t solve my wish to have the ‘Title’ field in Devonthink populated with the actual Title of the article. Furthermore inside the Devonthink Database, the ‘Name’ field has nothing to do with the file name … and can be populated. Yet there is no way to show as one of the columns the actual ‘file name’ other than seeing it as a component of the ‘Path’. I’ve added all ‘Colums’ and there is no ‘Horizontal Scrollbar’ which means I have to widen the window beyond the screen to see all columns… none of them is the actual file name (other than as component of ‘Path’).

So I find myself wondering how anyone has managed to get a simple workflow of doing basic PDF file management… such as with Title and Authors within the Devonthink metadata database without doing this manually every time a new PDF is added to the database…

F.

macula · December 16, 2015, 4:11pm

I’d use a dedicated bibliography manager for the data extraction, then a script or folder action to import the data into DT. Also, last time I checked there were a few Alfred workflows for DOI-based searching.

fmenard123 · December 17, 2015, 1:24am

Any idea what that script may be to import … ?

I’m able to get Highlights to generate an HTML file.

The best I’ve done so far is to get the HTML file generated by HilightsApp and then merge the HTML file and the PDF file into one document. This creates an entry with a clearly searchable title but it doesn’t populate the Title field in the DT Database.

The other thing I want to do is to do something like:

Read 1st line of PDF, dump it into the Title field.
Read 2nd line of PDF, dump it into the Authors field.

If there was a way to open PDF, read first line, dump into Title field, via a script… then I think this would work. The kind of stuff I find myself reading the most is papers from OpticsInfoBase and those articles have always the Title as the first line of the content and the authors as the 2nd line in the PDF.

F.

BLUEFROG · December 17, 2015, 3:26pm

These two words lead to problems. always makes it easier to script.

Also, just as an FYI - dealing with PDFs under-the-hood is a challenge in itself. It’s not that it can’t be done, but that you will find many custom tools for getting at the underlying data (which also means they’re not going to be installed on your Mac be default).

fmenard123 · December 23, 2015, 4:40pm

I find that I cannot enter the ‘Title’ field manually for any entry. Even if I open the ‘i’ button. How can the ‘Title’ field in the metadata be edited ?

F.

BLUEFROG · December 23, 2015, 5:31pm

It can’t be edited in DEVONthink. It is part of the metadata of the PDF file and is ReadOnly.

Greg_Jones · December 23, 2015, 6:13pm

It should be possible to edit the Title field for PDFs in DEVONthink-Tools Menu>Show Properties. The 7 property metadata fields that are editable varies (from none to most) based document type, but Title is one of the editable fields for PDFs.

korm · December 23, 2015, 7:05pm

… and you can set columns to display Title and other metadata – so for an imported PDF with values in those metadata, those values will appear in the your document display.

BLUEFROG · December 23, 2015, 7:47pm

It should, but… found a bug.
When viewing in List View, Properties is not editable. If you guys can see that too, let me know and I’ll file a bug. Thanks.

korm · December 23, 2015, 8:03pm

True. In both View as Icons and View as List, the Properties panel neither shows the metadata for a PDF, nor allows that data to be edited. The other Views seem to be OK. Didn’t test in the case of non-PDF documents.

BLUEFROG · December 23, 2015, 8:51pm

Excellent. Thanks for the assist, korm.