Keywords and PDFs in Devonthink 4

Hi. I’m a longtime and enthusiastic user of Devonthink, albeit that I rarely scratch the surface of its capabilities. I am currently involved in a community archive project which involves processing and organising thousands of scanned documents, photos and sources, many of which are in PDF (PDF+Text) format. As most of my colleagues are non-Mac users, Tags are not an option and for most of the group the best option would seem to be using Adobe Bridge and Keywords, and a few basic IPTC fields, to label and locate files. For standard image files Keywords are recognised in both directions by Bridge and Devonthink, but I seem to have hit a wall when it comes to getting Keywords to stick to PDFs.

I have read the relevant (I believe) documentation, and also a number of posts warning of the rudimentary implementation of IPTC metadata in DT. And in a better world I would love to park everything we have in a DAM system.

However, I have had some success using the Exiftool command

exiftool "-Keywords<Subject”

to copy the Keywords from the standard IPTC/DC fields to the pdf:Keywords (xmp-pdf) field that (if I’m understanding it right) that DT uses. But I’m not a command line champ and would like a cleaner, more user-friendly solution. I’m happy enough to use Bridge for Keyboarding though I’d obviously be happier in DT. Is there a script of some kind that might automatically or simply copy the data in batches from one field to the other that would keep Bridge users happy?

Thanks for any help or pointers anyone can offer.

Modifying XMP is not a trivial thing, hence the existence of specialized tools like exiftool. And no, there is no built-in function or script for it.

A real-world example would be useful including what data “simply copy the data in batches from one field to the other” specifically.

Thanks. I guess I’m just trying to iron out a kink in my workflow from DT in which I import, edit, crop, run OCR, add Keywords (and maybe Title and Subject fields, though they travel fine), sort and file content, and would like to have the resulting files and folders, complete with the same metadata, available to others in Bridge (or other apps and OS’s). As for “simply copy the data …” my only immediate wish is to have the Keywords migrate to Bridge. With as little friction as possible.

e.g.If in the Keywords field of the Properties Inspector (Note: Blank fields in this e.g are not available for PDF files) I write this :

In Bridge I see this.

My hope was there might be a scripted way of duplicating the PDF Keywords to the IPTC Keywords, but if as you say this is a non-trivial issue then maybe I need to rethink my workflow.

Also (forgive me if this is a dumb question) but what is it about PDFs that they cannot be tagged in the same way as TIFFs, JPEGs etc. using the standard metadata fields? Is that a Mac thing? Or a DT thing?

Regards

I think that you’re trying to drive a nail into a wall with a screw driver. It may work, but the tool is not the best one for the task.

PDF keywords are one thing. EXIF/IPTC keywords are another thing. An image can’t have PDF keywords. I doubt that any tool but exiftool can handle IPTC keywords in/for PDF. A brief search on the net didn’t reveal anything else (but I’m in China now, so that might influence results).

I’m not sure that I understand the question. What “standard metadata fields” are you referring to – exposure? GPS coordinates? author/title/publisher?

PDF standard provides for a (very limited) set of metadata fields. Adding others like IPTC/EXIF might be possible today, but there’s eg no support for that in Apple’s PDFKit framework. Little surprise there: IPTC/EXIF was developed in a completely different field than PDF (Portable Document Format – all about print, not document management). The only relevant overlap that I see is in the PDF metadata themselves.

Feasible by using a doShellCommand with a properly crafted exiftool call. Although I’d forego Apple/JavaScript completely in this situation and write a simple shell script for it.

My question: Why do you even want to replicate PDF keywords to IPTC ones? Every self-respecting PDF reader can display the keywords. Bridge is not a PDF reader, afaik, so it might be lacking in that respect. Is everyone using Bridge because it seems to be the smallest common denominator?

And one more thing: What do you want to happen if someone changes the IPTC keywords of a PDF file (deletes, adds, whatever) – do you want them to be updated in the PDF keywords, too? If not, DT will show you something different than Bridge will do…

DT does, to my knowledge, not use xmp-pdf, but simply the PDF metadata structure that has existed in PDF since eons.

1 Like

Thanks for the helpful comments. I am forced to the conclusion that, much as I prefer working within the DT interface, that for this project I will henceforward do my Keywording in Bridge. The purpose of my original question was really to export the keywords of the many files that I had amassed in DT (PDFs to be precise, since the image files were fine) to Adobe Bridge, which is our group’s best (only?) shared option. I have now figured out (with a little help from AI) how to get exifttool to do that for files or folders:

exiftool "-XMP-dc:Subject<PDF:Keywords" -overwrite_original filename/pathtofolder 

but I don’t relish the prospect of doing that on an ongoing basis.

As to you your specific points:

… you’re trying to drive a nail into a wall with a screw driver. It may work, but the tool is not the best one for the task.

Yes, almost certainly. I am not familiar with all the tools in the DT toolbox and was hoping someone could suggest the right one to use.

I’m not sure that I understand the question. What “standard metadata fields” …

To be more precise I had in mind the IPTC Core set of fields: Headline, Description, Keywords, Credits etc. I’m aware that there is a bewildering proliferation of metadata schemas and options, but at our level of resources and know-how we need to keep things simple.

Feasible by using a doShellCommand with a properly crafted exiftool call.

As I say we’ll probably stick to Bridge (and its plugins) going forward. That said if putting the exiftool command into a script that I can run from DT is an easy task for someone here, it might prove useful in the future. I’m afraid I wouldn’t know where to begin with that.

Why do you even want to replicate PDF keywords to IPTC ones? Every self-respecting PDF reader can display the keywords. Bridge is not a PDF reader, afaik, so it might be lacking in that respect.

I had keyworded c.2000 files (pdfs, jpegs, gifs, tifs) before I realised that the world of pdf metadata was a special case. I realise that a proper PDF reader would manage this fine, but for all its shortcomings, Bridge can display PDFs and pretty much all the filetypes we need to worry about. AND read/edit their metadata (just not Applekit/PDF data, I now know). And it’s free, which is a big +++ for us.

DT does, to my knowledge, not use xmp-pdf, but simply the PDF metadata structure that has existed in PDF since eons.

Yes, you’re right. I got that completely back to front. Or something

Thanks again for your help and attention. And for putting up with the dumb questions–I have learned some things I didn’t know before, which is always time well spent.