Meta information of PDFs is lost during import (downl. from

I just stumbled upon a problem with DevonThink Pro Office 2.0.3:

If I load PDF files from the web with Safari and then save them on the disk (with “save” or “save as …”), they usually contain information about the document (title, author, etc.) and also the weblink where they were found.

However, if I save it to the DevonThink Inbox directly or import the downloaded file into DevonThink Pro, this information seems to get lost:

Is this a bug or a feature?
I’d like to preserve this information, if possible.
Is there a way to do that?
(or is there a good reason to remove it automatically?)

Kind regards

Martin

My guess is “Weitere Informationen” isn’t displayed (actually, as “–”) for the file in DT because it’s under …/Files.noindex, which isn’t indexed by Spotlight. If you copy the file somewhere outside that folder (e.g. Desktop) then the metadata should reappear.

Also, if you select the document in DT and run Tools > Show Info… (Shift-Command-I) the metadata should show up under Additional Information. And in the Tools > Show Properties… (Option-Shift-Command-P) panel.

Notice that “Erstelit” and “Geandert” in your images are identical for both the original and imported files, which is a clue DT didn’t modify the latter when importing.

Thanks, sjk: you are right:

  • when I drag the file from DTPro to the Finder or export it, the Finder-meta-data are back visible as they were before
  • and most of the metadata is also visible in DT Pro (see screenshot - I overlooked it, sorry!)

But in Devonthink I found no way to display or search for the source URL, which I would find important for scientific work where you have to correctly cite your references (with URL and date of last access).

I’m not sure if there’s a way to import/save PDFs into DT so the URL field will be populated. An example comparison:

When saving a Web Archive from Safari to DT’s global inbox the URL will (conveniently) be added to the URL field, but not (unfortunately) when saving a PDF. Both have kMDItemWhereFroms (“Where from”) metadata, but that’s not visible from within DT because it lacks Spotlight indexing. If you drag those documents out of DT then you can see “Where from” metadata in Finder Get Info windows.

thanks, sjk, for your help.

I don’t understand how that is possible:

  1. I see the pdf in finder, the information is there.
  2. I import it in DTPro and the information is not visible any more (nor in DTPro nor in the Finder, if I look at the file in the database package)
  3. I drag or export the same file to finder, and the information is visible again

-> so where was it stored as long as the file only existed in the DTPro database and can’t it be accessed from there?
(if it would be accessible via AppleScript, one should be able to search all pdf files for this source url information and - if existent - write it to a place where DTPro can use it…)

What exactly does it mean “DT Pro lacks spotlight indexing”?

Martin

At this point it would be helpful if someone else can step in and clearly explain this to you in German. :slight_smile:

hey, hey - I’m not that slow on uptake :wink:

After re-reading your post several times, the following explanation came to my mind (but I don’t find it convincing yet):

The meta data seem to be (and stay) in the pdf file, but as long as it “lives” in the DevonThink database package, the finder resp. spotlight don’t access it?!
(and DevonThink unfortunately has no access to it either?)

But after reading about the KMDItemWhereFroms, I tried the terminal command mdls on the indexed and the imported pdf file and the imported one does not contain all the meta information:

Imported file:
mdls symbols-a4.pdf
kMDItemFSContentChangeDate = 2010-08-13 22:06:33 +0200
kMDItemFSCreationDate = 2010-08-13 22:06:33 +0200
kMDItemFSCreatorCode = “”
kMDItemFSFinderFlags = 0
kMDItemFSHasCustomIcon = 0
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery = 0
kMDItemFSLabel = 0
kMDItemFSName = “symbols-a4.pdf”
kMDItemFSNodeCount = 0
kMDItemFSOwnerGroupID = 504
kMDItemFSOwnerUserID = 504
kMDItemFSSize = 4387686
kMDItemFSTypeCode = “”

file in Finder:
mdls symbols-a4.pdf
kMDItemAuthors = (
“Scott Pakin scott+clsl@pakin.org
)
kMDItemContentCreationDate = 2010-08-13 22:06:33 +0200
kMDItemContentModificationDate = 2010-08-13 22:06:33 +0200
kMDItemContentType = “com.adobe.pdf”
kMDItemContentTypeTree = (
“com.adobe.pdf”,
“public.data”,
“public.item”,
“public.composite-content”,
“public.content”
)
kMDItemCreator = “LaTeX with hyperref package”
kMDItemDescription = “List of 5913 symbols that can be typeset using LaTeX”
kMDItemDisplayName = “symbols-a4.pdf”
kMDItemEncodingApplications = (
“pdfTeX-1.40.9”
)
kMDItemFSContentChangeDate = 2010-08-13 22:06:33 +0200
kMDItemFSCreationDate = 2010-08-13 22:06:33 +0200
kMDItemFSCreatorCode = “”
kMDItemFSFinderFlags = 6
kMDItemFSHasCustomIcon = 0
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery = 0
kMDItemFSLabel = 3
kMDItemFSName = “symbols-a4.pdf”
kMDItemFSNodeCount = 0
kMDItemFSOwnerGroupID = 80
kMDItemFSOwnerUserID = 504
kMDItemFSSize = 4387686
kMDItemFSTypeCode = “”
kMDItemKeywords = (
“LaTeX; symbols; glyphs; characters; typesetting; macros; commands; accents; phonetics; mathematics; operators; arrows; harpoons; astronomy; dingbats; geometry”
)
kMDItemKind = “PDF”
kMDItemLastUsedDate = 2010-08-13 22:06:33 +0200
kMDItemNumberOfPages = 164
kMDItemPageHeight = 841.89
kMDItemPageWidth = 595.276
kMDItemSecurityMethod = “None”
kMDItemTitle = “The Comprehensive LaTeX Symbol List”
kMDItemUsedDates = (
“2010-08-13 00:00:00 +0200”
)
kMDItemVersion = “1.6”
kMDItemWhereFroms = (
http://tug.ctan.org/tex-archive/info/symbols/comprehensive/symbols-a4.pdf
)

my conclusion:
DevonThink Pro takes the meta information from the file during import and stores it somewhere else.
Why can’t it just leave it where it is? (maybe a stupid question from a non-programmer)

Martin

Sorry, I didn’t mean to imply that. I meant this seems tougher for both of us because of my inability to communicate in your native language. I’m slow with German… in and out. :slight_smile:

That’s a side effect of it being stored under …/Files.noindex/…, which isn’t indexed by Spotlight.

Copy that file out of the db package and the results of mdls will be identical as with the original (non-imported) file.

DT doesn’t modify the file when importing. While stored in DT a subset of metadata shows up in Information and Document Properties windows, presumably only what can be obtained directly from the file (which excludes Spotlight metadata like KMDItemWhereFroms).

Another question might be:

Which doesn’t DT make use of certain Spotlight metadata for DT metadata during importing, e.g. grab KMDItemWhereFroms (if available) for populating the document URL field?

I’m mostly trying to describe why certain metadata is/isn’t available/visible in certain contexts. I don’t fully understand how DT and Spotlight handle metadata so some explanations might be inaccurate/incomplete.

Don’t worry! :slight_smile:

so that’s the point where the DevonThink team could join in …
[size=85][is somebody listening out there?][/size]
:wink:

There are at least 4 possibilities:

  1. Add it via DEVONagent
  2. Download it via DEVONthink Pro’s download manager
  3. Add it using the “Add PDF/web document to DEVONthink” scripts while viewing the document in Safari
  4. Print it to DEVONthink

AFAIK Safari doesn’t store the URL as an extended attribute, not as a Spotlight comment and doesn’t modify the downloaded file of course. Otherwise DEVONthink would already use the information.

Thanks, Christian, for your comment.

OK, next time I’ll try the script (3).
I tried to save the pdf document to the Inbox from within Safari, which did not work as expected.

but: (why) can DTPro not read/use the kMDItemWhereFrom information where the source url was stored after “normal” download and save on disk with Safari?

Martin

At least on Snow Leopard, PDF’s downloaded/saved with Safari use com.apple.metadata:kMDItemWhereFroms.

Me, too.

Thanks for the suggestions, Christian.

Safari seems to add the extended attribute only if Spotlight is enabled, v2.0.4 will support this.

Is it Safari adding it or a Spotlight/system process? I’m guessing the latter, and that if Spotlight is disabled then the com.apple.metadata:kMDItemFinderComment extended attribute won’t be added to a file when its Spotlight Comments are updated.

Using com.apple.metadata:kMDItemWhereFroms? It can contain multiple (two?) URLs; the first would be what I’d want to populate a DT document’s URL field, if that’s what 2.0.4 will support.