Parsing DOI from Pubmed xml output

Dear all,

I am trying to parse the DOI of a Pubmed search xml result. Currently, I can get authors, date, journal etc. by using a modified Pubmed search script, but the DOI listed in

14734504 10.1161/01.CIR.0000102381.57477.50 109/2/159

I cannot parse using

set thePublicationDOI to (value of first XML element of ( first XML element of thePubmedDataElement whose name is “ArticleIdList”) whose name is ArticleId IdType=“doi”")

Is there a way to get the doi from a Pubmed xml file?

Thank you!

Stephan

I cannot parse using

Is it returning an error or … ? Pleae provide more information about the data being returned.

Multiple fields are combined there

14734504 is the Pubmed ID or PMID

The doi is:

10.1161/01.CIR.0000102381.57477.50

Hi!

Sorry, here is more information:

I am using this script to search Pubmed by pmids:

use AppleScript version "2.3"
use scripting additions
use re : script "RegEx"

property pDefaultQuery : ""
property pURLSearch : "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
property pURLFetch : "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
property pURLView : "http://www.ncbi.nlm.nih.gov/pubmed/"

on run argv
	-- Import helper library
	
	tell application "Bookends"
		set theIDs to «event ToySRUID» "Selection"
		repeat with theID in paragraphs of theIDs
			tell front library window
				try
					set BookendsLink to ("bookends://sonnysoftware.com/" & theID) as text
				end try
			end tell
		end repeat
	end tell
	
	tell application "Finder" to set pathToAdditions to ((path to application id "DNtp" as string) & "Contents:Resources:Template Script Additions.scpt") as alias
	set helperLibrary to load script pathToAdditions
	
	
	try
		
		-- We're later working in DEVONthink, we need to cache localized strings while still in our realm
		set theTemplateFile to "/Users/zellerhs/Temp/" & "%articleTitle%.rtf"
		
		tell application id "DNtp"
			
			-- Ask the user for their query, then make group for the results
			set theQuery to item 1 of argv
			--	set theKey to item 2 of argv
			--	set theQuery to display name editor my helperLibrary's localizedString("New PubMed Query") default answer pDefaultQuery info (my helperLibrary's localizedString("Please enter your query for PubMed:"))
			set theQueryEscaped to my helperLibrary's replaceText(" ", "+", theQuery)
			
			show progress indicator my helperLibrary's localizedString("Researching on PubMed") steps -1
			step progress indicator my helperLibrary's localizedString("Sending query")
			
			-- Run the search, get list of IDs
			
			
			-- Download articles
			step progress indicator my helperLibrary's localizedString("Downloading result list")
			set theIDs to theQuery
			set theIDString to theQuery
			set theFetchURL to (pURLFetch & "?db=pubmed&id=" & theIDString & "&retmode=xml") as string
			set theXML to download markup from theFetchURL encoding "UTF-8"
			if theXML is missing value or theXML is "" then
				error "Download failed."
			else if theXML contains "503 Service Temporarily Unavailable" then
				error "503 Service Temporarily Unavailable"
			end if
			
			-- Process articles
			
			tell application "System Events"
				set theXMLData to make new XML data with data theXML
				set thePubmedArticleElements to (every XML element of (first XML element of theXMLData whose name is "PubmedArticleSet") whose name is "PubmedArticle")
				
			end tell
			
			if thePubmedArticleElements is not {} then
				step progress indicator my helperLibrary's localizedString("Making a group for the results")
				
				--set theGroup to create record with {type:group, name:(my helperLibrary's localizedString("PubMed Research: ") & theQuery) as string} in current group
				set theGroup to get record with uuid "50C292EF-960B-4618-AD55-31F0B858109B"
				if theGroup is missing value then error my helperLibrary's localizedString("Could not create group.")
				
				show progress indicator my helperLibrary's localizedString("Downloading articles from PubMed") steps (count of thePubmedArticleElements) with cancel button
				
				repeat with thePubmedArticleElement in thePubmedArticleElements
					
					tell application "System Events"
						
						set theArticleWrapperElement to first XML element of thePubmedArticleElement
						set theArticleElement to (first XML element of theArticleWrapperElement whose name is "Article")
						set thePubmedDataElement to (first XML element of thePubmedArticleElement whose name is "PubmedData")
						set theArticle to {doi:"", pubmedID:"", link:"", name:"", abstract:"", publication:{}, authors:{}, publicationtypes:{}}
						
						
						
						-- Get article ID and so the URL
						try
							set the pubmedID of theArticle to (value of first XML element of theArticleWrapperElement whose name is "PMID")
							set the doi of theArticle to (value of first XML element of thePubmedArticleElements whose name is "doi")
						end try
						set the link of theArticle to (pURLView & pubmedID of theArticle) as string
						
						
						
						
						
						-- Get article name
						try
							set name of theArticle to (value of first XML element of theArticleElement whose name is "ArticleTitle")
							tell application id "DNtp" to step progress indicator (name of theArticle)
						end try
						
						-- Get article abstract
						try
							set abstract of theArticle to value of (first XML element of (first XML element of theArticleElement whose name is "Abstract") whose name is "AbstractText")
						end try
						
						
						-- Get publication information
						set thePublication to {doi:"", publication:"", vol:"", issue:"", publicationdate:"", page:""}
						try
							set theJournalIssueElement to (first XML element of (first XML element of theArticleElement whose name is "Journal") whose name is "JournalIssue")
							try
								set the vol of thePublication to value of (first XML element of theJournalIssueElement whose name is "Volume")
							end try
							try
								set the issue of thePublication to value of (first XML element of theJournalIssueElement whose name is "Issue")
							end try
							try
								set the publication of thePublication to (value of first XML element of (first XML element of theArticleElement whose name is "Journal") whose name is "Title")
							end try
							try
								set theMedlineDate to (value of first XML element of (first XML element of theJournalIssueElement whose name is "PubDate") whose name is "MedlineDate")
								
								set publicationdate of thePublication to theMedlineDate
							on error
								try
									set thePublicationMonth to (value of first XML element of (first XML element of theJournalIssueElement whose name is "PubDate") whose name is "Month")
									set publicationdate of thePublication to publicationdate of thePublication & thePublicationMonth
								end try
								try
									set thePublicationDay to (value of first XML element of (first XML element of theJournalIssueElement whose name is "PubDate") whose name is "Day")
									set publicationdate of thePublication to publicationdate of thePublication & " " & thePublicationDay
								end try
								try
									set thePublicationYear to (value of first XML element of (first XML element of theJournalIssueElement whose name is "PubDate") whose name is "Year")
									set publicationdate of thePublication to publicationdate of thePublication & ", " & thePublicationYear
								end try
								
							end try
							
						end try
						set publication of theArticle to thePublication
						
						
						try
							set thePublicationPage to (value of first XML element of (first XML element of theArticleElement whose name is "Pagination") whose name is "MedlinePgn")
							set page of thePublication to thePublicationPage
						end try
						
						
						try
							set thePublicationDOI to (value of last XML element of (first XML element of thePubmedDataElement whose name is "ArticleIdList"))
							set doi of thePublication to thePublicationDOI
						end try
						
						-- Get author names
						set theAuthorElements to {}
						try
							set theAuthorElements to every XML element of (first XML element of theArticleElement whose name is "AuthorList")
						end try
						if theAuthorElements ≠ {} then
							repeat with theAuthorElement in theAuthorElements
								try
									set theAuthor to {firstname:"", initial:"", lastname:""}
									set firstname of theAuthor to (value of first XML element of theAuthorElement whose name is "ForeName")
									set initial of theAuthor to (value of first XML element of theAuthorElement whose name is "Initials")
									set lastname of theAuthor to (value of first XML element of theAuthorElement whose name is "LastName")
									set authors of theArticle to (authors of theArticle) & {theAuthor}
								end try
							end repeat
						end if
						
						-- Get publication types
						try
							set thePublicationTypeListElement to (first XML element of theArticleElement whose name is "PublicationTypeList")
							set thePublicationTypes to (every XML element of thePublicationTypeListElement whose name is "PublicationType")
							repeat with thePublicationType in thePublicationTypes
								set publicationtypes of theArticle to (publicationtypes of theArticle) & {value of thePublicationType}
							end repeat
						end try
						
					end tell
					
					-- Prepate strings for some elements
					set theAuthorsString2 to ""
					set theAuthorsString to ""
					repeat with theAuthor in authors of theArticle
						set theAuthorString to firstname of theAuthor
						if (theAuthorString ≠ "" and initial of theAuthor ≠ "") then set theAuthorString to theAuthorString
						if (theAuthorString ≠ "" and lastname of theAuthor ≠ "") then set theAuthorString to theAuthorString & " " & lastname of theAuthor
						if theAuthorString ≠ "" then
							if theAuthorsString2 ≠ "" then set theAuthorsString2 to theAuthorsString2 & linefeed
							set theAuthorsString2 to theAuthorsString2 & theAuthorString
						end if
					end repeat
					
					repeat with theAuthor in authors of theArticle
						set theAuthorString to firstname of theAuthor
						if (theAuthorString ≠ "" and initial of theAuthor ≠ "") then set theAuthorString to theAuthorString
						if (theAuthorString ≠ "" and lastname of theAuthor ≠ "") then set theAuthorString to theAuthorString & " " & lastname of theAuthor
						if theAuthorString ≠ "" then
							if theAuthorsString ≠ "" then set theAuthorsString to theAuthorsString & ", "
							set theAuthorsString to theAuthorsString & theAuthorString
						end if
					end repeat
					
					set thePublicationString to publication of publication of theArticle
					if thePublicationString ≠ "" then
						if vol of publication of theArticle ≠ "" then
							set thePublicationString to thePublicationString & "; vol. " & vol of publication of theArticle
						else
							set thePublicationString to thePublicationString & ";"
						end if
					end if
					if thePublicationString ≠ "" and issue of publication of theArticle ≠ "" then set thePublicationString to thePublicationString & " issue " & issue of publication of theArticle
					if thePublicationString ≠ "" and publicationdate of publication of theArticle ≠ "" then set thePublicationString to thePublicationString & "; " & publicationdate of publication of theArticle
					if publicationtypes of theArticle = {} then
						set theTypeString to ""
					else
						set theTypeString to item 1 of publicationtypes of theArticle
						if (count of publicationtypes of theArticle) > 1 then
							repeat with thePublicationType in items 2 through -1 of publicationtypes of theArticle
								set theTypeString to theTypeString & ", " & thePublicationType
							end repeat
						end if
					end if
					
					-- Add article to DEVONthink
					if abstract of theArticle ≠ "" then
						set theAbstractString to abstract of theArticle
					else
						set theAbstractString to my helperLibrary's localizedString("No abstract available.") & return & return
					end if
					
					set thePlaceholders to {|%articleTitle%|:name of theArticle, |%articleAuthors%|:theAuthorsString, |%articleLocation%|:thePublicationString, |%articleType%|:theTypeString, |%articleBookends%|:{|URL|:BookendsLink, |name|:my helperLibrary's localizedString("Click here to view article in Bookends")}, |%articleAbstract%|:theAbstractString, |%articleLink%|:{|URL|:link of theArticle, |name|:my helperLibrary's localizedString("Click here to view article in PubMed")}}
					
					set theRecord to import theTemplateFile to theGroup placeholders thePlaceholders
					
					add custom meta data theAuthorsString2 for "authors" to theRecord
					add custom meta data theTypeString for "type" to theRecord
					add custom meta data page of thePublication for "page" to theRecord
					add custom meta data publication of thePublication for "journal" to theRecord
					add custom meta data vol of thePublication for "volume" to theRecord
					add custom meta data issue of thePublication for "issue" to theRecord
					add custom meta data publicationdate of thePublication for "date" to theRecord
					
					add custom meta data (pubmedID of theArticle) for "pmid" to theRecord
					--set doi of thePublication to digital object identifier of theRecord
					add custom meta data (doi of thePublication) for "mddoi" to theRecord
					
					set URL of theRecord to link of theArticle
					--					set tags of theRecord to theKey
					if cancelled progress is true then exit repeat
					
				end repeat
				
			end if
			hide progress indicator
		end tell
		
	on error errMsg number errNum
		
		tell application id "DNtp"
			hide progress indicator
			if errNum ≠ -128 then display alert my helperLibrary's localizedString("An error has occured.") message errMsg & " :: " & errNum as warning
		end tell
		
	end try 
end run

PMIDS are provided by a shell script.

    try
          set thePublicationDOI to (value of last XML element of (first XML element of thePubmedDataElement whose name is "ArticleIdList"))
          set doi of thePublication to thePublicationDOI
    end try

Does work, if the doi is given in the last XML element, which is not always the case. Searching for ArticleId IdType=“doi”does not seem to work, doi of thePublication is then empty, without an error message.

Thanks in advance!

Stephan

Is this script yours or ours (bearing in mind some scripts predate my arrival at DEVONtech)?

It is a modified and extended version of a DEVONthink script to include the Bookends part and also the addition of the custom metadata.

This is something that should work, though it’s just the DOI check. Note the XML scraping with System Events is a bit wonky and it doesn’t necessarily behave as expected. Apple’s own example (which is sparse) uses a similar method of walking elements.

Note I’ve included the XML setup from the original for clarity and so the snippet has a functional return for other testers.

property pURLFetch : "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"

tell application id "DNtp"
	set theIDString to "46444"
	set theFetchURL to (pURLFetch & "?db=pubmed&id=" & theIDString & "&retmode=xml") as string
	
	set theXML to download markup from theFetchURL encoding "UTF-8"
end tell

tell application "System Events"
	set theXMLData to make new XML data with data theXML
	set thePubmedArticleElements to (every XML element of (first XML element of theXMLData whose name is "PubmedArticleSet") whose name is "PubmedArticle")
	
	--- THIS IS THE SECTION GETTING THE DOI
	tell item 1 of thePubmedArticleElements
		repeat with thisElement in (XML elements ¬
			of (XML element "ArticleIdList") ¬
			of (XML element "PubmedData") ¬
			)
			if (value of XML attributes of thisElement) = {"doi"} then
				set thePublicationDOI to value of thisElement
			end if
		end repeat
	end tell
	---
	
end tell

It does work, thank you! BTW: If you want to export these pubmed results to bookends, the script provided by DEVONthink does not pass the authors of the publication but the author/creator of the rtf file to bookends. This can be accomplished by

deleting set theAuthors to my metaDataItems(|kMDItemAuthors| of theMD)

and adding try set theAuthors to mdAuthors of theCustomMD end try

We can also, FWIW, find the value with an XPath search:

//ArticleId['doi'=@IdType]

The trick is to include the opening incantation use framework "Foundation", and then use the nodesForXPath method of NSXMLDocument

For example:

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions

property pURLFetch : "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"

on run
    tell application id "DNtp"
        set theIDString to "46444"
        set theFetchURL to (pURLFetch & "?db=pubmed&id=" & theIDString & "&retmode=xml") as string
        
        set theXML to download markup from theFetchURL encoding "UTF-8"
    end tell
    
    matchesForXPathInXML("//ArticleId['doi'=@IdType]", theXML)
    
    --> {"10.1016/s0140-6736(75)91205-2"}
end run


-- matchesForXPathInXML :: String -> String -> [String]
on matchesForXPathInXML(strXPath, strXML)
    set {docXML, xmlError} to current application's (NSXMLDocument's alloc()'s ¬
        initWithXMLString:(strXML) options:0 |error|:(reference))
    
    if docXML is not missing value then
        set {matchFound, xpathError} to ¬
            docXML's nodesForXPath:strXPath |error|:(reference)
        
        if matchFound is not missing value then
            return (matchFound's valueForKey:"stringValue") as list
        else
            (localizedDescription of xpathError) as string
        end if
    else
        (localizedDescription of xmlError) as string
    end if
end matchesForXPathInXML

That’s a bit more than a “trick” :stuck_out_tongue: :wink:

matchesForXPathInXML is quite general and reusable, so you can just paste it in as a black box – no need to bother too much with its internal details.

XPath queries, on which TaskPaper 3’s search language is based, make a very straightforward and easy way of handling anything with an XML structure.

The Mozilla pages are good:

and so are the www.w3.org pages (Apple’s built-in NSXMLDocument XPath version is XPath 1.0)

https://www.w3.org/TR/1999/REC-xpath-19991116/#path-abbrev