XML PubMed Plugin

While waiting for an updated PubMed plugin that is planned for DA 2.2 (in Cocoa), I decided to try generating an XML based plugin.

I got it working but it leads to a few questions.

The special operators NEAR, BEFORE, and AFTER: these are available even for search engines that don’t offer them, correct? That was my presumption when first reading the Docs, and that DevonAgent applies these operators in a post-processing step. For this to work, I guessed DevonAgent must substitute these operators with AND in the original query prior to sending it to the search engine.

Since the entrez search engine for PubMed does not support these special operators, I mapped them to AND in the OperatorsDictionary key of my plugin. In the log I can see that instances of NEAR or BEFORE got replaced by DevonAgent. In the query results, it does appear that all the hits satisfy a query with restrictive connector like “NEAR/1”. So, were my presumptions correct or am I just fooling myself with the limited number of test queries I have run?

One thing I don’t like about this implementation is that the maximum number of hits from the primary query is hardwired. I am using the “dispmax” parameter for the Entrez search engine. Since more stringent queries relying on the special DevonAgent operators might filter out many of the primary hits, I have set the dispmax parameter to 200. There’s no way around that, correct?

I tried to include a field restriction in my query, which for Entrez is done with square bracket-delimited codes, but DevonAgent stripped out the codes before sending the query. Based on the docs I wasn’t expecting that, but maybe this feature is turned off for a custom plugin? Anyway, it would be useful if there was a key for the plugin API that allowed the user to specify what delimited blocks should be passed through untouched by DevonAgent, eg “[*]”.

Lastly, for the cocoa based PubMed plugin, i wondered if the EUtilities API is being used.

Thanks – mpm

It’s not necessary to use the OperatorsDictionary in this case, just define the supported operators via the Operators bitmap and DEVONagent converts the operators on its own before sending the query to the search engine.

You could retrieve multiple pages by defining OffsetPerPage, Start & ResultsPerPage values and by adding at least the agentOffset parameter to the URL (EngineURL).

Ok, that worked once I also corrected the Operators value.

The trouble I am having now is that the initial query is generating a different number of hits each time I run it. For example, with just a single search term and the Entrez parameter for maximum hits per page set to 200, the query returned 125, then 114, then 81, then 96 hits (links). No particular pattern, but never the 200 I should get. In the log view, if I reload the URL I get all 200 hits in the resulting page. DevonAgent appears to be truncating the page at some point during its search process.

You might post the plugin here and I’ll check this but DA 2.2 is not far away :wink:

I tried DevonAgent v2.2.

When using the Devon supplied plugin for PubMed, it fails to work with the NEAR operator. Nothing in the log either. On the other hand, both BEFORE and AFTER produce results (but even for them nothing shows up in the log). Actually, while a query with BEFORE/1 results in fewer hits, some of the hits have many more than one word separating the operands of the query.

you can try:

(MARK2 OR LKB1) AND AMPK

(MARK2 OR LKB1) NEAR AMPK

(MARK2 OR LKB1) BEFORE/1 AMPK

There were several requests some time ago to return all PubMed/Nucleotide results as DEVONagent scans (more or less) only the abstract of the articles. Therefore proximity operators are not supported. However, this will be changed in v2.2.1 as it’s definitely confusing. Thanks for reporting this!

I noticed that the USPTO plugin in DA version 2.2 does not support double quoted phrases. This is really critical for limiting the number of hits with certain queries.

Also, would it be possible to generate a plugin for the USPTO that is based on the Advanced Search web page, rather than Quick Search? My particular interest is staying on top of the latest patent applications, not the issued patents. the Advanced Search option allows more flexible queries.

I have tried to build my own plugin for the advanced uspto search, but keep encountering difficulties.

see: appft1.uspto.gov/netahtml/PTO/search-adv.html

With the advanced search form, you can enter a boolean query using parenthesized expressions and double quoted phrases. It is not necessary to use the field restriction options the advanced form provides. Although, it would be great if the DevonAgent plugin API had a general mechanism to support field restriction codes. I suppose this may not be easy for all search engines.

The ability to restrict the advanced search to a date range would help tremendously. What would help in that regard is if DevonAgent had a general mechanism to substitute the current date into the query at runtime. Since different search engines will use a specific date format, you could use the format string like the unix ‘date’ command. The end user could format the date in nearly any desired way.

Other useful fields for restriction with patent searches are the Title, Abstract, and Claims fields.

I just thought – another way to enable more flexible query generation at runtime would be to have a key that names an applescript. The user could then write whatever script is necessary to generate the correct query. If possible, it would be useful to enable passing in the query text.

Ok, I’m starting to ramble…

The next release will support this. In the meantime, navigate to Contents > PlugIns > Patents in DEVONagent’s application package and open “US Patent Office.plist” either in Apple’s Property List Editor or TextEdit (requires UTF-8 encoding for loading and saving). Replace the Operators value 34859 with 34875, save the plugin and launch DEVONagent to check the results.

Due to a shortcoming of DEVONagent 2.2, you have to add this definition to the plugin too:

<key>ParseLinks</key>
<true/>

In PList Editor, should that be Boolean “Yes”, or a String “True”?

Thanks,

(I had mistakenly thought the USPTO plugin was another one done in Cocoa. It’s nice to see your xml).

– mpm

It’s a boolean.