[HELP] Trying to build a Scirus Plugin

jcf · November 23, 2004, 11:59am

Hello, I’m not sure being in the right place to ask this, but if anyone knows…
I’m trying to build a plugin to browse scirus engine because it provides relevant information for my academic searches.
scirus.com
For the moment I wrote this:

DA performs the query well, but what I don’t understand, is that DA analyses the result pages only, not the linked one. And of course, my aim is DA to analyse the abstract webpage of each paper in order to have the relevant ones.
If anyone of yours has an idea…
Thank you very much in advance,
Jean-Christophe
PS: may be it would be a good idea to provide a space on your website where users could share their plugins (may be I didnt found it? ).

subscriber3 · November 23, 2004, 3:30pm

I am testing your plugin, and I need more information.

could you give a sample query we could talk about?

what I am seeing is that the digest contains only pages of scirus search results.

if I open a page and examine the link to an article, it contains a redirect to a website such as sciencedirect.com.

the third tab in a DEVONagent search is “Log”. this section lists each page examined, and places the page in a category. select one of the pages, and you can see it in the small window below.

I have examined the “Log” for several of the pages at sciencedirect.com that were referenced in the links. each was in the category “No match” and, looking at the page, this was correct.

in order to get past these pages to the actual articles your “Settings” (in the toolbar at far right) have to instruct your DEVONagent search to follow links another level deeper.

in “DEVONagent > Tools > Edit Search Sets…”, the “Settings” tab has a “Follow Links” section. to follow all links found, use a * wildcard in the text box.

in addition, you have to have the proper subscription to the website and the corresponding cookies set. test the links in a DEVONagent browser and be sure you can reach the articles that way.

are you seeing the same things when you run your plugin?

jcf · November 23, 2004, 10:58pm

Hello Douglas! Thank you for your answer!

Yes, for example: fuzzy logic

That’s the problem I’have!

Yes, in fact scirus is a webcrawler performed by sciencedirect, so, when it performs journals searches, its on sciencedirect content. I have also tried to do a plugin to browse sciencedirect, but their searching url seems to be coded because you can’t find your query anywhere in the url.
Scirus performs also academic webpages search (&ds=web), but I asked it in my plugin to browse only journals (&ds=jnl) at this time.

Thats right, I can see the same “No match” pages in the log. But in the scirus search results pages displayed in the digest, the links are valid and I don’t understand why none of them appears in the digest.
I would like DA to process the abstracts backside these links instead of processing scirus result pages.

I did it, but even with the highest level, DA still gives me pages of scirus search results in the digest.

I don’t understand where I am supposed to use a wildcard? I do not have the menu you’re talking about, I have a follow link option in the ‘settings’ tab, but there is no place to enter text. Do you mean that I have to use ‘*’ in the query itself? (I’ve tried but it changes nothing).

Of course, but acces to abstracts is free at sciencedirect, you need a subscription only to download the articles.

I’ve noticed no difference between browsing scirus with DA or with Safari. I’ve noticed just one thing, its that the links often doesnt work the first time you click it. You have a warning telling you that the page doesn’t exist. And the second time it works. But it seems to be a random problem. And it also happens on Safari.
But while telling that, I’m wondering if it isn’t be the problem…

Other ideas?
Thank you in advance for your help,
Jean-Christophe

subscriber3 · November 24, 2004, 1:34am

I would like to apologize for getting my versions mixed up. the testers are showing up in the forums, and they are just as eager to share what they know as the rest of you are to see the new public release. there have been many alpha releases, and there have been interface changes, bug fixes, and many, many features added. it is worth the wait.

I have simply downgraded temporarily to the current public release in order to see what is going on. I think there may be problems.

I have a couple of general observations:

what you are trying to do is to search pages from a variety of sources. when you use the keys “LinksStart” and “LinksEnd” they should be found in all of the pages to be searched. they are for efficiency, and in your case you don’t have many choices because of the variety of pages. you may want to omit these keys.

you are using the key “LinksNotMatching” with two Google specific terms you copied from an example. delete this key, and add a key “LinksMatching” as a * wildcard.

now, the bad news. I may be wrong about this.

if you examine the links, each has a redirect to the source.

after running a number of tests, if I examine the “Log”, there is only one page for each site, and it is cut off. for example:

http://www.scirus.com/srsapp/sciruslink?src=mdl&url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3Dpubmed%26dopt%3DAbstract%26list_uids%3D14656488

has become:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve

note that if I load the first URL in another browser (I use FireFox) it will change to the second. but, if I load just the URL portion:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=14656488

I get the page you expect.

what I think may be happening is that Scirus may be deliberately blocking access to the site. in the terms of service at:

http://www.scirus.com/srsapp/termsofservice/

they state in part:

the usual way of detecting robots is to see how rapidly the http requests are being sent from an internet address. I am probably blocked for 24 hours.

unless someone else has another interpretation, you may have to write separate plugins for each of the databases and combine them into a single search set, avoiding Scirus.

mse · November 30, 2004, 5:02pm

I had the same conflict with the American Chemical Society publication service. Then I tried a search from within the DA using a crawler they locked my (ligitimate) access by IP with the similar notification message