DevonAgent seems to prematurely stop crawling

… in other words, I get more results with the search engine directly than with the plugin I am trying to develop (my first on DA) for that search engine.

The engine at hand is Oxford Music Online (oxfordmusiconline.com, subscription required).

The plugin is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Description</key>
	<string>Rechercher sur le Web en utilisant &lt;www.oxfordmusiconline.com&gt;.</string>
	<key>EngineUrl</key>
	<string>http://www.oxfordmusiconline.com/subscriber/search_results?q=_agentQuery_&amp;search=quick&amp;_start=_agentOffset_</string>
	<key>Identifier</key>
	<string>www.oxfordmusiconline.com</string>
	<key>Info</key>
	<string>Oxford Music Online Plugin</string>
	<key>Keyword</key>
	<array>
		<string>omo</string>
	</array>
	<key>LinksNotMatching</key>
	<array>
		<string>*www.oup.com/*</string>
		<string>*www.oxfordmusiconline.com/public/logout*</string>
		<string>*www.oxfordmusiconline.com/subscriber/about*</string>
		<string>*www.oxfordmusiconline.com/subscriber/advanced_search*</string>
		<string>*www.oxfordmusiconline.com/subscriber/book/*</string>
		<string>*www.oxfordmusiconline.com/subscriber/browse*</string>
		<string>*www.oxfordmusiconline.com/subscriber/browse?type=biography*</string>
		<string>*www.oxfordmusiconline.com/subscriber/contact-us*</string>
		<string>*www.oxfordmusiconline.com/subscriber/email_search_results*</string>
		<string>*www.oxfordmusiconline.com/subscriber/help/*</string>
		<string>*www.oxfordmusiconline.com/subscriber/page/*</string>
		<string>*www.oxfordmusiconline.com/subscriber/page/privacy_policy*</string>
		<string>*www.oxfordmusiconline.com/subscriber/page/resources*</string>
		<string>*www.oxfordmusiconline.com/subscriber/search_results?*</string>
	</array>
	<key>Name</key>
	<string>Oxford Music Online</string>
	<key>NextLinkName</key>
	<string>"Next page"</string>
	<key>OffsetPerPage</key>
	<integer>25</integer>
	<key>Operator0</key>
	<integer>32882</integer>
	<key>Operators</key>
	<integer>59</integer>
	<key>ParseLinks</key>
	<true/>
	<key>ResultsPerPage</key>
	<integer>25</integer>
	<key>Start</key>
	<integer>1</integer>
	<key>Version</key>
	<string>1.0</string>
</dict>
</plist>

I tried with and without the NextLinkName tag but there was no difference whatsoever.

Any ideas?

Thank you.

Unfortunatley a subscription is required for the website. But how do the URLs of results look like? And what kind of results are skipped? Finally, you could also use the secondary query “*” (without quotes) to ensure that all results are used (even if they’re not matching).

The first page of results is the following:


http://www.oxfordmusiconline.com/subscriber/search_results?q=bernstein&button_search.x=53&button_search.y=6&button_search=search&search=quick

The second page:


http://www.oxfordmusiconline.com/subscriber/search_results?q=bernstein&search=quick&_start=26

Once on the second page of result, clicking the “previous” button returns to the first page, but with the following URL:


http://www.oxfordmusiconline.com/subscriber/search_results?q=bernstein&search=quick&_start=1

As I understand it, DA returns 100 results whereas the webpage over 400. My impression is that DA simply does not press the “next” button more than four times :slight_smile:

Using the wildcard * as secondary search string didn’t influence the results. Maybe I am missing something?

By default plugins return up to 100 results. To get more results, you’d have to create a search set and adjust the number of results per plugin.