Everything seems to work pretty well, although I can’t seem to work out how to get it to go beyond the first results page. Based on little changes here and there I get between 50 and 90 results, but the search only ever hits aroun 103 pages. Looking at GBS, they only seem to provide up to the first 1000 hits, which would be fine (you can use date ranges to get them all in the end).
Is there something pretty simple I’m missing here? I read a little about LinksStart and LinksEnd but didn’t really understand. I used the Google plugin .plist as a template for this so I was hoping that most of the mechanics would just work out. Thanks for any help you can provide.
Not sure if this is a solution or just a quick fix, but it looks like if I increase the number of links being followed (previously 0) that I do get more of the results. Not sure if this is due to linkages from the pages identified in the first 100 search results or not. I was hoping to avoid that as, of course, blogs link to other pages and I just want to end up with all the blog entries from the site in question (e.g., Engadget).
I can filter out a lot of them in post processing (does the site url contain engadget.com) but if there is a simple fix to the question above, I would certainly appreciate it!
In the online help there is a chapter on writing your own plugin. There it also explains how to go beyond the first page of search results. Search for the keyword “EngineUrl” in the online help for DEVONagent. Please read that and then the following paragraph should make more sense.
The trick is to compare the URL of the first results page with the URL of the (say 3rd) results page. If these are completely different you can use the “EngineNextUrl” keyword, otherwise it normally only takes 1 parameter that you can catch with the “agentOffset” variable.
PS: One potential reason why DT might choose not to include such a plugin is due to Google keeping a close check on hits to their servers. After using the plugin for a while I start to get ‘hey, you seem to be doing something automated, you probably shouldn’t do that’ messages. I think at one time they used to have something in their terms of service about using automated means to capture search results. So, if anyone decides to use this, they might want to use it sparingly.
If you use automated means to capture search results, then you could at least theoretically set up your own search service, piggybacking on Google’s huge infrastructure investments while siphoning off the resulting ad revenue. So yes, I can see how they might become annoyed.
OTOH, they do offer a variety of tools for people who want to build custom searches, either for personal use or for web sites. So they’re not completely opposed, you just need to play by the rules.
PS Full disclosure: My husband works for Google. But he’s asleep right now, and even if I asked he would send me to the public information pages.