Search mailing list archives (operators in follow links fiel

Okay here’s a little tip that may help some people. A lot of open source software projects use the mailman mailing list manager. They typically have archives up somewhere, but the archives are not searchable. lists.squeakfoundation.org/pipermail/seaside/ is a good example. You can use DEVONagent to search the entire archives for you. Basically you just add the page to the search set, and then set DA to follow links two levels deep. In setting this up, I learned something really cool - the DA operators (and not or near before after) are all usable inside of the follow links field. This is useful because many of the linked pages are just different views on the same content. The archive page lets you view by Thread, Subject, Author, and Date. If DA just followed every link on the page, it ends up visiting a lot more pages than it really needs to. So you can use the follow links field to zoom in to only the links you want to follow - not by specifying the text to follow, but by specifying the text not to follow. Here’s a complete example:

  1. Create a new search set using “edit search sets”
  2. Check the “follow links” box and set it to 2 levels deep
  3. Change follow links input to: not ("[ Subject ]" or “[ Date ]” or “[ Author ]”)
  4. Add the archive index page via the Sites tab

You’re good to go! Now using this search set will follow every link in the archive and present you with a list of pages that includes your search term. It skips over the alternate views to minimize wasted effort and make things go a bit faster.

Figuring this out really has me hooked on DEVONagent now. I use tons of mailing lists that have all the info I need…somewhere…but unfortunately they’re not searchable. DEVONagent makes them all searchable for me, and an extra powerful search mechanism at that.

Thanks for posting this. I’m curious to give it a try, starting with The dovecot Archives.

It takes much longer than a google search because it downloads every page. Also I’ve found that sometimes pages error with “too big”, I’m not sure if the page itself was too big or there were too many links or what.

“Too big” means that the page is larger than the max. download size specified in the preferences. DEVONagent will support up to 4096 KB.

Thanks for mentioning that. Maybe a good reason not to use it with someone else’s servers.

I think it’s okay. In the context of the whole Internet we don’t have much of an impact. Plenty of bots do the same thing. Also the pages stay cached if you don’t quit, which reduces the load on their end if you run multiple searches.

Maybe a future version of DEVONagent could selectively keep a cache for certain pages, sites, or search sets, instead of it being an all or nothing deal (there’s a setting in preferences to remove cache on quit)