For example, let’s say I want to create a search setting that searches the Life Extension Foundation website. Their url is lef.org.
If all I do is add lef.org, nothing is happening, I get zero results returned. So obviously I need to do more than that. The user doc has a sentence about adding a string to utilize the website’s built in search functionality but the doc is, well, less than helpful.
hey Bill…did all that, except for entering a default search string (never quite sure of the value in that since i’d never run a search without entering a search term). The query runs and I get nothing.
for example, i created a new search set called “medical sites”. set it to follow max links. added the site "http://www.lef.org. run the query on the term ‘osteoarthritis’.
I took a quick look at the site and I believe you may need to create an XML plug-in. Rather than linking articles off of the main URL (lef.org), the site has its own internal search engine. That’s why the “Follow Links” option on your search set is not giving you the hits that you expect.
that’s what i figured it was, but the user doc seems to indicate there’s a way to construct the url in the sites pane such that it will leverage the site’s search functionality. i just can’t figure out the user doc instructions on this.
also, how do I determine which sites need this and which don’t? is the mere presence of a search field sufficient to make that determination?
Just checked the URL at 3:41 PM Central Time – the site is down for maintenance still, so no search strategy is going to work!
My test set used ‘life’ for both default and follow links term. Earlier today, the site was up and I got 62 hits. Even though the site has it’s own search funtion, DA was able to act like a human viewer of the site would, and to find pages that contained the search string.
i’m confused. i realized one thing i wasn’t doing was populating the follow links field with a default search term. one problem with the screen layout is it doesn’t make it clear just what the heck that field is. that should be remedied in a future version. it just says follow links with a blank field underneath.
now when i put ‘life’ into both fields, then i get some hits but it’s all basically product returns. it doesn’t access the meat of their research.
the thing I’m finding REALLY confusing is if I change the default search terms from ‘life’ to ‘test’, i get only one link returned. so that begs the question, how does DA use the default search term? i figured it was a placeholder that would be substituted with the entered search term, but that’s clearly not the case. this doesn’t make any sense to me.
Now, with ‘life’ as the default search terms, if I go to their search field and enter “fibromyalgia”, i get a ton of hits. At the bottom of the page I can narrow the query to just return results from their medical abstracts. The URL of this native search is…
is there a way to use DA to leverage a site’s internal search functionality?
For this, you need plugins that “tell” Devonagent how to access the sites search function.
I am trying to find out how this precisely works right now, but am having some difficulties as well … I actually posted a question on this in another board. Will keep you posted.
I realize this is now an old discussion, but I was going through exactly the same process and just discovered the solution. I too wanted to get DA to crawl a particular site and grab every page from it. I discovered that you need to enter an asterisk [i.e., shift- 8] in BOTH the default query and the box under “Follow Links”. Once I entered that second asterisk and hit Go, DA took off and happily dl’d the entire website for me. I’m happy. Hope this helps the next person trying to get DA to crawl sites.
Mind you, it’s now up to 608 pages, so I will clearly need to introduce some filtering and restrictions!