I have created a search set with a single RSS feed set to crawl mode https://blog.testdouble.com/index.xml
I want to search the contents of the articles present on that feed.
Example: https://blog.testdouble.com/posts/2021-09-09-how-to-build-a-search-engine-with-ruby-on-rails/ contains the term “postgres” multiple times.
However if I search for postgres, that article doesn’t come up. It appears that DA only searches the RSS preview content, rather than the page content itself.
I have also tried setting “follow links” to no avail.
How can I set up a search set that will crawl all of the links in an RSS feed, and search the contents?
Would you be able to provide some more detail/info regarding what you meant by
Currently following of links is only applied to matching pages in case of feeds
Specifically:
what are the criteria for a link to be a “matching page” &
are feeds the only instance of this current restriction or does this hold true for every page searched by DEVONagent that happens to be an .xml file (for example)?
I’m curious as to how this logic effects end search results & if I should be aware that DA doesn’t go any further down when searching a url that is (or like?) xml RSS feed; so I don’t make the assumption that all related pages have been exhaustively searched if X is found in the results etc etc
Thanks so much for the help! Def gonna be looking forward to the next update
The page matches the primary (or secondary) search term (depending on the settings the title, text, keywords, description, URL and/or objects are used)
I hope you’ll consider it for sitemap.xml as well, since that’s probably the most reliable and comprehensive index of a site’s contents. Doing site search often fails because of search engine rate-limiting. If I could plug in sitemap (XML or txt) and treat it as a search engine results list, that would be great. Then DA would do the work of crawling each link and comparing it against my query.