I’m attempting to do a site specific search of the domain: fujiweekly.com by using the following searches, set to Deep, Deeper, and Deepest Web in DEVONagent Pro v3.11.6:
site:https://fujixweekly.com (“X-Trans V” OR “X-Trans IV” OR X-T5 OR X100V) AND (“Film Simulation*” NEAR/5 Astia).
site:https://fujixweekly.com/ (“X-Trans V” OR “X-Trans IV” OR X-T5 OR X100V) AND (“Film Simulation*” NEAR/5 Astia)
site:https://fujixweekly.com/* (“X-Trans V” OR “X-Trans IV” OR X-T5 OR X100V) AND (“Film Simulation*” NEAR/5 Astia)
I’m getting very limited results, in this instance 2 hits, though I know for a fact that there are more.
This page, and many like it contains the terms that I’m searching for, yet it does not appear in my results.
It appears that DA isn’t able to traverse the whole site, though I have enabled “Ignore instructions for robots” in preferences.
I’d welcome suggestions and feedback.
That’s not how these search sets work, see e.g. their description:
Following of links is neither unlimited (maximum level is 5) nor do these sets use all links by default. Instead only “promising” links are followed to reduce the traffic. For more details see Help > Documentation > Search Sets
Please excuse my ignorance. I’ve created my first Search Set, and when I start it, I’m getting unexpected results. Though I’ve specified the exact website that I want crawled, it’s not crawling that site. Also, though I chose Crawl (and not Search) under Sites, the Search doesn’t run unless I choose a plugin. This confuses me, as I thought that Crawling didn’t require a plugin. Perhaps I’ve completely misunderstood something.
Chosen Search Set:
Search Set’s URL for crawling:
Results from crawl:
The URLs in the Sites tab are just starting points for the search but the search is not restricted to the same site. Which and how many links will be followed is defined by the settings in the General tab.
Chris, sorry but I’m incredibly confused.
Here are my settings for the General tab.
And here are my settings for the Sites tab, which I’ve set to Crawl
If I just run this search, as is, with no plugin checked, then nothing at all happens. No searching/crawling is taking place.
However, if I add a plugin, then searching occurs. But, I didn’t think that I needed to add a plugin to be able to crawl a site.
If I want to limit the crawl to only one site, the one listed on the sites tab, how do I go about doing that?
Your URL in the screen capture is incorrect. You entered
Oh my! How did I miss that “x”!? Time for some new glasses!
The crawl now appears to be working as I expected. Thanks!
However, any idea why I’d be getting these hits, if I’m only specified fujixweekly.com?
Export, ZIP, and post your search set. Thanks!
Also, is there any way to exclude certain content? For instance, I’m now getting “hits” for links that are at the bottom of the page, that aren’t germane to the content in the page.
I just came across this discussion today, so if you’re still working on this…
The “exclude” feature in the “advanced” tab of a search set’s configuration looks helpful. Quoting the documentation,
The Exclude Domains and Exclude Links fields allow you to explicitly exclude domains and links from being used or followed when you use this search set. Add the domains or links that you want to be excluded into the text fields. You can use asterisk (*) wildcards to include e.g. all subdomains or partial links.
As soon as you can find a string that sets those links apart from “real” ones, you should be good to go.