I’m still very much in the stage of learning how to get the best out of DA.
For instance, I ran a search for the following string using the built-in Web (Fast) search set (except I’ve tweaked it in the set to only search in English), and had unticked the option to omit similar pages in the Settings tab:
"New Content Creation Tools Offer Benefits But Have The Potential To Disrupt"
(it’s the title of a short piece I wrote and was wanting to check if it had been reposted anywhere, hence not wanting to filter out similar pages)
So, no links being followed in this set, I’d expect it take just a few seconds. It actually takes ages, and ends up evaluating a huge number of files (~36,000) though only returns 2 hits (compared to the 32 hits that Google gives on its own)
Can anyone explain why this search takes so long to run in DA, and why you get fewer hits than just using Google directly?
This is an intriguing question and I have also wondered why the search takes a long time. I can understand it in some of the sets as they are very comprehensive, but I have found that a simple Google search in DEVONagent ia slower than doing the same search in Safari. Searching I must say, tough, has always for me fallen into the catagorie that I term, “a dark art” and I have often wished for a book on DEVONagent such as Joe Kissell’s excellent one on DEVONthink.
By default DEVONagent downloads all (!) results and evaluates them on its own (e.g. to apply advanced operators/wildcards, filter similar results etc.). This requires of course much more time than just retrieving one results page.
But this can be skipped by enabling the “Express” mode. However, advanced features like digest, summaries, filters and support for operators/wildcards are limited in this mode, support for scanners isn’t available at all.
Hmm, in that case I would expect my download activity to be high and/or my CPU activity to be high. However, I just went through 4hr query that crawled to a very slow pace in the end with virtually showing no bandwidth nor cpu activity.
Umm… If DA downloaded and scraped 36,000 pages in 4 hours and didn’t show a huge spike in bandwidth or CPU, I’d say that is pretty optimized. If your machine had come to a standstill or your couldn’t continue normal operations, then I’d say it’s not optimized.
(grin It seems ‘optimized’ is multi-interpretable. Yes, no stand still nor hanging in 4hrs. But the majority of the 4hrs is spent while the progress bar sits somewhere north of 90% with a CPU and bandwidth not showing any significant activity. So no idea what is going on while waiting. Of course the progress bar is likely to be non-linear but still, why that wait with seemingly no activity going on?