Deep scan : has been scanning since early morning

Hi !
I’m using a Mac Mini PPC G4, 512 Mb and a 4 Mb Internet connection.
This morning I started a deep scan on “wotan” keyword.
Some 8 hours later, it came across 83923 pages and 7755 documents and search was still on progress.
I had to manually stop it, as the computer was getting painfully slow and I was unable to do anything else.
Also, search was very slow.
Is it a normal behaviour ? ( Devon agent is configured to its defaults).
Best regards

I’ve had to quit a normal “research” scans as it seems to take forever finish (especially when the search is broad). Understand that, IIRC, DA actually scans each page, not just the search engine’s index.

Performing a targeted deep search using other boolean operators such as NEAR with other relevant terms might be more effective.

You can also turn off “follow links” in the settings drawer as well as set additional preferences.

For example, I performed a deep scan using wotan NEAR mythology with follow links turned off and it took about 30 seconds on a MacBook Pro using wireless to locate 485 pages.

Also, from the DA help file:

[i]Case

All terms are case-insensitive. You may, if you wish, use capitalization for proper names in a query, but DEVONagent will ignore case in interpreting the query.

Precedence of Terms

Search terms and associated operators will be interpreted from left to right, except as modified by including portions of the query within parentheses.

Boolean Operators

The operators (often called Boolean operators) are words or symbols that establish logical rules for the terms in the search query. These are:

term1 AND term2: Contains term1 AND term2
term1 BUT term2: Contains term1 AND term2
term1 OR term2: Contains term1 OR term2
term1 XOR term2: Contains term1 or term2, but not both
term1 EOR term2: Contains term1 or term2, but not both
NOT term: Does not contain term
“term1”: Contains the string term1, in exactly this form
Beside the classic Boolean operators, DEVONagent features a number of operators that can usually only be found on high-end databases. Use them as a replacement for AND and “quotes” to fine-tune your query.

term1 NEAR term2: term1 occurs 10 words or less before or after term2
term1 NEAR/n term2: term1 occurs n or less words before or after term2
term1 BEFORE term2: term1 occurs before term2
term1 BEFORE/n term2: term1 occurs n or less words before term2
term1 AFTER term2: term1 occurs after term2
term1 AFTER/n term2: term1 occurs n or less words after term2
~term1: Contains all words that begin or end with term1 (words containing term1 as a part of the word; depends on the queried search engine)
Note: See chapter “Designing a Search Query” for examples on how to use all these operators effectively.

For convenience, some of these operators can also be abbreviated using commonly used symbols:

AND: &, &&, +
OR: |, ||
XOR: ^, ^^
NOT: !, -
Note: The symbols above are also used by the Finder and Tiger’s Spotlight for searches. Enter the vertical ruler character for the OR operator by pressing Shift-7.

Special Rules

To search for a word that is also the name of an operator, put the word inside quotation marks. The following example searches all four terms including the word “near”:

Example: Beach “near” Los Angeles

DEVONagent ignores parts of query terms inside square […] brackets. This is useful for scanning to titles or authors inside some databases, e.g., PubMed or Nucleotide.

Example: name[Author]word[Title]

Restrictions

Queries are restricted to ten search words due to the limitations of most Internet search engines. Only secondary queries are not restricted.[/i]

Yes, this is normal behavior. A “deep scan” for a simple word could potentially walk through a substantial portion of the Internet itself.

I’ve let scans run that downloaded 10 Gb worth of web pages.

I find that the number of “hits” you get from a deep scan seems to grow logarithmically – that is, the longer you let it run, the fewer and fewer hits you’l get back (probably because it’s finding more and more duplicates). So, set yourself a target number of pages to look through, and then stop it after that. Otherwise, I can envision cases where it wouldn’t stop for days.

John

Christian and I once joked about how many levels of a deep search might trigger a search of the entire Internet; the answer is only a few levels deeper than currently allowed – and it’s probably a very bad idea to try to do that with the current state of the art of computer power and connections to the Internet. :slight_smile:

As jwiegley noted, DEVONagent is probably filtering (depending on your filter settings) previously archived pages, similar pages and junk pages as it goes through the search. Then it’s actually downloading the search results to your computer, and preparing digest summaries of each.

That set of 7755 search results perhaps already contains more information about Wotan than you may want or need to know, unless you are trying to find every page on which the term was used, for some reason. In the later case, remember that the default setting for each plugin is, as I recall, 100 results per plugin – so that you might want to archive your set of search results, turn on the filter to not capture already archived pages, turn off the filter to reject similar or junk pages, and repeat the search, perhaps several times. Given the relatively limited RAM on your Mac, be prepared to keep doing this for, perhaps, several days. Even then you may never get a complete capture of all the pages on the Internet that mention Wotan. And new pages are probably going to be added while (or right after) you are searching.

Google tells me that there are about 1,560,000 pages that contain the term “Wotan”. It took only 0.23 seconds to return that answer. But then I would have to start looking at each of those pages, choose those that look useful, and capture them to my computer should I want to use them.

Those 7755 documents DEVONagent have found are already filtered by DEVONagent, downloaded to your disk and have summaries. The Digest page provides a number of terms and visual cues that can speed up your analysis of content of pages that may be especially useful, and you can transfer them (and/or all of the summaries) to a DT database for further searching and analysis.

Yes, DEVONagent’s search for “Wotan” takes a lot more time than 0.23 seconds. But it’s doing a lot of ‘front-end’ work that you yourself would have to do if you looked at the Google results.

That’s why I like DEVONagent.

Thank you so much for the quick and clear answer. Beeing a new user of DA + DT, I did not know so much about how DA is working and it’s behaviour.
Following your advices, I refined my deep search using “wotan AND god”, this one ended with 685 relevant documents.Nice !
My system was quite slowed down and no so responsive, but well, my Mac Mini PPC G4 and it’s 512 mb of RAM are maybe a bit “light” to handle smoothfully such amounts of informations.
Best regards.
,