UI is confusing - I say crawl, but DA does not crawl?

#1

I must admit, after all those years of using DEVONagent Pro, I am STILL confused by the interface.

I managed to create a few search sets which somehow do perform as intended, although the process of getting there is not very logical to me.

Example one: Crawling a website.

I create a search set with the default search term, I tell it to follow links on the same host (that site I want to crawl), set the depth to 5 (maximum), and I go to sites and add the website I want to crawl.
And when I hit start, DEVONagent searches the title page of the site, says “nothing found” and stops. I had told you to click on all available links and crawl the deeper pages, but it did not.
What happened, and why?

Example two: Scanning a site repeatedly

As stated, I made a set to search a single page (a Twitter page of one account) and return the result whenever a new tweet matches the criteria. I was just confused that in order to achieve this, I need to enter a VERY generic search term as the first search, which is always on the page (e.g. “Twitter” or such) and only on the filter/secondary search terms, I can actually enter what I am looking for. Putting the filter terms on the first line doesn’t work for some reason, but I think it should, because, you know, this is what I want to search for.

I don’t know if I am using the program wrong, or if I am just too confused by the UI and the kinda weak tutorial videos. Am I right? Am I wrong? Please tell me… :cry:

#2

By default not all links are used, only those matching the following links term (see Search Sets window) or appearing near this term. To follow all links, you could specify the wildcard * to match all links.

The primary search term should actually be sufficient in this case. What’s the URL of the page and which primary/secondary terms did you use?

I don’t know if I am using the program wrong, or if I am just too confused by the UI and the kinda weak tutorial videos. Am I right? Am I wrong? Please tell me… :cry:
[/quote]

#3

I’m also facing the same issue that DA does not follow links, especially using the Baidu plugin in China. Any search prompts will return empty results. It looks like DA’s not managed to parse the links of the page it visits:

AND, using wildcard, it does no luck:

The 1st weird thing is, if I was using a VPN connected to a USA proxy, it works like normal.

But I won’t be happy having to connect to VPN each time I use DA.

The 2nd weird thing is that when I launch DA for the first time, the Baidu plugin works normally, but after that, it’s broken.

This problem has survived for a long time. There must be a bug with the app.

#4

And if you open the link shown in the log, does the page contain any results or e.g. a captcha?

#5

It opens a page full of usable links, but DA won’t pick them up. Check the screenshot:

Now, we can ensure this is a bug, isn’t it?

EDIT1: I can provide more informations here.

I’m using DA 3.11.1 downloaded directly from your website, the Baidu plugin is not modified, everything is clean.
macOS 10.14.3 on a 13-inch early 2015 MBP.
My IP is currently in China.
When I’m not using any proxy, this problem appears.
When I’m using a socks proxy at USA, the problem disappears and the Baidu plugin works as normal.
Whether I’m using proxy or not, the 1st link shown in the log are the same totally, and clicking it would display the same webpage, showing exactly the same content.

The Bing plugin works fine. Bing is another search engine that is not banned currently in China.

EDIT2: I have another VPN located at China. When I activated that proxy, the problem still remains, and, the link in log is the same one, clickable, and showing the same content.
So one can conclude that this bug does not relate to the status of proxy, it relates to where your IP is located. If your IP is in China, something hidden, invisible will prevent DA from working.

Better check inside what’s going wrong! I hope these infos will help you.

#6

The links are probably different while you’re in China and not using a proxy and the plugin doesn’t support these links yet. Could you please post the URL of such a result link?

#7

As I already said, the links are exactly the same, whether I use proxy or not.

Without proxy:
baidu.com/s?wd=hello&pn=0&oq … tf-8&usm=1
With proxy at USA:
baidu.com/s?wd=hello&pn=0&oq … tf-8&usm=1

#8

Then something else has to be different, it’s a static Plist plugin and doesn’t care about proxies on its own. Could you please save the HTML source of the results page and send it to cgrunenberg - at - devon-technologies.com? Thank!

#9

I’ve sent the samples, thanks!

#10

Okay, this seems to work - so I just put a * in the primary search and let DA crawl the site for the secondary term!? Any way around this, to tell DA to crawl everything? I thought the setting “follow links on same host” with maximum depth is meant to do just that?

I am attaching screenshots. The goal was to monitor a few Twitter feeds related to new game announcements, and in order to prevent multiple hits I checked the “new pages only” option.

Here are the URLs

#11

By default only links near the primary search terms are used. But you can specify your own term to follow links, e.g. * for everything.