User Agent ID of DTPro

I’m trying to use DTPro to capture some web pages, but on one site (i.e., Netlibrary.com, which I get through my university) the page won’t view in DTPro because it is not a “supported browser.” That made me wonder how DTPro (using WebKit) is identifying itself. It seems like it should be identifying as “Safari” or however Safari IDs itself. That way if it displays in Safari, it will display in DTPro.

Doug

DT Pro’s WebKit is a subset of Safari. DEVONagent is, also, but with more browser capability than DT Pro’s.

For some sites, you may need to use Safari, as it can handle Java objects, etc. that DT Pro’s browser cannot. (If you have DEVONagent, try it on your university’s site.)

That won’t help Bill. Lemme explain…

In DEVONagent the User-Agent is:
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit (KHTML, like Gecko) DEVONtech

Safari’s default User-Agent is:
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412 (KHTML, like Gecko) Safari/412

In Safari there is a Debug menu that can be activated through defaults providing other User-Agent capabilties – they are:
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.1) Gecko/20020826
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0
Mozilla/5.0 (Macintosh; U; PPC; en-US; rv:0.9.4.1) Gecko/20020318 Netscape6/6.2.2
Mozilla/4.79 (Macintosh; U; PPC)
Mozilla/4.0 (compatible; MSIE 5.22; Mac_PowerPC)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2)
Mozilla/5.0 (compatible; Konqueror/3)

Some web server’s discriminate browsers based on how their website was designed – namely, because if they are a site that only supports Windows customers, they do not want tech support calls originating from users running software they don’t support. Therefore the way around these “filters” is to spoof your browser’s identity and often since Safari et al use W3C standards, the sites will work regardless – in other cases YMMV.

Both DT[Pro] and DA need settings to spoof the User Agent to mine some websites. Maybe some default options like the debug menu in Safari – but preferably a text field to be able to define your own user agent.

1 Like

Good suggestion. I’ll relay your idea to Christian when he gets back from a well-deserved vacation.

But unless the DT Pro browser gets more beef on things like Java, spoofing won’t help if the target site requires features it doesn’t have.

The DA browser is currently more capable, and spoofing might get it accepted.

Note that the university site in question is happy with Safari. They might be persuaded to add DT Pro’s browser if it will work otherwise on their site.

This is an old, old post, but it is worth revisiting six years later…because netlibrary and a few other sites are blocking the useragent string of Devonagent again. Now that the Javascript functionality is up to par and capable of responding to fancy websites, it would be nice to be able to change the useragent string so that websites can’t simply block Devonagent out of the easy bigotry that comes from seeing that it does “deep scans.” It’s far more appropriate, IMHO, to block abusers by IP address, but trying to reason with netlibrary about this is tiring. Is there a way to change the useragent for devonthink/devonagent, perhaps by a “default write” command in the terminal?

I know it’s a specialized thing, but such is the lot of the “information worker.” There’s a lot of sillyness in the wild, wild west of the internet, and some with some sheriffs is better just to not wear black in their town, as they shoot on site. Any way to borrow a white hat?

cheers,

Eric O

Okay, I dug a bit further on this, and now I’m not sure that it is the user agent. Webkit nightlies do work fine for signing in to my university system. They have the user string:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_7) AppleWebKit/534.36+ (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1

Devonagent sends out a different one (notice the U; and the en;) :

Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/533.21.1 (KHTML, like Gecko) Version 4.0.5 Safari/533.21.1

Obviously, one coudl detect this. And I suspect they are…because whenever I go to my signin page with devonagent it redirects me to a resource forbidden page. Works fine with safari. Or is this some sort of quirk in the way redirects are handled?

-erico

You could try to delete ~/Library/Caches/DEVONagent/_crawl-patterns (while DEVONagent isn’t running), does this fix the problem?

Yep, that fixed it. Thank you!

I have no idea why that works though. It’s not a cookie or a useragent, but it was being detected somehow…

Does that file contain a list of domains that should/should not be respected for robots.txt?

best,

Erico

No, it contains patterns of HTTP redirects/forwards. If possible, DEVONagent tries to skip stuff like click handlers etc.

The latest version of DT Pro Office (2.5.2) sends an incorrect user agent string, which causes some sites to refuse to render themselves, and the trick mentioned above does not help. Here are steps to reproduce the issue:

  1. Add a new bookmark in DTPO with a URL of trello.com and note that the site complains that the browser is unsupported. Click the “Your browser is unsupported” link and note that the site requires Safari 5.0.5 and above.

  2. Add a new bookmark in DTPO with a URL of whatsmyuseragent.com

    • It shows that the User Agent String that DTPO is sending is:
      Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/536.28.10 (KHTML, like Gecko) Version 4.0.5 Safari/536.28.10
  3. Open Safari and navigate to whatsmyuseragent.com

    • It shows that the User Agent String that Safari is sending is:
      Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko) Version/6.0.3 Safari/536.28.10

I’ve searched extensively to see whether there’s a bug in WebKit that might be causing this issue but thus far haven’t found anything.

Thanks,
Brian

Interesting. Earlier today I read about some other web client misidentifying itself with “Version 4.0.5” in its User Agent string, but can’t remember/find that reference now (not that it matters).

The UserAgent string that DTPO sends is hardcoded into the executable, /Applications/DEVONthink Pro Office/DEVONthink Pro.app/Contents/MacOS/DEVONthink Pro in several places, no doubt due to static linking of some frameworks when the application is built, but the version string that matters is at offset 0x01a1eeb. Patching that location to be “6.0.3” instead of “4.0.5” caused the UserAgent string sent by DTPO to change accordingly, but www.trello.com still says the browser is too old. The number of bytes for the UserAgent string just happened to be exactly the same length as what is sent by Safari, so I changed them to be identical, and trello.com still complains, even though whatsmyuseragent.com shows them as being identical. Very strange.

What’s even stranger is that hitting whatsmyuseragent.com with the latest version of Chrome lists AppleWebKit/537.31, which is a more recent version than Safari uses.

Brian