Hey all,
Looking at page 25 of the DevonThink Pro Office Manual, it describes web archive as both saving the HTML code and all resources needed to display the page.
I was curious if there’s a way to tweak how WebArchive works, such as having it pull more information than just the base page referenced. Here’s my use case.
I’m studying Japanese Grammar, and am storing the grammar for each item I come across in a website. I always get a little worried that these sites can go away at any time, or that I may not have internet access. An example page is:
renshuu.org/grammar/287/%E3 … 6%E3%81%A1
When pulling a web archive, it seems to pull just the main page, but not the “example sentences”. Their example sentences work a little differently than other sites in the sense that when you click on the link, it does an Ajax call to the server to fetch the sentences. Clicking on “next” will pull the next set, etc.
My overall goal is to pull both the main page, and all those links. I can do this programmatically, and will fall back on that if necessary, but I’m hoping for a non-programmatic solution too. So my main question is, is there a way to configure “Web Archive” to pull information as I click on things as I click around a web archived page. That way I can pull the base, and as I navigate, it pulls that information into the cache.
If you are thinking PDF may be better, it kinda would be but isn’t perfect. The problem with PDF in this case is because the clipper will reload the page before clipping, thus going back to the grammar point. I’m using Chrome for this test.
I haven’t tried DevonAgent yet, that may handle it better, I’m unsure. From a programmatic solution, I’m tempted to write Applescript that’ll go through each element here, spawn off a Python solution using Beautiful soup, which would pull down the examples programmatically, pick them out, then throw them in a file. Then the Applescript would pick up that file, and pull it into DevonThink. I haven’t written this yet, and it doesn’t sound too hard, but hoped to ask here first before I go down that road.