Web archiving problem

fionda · March 6, 2007, 6:50pm

Hi,
I ran across a quirk using the Web Archive bookmarklet in Safari. It works most of the time, but not all. For example, after many repeated attempts, I couldn’t successfully archive this link by using the bookmarklet.

analyticjournalism.blogharbor.co … 76006.html

Nothing happens inside DTPro (Office)

Now, if I try to archive via the script menu> “Add Web Archive to DevonThink”, I get a “Download Failed” message from Safari. When I check DTPro, I see that the url was added to the root directory (which isn’t the one I’ve specified in my “import” preferences) but the URL needs an internet connection to work.

Are there certain pages that cause DTPro to snag?

Thanks
Francesco

Bill_DeVille · March 6, 2007, 7:52pm

I had no trouble sending the WebArchive of your link to DT Pro using the contextual menu option in DEVONagent. But I did not succeed in capturing the WebArchive of that page either using the bookmarklet or (in Safari) the script to capture as WebArchive.

But why would you want to capture the article “Science and simulation for the greater good” as a WebArchive, anyway? That Web page is beautifully set up to allow a quick selection of the entire content of the article, with subsequent capture as a rich text note.

Captured as RTF: 14 KB, retaining all hyperlinks and (if there were any) images. Fully viewable whether online or offline.

Captured as WebArchive: 364 KB, with lots of extraneous, unrelated material that might show up in searches or See Also suggestions.

That’s why I do 99%+ of my Web captures as rich notes. Saves tons of memory and improves the focus of searches and AI features. I’ve got all of the “real” information content without lots of distracting excess baggage. The source URL is documented if I’ve used the Services option to capture as rich text in Safari, or using a contextual menu option in the built-in DT Pro browser.

fionda · March 6, 2007, 8:16pm

Bill,
Thanks for the quick response. I used command-) to capture the selection as a rich text note and it worked fine, as you said; even kept the source URL.

I guess I’m still puzzled as to why the bookmarklet or the script didn’t work (for the 1% of the time that you…or I now… want to use the web archive feature)

Francesco

cgrunenberg · March 7, 2007, 7:16am

Due to a rare WebKit bug, the scripts/bookmarklets to create web archives aren’t always working. V1.3.1 will fix this.

sgmiller · March 7, 2007, 4:06pm

"But why would you want to capture the article “Science and simulation for the greater good” as a WebArchive, anyway? That Web page is beautifully set up to allow a quick selection of the entire content of the article, with subsequent capture as a rich text note. "

Respectfully, there is one very good reason why I for one want to capture pages as web archives insteadof notes. When I am first doing my research, I am going relatively quickly through a ton of material. At the time I make a decision to capture a page, I am not entirely sure of what is or is not relevant on that particular page. Sometimes, when I go back later, I see things that only became apparent to me after doing further work (if this is not clear I can explain further).

So, what I suggest for people in my position is to go ahead and web archive the pages that seem relevant and then later, during the organizing process, create notes out of pages with alot of extraneous material baecause clearly the autogroup/classify process works far better with irrelevant material trimmed away. I would class these two different steps, something like Collection and Pre-Processing. Perhaps some people can get away with doing this in one step but for me, as I said, I prefer to do my Collecting first and put of the Pre-Processing of the pages until I am more sure of what I am looking at.

I have said before on this forum that I also keep an archive of pages outside of DT because I have learned from hard experience that nothing is forever and I don’t want to be stuck in one program, especially not with about 8gb of archived web pages and other documents. Maybe this will change when DT stores data in “normal” data structures.