question re bringing in pages from DA

rmathes23 · March 7, 2005, 1:01pm

if i have a web page open in the DevonAgent browser, if i click on the drop down arrow and select DT/Add HTML page, does that give me the same result as dragging the url to DT and then hitting the ‘capture page’ button? I want to make sure that if the host changes the content or breaks the link that i still have it.

Bill_DeVille · March 7, 2005, 3:22pm

Yes. use Data > Add To DEVONthink > HTML Page if you want to save the source code of the Web page, which will be ‘frozen’ as of the time you save it to DT.

Remember, however, that if the host changes content or breaks the link, you won’t be able to see images when you view your captured HTML page in DT. If there are important graphics on the page, you may wish to use the option of selecting desired text and images and choosing Data > Add to DEVONthink > Selection. That option results in saving the selected portion of the page as an RTF(D) file, which contains any images selected.

Both options, HTML Page and Selection, will ‘freeze’ the information you wish to keep, even if the host subsequently changes content or closes its site.

Of course, DEVONthink retails the URL of the captured information, so that you can at any time check the current status of the Web page.

DEVONagent also allows you to capture the URL to DT as a ‘bookmark’ that allows you to conveniently return to the Web page at any time. You will be able to see the current status and content of the site, even if it changes frequently. If that’s what you wish to do, choose Data > Add to DEVONthink > Link.

Example: I have a bookmark to the Science Magazine Web site, which adds a new issue each week. Clicking on that bookmark lets me open the current site. If I see an interesting article that I wish to add to my DT database, I would (within DEVONagent) send either the HTML Page or Selection to DEVONthink.

quentinsf · March 9, 2005, 9:55am

If you import a file, the path of the file is captured, and doing a ‘sycnhronize’ updates it with the latest version.

Things would be more consistent, I think, if captured web pages did the same, though you’d probably want an Undo in case the latest version is a ‘404 Page not found’ error.

quentinsf · March 9, 2005, 10:07am

Have just seen that the contextual menu on a captured page has an ‘Update captured page’ option, which does this, so the functionality is there.

I suppose that the distinction is that ‘synchronizing’ only happens if the remote file has been updated, where ‘update’ happens regardless.

rmathes23 · March 9, 2005, 12:52pm

i don’t want web pages auto updated unless i specifically request it for a given page. that would be a problem of such magnitude for me that i wouldn’t use synchronize if that were the case.

once i capture a web page, i’ve got it and that’s where i want it to stay. I’ve already got like 500 captured pages, i can’t imagine having to go through each one and make sure that any changes to the respective pages didn’t blow away the key reason i imported it in the first place.

Bill_DeVille · March 9, 2005, 9:53pm

I agree, as I have thousands of items captured from Web pages. Only in a few instances might I want to update such captured information. I use bookmarks (URL links) to access frequently changing sites, as in my example of Science Magazine’s bookmark in an earlier post in this thread.

My normal capture mode is to select the text and images that I want to place in my db. That way, I can veiw the images even when offlne. The exception to that rule comes for pages that contain tables, which don’t render well in RTFD captures; then I’ll capture the page as HTML (or perhaps as an Acrobat Web page capture).