very basic question

rmathes23 · December 14, 2004, 9:59pm

hi guys, new user here. I’m testing trial versions of DT and DA and if the few hours I’ve used them this morning are any indication, i’ll be purchasing both this week.

One question I have is this, and I’m sure it’s very basic but in searching for an answer I’ve read conflicting things:

when I “capture” a webpage in DT, is the view of that page frozen or is it dependent on the source of the page maintaining it on their servers?

I tested this by capturing the lead page from cnn.com. gave it a name like ‘cnn - 11:00am’. Then waited until the lead page changed pictures and did another capture and the two results are different, they each show what was the state of that page at the time of the capture. That’s what I was hoping to see but I’m making sure it means what I hope it means.

One of the main selling points for me in getting these apps is I’m constantly finding stuff online I want to archive for later use and have never found a particularly good tool to do it. And if I bookmark the links, sometimes in the future the link turns dead and now the information is inaccessible. I want to make sure that once I capture it, I’ve got it no matter what happens to the servers where I got it.

I’m a small business owner who also enjoys researching and writing. I’ve already got a stable of very effective programs I thoroughly enjoy using (Notebook, NovaMind and Ulysses along with the new beta of OmniWeb) and I think DA/DT will fill some gaps in that suite quite nicely.

subscriber3 · December 14, 2004, 10:26pm

when you capture a web page it captures the page, not the images, external style sheets and other things.

This may be hard to tell, since they may be in your cache and may be maintained by the site for quites some time. if you shut down your internet connection and clear your cache you will see what is missing.

To avoid losing things if the site shuts down or moves the files, you have three options:

select the part you want and take a rich note (RTF). this will give you the most accurate searching and classifying since unnecessary material that may be shared with other pages taken from the same site is eliminated.
use the PDF Service script “Save to DEVONthink” to save as a PDF file. although this saves a “picture” of the web page, DEVONthink will also save a word list allowing search and classification.
use the “Create Offline Archive” script to store the additional files in your database.

These two scripts will be in DEVONthink Pro; I am sure the PDF Service is in DEVONthink PE, I am not sure about the offline archive script.

rmathes23 · December 15, 2004, 12:57pm

thanks for the quick reply. I copied your post for future reference and captured it as rtf in DT!

Here’s the thing I’m still trying to get my arms around: i’ve got these two cnn links, right? all that’s different is the time they were captured, the url is identical for each (cnn.com). When I clear cache and disable my airport connection I do indeed see what’s missing, but the text of the respective pages remains intact, reflecting the state of the page at the time I captured it.

My fear was that with a link like cnn.com, that even though the page was captured, when I subsequently re-displayed it in DT I would get the state of the page at that time, not at the time of the capture. That is obviously not the case (which is a very good thing).

So I guess I’m trying to determine exactly what a ‘capture page’ command does. It’s obviously not behaving like a bookmark, saving the url and redisplaying it on command. Is it like the copy to rtf command except it’s html instead of rtf?

system · December 15, 2004, 2:48pm

Exactly.

Best,

Eric.

subscriber3 · December 15, 2004, 3:04pm

you can always use
File > Export > Files and Folders…
to obtain a copy of what it saved in the database.
this will not remove the entry from your database.

if you do this with a captured page, you will get an HTML file, and a DEVONtech_storage file. both can be opened with Safari, among other apps, and should answer your questions.

The HTML file is the captured HTML Eric referred to, and the DEVONtech_storage file is other information such as the URL and the entry name.

The DEVONtech storage file is used by
File > Import > Files and Folders…
to build the entries.

for example, if you have a friend who is using DEVONthink, you can Export entries and, when they Import them, they will have the complete entry that you had, not just the data.

sjk · December 15, 2004, 11:54pm

And how does Update Captured Page (on the contextual menu for HTML documents) fit into the picture? In the past I’m pretty sure Christian said it’s not working as intended and I’m not sure if anything’s different now. I can’t figure out any use for it (if any); just seems to work like Touch, updating the document Modified time.

subscriber3 · December 16, 2004, 12:32am

I normally work in Notepad view.

if I select a captured HTML page, it is shown in the window.

if I use the context menu to “Capture Page” I get a new entry, a duplicate of the page.
if I use the context menu to “Update Captured Page” I overwrite the existing entry with the duplicate information. I think it might have been intended that it would get a fresh copy from the URL instead.

however, don’t forget that you can follow links in the DEVONthink window. when you do that, the information is no longer a duplicate. (in many cases you can follow a link away, and then a link back, and get a fresh copy.) now the context menu gives you the choice of “Capture Page”, creating a new entry with the new information, or “Update Captured Page”, overwriting the existing entry with new information.

I have found occasional use for this command. if you want a fresh copy all the time, you would capture a link to the page, so I think this is really for pages you only want to update from time to time.

sjk · December 16, 2004, 3:09am

Which was renamed to Vertical Split, at least in DT Pro. I use that and List view most often.

WebKit-rendered, yep.

I get what appears to be a duplicate but the orig/new names aren’t displayed with blue text to indicate duplicate (any idea why?). [edit: gota blue-text dups when I tried again later, hmm]. If I use Duplicate [command-D] instead then it’s a genuine duplicate, with “copy” appended to the name and it and the original in blue text.

That’s where I lose you. Did you run that on the original page? And which duplicate information overwrites the existing entry? [edit: never mind]

That been my understanding.

Click a link on the current page? [edit: never mind]

Ahh, I finally see how that works, thanks! I don’t do much browsing in DT so it never occurred to try it that way.

And I noticed using Update Captured Page preserves the item’s original name (confusing) while Capture Page uses whatever the new item’s page title is (clear).

What I’m looking for is the optimal way to update local HTML copies of pages with the current version from the original site. Known dynamic pages like ones from this forum, for instance. I don’t want links to them 1) so they can be viewed offline and 2) so I’ll still retain a copy if the original site/content goes away (which happened with the YaBB DEVONthink forum ). With those last bits of info you’ve provided I think I can manage to get that working… will do a test with this page (which I just capture) after posting this.

To summarize, Update Capture Page updates the local HTML content if the page being viewed has changed (e.g. click a link on it) from what was originally captured. It doesn’t “recapture” the page from the original site, as one might anticipate (which I’m attempting to achieve, without wiping out a good original with a bogus refresh). Capture Page grabs the content currently being viewed and Duplicate clones the original content. Correct?

Whew. That gives me enough to play with. Thanks again, douglas!

subscriber3 · December 16, 2004, 12:00pm

well, a duplicate is a duplicate. Some web pages seem to have “invisible” information, like a date stamp. So if you capture them twice in the same day, they are duplicates, but if you capture them on two different days they are not.

I don’t know what all the possible causes are, if you are curious you can export the HTML and run FileMerge (if you have the Apple Developer tools) or maybe check VersionTracker for something like Diff’npatch (if you don’t).

Knight_of_Nee · January 2, 2005, 8:17pm

I have a feeling I’m missing something, but I do not see a “Create Offline Archive” script in my DT scripts folder. Also, my “Save to DEVONthink” script does nothing from within Safari. What is the appropriate usage for the scripts that are provided with DT PE? Right now the way import web pages is to create an active link in DT then use “Capture Page” but if I have read this thread correctly, this will not archive the images with the page. That would be a big problem. Thanks.

eboehnisch · January 3, 2005, 9:12am

The “Create Offline Script” comes only with DEVONthink Pro. Sometimes, long-time beta testers forget about the “real world”

The “Save to DEVONthink” script is a PDF service script that allows you to “print” a web page as a PDF file directly into DEVONthink. Copy the script to the folder “~/Library/PDF Services/” (create if it doesn’t exist), then open your web page in Safari, choose “File > Print…”. There you’ll find a PDF button on the bottom of the print dialogue which pops up a menu that contains a “Save to DEVONhink” item.

Best,

Eric.