Browsing and capturing

Tribulatio · January 1, 2006, 4:58pm

I am a new user of DEVONThink Pro, and probably this will be reflected in my question!

I have actually two questions:

If I want to open a webpage in DT Pro, it is easy to drag it from my browser into DT and thus to open it. Suppose my browser is not open, I just happen to have a URL in a document and I want to open it straight in DT Pro, what to do? Of course, I can first open the browser and then drag and drop. But I have not managed to find a short way to open a URL straight in DT Pro: if I create a new document and add the URL in the URL column, it does not do anything. Is it possible to do it in just one click, or is it better to enter the URL first in the browser, and then drag and drop?
Regarding the capture function: it seems the toolbar only allows for “Capture HTML”, while the contextual menu allows both for “Capture HTML” and “Capture Webarchive”. What do you advise, what is the difference? Is webarchive the most stable copy of a webpage, something similar to PDF? I capture webpages for use in academic work, so it needs definitely to be very stable, and to look exactly as the webpage used to be (including pictures, etc.), even if open it five years later. Could you explain the comparative advantages and uses of the two different modes of capturing? If they are very different, could DT consider allowing for having the two different modes as buttons in the toolbar?

I have not found a clear explanation in the tutorial (or maybe I didn’t see it) or in the forum - hence this question! Thank you for clarifying!

sjk · January 2, 2006, 12:41am

Welcome to the forum and DEVONthink Pro.

Re: #1

One way to do it is to copy the URL to the clipboard, then either switch to DTP and run Data>New>With Clipboard… [command-N shortcut] or use New With Clipboard from the Dock menu (which allows you to select the destination group). That’ll work for most other types of items in the clipboard, not just URLs.

Another possibility with apps that support Services is to select the URL (or other text), then run DEVONthink Pro>Take Plain Note [command-( shortcut] from the Services menu.

Still another way is to select the URL (etc.), then click-hold=drag it to DTP or its Dock icon. Having the floating Groups window open (Tools>Show Groups [control-command-G shortcut]) can be a useful target for that.

Depending on which DTP view you’re in that last method will open the item in a DTP window. With the other methods you’d have to select the item in DTP since it won’t be selected automatically.

You may want to open the Information window (Tools>Show Info… [shift-command-I) for items you add to check what metadata gets added using different methods. And the item name can differ depending on which method you use.

There are probably other ways to do it but those should be enough to keep you busy for now.

Re: #2

Yesterday I saw several posts discussing that topic while catching up with forum browsing. You could try searching but getting good results can be tricky since search support on this forum is pretty weak (e.g. no phrase searching). I probably saved a couple of the threads in a DTP database. Time permitting, I’ll check for them and post links for any relevant info I find. I’d rather link to previous posts/threads instead of adding new content with redundant explanations although I suppose my long-winded response to #1 is is an exception.

Tribulatio · January 2, 2006, 10:08am

Thank you very muvh, sjk!

I had not realized how I could do it with Data>New>With Clipboard… Wonderful indeed, it meets entirely my needs and expectations!

Regarding Capture Webarchive and Capture HTML, subsequently to my posting, I have discovered some discussions in another section of the forum. If I understand right (and please, correct me if I don’t!):

a) Capture HTML will capture the code, but linked pictures and other items may change, and so the saved webpage might look quite different in a few months from what it is now (in case I need not just the text - in which case Save as Rich Text could do as well - but the page with its full outlook, Capture HTML seems NOT to be the right strategy).

b) Capture Webarchive would preserve the content entirely, as it is, including linked pictures contained in the page, even if the original page completely disappears, but it can be read only on Mac, not on Windows.

Is this a correct understanding?

Then remains my question: in the toolbar, is it possible to have a button Capture Webarchive instead of Capture HTML? Indeed, my own strategy would be to capture either Rich Text, PDF or Webarchive, but rarely HTML.

Anyway, thank you for your kind help! DT Pro is an amazing tool, and it seems to be very stable - it inspires confidence, not making a user afraid of losing one’s data.

cgrunenberg · January 2, 2006, 12:45pm

It’s not yet possible to add this to the toolbar but v1.1 will support this.

Another solution is to use DEVONagent to store web pages as PDFs in DEVONthink (e.g. by using the action menu in the upper right corner of browser windows - that way you can store pages as one huge page). Or print pages to DEVONthink using the “Save to DEVONthink Pro” script and the print panel of any browser (but that will generate a paginated PDF).

The PDFs will be searchable and work on any platform.

Tribulatio · January 2, 2006, 12:52pm

Thank you very much for this clarification!

I assume that your reply means that my understanding of the primary difference between Capture HTML and Capture Webarchive is correct?

I am very glad to hear that new improvements are planned for version 1.1. DT Pro is definitely an amazing tool, from what I have already been able to see.

cgrunenberg · January 2, 2006, 12:55pm

Yes, that’s correct. But some captured archives might still miss some images when you’re offline if the page is using scripts to load random pictures for example.

Oofy · January 17, 2006, 1:20pm

As an even newer user of Devonthink (not yet bought it but will) I was interested in your two-part question. The first part has been answered even better (I think) with something in Usage Scenarios/Tips and Tricks on Oct 14,2005 called “Enter a URL without creating a link”. This is pretty simple to put into any database (it’s in the tutorial db). I managed it quickly and I THINK it does exactly what you (and all of us) need simply if a little inelegantly.

I’m still mightily confused, though, about the form in which webpages are best brought into Devonthink and wondered if you were clearer about it now. If you want to have something you can refer back to for years to come (isn’t this usually the case?), I still can’t work out if a PDF (with, presumably, no live links), Webarchive or just html will be best. Is size of the file likely to become an issue later if you always opt for Webarchive? (Took a while to realise I was only capturing URLs and puzzling why search never came up with anything I KNEW was on the webpages.)

If it’s only the text you’re after, presumably PDF or rich text is best but, for the life of me I can’t work out how to get from a webpage in another browser to having RTF text in DT without a bit of a palaver.

Thanks.

cgrunenberg · January 17, 2006, 1:35pm

There are several solutions:

Taking plain or rich notes (via services) or capturing notes (within DT):
Stores the interesting, selected part and the original URL. This is sufficient for most articles and does not waste space. In addition, the notes can be easily edited/formatted!
Printing PDFs to DT Pro via DEVONagent:
Stores a static “snapshot” of the page. Great for viewing and searching. Bad if the page contains dynamic contents like plugins or applets or if you want to browse the page again in the future. But one advantage is that PDFs work on any platform.
Storing web archives:
Might waste a lot of space depending on the page. And some pages might not look as expected when offline or if the original page disappears due to limitations of Apple’s WebKit and due to dynamic page contents. Another disadvantage is that this is limited to Mac OS X.
Storing HTML pages:
Does not waste space but can’t render the page when offline or if the original page disappears. At least the text contents should remain intact and searchable.
Storing links:
Great for bookmarking or temporary storing but definitely not for archiving & searching.

Usually I would recommend to…

use links for temporary items or bookmarks
take/capture notes of interesting articles
store PDFs if (and only if) the layout is important

Web archives are usually not recommended as they waste space and as they’re limited to Mac OS X.