Bringing the Web into DT

I suspect that for many of us DEVONthink personal edition (DTpe) users, the Internet is the primary source of our document collection Probably true for images, too. I’d like to explore how folks get that content into DTpe.

First, everyone should know that Christian has told us that that DTse will include an “‘import URL’ command to automatically import single web pages or whole websites.” This sound excellent, and will no doubt become the default method of capture web content.

For today, however, here are the several methods of capturing web content that I know of. Are there others that people use?

  1. DT’s Services, which allow users of Cocoa apps to capture either plain text or rich text.

Drawback: these service will not capture the images from a web page, nor will they capture an entire site at once.

  1. Safari features a "Save As" command which saves the HTML of a web page. This HTML can be imported into DTpe, dragged in, dragged over the application icon. Best of all!! it can be saved to a folder with the "Save in DEVONthink" folder action script attached. This works wonderfully – you never have to leave your browser. Of course, DT does a great job of rendering the HTML, and will launch the browser for external launch.

Drawback: this command doesn’t download the images attached to the HTML, and can’t capture more than a page at a time.

  1. IE features a "Save As" command which saves the web page, or entire web site, as a WAFF file.

Drawback: DT doesn’t read the WAFF file, and imports the file as URL only.

  1. PDF files downloaded from the web can be saved using the Schubertit PDF browser plug-in (see discussion on this board at: http://www.devon-technologies.com/cgi-bin/yabb/YaBB.cgi?board=requests;action=display;num=1054002614). This is a great solution for PDF files downloaded from the web. The plug-in will place the original URL in the Finder’s “comments” field. More over, as above, the PDF can be saved to a folder with the DEVONthink folder action script attached, for automated import into DTpe.

Drawback: PDF files are subject to bloat, and often require further intervention (e.g. PdfCompress or PDFshrink) before importing into DTpe.

Are there other, better methods for bringing the web into DTpe?

well, I usually suck the pages and then import the folder…
Possible programs are
pagesucker
sitesucker
web devil

It is not a very satisfying solution, so I’ll wait for the SE in order to be able to directly add those pages into DT

stephan

I use DEVONagent beta to bring Web into DT, because I get the original Side with images and URL. But i hope i would be possible from Safari to.

The reason why you get the images is that DEVONagent provides the URL of the pages to DEVONthink. Therefore, if you’ve added HTML pages on your own to DEVONthink, enter the proper URL of those pages and the display should be fine (at least as good as possible using the old OS X HTML engine).

Basically, DEVONagent executes DEVONthink’s "Take Rich Note" service providing HTML sources and URLs. Therefore if a browser would provide the same contents, you would get the same results after using "Take Rich Note" on your own. Unfortunately, no browser provides HTML to this service.

BTW: DEVONthink’s services work in combination with Carbon applications too if they support services.

And can you now ask Apple to implement this to safari. Or should I write a bug report? That is a really nice feature :slight_smile:

Apples does not listen. Maybe a lot of user requests will help  ;D

Helllo

I didn’t try it but I would “print” the web page from safari and  choose “save as pdf”. combined with a folder script to bring the pdf into DT should convenient enought to work with it…

OT:  So, you are one of the DA beta testers!  Lucky you!!  :)

What determines how DT uses a Services import to the database?

A few times I’ve copied through Services (from Safari) to DT only to check later and not seeing the file at the top level of folders. A search turns it up in a folder, but I’m not sure why that particular folder.

At other times, the imported Services clip is listed at the top level, with all the folders, ready for me to categorize and store it at my leisure.

Can someone advise?

Thanks,
Craig Turner

Please check the destination for new notes (see Preferences > Summarization).

Ah, yes, thank you. I also was reading through the new manual tonight and saw that description. With a new folder marked for "Web clippings" I now have a drawer to throw things into at my leisure to be sorted out later.

Excellent work on 1.7!! Much thanks.

Craig :smiley:

But could the DT service copy the URL of the source page to the URL field in DEVONThink?

NoteTaker’s "Clipping Services" will correctly write the URL as the first line of the new entry. Works with Safari and IE. Great for noting where the clipping came from.

If DevonThink could do this I would stop using NoteTaker for web-clippings immediately…

The service can’t do this as the service does not receive the URL (otherwise all DT versions would use the URL!). But an upcoming contextual menu plugin will provide this feature (probably in v1.8).

Great!

But I wonder how NoteTaker 2003 does it then. AquaMinds seem to have it itegrated into their Clipping Services…

Adrian

Since I kicked off this discussion a while back, I should also point that there is a new means available to bring the web into DT: DEVONagent.

If you haven’t tried this yet, stop everything and do it.  Search in DA, select a results page, and click the DEVONthink button in the menu bar.  DA will save the HTML to DT, and will even automatically place the URL of the page in DT’s URL.

This is great stuff, and is now my favorite way to save web sources locally.  Flemming noted this possiblity much earlier in this thread, and now that DA has a public beta, we can all try it out.  The only drawbacks I can think of:

  1. DA => DT saves the HTML to DT, not some kind of representation.  This means:

a) rendering of the saved page inside of DT is dependent on DT’s HTML engine, which is currently somewhat “iffy” (but will be replaced by WebKit soon)

b) the page “saved” in DT is volatile; if the page is deleted from the web, then it’s ipso facto gone from your DT knowledgebase.

  1. DA => DT can only save a page at a time, not a collection of pages or an entire site.

Overall, this feature is so useful that I now find myself wishing I could just use DA as my default browser.  Christian, is it possible to fire up DA and just open a blank "preview" window, into which I could type a URL?  Right now, it looks like the only way to get a preview window is to originate a query and select a page to preview.  

Fred:

I like your suggestion for enabling a browser window in DEVONagent.

The HTML text sent from DA to DEVONthink isn’t volatile (even if the referenced Web site disappears), but you are correct that images will be lost if the referenced page disappears from the Web. I’m looking forward to actually capturing the complete page in DEVONthink.

DA can save multiple or all results in a row but DT PE can’t import whole websites. In addition, at least the HTML code remains inside the knowledge base. Only images could disappear.

??? When I try to add a webpage from DA to DT I just get a beep. Here is what I tried. First I did a search and then navigated to a window with a single site, the kind of window that says 16 of 60 in the window title. Then I clicked on DevonThink in the list of actions and just got a beep. What am I doing wrong?

Thanks

Verify that DEVONthink is 1.7.1 inside the folder /Applications and log out and in again if haven’t done this since the installation of DEVONthink. Afterwards check if the services of DEVONthink (see Services menu) are available. If that’s not the case, check if there are any APE modules installed and deactivate them.

And if adding still does not work, please check if there are any entries inside the system console related to DEVONthink or DEVONagent as it’s still a beta.

Beta 2 will provide a "New Browser" command and an option to open a new empty browser on startup. In the meantime you could archive a page in DA and use this archived page to get a preview window without starting a query.