DT Web Clipper Appears to be Defective

sjdavis1224 · May 22, 2013, 9:13pm

Colleagues,

In case you’re not aware, the DevonThink web clipper has a bug in such that you get unexpected results when attempting to clip from a password protected web site. The problem results in a copy (clip) from the browser to the DT inbox which merely contains the login page from the source web site.

Thus, rather than doing a screen scrape or screen dump from the browser (or some other type of copy function) which collected the desired content, instead the inbox merely captures a copy of the login page from the systems that’s being access from the browser.

It can be very disappointing when a person ‘thinks’ they are capturing information only to find out latter, after it is ‘too’ late, they did not record the information they were attempting to collect. This anomaly can be especially problematic because no warning is generated when there ‘is’ a problem. This issue has also occurred in the past but it has resurfaced again. (Refer to the below post.)

After many hours of research and failed alternatives, I have been able to find a reasonable option. We’ll begin by saying, performing a screen ‘grab’ (via Mac’s utility) helps but it is not adequate. Further, if you’re on a lengthy web page, it would be very time consuming and tedious to capture / piecemeal the screen dumps from several pages together.

Evernote-the word famous software application-also does have a web clipper which ‘will’ capture the entire contents of a browser’s window-even if it is on a site that’s password protect. Unfortunately, if the source isn’t already in stored in a PDF format, Evernote will store content to its proprietary file format. As far as I am aware, it will not perform a conversion to PDF.

Nonetheless, we’re able to clip web content to Evernote’s online site. That’s a first web. We’re then able login to Evernote in order to perform web clippings from their site. When we’re on Evernote’s web site, we can use DevoThink’s web clipper in order to copy the desired content into DT’s inbox. This solution takes extra steps but works a reasonable solution until DevonThink Technologies fixes their bug.

Today, I was trying to archive a copy of default settings from a web-based application. This documentation was needed before I began to make changes to it. I though I was documenting a system only to find out the otherwise. The DT bug wasted so much of my time today and I’m sure others have experienced a detrimental impact due to the anomaly given no exceptions are generated.

Since Evernote is able to perform the aforementioned clipping capability, I am hopeful DevonThink technologies will eventually have a working solution as well. I searched the DT forums for a solution earlier today but didn’t find one. Because this ability to clip content from the web is so important, I hope others won’t find themselves surprised by a software defect before it is too late.

In order to end on a positive note, I hope others will find this solution helpful. Now that a workaround has been posted, others will have a solution and save time.

Respectively,

Stephen Davis
Solution Architect

Documentation - This has occurred before …

blog.devontechnologies.com/2010/ … k-updated-—

Bill_DeVille · May 22, 2013, 10:18pm

The Clip to DEVONthink extension and the Bookmarklets require a second, parallel access to the page, which is not allowed by secure sites such as your online banking site and by some others that control access, such as the New York Times site. Instead of a successful clip, the result will be a clip of the login window.

This has been discussed on the forum in the past.

A workaround on such sites is to ‘print’ the page as PDF to DEVONthink. To do that, press Command-P to invoke the Print panel, click on the PDF button, then choose the option to Save PDF to DEVONthink.

Alternatively, one can select all or a portion of the Web page and capture as rich text using the Command-) keyboard shortcut for that Service (doesn’t work in Chrome or Firefox, as those browsers don’t properly handle Services, but does work in Safari, DEVONagent and DEVONthink’s browser).

Especially in cases such as articles segmented into multiple pages (often presented this way on the New York Times site), Safari’s Reader button will create a text selection, including download of all the segments of the article. Then press Command-A to select the Reader conversion and Command-) to capture as rich text to DEVONthink.

korm · May 23, 2013, 1:32am

Yes, but what Mr. Davis points out is that Evernote with it clipper has been able to sort the problem of clipping from secured and paywalled sites (e.g., NY Times) and succeed where DEVONthink does not. The original post suggests an approach that I’ve also found useful. Obviously the technical solution exists, I believe Mr. Davis’ note might be an encouragement to DEVONtech to consider modern alternatives to its clipper.

Bill_DeVille · May 23, 2013, 4:27pm

Note that if you wish to capture full Web pages as PDF, this works for all sites in DEVONthink via the ‘print as PDF’ approach: Press Command-P to invoke the Print panel, click on the PDF button, then select the Save to DEVONthink option. The capture will be quicker than if Clip to DEVONthink were used.

My own approach to Web captures is to exclude irrelevant content from pages, and capture only a selected area of the page. The major benefit is to improve the efficiency of searches and use of the AI assistants, as my captured documents don’t include text that’s irrelevant to the article of interest. Another benefit is reduction of document file size, often by two or even three orders of magnitude (file sizes that are sometimes only a hundredth or thousandth of the file size if captured as full-page PDF or WebArchive).

Here’s a real-world example. I’m interested in issues related to intellectual property. The MacDailyNew site has an article that I wished to capture, titled “Apple CEO Tim Cook: U.S. court system is too slow to adequately protect tech firms’ intellectual property” posted 22 May, 2013. The URL is macdailynews.com/2013/05/22/appl … -property/

Using Clip to DEVONthink, if I capture the full page as 1-page PDF the resulting document has a file size of 1.2 MB. Using Clip to DEVONthink, if I capture the full page as WebArchive, the file size is 3.8 MB. Using the rich text Service to capture a selected area of the page that contains only the article, the file size is 2.5 KB. See what I mean?

Because the full-page captures contain a great deal of text on a wide variety of topics, searches might list that page because of content that has no relevance to the actual article topic. The usefulness of the Classify and See Also assistants will be compromised by all that irrelevant content. Indeed, when I look at the full-page capture the information content that interests me appears almost minuscule compared to all the ads and other unrelated content on the page, which are highly distracting. I don’t want that full-page version in my database!

That’s why my normal capture of Web items is as rich text or WebArchive of a selected area of the page that focusses on the content I want to capture, and excludes everything else. In Safari, DEVONagent Pro or DEVONthink’s browser selections are captured using keyboard shortcuts Command-) for rich text or Command-% for WebArchive, using the appropriate Services. Note that Firefox and Chrome can’t handle those Services, so I don’t use them for captures.