Making the web clipper as good as Evernote's

Using the Safari contextual menu and its Share > Add to DEVONthink command for Rich Text does just copy only the selection I’ve made on a webpage and nothing else (which is what I want), but it only copies it as plain text (even though I’ve selected the Rich Text option) without the bold text (or other rich text) formatting and without the embedded URLs in my selection.

Again, what I want DTP to do, which Evernote does flawlessly, is to import my selection—and only my selection—with whatever rich text formatting and embedded URLs that exist in the selection.

It’s the same with dragging a selection to the Sorter; the selected text does import, but only with some of the selection’s original rich text formatting and in some sort of weird faded capture (see screen capture below and compare to korn’s original post).

And why should I have to use such an inefficient process as dragging the selected Safari content onto the desktop as a .webclipping and indexing or importing the clippings collected there into my DEVONthink database? And it’s the same inefficiency with invoking “Take Note” with the keyboard shortcut and dragging the selected Safari content into a note’s text field (which I’d first have to create a note to do).

I haven’t tried either of these last two methods to see if it would do what I want, but why can’t I get the selection I want just by invoking DEVONthink’s webclipper?

I prefer DTP over Evernote for a variety of reasons, but at least with my OS (Mac OS 10.11.4), Evernote cleans DTP’s clock with respect to capturing just a webpage selection with original formatting and embedded links. I just don’t get why DTP can’t accomplish the same task with its webclipper.

DEVONthink (and Evernote for that matter) is at the mercy of the webpage. I’ve had rich text copy failures in DEVONthink, as described by the correspondent above, and in Evernote. Sometimes a selection from a given page fails to come across as rich text in DEVONthink and succeeds in Evernote, and other times it’s the opposite. The web is a mess of bad coding – I wouldn’t blame anyone’s clipper for failing since success depends on exogenous factors out of any developer’s control.

Note: we don’t have 400+ employees available to work on the browser extension alone. So, while nothing is just “good enough” for us, we also have to allocate our resources far more judiciously than they do.

Also, as korm pointed out, Evernote also fails with their extension. In fact, just recently I had to use it for a Support Ticket and it failed on several pages our extension was working on. The Web is indeed the Wild West, despite whatever standards may be in place. (It’s also why I wish we could remove the word “automatically” from people’s vocabulary when referring to software and the web.) And consider any standard created now has many years, and literally billions of web pages of legacy (ie. non-standardized) code that no one would EVER backwards maintenance. Sometimes, it’s amazing capturing web data works at all! 8) :mrgreen:

1 Like

Preface: I can’t include links in my post?


For me, the benefits are, highest priority to lowest is:

  • get the full-text, for searching within DT
  • save some visual representation of the page, so when scanning, remembering things is easy because ‘oh yeah, I remember it’s this mostly blue page, with large white headers’
  • an image/pdf export
  • tagging
  • where does it go? db/groups etc,
  • smart processing, ie, remove ads, clean reading view etc

How

I don’t know if DT has the capability of running JS under the hood- the OSS projects I referenced further below allow for clipping seamlessly, directly into HTML files.


Repos

github[dot]com/gildas-lormeau/SingleFileZ
github[dot]com/gildas-lormeau/SingleFile


Why

  • It’s really really good
  • Saves as a standard HTML file, without the lockin of a .webarchive.
  • When browsing my stuff in DT, highlighting an HTML file in the main results would bring up the preview right away, vs having to drill down to the “main” file. These are what my clipped notes from Evernote import look like:


Comparison of resulting file types:

github[dot]com/gildas-lormeau/SingleFile#file-format-comparison


CLI version for devs + Browser Extensions

  • Firefox: addons[dot]mozilla[dot]org/firefox/addon/single-file
    *Chrome: chrome[dot]google[dot]com/extensions/detail/mpiodijhokgodhhofbcjdecpffjipk

Deferred Processing

I tried the Chrome ex, I love how it runs asyncronously, with the little notification in the bottom… this lets me continue what Im doing without interruption. UX on this is really important- hopping back and forth between things quickly without losing flow.


A solid clipper would mean all in on DT and no more Evernote

2 Likes

Just noticed how old this thread is… this is my first day with DT trial
Hoping this has been improved/resolved

ditto to the request

better web clipper is critical for knowledge management db
saving as html or web archive ain’t best because of the size and cleanliness
websites such as reddit contains much valuable info in the discussion section which can be used as future reference. however, the current clipper is weak in clip the whole reddit thread in clean format

A better web clipper would be a huge benefit. Right now I’m using the Evernote clipper, then taking the note in Evernote, printing to pdf and then importing that into DT3. It’s painful…

Love DT and have been a user since 1.x. I think the web clipper, working properly, would be a huge deal.

Thanks!

I get the frustration.

I’ve been using DT Clip to Markdown or copying and pasting into a Formatted Note when I need fairly simple text and images.

But when I want to keep an accurate copy of a web page, I use a Safari extension called “Page Screenshot” (available in the Mac App Store) that allows me to take a “screenshot” of either the visible page or the entire page—as a single image—in either PDF, JPG, and/or PNG format. That preserves exactly what I’m seeing—then I make a PDF with text by using OCR.

The only downside is that it captures the page as a single image—which makes for a very tall image—I’m not sure how one would print it to paper. But it does allow me to keep an exact, WYSIWG “archive” of a page.

And I’ve found that when I select “Keep full retina resolution quality” checked, the file is very large (15+ MB) and won’t OCR properly in DT 3—I don’t know if it’s the size or dpi or what, but I keep that unchecked if I’m going to want to extract a text layer from it.

The extension looks like this:

All, I believe that this is still not resolved. Is there any hope to see the save as article function? Just now I tried to capture a question and answer from Quora and just despaired. Whatever I saved was only showing the log in screen of Quora.

Hello,

As this thread is long, some info is old and the word ‘reload’ doesn’t seem to be in here, I was wondering:

Why does the web clipper ‘reload’ a page before it’s clipped? The problem I experience is that cookie walls and pop-overs get clipped, and I have a hard time to manually remove them from the webarchive or html.

Other clippers like the one from Evernote or Nimbus Note seem to use another capturing trick, so the loaded page gets clipped as-is. The latter even allows me to edit / modify the contents before it’s stored.

Best regards,
Maik

1 Like

That’s just the way the browser extension works at this time.

Note: There is no “clipping standard”. These things are developed independently and with their own solutions. Though our extension works in many instances, we have it on our list to enhance in the future. Thanks for your patience and understanding.
(Also, note that Evernote has 300+ employees and millions of dollars in funding. We are a small development house, completely self-contained and funded through our sales alone. And at one point, I heard rumor (though I didn’t try and substantiate it) they had at least 40 people working on the clipping extension technology.)

2 Likes

I appreciate the frustration. And BLUEFROG’s explanation makes sense: apparently it’s just not that easy to consistently clip material from a huge variety of web pages. I’ve come up with two solutions, using Safari:

  1. ‎Page Screenshot for Safari on the Mac App Store, per my post above, then OCRing it to pdf; the advantage of this (even over Evernote) is that it preserves the exact look and layout of the page. Inline links, unfortunately, don’t work.

  2. With some help from DT Support, I made an AppleScript shortcut that creates and opens a blank Formatted Note with the correct URL and page title. Then I simply copy and paste whatever I want from the page into that open note. If the formatting is wonky, I will usually just select “Reader View” before copying; that at least gets the text and usually any inline images. The helpful post with the script was here: Difference between clipping Safari page to formatted note and copying/pasting into formatted note - #8 by pete31

Of course the third option is just to use Evernote, and then import into DT3 if necessary. That’s not something I do too often, but it works for some scenarios. When repeatedly clipping simple things – for example, I keep a list of words and definitions I’ve looked up, and DT3 is useless at clipping the dictionary I use – I just use Evernote.

And I didn’t know about Nimbus Clipper; I’ll check it out!

Thanks,

W.F.

3 Likes

OMG: :confused:

Indeed! Now, that was some time ago when I ran into that. The finer point of it is, Evernote has always had a big influx of capital investment and a team that far exceeds ours in numbers. So they have the ability to create a group of developers who can concentrate on singular features or smaller sets of functionality.

1 Like

If you have an Instapaper account you can also use the Text bookmarklet from the Instapaper page: Instapaper

The cleaned up page can be added to DT the usual way using the extension.

Rehashing the discussion, and I think this question is related to the last comment from Jim about “reloading the page”.
Is there a setting to change from giving permission to Devonthink to clip every page or URL every time one tries to clip a page, instead of giving the entire Chrome application permission one time?

  1. Stop using Chrome - ugh! :wink:
  2. This is a Chrome issue not something DEVONthink controls.

Try this in Applications/Utilities/Terminal.app

defaults write com.google.Chrome ExternalProtocolDialogShowAlwaysOpenCheckbox -bool true

from…

yeah, I know. I’ve been meaning to move away from Chrome for a while.

Cheers!

1 Like