Making the web clipper as good as Evernote's

Using the Safari contextual menu and its Share > Add to DEVONthink command for Rich Text does just copy only the selection Iā€™ve made on a webpage and nothing else (which is what I want), but it only copies it as plain text (even though Iā€™ve selected the Rich Text option) without the bold text (or other rich text) formatting and without the embedded URLs in my selection.

Again, what I want DTP to do, which Evernote does flawlessly, is to import my selectionā€”and only my selectionā€”with whatever rich text formatting and embedded URLs that exist in the selection.

Itā€™s the same with dragging a selection to the Sorter; the selected text does import, but only with some of the selectionā€™s original rich text formatting and in some sort of weird faded capture (see screen capture below and compare to kornā€™s original post).

And why should I have to use such an inefficient process as dragging the selected Safari content onto the desktop as a .webclipping and indexing or importing the clippings collected there into my DEVONthink database? And itā€™s the same inefficiency with invoking ā€œTake Noteā€ with the keyboard shortcut and dragging the selected Safari content into a noteā€™s text field (which Iā€™d first have to create a note to do).

I havenā€™t tried either of these last two methods to see if it would do what I want, but why canā€™t I get the selection I want just by invoking DEVONthinkā€™s webclipper?

I prefer DTP over Evernote for a variety of reasons, but at least with my OS (Mac OS 10.11.4), Evernote cleans DTPā€™s clock with respect to capturing just a webpage selection with original formatting and embedded links. I just donā€™t get why DTP canā€™t accomplish the same task with its webclipper.

DEVONthink (and Evernote for that matter) is at the mercy of the webpage. Iā€™ve had rich text copy failures in DEVONthink, as described by the correspondent above, and in Evernote. Sometimes a selection from a given page fails to come across as rich text in DEVONthink and succeeds in Evernote, and other times itā€™s the opposite. The web is a mess of bad coding ā€“ I wouldnā€™t blame anyoneā€™s clipper for failing since success depends on exogenous factors out of any developerā€™s control.

Note: we donā€™t have 400+ employees available to work on the browser extension alone. So, while nothing is just ā€œgood enoughā€ for us, we also have to allocate our resources far more judiciously than they do.

Also, as korm pointed out, Evernote also fails with their extension. In fact, just recently I had to use it for a Support Ticket and it failed on several pages our extension was working on. The Web is indeed the Wild West, despite whatever standards may be in place. (Itā€™s also why I wish we could remove the word ā€œautomaticallyā€ from peopleā€™s vocabulary when referring to software and the web.) And consider any standard created now has many years, and literally billions of web pages of legacy (ie. non-standardized) code that no one would EVER backwards maintenance. Sometimes, itā€™s amazing capturing web data works at all! 8) :mrgreen:

1 Like

Preface: I canā€™t include links in my post?


For me, the benefits are, highest priority to lowest is:

  • get the full-text, for searching within DT
  • save some visual representation of the page, so when scanning, remembering things is easy because ā€˜oh yeah, I remember itā€™s this mostly blue page, with large white headersā€™
  • an image/pdf export
  • tagging
  • where does it go? db/groups etc,
  • smart processing, ie, remove ads, clean reading view etc

How

I donā€™t know if DT has the capability of running JS under the hood- the OSS projects I referenced further below allow for clipping seamlessly, directly into HTML files.


Repos

github[dot]com/gildas-lormeau/SingleFileZ
github[dot]com/gildas-lormeau/SingleFile


Why

  • Itā€™s really really good
  • Saves as a standard HTML file, without the lockin of a .webarchive.
  • When browsing my stuff in DT, highlighting an HTML file in the main results would bring up the preview right away, vs having to drill down to the ā€œmainā€ file. These are what my clipped notes from Evernote import look like:


Comparison of resulting file types:

github[dot]com/gildas-lormeau/SingleFile#file-format-comparison


CLI version for devs + Browser Extensions

  • Firefox: addons[dot]mozilla[dot]org/firefox/addon/single-file
    *Chrome: chrome[dot]google[dot]com/extensions/detail/mpiodijhokgodhhofbcjdecpffjipk

Deferred Processing

I tried the Chrome ex, I love how it runs asyncronously, with the little notification in the bottomā€¦ this lets me continue what Im doing without interruption. UX on this is really important- hopping back and forth between things quickly without losing flow.


A solid clipper would mean all in on DT and no more Evernote

2 Likes

Just noticed how old this thread isā€¦ this is my first day with DT trial
Hoping this has been improved/resolved

ditto to the request

better web clipper is critical for knowledge management db
saving as html or web archive ainā€™t best because of the size and cleanliness
websites such as reddit contains much valuable info in the discussion section which can be used as future reference. however, the current clipper is weak in clip the whole reddit thread in clean format

A better web clipper would be a huge benefit. Right now Iā€™m using the Evernote clipper, then taking the note in Evernote, printing to pdf and then importing that into DT3. Itā€™s painfulā€¦

Love DT and have been a user since 1.x. I think the web clipper, working properly, would be a huge deal.

Thanks!

I get the frustration.

Iā€™ve been using DT Clip to Markdown or copying and pasting into a Formatted Note when I need fairly simple text and images.

But when I want to keep an accurate copy of a web page, I use a Safari extension called ā€œPage Screenshotā€ (available in the Mac App Store) that allows me to take a ā€œscreenshotā€ of either the visible page or the entire pageā€”as a single imageā€”in either PDF, JPG, and/or PNG format. That preserves exactly what Iā€™m seeingā€”then I make a PDF with text by using OCR.

The only downside is that it captures the page as a single imageā€”which makes for a very tall imageā€”Iā€™m not sure how one would print it to paper. But it does allow me to keep an exact, WYSIWG ā€œarchiveā€ of a page.

And Iā€™ve found that when I select ā€œKeep full retina resolution qualityā€ checked, the file is very large (15+ MB) and wonā€™t OCR properly in DT 3ā€”I donā€™t know if itā€™s the size or dpi or what, but I keep that unchecked if Iā€™m going to want to extract a text layer from it.

The extension looks like this:

All, I believe that this is still not resolved. Is there any hope to see the save as article function? Just now I tried to capture a question and answer from Quora and just despaired. Whatever I saved was only showing the log in screen of Quora.

Hello,

As this thread is long, some info is old and the word ā€˜reloadā€™ doesnā€™t seem to be in here, I was wondering:

Why does the web clipper ā€˜reloadā€™ a page before itā€™s clipped? The problem I experience is that cookie walls and pop-overs get clipped, and I have a hard time to manually remove them from the webarchive or html.

Other clippers like the one from Evernote or Nimbus Note seem to use another capturing trick, so the loaded page gets clipped as-is. The latter even allows me to edit / modify the contents before itā€™s stored.

Best regards,
Maik

1 Like

Thatā€™s just the way the browser extension works at this time.

Note: There is no ā€œclipping standardā€. These things are developed independently and with their own solutions. Though our extension works in many instances, we have it on our list to enhance in the future. Thanks for your patience and understanding.
(Also, note that Evernote has 300+ employees and millions of dollars in funding. We are a small development house, completely self-contained and funded through our sales alone. And at one point, I heard rumor (though I didnā€™t try and substantiate it) they had at least 40 people working on the clipping extension technology.)

2 Likes

I appreciate the frustration. And BLUEFROGā€™s explanation makes sense: apparently itā€™s just not that easy to consistently clip material from a huge variety of web pages. Iā€™ve come up with two solutions, using Safari:

  1. ā€ŽPage Screenshot for Safari on the MacĀ AppĀ Store, per my post above, then OCRing it to pdf; the advantage of this (even over Evernote) is that it preserves the exact look and layout of the page. Inline links, unfortunately, donā€™t work.

  2. With some help from DT Support, I made an AppleScript shortcut that creates and opens a blank Formatted Note with the correct URL and page title. Then I simply copy and paste whatever I want from the page into that open note. If the formatting is wonky, I will usually just select ā€œReader Viewā€ before copying; that at least gets the text and usually any inline images. The helpful post with the script was here: Difference between clipping Safari page to formatted note and copying/pasting into formatted note - #8 by pete31

Of course the third option is just to use Evernote, and then import into DT3 if necessary. Thatā€™s not something I do too often, but it works for some scenarios. When repeatedly clipping simple things ā€“ for example, I keep a list of words and definitions Iā€™ve looked up, and DT3 is useless at clipping the dictionary I use ā€“ I just use Evernote.

And I didnā€™t know about Nimbus Clipper; Iā€™ll check it out!

Thanks,

W.F.

3 Likes

OMG: :confused:

Indeed! Now, that was some time ago when I ran into that. The finer point of it is, Evernote has always had a big influx of capital investment and a team that far exceeds ours in numbers. So they have the ability to create a group of developers who can concentrate on singular features or smaller sets of functionality.

1 Like

If you have an Instapaper account you can also use the Text bookmarklet from the Instapaper page: Instapaper

The cleaned up page can be added to DT the usual way using the extension.

Rehashing the discussion, and I think this question is related to the last comment from Jim about ā€œreloading the pageā€.
Is there a setting to change from giving permission to Devonthink to clip every page or URL every time one tries to clip a page, instead of giving the entire Chrome application permission one time?

  1. Stop using Chrome - ugh! :wink:
  2. This is a Chrome issue not something DEVONthink controls.

Try this in Applications/Utilities/Terminal.appā€¦

defaults write com.google.Chrome ExternalProtocolDialogShowAlwaysOpenCheckbox -bool true

fromā€¦

yeah, I know. Iā€™ve been meaning to move away from Chrome for a while.

Cheers!

1 Like