A not so prefect better web clipping ideas and welcome improvments

I love to use Devonthink. But as many others, one thing bothered me a lot is the web clipping capability of this app. Both desktop and on mobile. I constantly looking for a better method, that easy and constant, can give me better clipping result. Web archive always not working on dynamic site, pdf clipping always loss images, markdown can’t capture full page, these are the issues I have encountered many times.

So enough for grumbling. Yesterday I found a web extension called 'Print Friendly &Pdf". I use chrome but I think it has version for Safari and Firefox too. This extension will grab any page and convert them into a beautiful (compare to default print pdf and save to devonthink option) pdf file and you can then download it to your mac. The texts are selectable, layout are clean and beautiful, and it gives you ability to remove some unnecessary clutters before download.

So far this is the best solution I have found for improving clipping material quality. As you can see, it has some drawbacks:

  1. It only works on desktop. I’m yet to find a way to let it work on mobile.
  2. You have to manually move the pdfs into devothink. I know you could let devonthink ‘watch’ a folder, but this extension only download to ‘Download’ folder. Which I don’t want devonthink to watch constantly since there are all type of files I would like to donwload to.
  3. You will lose original URLs in devonthink data sheet. It will replaced with the url this extension’s download page. You still have original URL on the pdf and it is clickble.

So I hope to share this little cute extension to you, and hope it will help. It is not perfect, but it better than the default methods.

Also, if anyone could come up with some ideas to overcome the drawbacks I have listed and make this works even better, please do, you have my thanks in advance.

I suspect you are looking for an impossible holy grail.

If it is a dynamic/Javascript oriented page a bookmark in DT3 works best; by definition you can never clip such a page.

If you are trying to eliminate ads or otherwise improve upon the original webpage, then that is a subjective moving target. (But probably Evernote works better than any other option in that case; they put so much manpower effort into that task that it is unlikely in my view anyone else will ever improve up on it.)

If you are trying to capture a static page as-is, one or more of the existing DT3 options should work fine.

2 Likes

I like that plugin a lot. However, I’ve been trying to find a more automation-friendly solution.

The Safari ‘Export as PDF’ function works well and can be fairly easily automated with Keyboard Maestro and/or AppleScript. With a good ad-blocker (I’m using AdBlock right now), you can get a fairly reliably clean PDF, and by triggering Reader mode you can get it completely decluttered.

It’d be nice if DEVONthink worked better with its own clipper. But its internal browser has very limited content blocking. The worst part is the cookie popups. The ‘Accept Cookies’ setting in preferences doesn’t manage to deal with them. Almost any website (particularly news-type website) that I try to clip via DEVONthink directly has a the ‘accept cookies’ popup right in the middle of the first PDF page.

Ironically, I bought DEVONthink originally primarily for its web clipper. I ended up using it for everything in my life except web clipping!…

The problem of saving web pages faithfully in any format is something that occupies many researchers in academia and industry today. The fact that DEVONthink doesn’t do a great job shouldn’t be held against it; as @rkaplan alluded to above, many kinds of web pages today contain dynamic content that is very difficult to capture using a single universal approach, and basically nothing does a perfect job.

In some cases, you can get better results if you know something about the software running the site (e.g., Discourse has special features for printing to PDF that makes it possible to overcome its dynamic loading/unloading behavior). Failing that, another approach to capturing content of pages that load content dynamically as the user scrolls the page (e.g., Twitter, some product web sites) is to simulate user behavior like scrolling down the page. My approach in DEVONthink has been a hack: use a script that runs some javascript commands to scroll the page down, in an attempt to force content to be loaded, before saving the page as PDF. An explanation and code can be found here:

Some other past DEVONthink discussions related to this topic:

3 Likes