I tried each file type, each with different failures:
PDF: contains essentially everything on the page except for the actual article, as in, all the links to other articles, enable cookies banners, subscribe banners, etc. everything I don’t want and nothing that I do want.
Webarchive: similar to the PDF, but now interactive! I can click on elements like denying cookies, but…the actual article isn’t present.
Markdown: Grabs the text of only the last section (Part 4).
HTML: This is the strangest one. From my Mac, it produces something akin to the actual webpage, with all the interactivity and animations and such. The Kind is listed as an HTML document in DT. It syncs successfully to DTTG. However, after I open it in DTTG and then go back to DT, it shows up as a Formatted Note with an un-dismissable subscribe banner in DT. After syncing again, if I come back to the file on my iPad, it’s garbled code (see below). Once it’s in this state, there’s no way back.
Summary: Clip to HTML on DT>Sync>View in DTTG on iPad; looks good>Sync>Open in DT>Auto-converted to garbled Formatted Note.
What would you suggest? Should I just give up trying to archive the articles from MIT Tech Review (which I have access to)?
I used Bear’s Web Clipper and it captured all the text and images as Markdown, albeit without the interactive elements (like the charts with hover tooltips and such), which is understandable.
On my iphone i showed the page in Reader View and made a PDF. Looks complete but I did not check for all the content. I did notice it is a very fancy and complicated web page. Very. See attached
Unfortunately, that only grabs the first three sections. Thanks for checking though.
Agreed, this is a very complex webpage. Bear was able to clip the full text just fine, though.
I’m trying to determine how to integrate DEVONthink into my ecosystem. I’ve used Bear for nearly a decade for web clipping, archiving PDFs, and my own notes, but would really like to move the clippings/PDFs out of there and reserve it for only things that I myself have written. DEVONthink is so close to satisfying this, but there are a few breaking issues rather than the “it just works” stability that I’ve come to trust Bear with.
Oh yeah I agree, Bear and DEVONthink satisfy different niches, which is why I want to use both, especially since I want Bear to be for notes Ive written and DEVONthink (or some other program) to be for things I haven’t written (research papers, articles, etc.).
Bear:
Pros:
Phenomenal writing experience; everything gets out of the way of writing.
Best in class web clipper
Images more seamlessly integrated into the note file itself.
Cons
Markdown only; not designed for PDF management
DEVONthink:
Pros:
Supports many different file types.
Better PDF management than Bear.
Cons:
Writing notes in DT is not as clean as Bear (subjective, I know)
Web clipper not as performant as Bear’s.
Images in markdown need to be stored in their own assets folder, which isn’t as seamless.
All these cons aren’t enough to not use one or the other, but it does require me to understand how I need to adapt my workflow. I want to use DEVONthink, just a matter of finding the right way of doing so within my ecosystem.
And yeah, MIT has definitely made it hard to archive their articles. If I pay for an article, I want to be able to reference it whenever. But it seems as though more and more of the Internet is becoming ephemeral and subject to the “you don’t own this; you can view/use for as long as we deem allowable” philosophy.
No, they “do not have to”. You can do that, but there’s no obligation. It depends on how you link to the images.
Whenever you want to use images in MD, you have to make decisions. There are tools that make them for you, which might be easier at a first glance. DT leaves it to you where you put your images – more flexibility, but a greater burden (perhaps) for the user.
It all boils down to the fact that MD was not developed as a note-taking format, but as an easier way to write HTML. And in the latter you also have to decide where to put your images.
Don’t take this the wrong way, but I know. I’ve read all the guidance on images in markdown files in DT. I’ve read all the threads here on the matter. I know there are a ton of options in DT (but not all of those options work for DTTG). None of them are done the way Bear does it, which is all about reducing the burden on the user to near-zero even if it’s not pure markdown.
This isn’t a criticism, it’s an observation based on using both programs. DEVONthink, for me, has a different use-case and value proposition than Bear. When I want to write, I just want to write, and sometimes that writing requires more than what plain text can support but not so much that a full blown text editor is required. That’s Bear, which is markdown with some limited extra bells and whistles.
Even right now, I’m in DTTG on my iPad trying to add an image to a markdown file, and it doesn’t work. As in, the picture just doesn’t even paste into the .md file in Edit mode. On DT on my Mac, it works; image goes into the Assets group exactly as I set it up. In DTTG, it doesn’t. On Bear, it works no matter what platform I’m using.
I don’t want to turn this into another “Images in Markdown” debate. That’s been beaten to death. I want to focus on the issues I’m having with the DT/DTTG web clipper not performing to the standards I need.
But you use Bear’s web clipper because you think it’s better (even though you probably didn’t write the clipped text yourself )
I don’t know Bear, but if a note-taking program has a better web clipper than DT, I find that surprising. Shouldn’t that be one of DT’s core competencies?
Could you be so kind as to briefly explain what Bear does better? Maybe it could be (easily?) integrated into DT if the developers also think it’s worthwhile. Thank you very much.
Interesting, but I have no idea what that means. I can write anywhere, even in TextEdit. It’s just that I can’t organize my work there.
I use Bear’s clipper because it’s better (which I’ll explain in a bit) and because Bear is what I’ve gravitated towards over the years. But:
I’d prefer to clip web articles and PDFs into DEVONthink since I haven’t written them; that’s a philosophical preference.
My workflow historically has been to clip articles and PDFs into Bear and also use Bear for writing my own notes.
Bear is not designed for being a PDF manager the way DT is and it doesn’t support the wide array of file types the way DT does.
Therefore, I’m wanting to adapt my workflow so that: I clip resources into DT and then write in Bear.
As for why Bear’s web clipper is better (in my experience): it captures all the text of a webpage minus the clutter. If there is an article with four parts (not four pages, just four sub-header parts), I expect all four to be clipped. Too often have I seen DT (and other programs) fail in this regard. Yes, sometimes Bear fails, but it’s far less often than DT and the other programs I’ve tested over the years.
Case in point, the article I linked in my original post. DT failed in clipping that article using every file type I tested: PDF (paginated/one page; reader mode or not), web archive, HTML, markdown. When I used Bear, it clipped the entire article minus the interactive charts, which is fine because it’s markdown. Admittedly, it’s a complex article. But Bear still grabbed it all minus the clutter, so it’s definitely possible.
Finally, what I mean by Bear being a phenomenal writing tool is this (from a previous comment): When I want to write, I just want to write, and sometimes that writing requires more than what plain text can support but not so much that a full blown text editor is required. That’s Bear, which is markdown with some limited extra bells and whistles. When I used Bear, it clipped the entire article minus the interactive charts, which is fine because it’s markdown.
I’m absolutely fine with DT not being my writing program. That’s not what my post is about. I want DT to be the best web clipper, resource repo, reference materials archive, etc. that it can be.
Just an FYI: Bear’s clipping extension is much more complicated than ours, employing multiple intersecting JavaScripts to isolate content. Remember, there is no clipping standard and teams generally develop their own methods independently. Clipping will be improved as time allows.
“More complicated” … I see, in this case that probably means “better.”
I’m glad to hear that. I think many DT users hope that every type of content can be transferred and stored as perfectly as possible in DT, at least as well as in Bear.
You must have your reasons. “Writing” is a weird process that takes place mainly in your head. The app plays a minor role … for me … maybe it’s different for others.
Cognition surely happens in your head, but writing is inextricably linked to the medium. The medium plays a huge role in writing, and this has been established in numerous studies on knowledge retention, recall, and creativity. Anyway, I’m really not here for a discussion on the philosophy/psychology of writing. That’s another topic that has been beaten to death, and anything we say here will not be novel, nor will it contribute to the thread topic.
@BLUEFROG I’m glad to hear that continued work is being done on the DT web clipper. It’s not a deal-breaker. I just wanted to make sure I wasn’t missing something and wanted to bring attention to its odd behavior with that particular article. Yes, Bear has a great web clipper, even if it’s complex under the hood. I only mentioned it as an example of a web clipper that successfully clipped that particular article. I definitely see a central place for both programs in my workflow, each doing the thing that I think they are best suited for (according to my uses).
I’m not sure if this is what you want. It is done in Edge macOS with Just Read with a custom css. First scroll to the end of the page. And printed to PDF with custom margins.
You example makes very heavy use of javascript to load and render content dynamically. For pages and sites like that, you often get a better result if you load them in DEVONthink’s internal browser and capture from there.
Sometimes it’s enough to login on a site once and let DT store cookies to get rid of of popups and banners in future clippings. But on this particular page, where the article content itself relies so heavily on javascript, I doubt that would be enough.
If you use Safari, another alternative is File > Export as PDF. That loses interactive elements but otherwise generally keeps a very faithful representation of the page.
I think that is a bug under investigation – nothing to do with this example in particular. But even after the bug is fixed, I wouldn’t choose HTML in the clipper to capture this page. The HTML is mostly a wrapper for all the javascript; it won’t work offline without all the external resources it’s supposed to load. And it will almost certainly break at some point because of changes on the host.
You could use something like wget to download a true offline copy, HTML + all dependencies. That’s another story. Firefox has a similar option for saving complete pages.
But if you prefer the result from Bear’s clipper, why not continue using it? You can always export the output to DEVONthink. I see that Bear has some support for Shortcuts, so you could probably automate the export to some degree.
Thank you for the detailed response. I agree that this webpage is heavy and filled with features. There are a number of other examples that I can provide that don’t seem to be as heavy (e.g. NatGeo articles that are mainly just text and images, though I’m sure the parent company Disney has them bake in a ton of trackers and other nonsense that actually interferes with clipping). I’ll try the tip of loading the webpage within DT’s browser and clipping from there; hopefully that has better results.
Good to know that the html oddity is a known bug under investigation. I only tried html here to round out my attempts at clipping the webpage, sending me down a bit of a rabbit hole.
As for why not keep using Bear’s clipper? I might do that until DT’s performs consistently enough for me. It’s a bit of a philosophical decision for me to keep Bear for what I’ve written and DT for what I haven’t written. It’s not the end of the world for me to clip to Bear and then send to DT, but I’d like to eventually see the need to do so obviated by DT web clipper improvements. No software is perfect, and the Bear+DT pair seems to cover pretty much everything I need for research and writing.
FYI, I tried clipping with DT as clutterfree webdoc and RTF, and got only part 4. I tried several other tools – Safari’s Reader mode, Feedly, and Instapaper – and none of them were able to get past Part 3.
Using Safari’s inspector, it looks like the issue is that the animated heading for part 4 has been coded poorly and is missing resources. It has enough to display but is throwing errors, which I assume the clippers don’t like.
The web isn’t the static system it once was; pages build depending on javascript logic, late requests, dom manipulation, GDPR compliance blocks. DEVONthink’s internal capture system needs to run headless, so can’t get you to interact with the myriad hurdles, or even benefit from content blockers and existing cookies (as it cannot share safari data).
If you’re prepared to capture with Markdown, and the page is befouled, I use MarkDownloader to copy the page as markdown to the pasteboard in Safari and paste over DT’s capture in the DT document viewer; this at least preserves metadata, but can blow out embedded media captures.
The other option is to ‘print to inbox’ from Safari, which uses the safari renderer to create the PDF, but won’t have all the metadata if you used DT’s own clipper - for example, it won’t know what URL it came from.
Kid-developers these days, they’ve never known a site without 500kb of javascript scaffolding
I have not been able to make any of the Devonthink web clippers work with any of my web browsers. However, this is what I do, firstly on my iPhone.
IPhone.
I have DTTG installed on my iPhone. When I come across an interesting article, what I do is I use the iPhone share option which then enables me to share to DTTG and as a default it saves as a PDF. This is what I did in respect of your MIT article and it saved it perfectly. It then synced into the cloud and then came down on my DT macOS application.
Mac Studio
If I come across an interesting article on my web browser (Safari), I have two ways of capturing this into Devonthink. Most of the time I can just right click on the article and select print page. If that is not available, and it sometimes is not, then I go to the Safari menu option and select File Print.
Having got the print page window open, I go down to the PDF button with its down pointing option arrow to bring up more options. and I have available as an option Save PDF to Devonthink ( I think I added this facility from something in DEVONtechnologies | Handbooks and Extras). I do that and it instantly puts it into my Devonthink Inbox.