Websites are not created equally. In fact, there are many ways people design and code them. This means a capture, especially one trying to extract an article as the clutter-free option does, isn’t always feasible.
Does it capture as a PDF without the clutter-free option?
I tried MD, MD/clutter free and PDF (cluttered) - no avail. The two markdown options gave exactly the same result as posted before (though on another page – this is basically surrounding noise, not the article itself). The PDF was simply blank.
This is from the free site, so no paywall, no login required.
The relevant text seems to be wrapped in an article element, as seems appropriate. As much as I agree that not all websites are created equal, extracting the text content from an article element should be fairly basic stuff. What can be seen in the MD snippets is actually the content of an inline style element contained in a div(yes, I’d consider that very bad style). Wouldn’t it be sensible to skip inline styles in any case?
Print to PDF (instead of clipping) works with faz.net, Reader Mode active and not.
Clipping works with the print version of a FAZ page. When the printer icon is visible—in my case: not visible in Safari but in Firefox—just click it before clipping. If not, add ?service=printPreview to the URL.
It’s a pain that there is no one way to fetch the content of different webpages. At the moment I can’t get the content of zeit.de on iOS/iPadOS if not by the tortous Share to Print routine. Which loses the URL and thus leads to just more steps of copying and pasting the URL between Safari and DTTG. No, shortcuts are of no help.
Just to prevent any misunderstandings: My complaint was not about DEVONthink’s clutter-free technology.
All clutter-free view modes I know about—in feed reeders like Newsify or Reeder, also the reader modes of different browsers—at some point with some web pages hit a brick wall. There is no unified markup or structure to web pages that makes the actual content reliably distinguishable from other page elements like advertisements. Which are of course the main reason why web page providers have no interest in putting any effort into making this distinction clear.
My guess is the developers apart from using inline styles have set the HTML to minify (that is remove whitespace, line breaks and the like). This is a way to reduce load times, and while common for javascript and css files it’s less common for HTML. This can also happen when Javascript is used to output the HTML, perhaps from a JS based content management system or static site generator.
Could you be more specific about which KM macro you’re recommending? There are lots of different ones in that thread, and none of them come from the handle @houthakker.
Whatever changed in the Updates in the last Period:
Clipping faz.net clutter-free as md works like a charm.
Since I don´t believe faz changed their website structure, great job!