Hi,
I’m currently exploring the most suitable way to store content from the forum in my personal knowledge database. So far, I’ve been experimenting with different formats. I was reasonably happy with using the webarchive format—until I learned (from other posts here) that it’s not fully self-contained, as dynamic content may still be downloaded from the internet.
PDFs aren’t fully satisfactory either, especially when code snippets require horizontal scrolling—the text often gets truncated because the PDF captures only the visible state of the page.
I’m leaning toward saving content as formatted notes, but I’ve encountered an issue that seems to be related to Discourse: when I clip a forum page as a formatted note, the background is set to black with white text. This doesn’t happen when clipping in formatted note format from other websites.
Could this be a Discourse-specific issue? Is there a way to fix it? Tweaking DEVONthink settings (under Editing → Format) doesn’t seem to resolve the problem…
N.B. I’m using DT4 beta1
Did you capture this in light or dark mode? Which browser did you use?
I did capture in light mode (the “Appearance” setting is set to “Auto”)
Safari 18.4
Sequoia 15.4
Confirmed in 15.5 beta2 as well …
… and Sonoma…
… and 3.9.9 on Ventura.
Clipping from Safari and the internal browser yields the same result.
For some reason clipping as a formatted note gets you the print page, even if you’re viewing the normal page… While still saving all of the CSS for the normal page. (That’s 1 MB of just CSS! It even includes styling for the chat window)
Looking at the HTML source of the live page, it seems the light and dark themes are loaded dynamically:
<head>
...
<link href="/stylesheets/color_definitions_light_2_3_851c5d3147238d1f6aaff24e498fea8e72138b47.css?__ws=discourse.devontechnologies.com" media="(prefers-color-scheme: light)" rel="stylesheet" class="light-scheme" data-scheme-id="2"/>
<link href="/stylesheets/color_definitions_dark_1_3_87fdfbc0c83dfea454c3d82549deb4d860d1236d.css?__ws=discourse.devontechnologies.com" media="(prefers-color-scheme: dark)" rel="stylesheet" class="dark-scheme" data-scheme-id="1"/>
...
</head>
But in a formatted note, the CSS is saved to an inline <style>
element. Both are just a :root
selector setting values for the same variables. The dark values appears after the light values in the inline style
, so they take precedence.
(Clipping directly from the print page doesn’t change anything.)
FTR: I see the same behavior in Firefox, macOS 15.3.2. I’m using light mode, always.
I don’t want to hijack my own thread
, so I’m happy to open a new one if more appropriate—but I’ve also noticed something else:
When I clip a Discourse page (like this one) in Markdown format, only one post from the thread gets saved—seemingly at random (in this case, it was the post from @troejgaard).
I tested this behavior in both Safari and Edge with the same result.
(I’m also not sure if we should split the thread, but I’ll post here for now)
That’s because Discourse relies heavily on javascript to load content dynamically, which means you have to render it in a browser window.
Using Tools > Capture > Markdown Text or the Sorter or browser extension downloads the page separately in the background before converting, so it doesn’t load properly.
Some alternatives:
- Use the SingleFile browser extension.
- This is somewhat similar to clipping as a formatted note: a self-contained HTML file. But it gives greater control, usually results in much smaller files, and it preserves the look of the page better.
- If you save the normal page, make sure to scroll and load the full thread before saving. (Again, Discourse loads everything dynamically.)
- This also keeps the syntax highlighting for code blocks, which is nice.
- Override the print styling. You can do this with a custom style sheet in Safari, or by using a browser extension to inject CSS in the page.
- Use DEVONthink’s system services to clip from the print page. (The print page loads the full thread in a simplified, static view, which makes the services easier to use.)
Since PDF’s are nice for annotating I like that option. But like you, I’m not satisfied with the default print styling. Text is too small, no margins, code blocks get truncated, and I prefer a different font (Safari defaults to Times New Roman). I haven’t figured out a way to keep the syntax highlighting, but I’ve solved the rest.
I use the Userscripts extension for Safari with this:
/* ==UserStyle==
@name discourse Print
@description Adjust print styling of discourse.devontechnologies.com etc.
@match https://discourse.devontechnologies.com/*
@match https://talk.macpowerusers.com/*
@match https://meta.discourse.org/*
==/UserStyle== */
/* Note: Doesn't always load before the print dialog appears. Have to close print dialog and print again. */
@media print {
:root {
--base-font-size: 16px;
}
body.crawler, body.crawler :is(h1,h2,h3,h4,h5,h6), body > noscript {
font-family: Georgia !important;
/* Important is necessary to override. Main stylesheet is set to `serif`. */
}
.wrap {
max-width: 100%;
padding: none;
}
#topic-title h1 {
width: 100%;
}
.topic-body {
width: 100%;
}
body.crawler div#main-outlet, body > noscript div#main-outlet {
padding: 0;
}
#main-outlet {
padding-top: 0;
}
pre {
max-height: none !important;
}
pre > code {
text-wrap: wrap;
overflow-wrap: break-word;
font-size: .8em !important;
max-height: none !important;
tab-size:;
}
.badge-category {
display: inline !important;
}
@page {
margin-top: 1.5cm;
margin-bottom: 1.5cm;
margin-right: 1.75cm;
margin-left: 1.75cm;
}
}
1 Like
Thank you very much for pointing me to SingleFile!
It seems to do a great job saving Discourse content as a self-contained .html file, including images and preserving the layout of the page. The file size is also about half that of a formatted note. 
While it’s not directly integrated with DEVONthink, importing it is just an extra step, so I think I’ll rely on this method—at least for now—to collect snapshots of relevant forum posts.
For other websites, I’ll continue experimenting with the DT clipper.
I checked with a formatted note for this very thread and the file produced by SingleFile. Same size (1,2 MB). And it’s very improbable that SingleFile produces a self-contained HTML that is much smaller – the size is mostly determined by the img
elements, which are using data URIs. And those take up a lot of space.
SingleFile has a lot of settings to filter, block and/or compress various page resources. You can create different profiles and set them to automatically apply depending on the URL. And if you use the “Annotate” feature before saving, you get a GUI where you can delete parts of the page, which can potentially reduce the size a lot. (You can also select part of the page and choose “save selection” from the context menu, which can be quicker than saving the whole page and menually deleting parts).
If nothing else, it preserves the page more faithfully than a formatted note.
Quite possible – at least in the case relevant here. White background.
I was only referring to @weirded’s statement, that it renders a smaller HTML. Which it did not in my one-document test without any particular settings.
And I still find it difficult to imagine that a HTML document is at the same time complete, self-contained and smaller as a formatted note capturing the same original. They can apparently remove alternative images, like those needed for mobile devices, resulting in a smaller file. But is that still the same document?
And there’s the option to “remove hidden elements”. Which is quite ominous – does it remove all elements that have currently a hidden
attribute? Or where display
is set to none
? Or those that are currently positioned outside the viewport? The first two approaches might result in the document displaying differently on mobile devices, for example.
I’m not saying that the tool is not good or even ingenuous. But one should be aware of consequences settings can have. If an HTML document is designed to be displayed differently on different devices, removing seemingly unneeded information might result in it not working as intended anymore.
It actually includes both the light and dark theme. It just saves them in a way that works. Instead of two :root
selectors (and everything else) in the same style
element, it does this:
Saving this whole page with my default configuration gives me a file of 910 kB (vs. a 1.3 MB formatted note). I don’t remember exactly what I changed from the factory settings.
Yeah, I won’t claim to understand what all of the settings do. There are some explanations in the built-in help, but for the full details you probably need to check the Github repo. For example:
Option: remove hidden elements
Check this option to remove all hidden elements. Checking this option can help to reduce the size of the file without altering the document. It may also increase the CPU consumption and the time needed to save a page.
It is recommended to check this option
I would guess that doesn’t fully satisfy you.
In my experience it generally displays fine on mobile, and better than a formatted note.
True, if you want a pristine copy of the original page, I don’t see how you can make it smaller. There are tradeoffs. But for a working visual representation with some of the cruft removed, I think it’s a good option.
I did try again with this same thread and I got the opposite result, i.e. formatted note smaller than the file produced by SingleFile (I didn’t modify any of the default options…). 
Anyway, I’ll stick to SingleFile for Discourse captures for the moment - at least until the background issue of the formatted note is resolved.