Clutter-Free Capture Fails on WeChat Articles (mp.weixin.qq.com)

igrso · April 16, 2025, 11:48am

Hello DEVONtechnologies team,

First of all, thank you for your continued work on DEVONthink — I’ve greatly enjoyed exploring both version 3 and version 4. The attention to detail in the UI, automation tools, and overall ecosystem makes it one of the most powerful tools I’ve used for research and long-term knowledge management.

Recently, I encountered a persistent issue when trying to capture articles from WeChat’s official public platform(mp.weixin.qq.com) using the Clutter-Free mode. Here’s an example URL that demonstrates the issue:

https://mp.weixin.qq.com/s/4gG0wmpjHS94fXEmAbfIXQ

Problem Description:

When I attempt to download this page using Clutter-Free mode (in both DT3 and DT4), the resulting note is not the actual article, but rather a message page that says:

“环境异常当前环境异常，完成验证后即可继续访问。”
(“Abnormal environment. Please complete verification before continuing.”)

In contrast, when I choose “Save as Formatted Note”, the full article is captured correctly — indicating that the page can be accessed after full rendering. So it seems the issue lies in how Clutter-Free processes and extracts the page content.

Technical Analysis:

WeChat articles are heavily dependent on JavaScript to render content dynamically, especially the main body within the #js_content element.

The Clutter-Free mode appears to rely on static HTML + server-side parsing (e.g., Mercury or a similar engine), which does not support client-side JS rendering.

Additionally, WeChat has anti-bot mechanisms that return verification pages when certain User-Agents or headers are missing.

Suggestion for Improvement:

Since this issue affects many research users working with Chinese media sources, I’d like to suggest the following possible improvements:

Add domain-specific fallbacks
Allow DEVONthink to switch to a full WebKit-rendered view before applying clutter-free parsing, especially for domains like mp.weixin.qq.com.
Support custom extraction rules
Offer a way for users to define content selectors (e.g., #js_content) for sites where automatic detection fails. This could work similarly to Readability or Instapaper’s custom site rules.
Enable a graceful fallback mechanism
When Clutter-Free fails or returns a generic page, allow automatic fallback to full-page or formatted note capture.

Closing Thought:

I fully understand that capturing content from dynamic and protected platforms is technically challenging. Still, WeChat is a major platform in Chinese academic and media ecosystems, and improving capture accuracy here would be extremely valuable for many users.

Thanks again for your excellent work — and for continuing to listen to your community. Looking forward to seeing how DEVONthink continues to evolve.

eboehnisch · April 16, 2025, 2:28pm

Thank you, noted for later examination.

igrso · April 17, 2025, 1:00am

Appreciate it. Hoping for a fix in the near future! It’s a small thing, but it affects a key part of how I use the app daily.