Some pages fail to clip

I am starting to run into some pages I can’t seem to clip from safari/chrome browser. This page for example: https://www.regulations.gov/document?D=APHIS-2011-0003-0001

I like to capture it as a pdf. Paginated or single, clutter free or not, none of the options capture this page. All lead to an empty page or a page that says: “Your web browser must have JavaScript enabled in order for Regulations.gov to display correctly.” (JavaScript is enabled btw).

Capturing it as a web archive only works partially; some content. Only capturing it as an HTML page appears to work.

Using DT 3.0.4 and macOS 10.13.6 or 10.15.3

One workaround might be to print the page to DEVONthink.

For stubborn and very bad designed web pages (as the one you’ve pasted here), I use to, first clip as formatted note with clutter free enabled. Once captured in DT, I go to the page itself, select the things I want, copy, and paste into DT captured Formatted Note, normally overwriting all.

Then I format it as I like it and then, if I want a “static” one, convert into PDF.

BTW,

Regulations.gov - Proposed Rule Document.pdf (133.5 KB)

Ai, that looks MUCH better than the original. Nice!

Well, it does have .gov at the end so “stubborn”, “bad” and a few more of such words is built-in and assumed :slight_smile:

I hesitate to go through as many steps as you outline but must admit that the result you provided below is very good. Thanks!

EDIT:
Going through your steps the result is invariably the same: I get a page with “Your web browser must have JavaScript enabled in order for regulations.gov to display correctly.”
Safari as well as DT3 have javascript enabled.

Oh never knew that existed (!) Goes to show the depth of DT3’s functionality (or my lack of explorative capabilities). Anyhow, that method works and gets me by.

Still makes me wonder; what is the essential nature of failure here? Is it the website’s interaction with its own context/data loading mechanism that cannot be overcome by DT3 or is it a DT3 issue? Or both? Even if the website is built-up a bit crappily, there are quite a few of these and it would be nice to see how DT3 could overcome this…

It’s probably caused by the dynamic nature (JavaScript) and the slow initial loading of the webpage. This causes probably a timeout and/or the background renderer assumes that the page was completely loaded and creates the result too early.

Thanks for that. Any helpful suggestions to overcome the “javascript must be enabled” problem?

It is the same in Spain. You must have a specific Java version and, yes, a specific Internet Explorer (yes, old good IE).

Once you are used to, it is not so slow. Shift-C does the capture, then you select the parts of the web, go to DT, select all text in captured page, paste. I have a shortcut to justify text (Control+Option+Cmd+J) and another to convert into PDF (Control+Option+Cmd+P). Then steps are:

  1. Select the parts I want to capture
  2. Press Shift-C
  3. Press enter
  4. Press Cmd+C
  5. Go to captured Formatted Note
  6. Cmd+A
  7. Cmd+V
  8. Cmd+A
  9. Control+Option+Cmd+J
  10. Control+Option+Cmd+P
  11. Delete Formatted Note.

It seems more complex that it really is, and I guess most parts could be scripted.

1 Like

Only the already suggested one to print the page to DEVONthink.