Capturing and Stored Webpages

kappabear · November 19, 2019, 9:08pm

Over the years, I’ve used Pocket and Apple’s Reading List to keep web articles for posterity that I’ve encountered. To my knowledge, there’s nothing wrong with Pocket, though when I just re-installed the extension for Safari, I was warned that it “can read sensitive information from webpages, including passwords, phone numbers, and credit cards on all webpages.” Seeing this gave me pause, and I realized that they’re undoubtedly keeping track of my browsing history. Something that I’m not enamored with.

To that end, I’m sure that DT3 can easily do exact what Pocket & Reading List can do. I’m no forerunner, so I’m sure that many others are already successfully doing, so I’m now curious as to exactly how you all are doing it. Are you simply storing a bookmark to the site, downloading the web archive (which could potentially change), storing a PDF of the page, or something else that I haven’t thought of or mentioned?

And by the way, my browser of choice is Safari, so I’ll need a solution for that, unless you can convince me to switch to Firefox.

Thanks!

cgrunenberg · November 20, 2019, 8:10am

That’s exactly the reason why I don’t use any browser extension - you have to trust all of them in the end.

This depends on the website actually. If only the online link is important, then a bookmark is fine. In case of news I take rich notes but any other format like Markdown, formatted notes or PDF is also okay. I wouldn’t recommend web archives as they’re limited to Apple’s platforms, not always compatible to former versions of macOS/iOS due to their poor architecture and might access online resources in case of dynamic websites (and most websites are dynamic today).

kappabear · November 20, 2019, 4:43pm

Criss,

When and how do you determine which format to save your web document in? When do you chose PDF vs. Markdown vs. Formatted Notes vs. Rich Notes? Is it a trial and error thing, to see which format does the best job with a particular site, or something else?

cgrunenberg · November 21, 2019, 8:10am

I use only bookmarks & rich text as I like to edit the text afterwards (WYSIWYG) but others might prefer a different format. It’s not necessary to choose the format page-by-page usually.

kappabear · November 21, 2019, 10:37pm

Criss,

How would you go about capturing all of this page, with or without the ads? I’m currently unable to capture the entire page including all of the images and text.

cgrunenberg · November 22, 2019, 8:17am

Hard to tell, I don’t have an account for this website. Probably I would take a rich note (via services, not via the clipper).

RobH · November 22, 2019, 4:07pm

Is the question about the preference of file type to use, or how to capture everything on the page?

kappabear · November 22, 2019, 4:47pm

@RobH, the question is about the format of the file type to use, and when. My general preference is a single page PDF.

kappabear · November 22, 2019, 5:05pm

Thanks for the tip regarding capturing as a Service vs. via the Clipper. I’d forgotten that was an option, and since you’ve mentioned it, I’ve played around with it and it generally does a much better job in capturing the page.

I intentionally sent you a New York Times article because they are often difficult to capture in a style that I like, because they’re riddled with ads of all sizes. This one in particular not only has ads, but also has a lot of pictures, and capturing the page via Clipping omits the pictures. Capturing the page a single paragraph or picture at a time via Services -> Append Rich Notes works very well, but is tedious. Using Apple Reader View in Safari helps a lot.

kappabear · December 1, 2019, 6:42pm

Here’s a page that I’m having difficulty capturing, in a clean, easily readable manner:

How would you go easily about capturing this page, without having to do a whole lot of manual steps? (Format doesn’t matter to me)

rkaplan · December 1, 2019, 10:14pm

That page works fine as an HTML page - except the ads are distracting in both the original and the HTML page format. Is that your concern? The content itself displays fine.

kappabear · December 1, 2019, 10:28pm

For whatever reason, I generally don’t ever save pages as HTML, usually preferring PDF or Rich Text. Perhaps, it’s because they can’t be saved Clutter Free, as you mentioned.

rkaplan · December 2, 2019, 12:41am

Well you said format does not matter.

Either HTML or bookmark work fine to reproduce the original page.

If the concern is not whether this page “works” to save but rather whether it can be saved clutter-free, I think that is a different issue. I doubt any software can consistently and reliably remove ads and leave all else intact on any website.