I clip a lot of news reports from the web, and I usually use web archive with the clutter-free layout option. I see elsewhere that someone has asked about markdownifier - which I didn’t realise was the instapaper replacement - replaces the original URL. But it also removes any dateline from the main body of news report. But as I need to reference them properly in my work, I have to wend my back to the original piece to get full attribution (byline, dateline). As I like to switch off the internet for hours at a time to get work done, this is very frustrating. Will this be fixed?
I switched from RTF-ing because I was getting loads of blank documents within DT database. If anyone can suggest how best I can clip cleanly and efficiently with all the right data included, I’d be grateful.
Attaching screenshot showing the oversimplifiedness, i.e. no byline, no dateline.
Mac OS 10.12.4
Safari 10.1
DT up to date
There is no standard for authorship used by web designers. If you look at a byline in express.co.uk and one in say, Breitbart, you would see they are referenced in different ways. This makes capturing this kind of data much more difficult. That doesn’t mean it could be done for these sites, but it wouldn’t necessarily work for many other sites.
PS: If it wasn’t for the monetization of these supposed “news sites”, we could just capture pages intact without all the noise, but there’s money to be made with all those link-bait adverts, right? Ugh!
thanks for your reply Jim, but I’ve just checked dozens of clippings taken from loads of different sites (sorry, no Breitbart). they are all identical: I’m attaching three, from the Guardian, NYT and Globe and Mail. But I could attach dozens more screenshots and they’d look alike. So I think the problem is not in the source, but in the markdownifier.
That proves what I was saying. There isn’t a unified standard byline format and the way it’s presented in the underlying code. We can certainly look at this, but it’s not necessarily an easy fix.
Thanks Jim. Is there another option for me then? Should I be saving in a different format to what I currently favour, which is web archive with clutter-free? I don’t, ideally, want to have to need an internet connection to read clipped files in my database. Nor do I want enormous files. I’ve used RTF in the past, but they are usually either blank or corrupted and I can’t risk using it.
Why not PDF? I just clipped a PDF from the Guardian and it’s only a 262KB file.