I’ve started to archive some web pages instead of just saving the URL.
I would like to store them as PDFs but I don’t want to save the background images, ads, etc. but I do like the keep images that is a part of the article.
I have played a bit with DT PDF page capture and the best result for me has been when I use the Instapaper option.
Now I’m curious how others do this (I’m not interested in web archives), is there some other option that is “better” in some sense, are there other programs that can be used for the capture, etc.
I also like the Markdown capture but I haven’t played enough with it (and I want to keep images on my machine … in case they are removed from the original site).
I do this frequently and my method is to open a text editor, in my case I like Bean for a reason that will become apparent later, and simply select all the stuff I want to keep in the web page and copy/paste into Bean. I then go through the pasted text which includes any images and edit if necessary including resizing the text and images which in the latter case in Bean can be done by double clicking on them and moving a slider to the required size. Then I print>save as PDF to DEVONthink. For me this method works a treat.
The “Reader View” in Safari (View > Show Reader) can produce an acceptable result which can be printed to PDF and sent to DEVONthink using the “Save to DEVONthink Pro” option in the Print view. Reader View doesn’t always yield good results, and is not always available, nor does it always include images.
The Evernote Clipper is also pretty good – but then you have to clip to Evernote and then import to DEVONthink which is a pain if you’ve got a lot to do.
There are tools / bookmarklets available at heckyesmarkdown.com that are useful, too. For strange reasons, the author has made part of their site NSFW, although it’s really just a geeky toolbox.
I tend to capture news items in web archive form, but I highlight the text I would like with my mouse first and then use a keyboard shortcut to import into DTPO.
This tends to keep the file size down, particularly if I can select just the text and not the the extraneous ads and other filler content from the site itself.
What keyboard shortcut are you using? Selecting text and capturing a web archive still yields a full web archive with the selected text as a Comment on the file.
This is doable using the Service ‘DEVONthink Pro Office: Capture Web Archive’, Command-%. I then often convert the Web Archive to a Formatted Note where I can edit/annotate the text.
Yes, exactly as Greg has indicated.
For instance, I just imported the text from this page: wto.org/english/news_e/news … ul15_e.htm
And the resulting import looks like:
File size is 7.2KB.