Web archives just aren’t practical as an offline solution.
Webarchives are still practical for many sites, just not all of them.
PDF is too rigid, especially since being able to edit the captured page is key (to remove useless footer stuff, for example)
PDFs are still the best option for static representations of a page. If they’re captured as Paginated PDFs, you may be able to excise pages at the end of the document.
I guess, but then you get all kinds of crap that can’t be removed. And the pagination is static, too. What if you want to convert from Letter to A4 at some later point?
Have you tried the SingleFile extension? You can use it to remove practically any elements you desire, then save the page as html, which is static but also infinitely editable and which can be converted to practically any other file format at a later date.
I’ve found when converting WebArchives to Paginated PDFs that documents containing mathematical formulas with exponents are expressed incorrectly. The exponent comes down off its perch. Any thoughts on correcting this?
First, sorry for the long delay in reacting on this thread, things were busy IRL.
Unfortunately, I have come across many pages where capturing as PDF cuts text in the middle of lines, e.g. https://www.keensoft.es/en/alfresco-devcon-2019/ . At the moment, capturing web content is a bit of a mixed bag with DEVONthink – I would welcome a new, independent approach beyond PDF and WebArchives. However, I also understand how tricky it is to even decide on a format in this context. Let’s hope some alternative will present itself.
I simply love WebArchives!!!
Cannot imaging living without them.
What I noticed is, that DTTG (probably DT too) seems to refer to the remote site!
So, most of the time when I do something with WebArchives, I disable Internet access for the iPad and then handle the WebArchives - otherwise, DTTG tries to access the orginal website, as it seems.
This can be annoying, of course.
So a way to disable active internet access for WebArchives would be GREAT
Alternatively, clipper can capture webpages with SingleFile. It produces a self-contained HTML file with all the images, styles, and scripts. The files can be viewed with a regular browser so don’t require anything special but WebKit.
Thanks a lot for this reference! SingleFile looks like a very interesting project; unfortunately, due to “real life” interfering, I won’t have the time to test it for a while. Perhaps someone else is also interested and could look into DEVONthink / Safari integration of this tool, possibly via a script?
Thank you for the suggestions @pete31. I’m just not having much luck capturing the entirety of the document. When I print to PDF, the file is subject to the whims of a paper page size, which proves to be too narrow to contain the code samples
True, @BLUEFROG. I was able to get a much higher percentage of the code parts to render by going from Portrait to Landscape, but it did not get everything. Just for kicks I created a page that was 50x50 inches and it appears to have captured all the information. monstrous page pdf.pdf (377.5 KB)
It is obviously too wide, but having to tweak and proofread kind of obviates quick and easy clipping.
Umm, I now checked the records I captured carefully and they are also missing parts (PDF) or do not wrap (Webarchive). So that was a bad suggestion (was in a hurry …)