@pete31,
I was too. And I think I have .
As we’ve said all along, I don’t think there’s anything wrong with your script. I think the process is just failing where there are heavily dynamically created pages. Or other content/elements which it can’t fetch.
The counterargument to that remains the fact that I can reliably Clip a webarchive in DT 100% of the time.
So… this has been the process today:
1 I ran the ‘Check Links’ script on my ‘Computer’ (grey) top level Group, which as 459 webarchives:
2 The script found, logged and put into its ‘Invalid URLs’ Group 52 webarchives (just over 10%… I have kept the (DT) log if it’ll help) that needed attention.
I worked through them one by one - inspecting, confirming and correcting
3 The first such invalid URL was for Spamhaus, which displays like this in the View/Edit pane with the outdated URL http://www.spamhaus.org/rosko/index.lasso
:
The correct URL, of course is: https://www.spamhaus.org
and it should look like this:
(Re-)running your Update webarchive script does not correct it.
4 Another example is this page: `https://www.snoize.com’ (MIDI monitoring software called, Snoize)
Again, the DT ‘Check Links’ script finds that the URL in the webarchive is incorrect. I correct it in the inspector.
But both before and after running the Update webarchive script the page is incorrectly rendered in the View/Edit pane:
5 One last example is a forum where somehow the webarchive insists on hanging on to the arguments/parameters for one particular post - rather than displaying the home page, all that’s contained in that URL.
https://www.sibeliusforum.com
should look like this:
Again, I correct the URL in the DT inspector and run the script; in the View/Edit pane - no matter how many times I rerun the script, it always renders wildly inaccurately:
It does seem to be hanging on - and trying to render - parameters for a particular thread by ‘keyrkenat’, doesn’t it.
Now - I did notice that if - in the View/Edit pane - I right/Ctrl-click and Open Page in Safari - I go to a page which renders correctly as well:
Today I have not touched Update Captured Archive
once.
Happy to provide further information if it helps, @pete31!
I still feel that I may not be approaching the whole question of updating/correcting/editing webarchives properly or as DT expects. I’d love to know, please.
Particularly given the disconnect which has been pointed out between the notional URL visible in the View/Edit pane and the URL field in the Inspector.
Am I missing the preferred/best practice way generally to correct URLs in webarchives? @BLUEFROG, @cgrunenberg, please?