Modify HTML and CSS inside a web archive

I am looking for a way (preferably using AppleScript in DTnp) to modify the CSS and HTML inside a web archive.

I have a number of applications in mind, including things like changing CSS media queries to alter conversion of web archives to PDF. It can also be nice to do things as simple as dropping inline style attributes into HTML elements.

I need to be able to automate the process to run on multiple archived pages from the same source with common structure and class names.

Thoughts and/or pointers to example scripts? Thx in advance.

I’ve tried that some time ago but never used it.

Original webarchive:

Modified webarchive:

The script currently handles a single path as I only tested how to do it and never needed a repeat.

You can safely test the script as it creates a new webarchive.

-- Manipulate CSS in webarchive


use AppleScript version "2.4"
use framework "Foundation"
use framework "WebKit"
use scripting additions

property thePath_text : "/Users/User/Desktop/Original.webarchive"

set thePath to (current application's NSString's stringWithString:thePath_text)

set theWebarchiveData to current application's NSData's dataWithContentsOfFile:thePath
set theWebarchivePlist to (current application's NSPropertyListSerialization's propertyListWithData:theWebarchiveData options:(current application's NSPropertyListMutableContainersAndLeaves) format:(missing value) |error|:(missing value))
set theWebResourceData to theWebarchivePlist's valueForKeyPath:"WebMainResource.WebResourceData"

set theString to current application's NSString's alloc()'s initWithData:(theWebResourceData) encoding:(current application's NSUTF8StringEncoding)

set theTextToReplace_escaped to "background-color: rgb\\(230, 198, 132\\)"
set theReplacementText_escaped to "background-color: rgb\\(230, 100, 132\\)"
set theText_new to my regexReplace(theString as string, theTextToReplace_escaped, theReplacementText_escaped)
set theString_new to (current application's NSString's stringWithString:theText_new)

set theString_new_Data to theString_new's dataUsingEncoding:(current application's NSUTF8StringEncoding)

theWebarchivePlist's setValue:theString_new_Data forKeyPath:"WebMainResource.WebResourceData"

set {theWebarchiveData_new, theError} to (current application's NSPropertyListSerialization's dataWithPropertyList:theWebarchivePlist format:(current application's NSPropertyListBinaryFormat_v1_0) options:0 |error|:(reference))

set theOutputPath to (((thePath's stringByDeletingPathExtension())'s stringByAppendingString:" (Modified CSS)")'s stringByAppendingPathExtension:(thePath's pathExtension()))

set {success, theError} to theWebarchiveData_new's writeToURL:(current application's |NSURL|'s fileURLWithPath:theOutputPath) options:0 |error|:(reference)

-------------------------------------------------------------- Handler Section ---------------------------------------------------------------

on regexReplace(theText, thePattern, theRepacement)
		set theString to current application's NSString's stringWithString:theText
		set newString to theString's stringByReplacingOccurrencesOfString:(thePattern) withString:(theRepacement) options:(current application's NSRegularExpressionSearch) range:{location:0, |length|:length of theText}
		set newText to newString as string
	on error error_message number error_number
		if the error_number is not -128 then display alert "Error: Handler \"regexReplace\"" message error_message as warning
		error number -128
	end try
end regexReplace

Here’s the webarchive I used for testing (2.6 KB)

1 Like

First off, I’d forget about Applescript in that context – you’ll need JavaScript anyway to work with the DOM.
Second, why modify the archive instead of changing the css itself? Fort which simply appending the relevant media queries should be sufficient.

Horribile dictu. But even for that, JavaScript and the DOM methods are the way to go. But of course, CSS would be the way to go here: much less trouble to modify that than adding inline style attributes.

Thanks much! That should give me the toe hold into Cocoa that I need.

1 Like