I exported my blog articles from SquareSpace as an XML file. I want to convert all of the articles in the XML file to Markdown in DEVONthink. My understanding is that I’ll need a script to accomplish this. If so, can someone explain to me how to find such a script and how to install it in DEVONthink? I know nothing about scripts. Or, is there an alternative way to convert an XML file that contains hundreds of articles to Markdown?
This is probably the best solution because it’s readily available. To parse XML in JXA/AppleScript-ObjC looks like a pita, and that’s only half of the task…
I agree with the gentle-persons above about using the zcaceres script, and note that the script is NOT installed in DEVONthink. Follow the instructions exactly as provided by the author.
An other approach, of course, is to scrape each blog post while displayed in Squarespace, one-by-one, using DEVONthink’s Sorter, or DEVONthink’s services, or third-party apps such as MarkDownload.
If the Python script others have discussed works for you that’s a great solution.
I have done a good bit of XML transformation work though and sometimes it is not quite as easy as it frist seems. In particular if your XML data is nested then many off-shelf scripts fail.
There is a scripting language called XSLT which is purpose-buillt for converting XML files to other formats.
There is an XSLT processor for the Mac called Xmplify which works well to execute XSLT scripts on a Mac.
Xmplify is from a small independeent developer in Australia. Damian is very helpful in giving advice on XSLT projects and available for a reasonable fee to write an XSLT script if needed.
I create all my blog items (goes to Substack) in DEVONthink and then past into the blog. This way I never have to extract from the blog. After you get all your stuff out, you may wish to reconsider how to write blog posts.
Maybe I was a bit too quick. @Bmosbacker does say he knows nothing about scripts. The instructions take for granted that you already have git and Python (+ pip) installed. And if you’re unfamiliar, it might not be obvious that you’re supposed to interface with all of this trough the Terminal.
Thankfully it’s very easy to install the requirements with a package manager like homebrew. If installing homebrew and git is too much overhead just for this, you can get Python from https://www.python.org/downloads/. And it should be enough to download the github repository directly from the page.
Then unzip the directory and place your squarespace.xml file in it like stated in the instructions. Open the directory in Terminal (step 2 under “Installation”) and follow the rest of the instructions.
@rmschne Indeed. I don’t have a blog. But if I ever wanted to start one, I would always write – and keep – a local copy of every post. In general I never write anything more than a few lines in a web interface. Even for this forum I write most posts in a local text editor and paste them into Discourse. And for just for a few lines, I still often prefer to write locally.
Thanks everyone for your kind responses. Using Git and any programming language is beyond my skills. I’ll work out something, but I appreciate the kind responses.
I think it looks scarier to you than it really is The script I found looks dead simple to use. You only need a little bit of setup, then a single command to run the script – no further input should be required. Why not spend 20 minutes to give it a try?
Then download a zip of the Github repository like shown in the screenshot. Unzip the directory. You will have a folder named squarespace-export-to-markdown-main. Its location shouldn’t matter too much, but just put it in your ~/Documents directory.
Just for good measure, duplicate your exported .xml file to work with a copy. Move it to ~/Documents/squarespace-export-to-markdown-main. Rename the file squarespace.xml.
Now:
Launch Terminal.app.
Drag & Drop your squarespace-export-to-markdown-main folder onto the Terminal icon in the dock. (This opens a Terminal session at the location)
Type, or copy & paste:
pip install -r requirements.txt
This installs two Python packages that the script needs.
(lxml to process XML and BeautifulSoup for scraping from the web. I assume this is for the images.)
You’re now ready to run the script. You run it by typing this in the Terminal:
python3 script.py
That’s it!
The script does have three different flags (options) you can choose to specify. From the readme:
--download_images: Use this flag to download images.
--img_url IMG_URL: (Optional) Specify the base URL for images. The default value is https://images.squarespace-cdn.com. In most cases, you won’t need to change this.
--namespace NAMESPACE_URI: (Optional) Specify the namespace URI. The default value is RDF Site Summary 1.0 Modules: Content. In most cases, you won’t need to change this.
I would guess your blog has some images you’d like to include in the end result. In that case, this is what you type in the Terminal:
Unless he already has some other version of Python installed
I am pretty comfortable with Terminal, Github, and similar command line apps. But when Apple terminated support for Python 2.7 I wound up with quite a bit of frustration simply continuing to use Python apps that I had already installed. I know some experienced Devs use apps which create a sandbox for a designated Python version to avoid this.
I have concluded that using Replit is much simpler. It is an online IDE which supports almost every language in the universe and will evern install apps directly from a Github URL. If you have questions/problems its AI feature is pretty good at solving it for you. It handlels 100% of the headaches to set up dependency software.
I don’t rely much on Python, so I still have to read up on a proper sandboxing workflow. homebrew changed its handling of python a while back, which did give me a little trouble.
Recent versions of macOS include a python3 command in /usr/bin/python3 that links to a usually older and incomplete version of Python provided by and for use by the Apple development tools, Xcode or the Command Line Tools for Xcode. You should never modify or attempt to delete this installation, as it is Apple-controlled and is used by Apple-provided or third-party software. If you choose to install a newer Python version from python.org, you will have two different but functional Python installations on your computer that can co-exist. The default installer options should ensure that its python3 will be used instead of the system python3.
But that is the 3.13.1 documentation – and I linked the 3.12.8 installer, as it was the top one. The 3.12.8 documentation doesn’t include the same explicit statement. I’m not sure how different the installers are, so I’ve changed the link to the 3.13.1 installer.
To be extra sure, maybe it’s better to invoke Python with python3.13:
Thank you for your kindness in taking the time to provide a detailed explanation. I appreciate it. But I’m not willing to deal with the terminal and Python scripts when I have zero experience with either, particularly given that this is a one-off project. I have too much work going on to risk messing with Python scripts and end up “constricting” my ability to get my work done.
It is important, but not important enough to take the risk (or the time) of messing up my system given the many projects I have underway. It is probably not wise for someone with zero experience with coding, using Git, or the terminal to mess around with such given this is a one-off project. I’ll find another way to move my content.
Your question presupposes that one should devote time to gaining experience in any field. I believe it is wiser to carefully choose to devote time and energy to endeavors that are likely to have a long-term high ROI. In my case, I would not realize a high ROI by devoting limited time to learning a skill that, given my profession, I will have little use for in the future. I also have no passion for scripting, coding, or mastering the terminal.