Mass Convert existing RTF/RTFD to Markdown?

I have years of web captured items that I’d like to convert from RTF with images inline that I’d like to convert to Markdown.

Anyone have a decent strategy for getting this done?

This is not easily done in any kind of efficient way, as a Google search quickly shows.

Third-parties may provide solutions but I personally tread very cautiously with any site who wants me to upload files to them.

Is the hard part the images?

That’s some of it, for sure. But it’s like translating between human languages. Possible, but with a chance of error. That’s why you don’t see a definitive “how-to” on it. Also, RTF(D) is a standard, but not one that’s been fully adhered to, or has been extended. In fact, regarding images, Apple has made their own modifications that aren’t found in files coming from an app like Word.

Well, they can be converted into Base64, if you don’t mind a 33% overhead to your images, and many text editors barfing when they come across a string containing a single sequence of 1,000,000 characters without a break.

I think there’s a couple of markdown editors which will consume RTF and convert to Markdown - Byword may do this, but it’s been a long time since I checked - so you may have some joy applescripting this. But Markdown was always designed to be a source format, not something to be converted into, and RTF was always just visual text, where 15pt bold text may mean a second level heading or just really big bold text - there’s no cues about the intent of the display.

If you poke around Pandoc, you may get somewhere - Pandoc is a ‘universal document converter’ which will have a go at converting between RTF, Word, HTML, Markdown, etc, but while it’s amazing that it exists, I’ve had patchy results with it that required manual intervention. It’s not that the bear dances badly, but that the bear dances at all. At least you can use it through terminal or shell scripting, though.

I agree that Pandoc is an impressive tool but certainly not for the faint of heart (and I don’t advocate messing about in Terminal without disclaimers). I’ve also had spotty results with it myself.

Fly, meet sledgehammer! :laughing: (Couldn’t resist. :mrgreen: )

More Sledgehammer-Machine-Part meet 10,000 flies of different shapes and sizes!

The good news is that the lion’s share of the content is general text, headers, and some quotes, some images and italic/bold/underline.

There are actually a few python libraries that should be sufficient except for needing to identify a strategy of extracting and placing the images in a way that doesn’t get them seen in DTPO, but seen by the MD.

@mikes: Sounds like an interesting project for you to tackle!

Pseudo coding it now! :laughing:

I’m in the same situation with hundreds of files to convert, nearly all without images which should make it easier. I’ll be watching this thread, and I’m looking elsewhere and will report if I find anything.

Don’t hold your breath - the thread is three years old.