Dumping rtf for md

I am interested in converting all my rich text notes to plain text Markdown.

The three obstacles to overcome are:

  1. Volume: I currently have almost 1300 rtf notes in my database, the vast majority of which are imported rather than indexed. Obviously, this calls for a script.
  2. Data conversion: DevonThink URI’s aside (see below), this should not be a problem, for example using textutil.
  3. Hypertext: My notes are interlinked in a tight web using x-devonthink-item:// URIs. It is crucial that these links remain valid, and this is, to my mind, the thorniest issue.

Grateful in advance for any ideas.

Thanks for this pointer, @korm. I’ll check out Terpstra’s utility, though this is really the smoother segment of the journey. I should have clarified that my intention is not to merely replace formatted text with unrendered Markdown text still stored in an .rtf file. Instead, I want to replace all my (binary) .rtf files with .md (plain text) files. The trouble is that, to my understanding, it is impossible to replicate the unique DevonThink identifier (and therefore the URI) of the .rtf file and assign it to the respective .md file. Or is it not? Or am I missing an obvious solution?

Excellent, @korm. Many thanks. (I had been convinced that .rtf is binary, probably because I had never really looked at a dump carefully, and at first sight it looks atrocious, evoking a very hexdump-y impression :laughing: )

Nitpick: RTF files can contain things like images encoded as binary data: biblioscape.com/rtf15_spec.htm#Heading49

Not nitpicking at all, actually, because I now realize that some of my .rtf files do contain photos, therefore I need to sift through my database and exclude those from the conversion process. (Or, alternatively, convert them to Markdown docs with links to image files).

Cheers.

An analogous mdd format wouldn’t be a bad idea, actually: a package file bundling the markdown text and the images linked to. Maybe I should try to lobby John Gruber about this!

      • UPDATE: Problem solved, I think. I was oblivious to the “Format > Change to RTF/Plain text” menu item (Cmd-T), which does precisely what I need, namely convert an .rtf into a .md file with the same UUID (and vice-versa).

OK, progress update, folks.

I used a Ruby script available on Github, which nicely converts .rtf to into .md (I had to tweak it to handle special characters in filenames, will submit pull request soon).

Then, using the Automator, I wrote a service that applies the said script on all selected Finder files.

So far so good. The trouble is that:

  1. If I take each .rtf file in DT and produce a corresponding md. file, the latter file will have no UUID (let alone the same one as the original .rtf file). Therefore all x-devonthink-item:// links to that content will still lead to the .rtf file. In short, we get broken links and an orphaned .md. “Repairing” the database does not help.

  2. If I erase the contents of the .rtf file and replace them with plain-text Markdown content without changing the .rtf extension (a hack intended to maintain the UUID of the file), DevonThink fails to display it, because it apparently invokes the RTF viewer, which finds no RTF markup.

Obviously, this is the domain of DevonThink indexing, a black box in which we have no access.

Does anyone have any further ideas? Surely, there should be a way to properly convert (as the menu entry indicates) an .rtf file into plain text: i.e. replace the .rtf file with a plain-text file carrying the same UUID. At present, the “Convert to” menu entry is a misnomer: DevonThink rather reproduces an .rtf file into a plain text file with a different UUID.

@korm,

My requirements have not changed, but perhaps I haven’t been clear enough.

In a nutshell, I want to replace every .rtf file in my database with an equivalent .md file, while also retaining the x-devonthink-item:// links between these documents.

(I realize I inadvertently typed .txt instead of .md in my earlier post. My apologies, typo now corrected.)

With regard to your two points:

#1. The “Format > Change to [rich text/plain text]” toggle in DevonThink actually replaces the .rtf file with a .txt file carrying an identical UUID. So while your understanding of UUID is strictly correct, in DevonThink the term seems applied in a looser sense, as two different entities (the .rtf and the .txt) can have the same UUID (albeit not at the same time and in the same database).

#2. Yes, it was a Marked.app-based scenario that I was contemplating as a worst-case workaround. But it now seems this won’t be necessary.