How to prevent external image links from being converted into base64-encoded images in markdown files?

As is shown, when converting from html to markdown, can we opt to keep the original URL instead of hardcoding the whole thing? A markdown file is basically uneditable with several images hardcoded in between its texts.

Apparently, not all images are converted to Base64. Can you indicate the original URL of the HTML document?

Sorry if I didn’t make it clear, but the screenshot is just how I wish the image could be converted into (a markdown link with the original URL) and how they are actually converted into (a markdown link with a long string of base64 codes).
All images in a HTML file can only be converted into base64 for now.

Where is your source HTML document from?

It’s from a rss feed, does that make it different from a saved webpage?

Please send me the original HTML document (and its URL) and we’ll have a look at this, thanks.

I came up with an educational script written in JavaScript:

(() => {
  /* Replacement function: Gets an MD image with data "![](...)" and
     returns the corresponding "![](URL)" from the original HTML
  */
  function replaceDataWithURL(match) {
    return `![](${imgURLs.shift()})`;
  }
  
  const app = Application("DEVONthink 3");
  /* Just an example – use your own UUID or a loop like 
     app.selectedRecords.forEach(r => {...})
  */
  const r = app.getRecordWithUuid("x-devonthink-item://2EFA66B8-725E-4AAD-9616-5E1F0D18A917");
  if (r.type() !== "html") return;
  /* get the raw HTML */
  const rawHTML = r.source();

  /* get the URLs of all img elements in the HTML */
  const imgMatches = rawHTML.matchAll(/<img.*src="(.+?)"/gi);
  const imgURLs = [...imgMatches].map(m => m[1]);
  
  /* Convert the current record to HTML */
  const mdRecord = app.convert({record: r, to: "markdown"});
  
  /* Find all data URIs in the MD and replace them with the corresponding 
    original URL */
  mdRecord.plainText = mdRecord.plainText().replaceAll(
    /!\[\]\(data:image\/png;base64,([^)]+)\)/g,
    replaceDataWithURL);
})()

It works on HTML (!) records by

  • first retrieving the src attribute from all img elements
  • then converting the HTML to Markdown
  • then replacing every base64 image in the Markdown file with the corresponding original URL

Shortcomings:

  • relative URLs will not work correctly
  • if only some of the MD images are base64 encoded, the replacements will not be correct.

The first issue can be overcome by looking for a base element in the HTML and prepending its href attribute to relative URLs. The second issue … that’s more complicated. One would have to consider all images in the Markdown source, not only the base64 ones, and then replace only the base64 images with the corresponding URL. Feasible, but a lot more work.

The next release will revise this if the option to copy Markdown images into the database is enabled.

The man in the forest - Podcasts-1.html.zip (1.9 KB)
Sure, here is a copy, it’s converted to this:

Huge thanks! This is really helpful :sunny:

Just illustrating @cgrunenberg’s comment…
In the upcoming release, DEVONthink will convert the image link if Preferences > Files > Markdown > Import images to group… is enabled…

The image is in the Assets group.

Thanks, this feature is truly awesome! I just enabled it but the images are still converted to base64 ones. Looking forward to next release.

Hello,

I know that’s an old Topic.
But a question:

Having the Images in a Subfolder makes it hard for me to move the note. Specially when in this Subfolder contains multiple Notes.

Is it either possible, to have one subfolder per note, maybe even somehow linked in case of renaming, and moved when the note is moved. And hidden even, when starts with a dot,

or that the Base64 Data in the Markdown, just will truncate in the editor for the “human editor”?

So Images would stay in the document, move with the document and sill It would be possible to edit them without thousands of base65 lines?

And it’s only faintly related to your question – so a new topic would be a lot better.

Not only possible but advocated by some, as a search in the forum would have revealed. Put the MD and the images in the same group (DT does not have folders!) and refer to the images in MD like ![title](image1.png).

Nope. That would break everything. You shouldn’t even have data URIs in your MD file in the first place because they’re just a PITA.

No.

@chrillek is correct on this. There is no requirement to put images in a separate group. In fact, here’s a little trick that helps with copy and paste, drag and drop, or the Tools > Import Online Markdown Images

Set Settings > Files > Markdown > Import: Import images to group as an empty field, like so…

This will automatically put the assets in the same group as the Markdown document. Then just move the entire group as you need to.

Hi,

Thanks, but I think I was misunderstood.

In fact use the “Import Image to group Feature” already.

Maybe my workflow is wrong because I add a note in the Inbox, e.g. during a call, and add Screenshots to it. After the Call or in the evening, I want to move the Notes to the correct Project Group. And then the Problem that the images still in the Resources Subgroup form Inbox.

The better question could be how to improve the workflow.