As is shown, when converting from html to markdown, can we opt to keep the original URL instead of hardcoding the whole thing? A markdown file is basically uneditable with several images hardcoded in between its texts.
Apparently, not all images are converted to Base64. Can you indicate the original URL of the HTML document?
Sorry if I didn’t make it clear, but the screenshot is just how I wish the image could be converted into (a markdown link with the original URL) and how they are actually converted into (a markdown link with a long string of base64 codes).
All images in a HTML file can only be converted into base64 for now.
Where is your source HTML document from?
It’s from a rss feed, does that make it different from a saved webpage?
Please send me the original HTML document (and its URL) and we’ll have a look at this, thanks.
I came up with an educational script written in JavaScript:
(() => {
/* Replacement function: Gets an MD image with data "" and
returns the corresponding "" from the original HTML
*/
function replaceDataWithURL(match) {
return `})`;
}
const app = Application("DEVONthink 3");
/* Just an example – use your own UUID or a loop like
app.selectedRecords.forEach(r => {...})
*/
const r = app.getRecordWithUuid("x-devonthink-item://2EFA66B8-725E-4AAD-9616-5E1F0D18A917");
if (r.type() !== "html") return;
/* get the raw HTML */
const rawHTML = r.source();
/* get the URLs of all img elements in the HTML */
const imgMatches = rawHTML.matchAll(/<img.*src="(.+?)"/gi);
const imgURLs = [...imgMatches].map(m => m[1]);
/* Convert the current record to HTML */
const mdRecord = app.convert({record: r, to: "markdown"});
/* Find all data URIs in the MD and replace them with the corresponding
original URL */
mdRecord.plainText = mdRecord.plainText().replaceAll(
/!\[\]\(data:image\/png;base64,([^)]+)\)/g,
replaceDataWithURL);
})()
It works on HTML (!) records by
- first retrieving the
src
attribute from allimg
elements - then converting the HTML to Markdown
- then replacing every base64 image in the Markdown file with the corresponding original URL
Shortcomings:
- relative URLs will not work correctly
- if only some of the MD images are base64 encoded, the replacements will not be correct.
The first issue can be overcome by looking for a base
element in the HTML and prepending its href
attribute to relative URLs. The second issue … that’s more complicated. One would have to consider all images in the Markdown source, not only the base64 ones, and then replace only the base64 images with the corresponding URL. Feasible, but a lot more work.
The next release will revise this if the option to copy Markdown images into the database is enabled.
Huge thanks! This is really helpful
Just illustrating @cgrunenberg’s comment…
In the upcoming release, DEVONthink will convert the image link if Preferences > Files > Markdown > Import images to group… is enabled…
The image is in the Assets group.
Thanks, this feature is truly awesome! I just enabled it but the images are still converted to base64 ones. Looking forward to next release.