Markdown - auto download linked web images?

vit · October 10, 2021, 8:15pm

Recently I am using the Markdownload extension for web clipping quite a bit, because it is sleek and generate good results. I also use DevonThink as RSS archive, with default news format as html.

The download markdown /HTML files usually contain images, which are only links to original sites. I was to make these images embedded images, so in case the original site is down I won’t lose them.

I supposed converting them to web archive format will do the job, but when I look at the code of the web archive, it is still just a scr linking to the web image, instead of embedded.

A possible workaround is to convert it first to RTF(D) and then to web archive, but that does not retain the adaptive layout offered by the benefits of css, thus not ideal either.

I would like to ask it this is the default behavior that if I convert a markdown file with outgoing image link into web archive, they will not be converted to embedded image? If so, is there any way I may forced the conversion to embed all linked images with my preferred format?

(My preferred format is markdown/ html/ web archive, because they are better for mobile view. For now I am converting them to PDF because PDF still retain good layout and embed image. But PDF is generally bad on small screen.)

I realize in the newly released DevonThink 3.8 there is an added functionality add image by simply drag and drop, and respective image can be added to a designated folder automatically. It will be very nice if you may extend this functionality to download all web images linked in the markdown file, and save it to a designated folder, preferably with smart rule/ applescript support. Please consider.

Thank you!

cgrunenberg · October 11, 2021, 12:53pm

This isn’t possible (at least without scripting, parsing Markdown & downloading images on your own) but we’ll consider this for future releases.

vit · October 16, 2021, 10:52am

Thanks for considering! That will be useful for both web clipping and RSS (as automatic archive of some important sites).

chrillek · October 16, 2021, 2:39pm

You could write a script that loads the image, converts its content to base64 and replaces the src attribute with a data url. This will, however, increase the file size tremendously and probably cause problems when you want to edit the MD file.

Alternatively, you could write a script that downloads the image and replace the src attribute with an URL pointing to the local copy. Which will break your MD for if you send it to someone else or want to use it outside of DT.

There’s no perfect solution.

tedhogan · November 28, 2021, 3:06pm

That would be a fantastic feature! The problem with trying to persist anything from the web without downloading the images is that the web is not guaranteed to persist and images often make up a significant part of the meaning of a resource (i.e., technical articles). The benefit of markdown is that you gain text that is persisted and can be easily manipulated, the con is that you do not have the images. Of course, you can just import via PDF and OCR it, which is what I do sometimes. It just feels like a lot of overhead for something that should be text and images. It’s basically using a whole platform for that.

chrillek · November 28, 2021, 3:58pm

There are formats that were developed to contain (!) text and images, like PDF. others were not developed with this goal in mind, like markdown and HTML. Both of the latter can be made to include images, incurring huge disadvantages (horribly big files and loss of legibility).
So instead of forcing a hammer to behave like a screw driver, use the screw driver. Aka PDF (or rich text)

tedhogan · November 28, 2021, 6:31pm

I agree that we shouldn’t alter a standard format that doesn’t support the feature we want. I was approaching the discussion from a product perspective, not solutioning. Since we are now discussing solutions, I think that PDF is a very heavy format for this, you are essentially creating a different type of web platform and running that engine just to save some text and images. RTF seems to be a bit wonky With the results not being that accurate of a representation in my experience.

Markdown is flawed, but it does allow for the text to live as plain text which is beneficial. I wasn’t thinking of modifying markdown in any way. One solution might be to copy the images to local storage within DEVONthink and modify the links in the markdown to point to the new location. As mentioned, this would make moving the file elsewhere difficult, however it would be trivial within the scope of DEVONthink instances, probably not too difficult to have some kind of export that would at least leave it in an OK state, or might not be that necessary for many users. Something to consider at least.

cgrunenberg · February 25, 2022, 12:43pm

This is supported since version 3.8.1.

ulmulm · August 14, 2023, 7:57am

Is that possible to do the same thing when I import a md from local disk?

cgrunenberg · August 14, 2023, 8:09am

No, this option supports only online images.

melik · April 20, 2025, 2:07pm

This script does not work for Firebase URLs which Roam Research and Tana use. I believe it works for URLs that end with image extensions like .png or .jpeg, but it fails when the URL doesn’t include an image extension. Is there any way to modify it to work with these URLs, or could you please share the script so I can customize it myself?

BLUEFROG · April 20, 2025, 2:27pm

Are you referring to these…

melik · April 20, 2025, 5:19pm

Yes, I am.

If you don’t have a meaningful answer, then don’t bother replying.

chrillek · April 20, 2025, 7:50pm

The answer is meaningful. No one will bother to accomodate a deprecated technology. And a snappy attitude will probably not motivate more people to provide answers that you might consider “meaningful”.

And the script your referring to is not provided by DEVONtechnologies nor by a forum regular. Just raise an issue on its GitHub page if it doesn’t do what you want.

melik · April 21, 2025, 4:53am

How did you think his answer was even remotely meaningful? Did you really believe that telling me it’s deprecated would somehow make my need disappear?

Also, I’m genuinely getting tired of empty, self-important replies that don’t move anything forward. If you don’t want to help, maybe just skip the thread instead of lecturing people.

chrillek · April 21, 2025, 8:22am

You refer to a script. The only script ever mentioned in this thread, which is nearly four years old is the one in the first post, which is a browser extension available on GitHub. This forum is not about code on GitHub.

OTOH, you are mentioning three products (Firebase, Roam Research, Tana) without linking to them or providing any useful information as to what these products do, what kind of links they produce, what you’d like to happen etc. Which means, you either expect readers to know these products in detail. Or to find out what they do to provide answers that you deem “meaningful”.

Your final question is if you can “modify it to work with these URLs”, where “it” probably is “the script” – which is publicly available on GitHub. Of course, you can clone the repi and modify the code if you are so inclined. But as you’re responding to a post saying “this is supported since version 3.8.1”, you might be referring to something else – but what?

You want “meaningful” responses. What about “meaningful” questions that make it easy for people to help you? Most of us do not get paid to read and answer here, and we do it in our spare time. The only person who gets paid here asked you a simple question because your post was cryptic. And you felt immediately treated badly.

If you really want help, I suggest you start a new thread with a meaningful title and a post that enables people to understand what you’re trying to do and with which tools.

You are welcome to ignore this post.