Markdown - auto download linked web images?

Recently I am using the Markdownload extension for web clipping quite a bit, because it is sleek and generate good results. I also use DevonThink as RSS archive, with default news format as html.

The download markdown /HTML files usually contain images, which are only links to original sites. I was to make these images embedded images, so in case the original site is down I won’t lose them.

I supposed converting them to web archive format will do the job, but when I look at the code of the web archive, it is still just a scr linking to the web image, instead of embedded.

A possible workaround is to convert it first to RTF(D) and then to web archive, but that does not retain the adaptive layout offered by the benefits of css, thus not ideal either.

I would like to ask it this is the default behavior that if I convert a markdown file with outgoing image link into web archive, they will not be converted to embedded image? If so, is there any way I may forced the conversion to embed all linked images with my preferred format?

(My preferred format is markdown/ html/ web archive, because they are better for mobile view. For now I am converting them to PDF because PDF still retain good layout and embed image. But PDF is generally bad on small screen.)

I realize in the newly released DevonThink 3.8 there is an added functionality add image by simply drag and drop, and respective image can be added to a designated folder automatically. It will be very nice if you may extend this functionality to download all web images linked in the markdown file, and save it to a designated folder, preferably with smart rule/ applescript support. Please consider.

Thank you!

2 Likes

This isn’t possible (at least without scripting, parsing Markdown & downloading images on your own) but we’ll consider this for future releases.

Thanks for considering! That will be useful for both web clipping and RSS (as automatic archive of some important sites).

You could write a script that loads the image, converts its content to base64 and replaces the src attribute with a data url. This will, however, increase the file size tremendously and probably cause problems when you want to edit the MD file.

Alternatively, you could write a script that downloads the image and replace the src attribute with an URL pointing to the local copy. Which will break your MD for if you send it to someone else or want to use it outside of DT.

There’s no perfect solution.

That would be a fantastic feature! The problem with trying to persist anything from the web without downloading the images is that the web is not guaranteed to persist and images often make up a significant part of the meaning of a resource (i.e., technical articles). The benefit of markdown is that you gain text that is persisted and can be easily manipulated, the con is that you do not have the images. Of course, you can just import via PDF and OCR it, which is what I do sometimes. It just feels like a lot of overhead for something that should be text and images. It’s basically using a whole platform for that.

There are formats that were developed to contain (!) text and images, like PDF. others were not developed with this goal in mind, like markdown and HTML. Both of the latter can be made to include images, incurring huge disadvantages (horribly big files and loss of legibility).
So instead of forcing a hammer to behave like a screw driver, use the screw driver. Aka PDF (or rich text)

I agree that we shouldn’t alter a standard format that doesn’t support the feature we want. I was approaching the discussion from a product perspective, not solutioning. Since we are now discussing solutions, I think that PDF is a very heavy format for this, you are essentially creating a different type of web platform and running that engine just to save some text and images. RTF seems to be a bit wonky With the results not being that accurate of a representation in my experience.

Markdown is flawed, but it does allow for the text to live as plain text which is beneficial. I wasn’t thinking of modifying markdown in any way. One solution might be to copy the images to local storage within DEVONthink and modify the links in the markdown to point to the new location. As mentioned, this would make moving the file elsewhere difficult, however it would be trivial within the scope of DEVONthink instances, probably not too difficult to have some kind of export that would at least leave it in an OK state, or might not be that necessary for many users. Something to consider at least.