Importing an URL, MarkDown and PDF version of a website?

I used to add just URLs to DEVONthink or DTTG.
But changing that later to MarkDown and PDF or WebArchive can be difficult to do and it seems to be better to this right from Safari and just do all steps …

So, is there some way to do exactly this?

Add the URL of the currently visible website to DT, at best adding some tags and automatically also adding the MarkDown and PDF / WebArchive version - using the same tags?

Just hoping :wink:

Otherwise, any idea how this could be done at best?

Many thanks

Currently visible where?

Here is a possibility with a smart rule processing captured bookmarks…

The tags added in the clipping are preserved on the conversion.

1 Like

I’m curious on why do you want to keep 3 versions of the same document (bookmark, markdown and web archive)?

Welcome @lecrazyfrog

I am curious about that as well. :thinking: :slight_smile:

Ohh, that’s fantastic.
I will try this, many many thanks!

I just noticed that sometimes a later conversion does not work or works differently, when converted later, compared to doing the same directly from the open page in Safari.

And this is what I meant, importing a version of a website that is currently visible in Safari … like it would be, when adding it to Safari.
Don’t know any other way to do this.

Anyways, I know no way to run such a SmartRule for a website … so I would need to add the website first from Safari, then switch to DT and run the SmartRule.

Or could this directly be done from Safari, so that only one step is required?

Strange question :wink:

  1. Bookmark is an URL, not the content …
    So this should be clear to you.

And for PDF, WebArchive and MarkDown:

The content ist different!
Partly massively different.
Also the type of content and in general what is included, and in which format.

Should be easy to understand.

The WebArchive should be the most natural version of a website that I want to archive.
But it requires a Webbrowser to use. But it is the best type of archive.

If I am only interested in the text, which may be the prime part of a website, a MarkDown archive may be far better and more easy to handle and to work on, for example in an editor.

But sometimes, the content is some mix of text, layout and images - in this case, the content ist way better be stored as PDF and can be worked on in an PDF viewer.

So, I finally do not even understand the question … should be logical, as given by the type of content and possible use cases.

And now, I don’t want to think hard which version may be best!
I just want them all, as I anyways need the WebArchive (as it’s the best archive, in case the website vanishes) and in addition also MarkDown or PDF … so why not both in addition? Less thinking and less bad decisions. Quite easy :slight_smile:

You’re welcome :slight_smile:

I am using this Smart Rule now:

Screenshot 2021-12-19 at 21.07.31

While it works OK, I noticed problems with PDFs and WebArchives!

  1. PDFs often have the problem that they do not show the relevant content, but for example some overlay that asks for cookies to be accepted.
    We discussed about this already and there does not seem to be a solution for the problem.

Can the DT URL Importer (some other product, cannot remember the real name) fix this?

  1. WebArchives strangely are NOT used to display the content, if opened in Safari or DT.
    Instead the REAL website get’s opened … which is sure not what people want. This is the same behavior as on the iPad.

On the iPad, I could fix this by simply disabling Internet access before opening the WebArchive.
But this would be a bigger problem on a Mac …

Is there any solution?

For example, the bookmark / webloc from this URL:

If converted to a WebArchive, it still refers to the real live website and also displays a cookie banner - sadly, this cookie banner cannot even be closed!
This rectangle in the lower left corner is always there!

  1. We have some modifications that may provide better capturing of dynamically delivered content - all too prevalent on site nowadays.
  2. This is not a DEVONthink issue. The webarchive isn’t a static format in all cases. It captures the page’s markup. If that page is getting dynamic content, it’s not going to display if the network is disconnected since it contact remote servers to download the info to display.
    Again, this may be alleviated in some cases, on some sites, but you’re never going to have a 100% solution.

So PDFs may get better soon?
Great to hear :slight_smile:

About the WebArchives …
When I do them manually in Safari, they look “better” to me.

I uploaded both versions and send you a PM …

Also, I think that DT should offer a way to display the pure webarchive’s content - without reaching out to the internet.

I had hopes for “Quick Look” from the “View” menu to offer this, but sadly, it also reaches out to the internet.