Clipping web page as Web Archive often fails and I end up with a bookmark

sardonicfireplace · September 11, 2024, 5:25pm

This has been happening for a while. When I’m browsing the web on Safari and I want to capture a page as a Web Archive I’ll us the Safari extension which seems to then open up the Sorter. When I use the sorter to save as a Web Archive it will, very often, save as a bookmark in stead of Web Archive. Then I can convert the bookmark as Web Archive inside of DEVONthink and it will work.

Is there a reason that this happens so ofter or am I doing something wrong?

BLUEFROG · September 11, 2024, 5:29pm

Clipping has been discussed at length and it is well known if you’re getting a bookmark, it’s because the page couldn’t be clipped (for whatever reason). If this is persistent, reboot the Mac.

sardonicfireplace · September 11, 2024, 5:44pm

Ah, ok. It seems like it’s happening more recently and also odd that the sorter can’t seem to clip it but within devonthink it can clip it.

Edit: I have no data to back up my claim about it happening more recently.

BLUEFROG · September 11, 2024, 5:54pm

Safari runs its own process while conversion of a bookmark in DEVONthink is a different process.

fredap · September 12, 2024, 9:22pm

Yes, this has been discussed a lot in the past but I do believe there is an underlying bug here.

There are two reasons for me to make this statement:

If I make the capture a second time there is a 100% success rate. I have done this hundreds of times. Make a capture, see a bookmark, make a capture again and it succeeds.
I have an application (not public) that creates a web archive using the API. It doesn’t need to download anything, since the HTML content is part of the API request, but it does contain the original URL where the content is coming from. That fails in about 50% of the times and results in a bookmark. Again doing the export a second time succeeds in 100% of the times.

FrankT · September 13, 2024, 5:33pm

Haha, that’s actually true. Thank you very much for the valuable hint.

Now you should have a rule, if a bookmark document and a web archive document have the same name, then delete the bookmark.

Or does anyone have a better idea?

fredap · September 13, 2024, 7:09pm

Sounds good. Please share once you created this.

BLUEFROG · September 13, 2024, 8:42pm

This won’t be accomplished without scripting as you can’t use a placeholder in the criteria. And criteria have to be specific, like Name is such-and-so or URL begins with https://doublerainbow.com. Those things are obviously going to vary with the content being clipped.

FrankT · September 14, 2024, 8:21am

@fredap Since my imports (initially) are all in the same group, I tried it this way.

If the document is a web archive, nothing happens. If the document is a bookmark (instead of a web archive), it is converted to a web archive. Then a second rule is executed that moves all bookmarks in this group to the trash.

I don’t know if this is the best solution, but it seems to work for my purposes.

fredap · September 14, 2024, 8:50am

Looks pretty good to me. It took me a while to understand that Lesezeichen means Bookmark

I believe there is a small risk that there the second rule removes a bookmark that hasn’t been converted yet.

You can address that by adding a tag in the first rule, e.g. a tag converted_to_webarchive and have the second macro check for that existence.

I also noticed that the second macro is set to manual. Is that to address this issue that you want to be certain that this is indeed the right article? If you add the tag you could perhaps make this fully automated.

FrankT · September 14, 2024, 9:15am

Ah, yes sorry my system is German

True, but everything in this group has already been converted.

This (second) rule should only be executed by the first rule. So only if there is exactly one bookmark in this group. I assume that “manual” means that it is not executed automatically.