Problem: capturing content from Safari via Sorter brings up empty file in DT3

bdhond · August 13, 2023, 7:44pm

I’m having this problem, have had it for a long time.
I clip loads of stuff from Safari as rich text, via the Sorter.
Usually a rich text version of the page appears in the Inbox. But sometimes the log window pops up and says “bookmark” and in that case, indeed, a bookmark to the page is stored instead. This is pretty random, a repeat of the clip from the same web page usually resuls in an actual richt text document.
Sometimes also an empty file is produced, length is 0 bytes, although it does have a correct title, URL and the tag I gave it. This is also random: using the 'launch URL" function I can go back to the page, repeat the clipping and it will work correctly (usually) the next time.
This is frequent enough that I now have a smart group “Empty” to spot such files and do the chore of clipping them again…
It’s just not working reliably and thus a major irritant.

cgrunenberg · August 14, 2023, 7:54am

This might be a network issue or an issue related to dynamic websites. Does a certain URL always cause this? One alternative is to use the Take Rich Note service instead but first you just have to select the interesting part of the webpage. But not all browsers support this service (Safari does). The URL is clipped in this case too.

bdhond · August 14, 2023, 2:22pm

It seems random, and the very same site will on some subsequent try store correctly. My log now shows dozens of failed (“bookmark”) attempts to save pages from the NYT and Wikipedia, but usually these are no problem.
Because of this, I also have a shortcut-activated rule to convert any bookmarks in my Inbox to rich text, which will usually succeed, proving there’s nothing wrong with those pages. But this doesn’t work on the empty files, that will just give an error message. Which is strange in itself, because in both cases the URL is available. So for those I have to do a ‘launch URL’ and try the whole thing anew, usually succeeding on the second try, but again not always. You can see why this will mess up a work flow…
If it’s a network issue, I’d be inclined to call it a time-out issue… DT should maybe not be so quick to give up?
And finally,when hopeully testing it, the document created by the service “take rich note” did not have a URL attached to it, which is crucial for me.

cgrunenberg · August 14, 2023, 2:59pm

Which browser do you use? Is DEVONthink allowed to automate the browser? In addition, which version of macOS and DEVONthink do you use?

rmschne · August 14, 2023, 3:00pm

NewYorkTimes has a lot of programming behind the scenes to deliver HTML to you. When I was a subscriber very little of what they published was easily captured in DEVONthink. As with other mainstream media newspapers, often the best way to capture is to “print” to PDF (or hit the Print button they provide, but that rare nowadays), and “save to DEVONthink”.

I don’t do a lot of saves from Wikipedia, but I see with a small test Clutter Free PDF and Markdown does not preview, but PDF does, as does a “print” to PDF and saving to DEVONThink.

Sometimes, if I really want a web site page, I use DEVON Technologies’ “DEVON Agent Pro” and forwatever reasons sometimes gets it when the Clipper doesn’t

I think the “randomness” you see is due to the diversity of technology running the different web sites. The internet is a complex place.

bdhond · August 14, 2023, 3:08pm

I appreciate the input, was editing my reply before realizing it was replied to, so repeating a few points here:
NYT and Wikipedia are usually no problem. It’s not the sites. It’s DT.
I also have a shortcut-activated rule to convert any bookmarks this problem makes appear in my Inbox to rich text, which will usually succeed, again proving there’s nothing wrong with those pages.
But this doesn’t work on the empty files, that will just give an error message. Which is strange in itself, because in both cases the URL is available. So for those I have to do a ‘launch URL’ and try the whole thing anew, usually succeeding on the second try, but again not always. You can see why this will mess up a work flow…
If it’s a network issue, I’d be inclined to call it a time-out issue… DT should maybe not be so quick to give up?
And finally,when hopefully testing it, the document created by the service “take rich note” did not have a URL attached to it, which is crucial for me.

bdhond · August 14, 2023, 3:11pm

Safari. 16.5.1
DT Pro: 3.9.2
Allowed to automate Safari: I don’t know what that means, I get no requests for that and the Sorter often works fine with it. Anything I need to do?

cgrunenberg · August 14, 2023, 3:12pm

See System Settings > Security & Privacy > Automation > DEVONthink 3

bdhond · August 14, 2023, 3:18pm

Safari automation is and was activated.

cgrunenberg · August 14, 2023, 3:23pm

In that case the URL should actually be stored when using services. Does a reboot fix this? If not then a screenshot of System Settings > Security & Privacy > Automation > DEVONthink 3 would be great, thanks.

BLUEFROG · August 14, 2023, 3:43pm

I wouldn’t be too surprised at this. When browsing the web, you are still connecting to a remote server whose connection isn’t going to be static. As an example, notice how YouTube or Netflix will stall and buffer. Add to this the dynamic content coming from remote(r) servers, sending data into the page you’re viewing. This daisychain of servers and networks isn’t a simple straight pipe from the NYT to your device.

bdhond · August 14, 2023, 3:48pm

Wouldn’t the same reasoning explain why it’s impossible for a browser to reliably display web pages…?
Anyway, seeing that web browsers can do this, I could work with selecting text and clipping, if a URL came with it (in case I need more).
But that isn’t the case, even after a reboot. Screenshot attached.

cgrunenberg · August 14, 2023, 4:00pm

Browsers and automation are different stories, e.g. browsers can (and sometimes do) endlessly load additional stuff if required but automation sooner or later has to time out. And of course there’s no user interaction during automation but some web pages require this.

bdhond · August 19, 2023, 7:46pm

I can credit that web context is dynamic and so grabbing content is often not perfect, or even impossible.
My workararoud for this is, that I can at least open the resulting rich text file, and do the following dance:
open document
launch URL (in Safari)
go to Reader Mode (if possible)
select all - copy all
go back to DT
select all - paste.
This is relatively painless, because I’ve automated it in Keyboard Maestro.
But it does depend on there being a rich text file to open. And that is something that I think I should be able to count on for DT to do: if I clip something to a rich text file, the result should be a new rich text file in my Inbox. Not a bookmark, as sometimes randomly happens. Not a defective URL-less empty document, as sometimes randomly happens.
Or am I missing a good reason for this behavior?

BLUEFROG · August 19, 2023, 9:49pm

In Safari you can select text on a doucment, then use Safari > Services > DEVONthink 3: Take Rich Note. This produces a rich text file in DEVONthink without going through the browser extension.

bdhond · August 20, 2023, 2:13am

I’ve tried that. It has some disadvantages.
For instance, this leaves the focus in the Sorter dialog inside the body, at the end of the clipped text. As far as I know, I have to use to mouse to get out of that and get the focus on the tag field, which I always use. This makes it hard to dispatch with just the keyboard (my preference) or to automate.