Search inside RSS feed's html file?

dogstir · May 1, 2022, 8:02pm

I subscribe to an RSS feed and the feed format is set to automatic. That means it downloads html files. The html files contain numerous links to websites. DEVONthink will search the displayed contents, but it will not search the internal contents of the html file.

For example, the html file displays Plexicam… but inside the html file the link is “https://www.plexicam.com/”. If I search DEVONthink for “Plexicam” it will find that file. However, if I search DEVONthink for “plexicam.com” it will NOT find that file.

Any solutions any one can think of for this?

I have tried converting to every other available file type and it will not find the internal link data regardless of format.

Thanks, guys!

pete31 · May 1, 2022, 8:56pm

Convert to Markdown and use hidden preference IndexRawMarkdownSource.

From help:

IndexRawMarkdownSource: Index the source code of Markdown files instead of the rendered content only

You probably don’t want to convert every new record manually, so set up a Smart Rule,

kind:news
convert to Markdown
(and, if you like, delete the HTML record)

Didn’t test but it should work.

searchingforanswers · May 2, 2022, 10:30pm

Hi @dogstir - do you have a URL for the RSS feed?

That may be more useful to help troubleshoot.

What’s creating the RSS feed?

dogstir · May 3, 2022, 4:46pm

https://www.macgeekgab.com/rss/mgg_mp3.xml

dogstir · May 3, 2022, 4:46pm

I couldn’t get the hidden preferences to work, but thanks for the tip… I’ll keep trying.

BLUEFROG · May 3, 2022, 4:52pm

The hidden preference relates specifically to the source of Markdown files.

PS: Did you try @pete31’s smart rule suggestion?

searchingforanswers · May 3, 2022, 6:06pm

Hmm. That’s a puzzler. Just tried it myself, and I’m experiencing the same thing.

Let us know if you find a solution!

pete31 · May 3, 2022, 6:55pm

What did you try? This hidden preference is either disabled (which is the default) or enabled. So there’s little that could go wrong.

dogstir · May 3, 2022, 11:40pm

I tried the command line to enable that hidden preference.

pete31 · May 4, 2022, 12:02am

Please try this

menu Help > DEVONthink 3 help
search for IndexRawMarkdownSource
click the second search result
find the IndexRawMarkdownSource hidden preference
click the On link
restart DEVONthink

(I’m pretty sure my suggestion should have worked even without restarting DEVONthink as newly created Markdown records’ sources should be immediately indexed (i.e. you should have been able to find URLs in them). Rebuilding a database is only necessary for Markdown records that were created before IndexRawMarkdownSource was enabled, but that’s not the case here)

BLUEFROG · May 4, 2022, 3:21am

A restart of DEVONthink never hurts when changing hidden prefs.

dogstir · May 4, 2022, 11:35pm

Okay, I finally got indexing of raw markdown to work (confirmed by creating my own markdown file and searching for non-displayed link).

However, I have also confirmed that “convert to markdown” from html file saved from RSS feed does NOT include the links.

Thanks for all of the suggestions, though. Hopefully DEVONthink will add this feature soon.

pete31 · May 4, 2022, 11:47pm

It does include the links if you first move (or duplicate) the HTML record out of the feed.

That’s something I wasn’t aware of, before posting the suggestion I only tested whether converting a normal HTML record (i.e. that was not created via a feed) to markdown would work.

If you include a step in a Smart Rule that moves (or duplicates) the HTML record before converting it to Markdown it should work.

dogstir · May 5, 2022, 1:12am

That did it! Thank you so much!

Also, for anyone who is looking at this in the future, I forgot to mention that the only way I could get the hidden preference setting to work was to right click on the “on” link, copy link, and then open Safari and paste the link and press enter to open it in Safari. For some reason, clicking the link in the help document does nothing.

system · May 4, 2025, 1:12am

This topic was automatically closed 1095 days after the last reply. New replies are no longer allowed.