Search inside RSS feed's html file?

I subscribe to an RSS feed and the feed format is set to automatic. That means it downloads html files. The html files contain numerous links to websites. DEVONthink will search the displayed contents, but it will not search the internal contents of the html file.

For example, the html file displays Plexicam… but inside the html file the link is “https://www.plexicam.com/”. If I search DEVONthink for “Plexicam” it will find that file. However, if I search DEVONthink for “plexicam.com” it will NOT find that file.

Any solutions any one can think of for this?

I have tried converting to every other available file type and it will not find the internal link data regardless of format.

Thanks, guys!

1 Like

Convert to Markdown and use hidden preference IndexRawMarkdownSource.

From help:

IndexRawMarkdownSource: Index the source code of Markdown files instead of the rendered content only


You probably don’t want to convert every new record manually, so set up a Smart Rule,

  • kind:news
  • convert to Markdown
  • (and, if you like, delete the HTML record)

Didn’t test but it should work.

Hi @dogstir - do you have a URL for the RSS feed?

That may be more useful to help troubleshoot.

What’s creating the RSS feed?

https://www.macgeekgab.com/rss/mgg_mp3.xml

I couldn’t get the hidden preferences to work, but thanks for the tip… I’ll keep trying.

The hidden preference relates specifically to the source of Markdown files.

PS: Did you try @pete31’s smart rule suggestion?

Hmm. That’s a puzzler. Just tried it myself, and I’m experiencing the same thing.

Let us know if you find a solution!

What did you try? This hidden preference is either disabled (which is the default) or enabled. So there’s little that could go wrong.

I tried the command line to enable that hidden preference.

Please try this

  • menu Help > DEVONthink 3 help
  • search for IndexRawMarkdownSource
  • click the second search result
  • find the IndexRawMarkdownSource hidden preference
  • click the On link
  • restart DEVONthink

(I’m pretty sure my suggestion should have worked even without restarting DEVONthink as newly created Markdown records’ sources should be immediately indexed (i.e. you should have been able to find URLs in them). Rebuilding a database is only necessary for Markdown records that were created before IndexRawMarkdownSource was enabled, but that’s not the case here)

A restart of DEVONthink never hurts when changing hidden prefs.

Okay, I finally got indexing of raw markdown to work (confirmed by creating my own markdown file and searching for non-displayed link).

However, I have also confirmed that “convert to markdown” from html file saved from RSS feed does NOT include the links.

Thanks for all of the suggestions, though. Hopefully DEVONthink will add this feature soon.

It does include the links if you first move (or duplicate) the HTML record out of the feed.

That’s something I wasn’t aware of, before posting the suggestion I only tested whether converting a normal HTML record (i.e. that was not created via a feed) to markdown would work.

If you include a step in a Smart Rule that moves (or duplicates) the HTML record before converting it to Markdown it should work.

That did it! Thank you so much!

Also, for anyone who is looking at this in the future, I forgot to mention that the only way I could get the hidden preference setting to work was to right click on the “on” link, copy link, and then open Safari and paste the link and press enter to open it in Safari. For some reason, clicking the link in the help document does nothing.

1 Like