I’m trying to use TextStart
and TextEnd
to focus on the actual text content of a page. I have an entry that looks like:
"TextStart" : "<div class=\"usertext-body may-blank-within md-container \" ><div class=\"md\"><p>",
The problem is, the matching HTML actually appears before the result that I want. So I need to skip past the first instance of it. And I don’t see a way to choose a different string, because the server produces some dynamically generated content that I can’t match against.
So my next thought was to use StripTags
to strip out the content that I don’t want. It’s in a simple side bar that is like <div class="side">... all the content ...</div>
and so I tried:
"StripTags" : [
"<div class=\"side\">", "</div>"
]
Unfortunately, DevonAgent crashes when trying to process the results. I don’t know if I’m using StripTags
correctly or not – the help says “[array] Array of strings marking HTML tags defining blocks to strip.” but doesn’t provide any examples.
How can I exclude a part of the web page, and match against the second instance of my TextStart
?