Writing a DA plugin for Discourse based forums.

Hi there,

is it possible to write a plugin for DA for http://www.discourse.org based forums like

sendegate.de
or
elixirforum.com

Is this possible, since it is heavy javascript based?
If so - can you point me in a direction how?

.
Pfitz -

I also follow a Discourse forum (“Discourse” = software name).

There is no need to write a plugin for this.
DAgent can crawl the forum using just the URL.

Instructions:
DEVONagent Pro > Window > Search Sets > Sites > Location (at the bottom) > “+” (to add).

DAgent sets set the mode to “crawl” by default.

Below is example of how I do that for a Discourse forum.
If anything I wrote here is not clear, please tell me and I will revise until it is clear.

Ok thats one way to do it. But as I understand plugins have the advantage to get better results than simple crawling a page. Otherwise there would be no need to have plugins.
Therefore I ask if it is possible to write a plugin for Discourse based forums.

Thank you … I didn’t know that … so now I am interested in the answer.

Better results how? Faster? More link following?
Where did you see information about this advantage?
I want to read about it, too.

Thats my poor mans understanding. Plz correct me if I am wrong.

You can better narrow the results returned by the search and and filter out pages which does not contain information etc.

A searchSet has that ability, just as well as a plugin. The settings in a search have secondary terms, page scanners, etc. for this purpose.

No joy with these settings. I generally find that DEVONagent is blocked from searching Discourse fora. Perhaps an obscure setting needs tweaked? Not sure.

.
Korm -

Good catch – Thank you for looking carefully.

When I followed my own advice, it was worse than “no joy”.

  1. Results from Discourse forum matched yours = “No Content”.
    The search term I used (“action”), should have presented hundreds or thousands of results.

  2. In DA search sets > sites, I checked [√] only the Discourse forum.
    But DA went ahead and crawled the UN-checked sites as well.

Something is wrong, but where?
I have no idea where to begin looking.

The OP, “Pfitz”, is certainly on target with his question.
Beginning to look like finding the answer won’t be easy.
.

It’s usually a good idea to test a recommendation before posting a recommendation, so others don’t visit rabbit holes. :confused:

It’s always a good idea to check one’s premises before posting.
Because … the Internet is riddled with rabbit holes.

DEVONagent is an early attempt at building a boardwalk to get over those rabbit holes.
But, like all software, it is often remodeled.
And the Internet is continuously under construction, too.
So, expect rabbit holes ahead; lots and lots of them.

Here is the search I just now tested on a Discourse forum:

* site:forum.keyboardmaestro.com action
“Action” is the fundamental element in the software on that forum,
so I expected search results in the thousands.

To minimize variables, I did all searches using browser windows in DEVONagent.

Results:

  • Google: “About 2,220 results”

  • Yandex: “1 thousand results found”

  • Boardreader “no matches”

  • omgili: “no results”

  • Yahoo: “909 results”

  • DEVONagent search set/crawl: “no content”
    (Note: the results above are very messy in other ways.)

What I’ve learned so far:

  • Discourse does expose itself to search engines. (But maybe not fully. Didn’t evaluate that.)

  • Some search engines can not read Discourse.

  • DEVONagent is one of those that can not. That is a problem.
    (Or, maybe the underlying problem is an error I made in my DA search set settings.)

I have no idea where to start looking for the problem.
Even if I found it, I don’t have the coding skills to work around it.

Thoughts, please.

(Thanks, again, to Pfitz, for opening this topic.)

Bump… Anyone at DevonTech willing/able to provide help/answers for this issue?

Discourse forums are getting quite popular and searching these with DA would be very good. Using the web crawl method usually result into a few links found - not the same amount as through the site’s own search method.

EDIT July 1/2018 ----------------------
I got a little bit further with this. With the usual method to determine the search string, the discourse sites show something like:


 <site url>/search?q=<your search string>  

If that is translated to a DA search query with:


<site url>/search?q=_agentQuery_  

we get nowhere. What is being returned is a web page that states “best used with javascript turned on”. Checking the javascript option in DA does not seem to make a difference. However, i noticed that discourse sites are able to return JSON also, so changing the query to:


<site url>/search.json?q=_agentQuery_  

Got at least some results DA could process.

All discourse sites appear to return JSON in the form of:


{
    "posts": [
        {
            "id": 366,
            "name": "mpubot",
            "username": "mpubot",
            "avatar_template": "/user_avatar/talk.macpowerusers.com/mpubot/{size}/12_1.png",
            "created_at": "2018-05-18T14:10:25.888Z",
            "cooked": "<p>The \u201cLive\u201d show is now MPU+. We discuss Macs Sierra left behind, the problem with \u201cfree\u201d software, network vs. direct attached storage, listeners share feedback on Amazon, Ulysses and Scrivener. We learn how to print to PDF from Apple Mail and more.<br><br><a href=\"http://relay.fm/mpu/344\">Show Notes</a></p>",
            "like_count": 0,
            "blurb": "The \u201cLive\u201d show is now MPU+. We discuss Macs Sierra left behind, the problem with \u201cfree\u201d software, network vs. direct attached storage, listeners share feedback on Amazon, Ulysses and Scrivener. We learn how to print to PDF from Apple Mail and more. Show Notes",
            "post_number": 1,
            "topic_id": 360
        }
    ],
    "topics": [
        {
            "id": 360,
            "title": "344: MPU+: Hello, Computer",
            "fancy_title": "344: MPU+: Hello, Computer",
            "slug": "344-mpu-hello-computer",
            "posts_count": 1,
            "reply_count": 0,
            "highest_post_number": 1,
            "image_url": null,
            "created_at": "2018-05-18T14:10:25.816Z",
            "last_posted_at": "2018-05-18T14:10:25.888Z",
            "bumped": true,
            "bumped_at": "2018-05-18T14:10:25.888Z",
            "unseen": false,
            "pinned": false,
            "unpinned": null,
            "visible": true,
            "closed": false,
            "archived": false,
            "bookmarked": null,
            "liked": null,
            "tags": [],
            "category_id": 6
        }
      ]
}

(using the the Mac Power Users new discourse forum as an example).

But here is where I am getting stuck; DA provides these keys to customize the plugin but it is not entirely clear what to set them to:

ResultsKeyPath: set to 'topics' I think?
DatesKeyPath: set to 'last_posted_at'
LinksKeyPath:  set to what?
TitlesKeyPath: set to 'title'
DescriptionsKeyPath: set to 'fancy_title' ?
ContentsKeyPath: set to what?
ThumbnailsKeyPath: leave empty?

I also saw that /t/id leads to the actual post but no idea how to specify this in the plugin definition.

Comparing the results from the site directly vs DA, DA returns either zero or far fewer results even with the ‘similar filter’ turned off. I’ll keep tinkering with this but it would be great if the boys and girls at DevonTech gave some helpful hints… I am not sure if JSON is the way to go or a special discourse sorta plugin needs to be developed…

Anyway, fooling around with this stuff during the russia-spain soccer match seemed more interesting than the match (haha :wink: )

OT, but may favourite comment on that match was from a report that said:

“The VAR team had very little to look at. Much like everyone else in the stadium.”

Would love to get some discussion/answers from the Devon team on this issue…

Yes, one would think there would be a least some sort of developer response on this.

SG