RSS feed not retrieving all items

Jayboux · January 7, 2023, 8:19pm

I recently used the n8n automation platform to create an on-demand RSS feed for the journal First Monday.

Example Feed XML:

<?xml version="1.0" encoding="UTF-8" ?><rss version="2.0"><channel><title>First Monday</title><link>https://firstmonday.org/ojs/index.php/fm/index</link><description>First Monday is one of the first openly accessible, peer–reviewed journals on the Internet, solely devoted to the Internet.</description><item><title>Twitter spam and false accounts prevalence, detection, and characterization: A survey</title><link>https://firstmonday.org/ojs/index.php/fm/article/view/12872/10749</link><description><![CDATA[The issue of quantifying and characterizing various forms of social media manipulation and abuse has been at the forefront of the computational social science research community for over a decade. In this paper, I provide a (non-comprehensive) survey of research efforts aimed at estimating the prevalence of spam and false accounts on Twitter, as well as characterizing their use, activity, and behavior. I propose a taxonomy of spam and false accounts, enumerating known techniques used to create and detect them. Then, I summarize studies estimating the prevalence of spam and false accounts on Twitter. Finally, I report on research that illustrates how spam and false accounts are used for scams and frauds, stock market manipulation, political disinformation and deception, conspiracy amplification, coordinated influence, public health misinformation campaigns, radical propaganda and recruitment, and more. I will conclude with a set of recommendations aimed at charting the path forward to combat these problems.]]></description><guid isPermaLink="false">b31e78b95c3f841f64bf74695bb81353</guid></item><item><title>Dreading big brother or dreading big profit?</title><link>https://firstmonday.org/ojs/index.php/fm/article/view/12679/10753</link><description><![CDATA[States and companies around the world have intensified their collection of personal information. China’s information state and its digital economy are particularly industrious data collectors. The resulting extensive exposure of Chinese citizens’ personal information could reasonably provoke privacy concerns. To date, the relative distribution of concerns toward government and companies, as well as the structural and ideological roots of privacy concerns in China, are not yet well understood. Concerns over personal information being combined in a big data scenario have not yet been examined in the Chinese context. Drawing on an original online survey from 2019 (N = 1,500), representative of the Chinese online population, this study reveals that concerns about data collection by government are low, albeit modestly elevated among individuals who are ideologically not aligned with the state. By contrast, concerns over data collection by companies are both extensive and consensual across key socio-structural and ideological divides. Surprisingly, the combination of government and commercially collected personal information does not multiply concerns. Thus, the Chinese authoritarian information state is perceived as a safety device for, rather than a threat to, citizens’ personal information. Extensive state interventions in the digital economy converge with broadly shared popular concerns about corporate information privacy practices.]]></description><guid isPermaLink="false">1b79b11f21d3c549a6a426ea8dfa46af</guid></item><item><title>Radical bubbles on YouTube? Revisiting algorithmic extremism with personalised recommendations</title><link>https://firstmonday.org/ojs/index.php/fm/article/view/12552/10752</link><description><![CDATA[Radicalisation via algorithmic recommendations on social media is an ongoing concern. Our prior study, Ledwich and Zaitsev (2020), investigated the flow of recommendations presented to anonymous control users with no prior watch history. This study extends our work on the behaviour of the YouTube recommendation algorithm by introducing personalised recommendations via personas: bots with content preferences and watch history. We have extended our prior dataset to include several thousand YouTube channels via a machine learning algorithm used to identify and classify channel data. Each persona was first shown content that corresponded with their preference. A set of YouTube content was then shown to each persona. The study reveals that YouTube generates moderate filter bubbles for most personas. However, the filter bubble effect is weak for personas who engaged in niche content, such as Conspiracy and QAnon channels. Surprisingly, all political personas, excluding the mainstream media persona, are recommended less videos from the mainstream media content category than an anonymous viewer with no personalisation. The study also shows that personalization has a larger influence on the home page rather than the videos recommended in the Up Next recommendations feed.]]></description><guid isPermaLink="false">4fab72a0406c6a1f402e74e42a808ed1</guid></item><item><title>Pundits, presenters, and promoters: Investigating gaps in digital production among social media users using self-reported and behavioral measures</title><link>https://firstmonday.org/ojs/index.php/fm/article/view/11604/10748</link><description><![CDATA[Through the lens of Bourdieu’s field theory, we investigate the relationship between the social characteristics of social media users and their differentiating practices in producing digital content. Matching survey data with self-reported user profiles and one year of actual posts on Twitter, we found four online fields of lifecasting, politics, promotion, and entertainment. Users tweeting positively about entertainment held higher levels of social capital. From 2011 and 2017, we found a reduction in lifecasting was accompanied by the rise of promotion.]]></description><guid isPermaLink="false">13595a7e5470cc6f6a4ba92a2208d33f</guid></item><item><title>Evaluating e-Government: Themes, trends, and directions for future research</title><link>https://firstmonday.org/ojs/index.php/fm/article/view/12526/10750</link><description><![CDATA[As observed by e-Government scholars, the use of digital technology in the public sector has intensified during the last decade. With digital infrastructures becoming more complex, we argue that it is important to study the evaluation of their use. In this paper, we study the literature on the evaluation of e-Government. With a bibliometric analysis of keywords, combined with a narrative analysis of highly cited papers, we sought to characterize the literature on evaluation and identify themes and trends to propose directions for further research. Our findings reveal seven themes of e-Government evaluation research. Our analysis of highly cited papers disclosed that the literature is characterized by a service-dominant logic, with citizens’ adoption of government services and Web sites being assessed by statistical analysis of surveys. We note the shaky theoretical foundations of e-Government evaluation, a field that has been subject to a plethora of localized models. We conclude that evaluation efforts in e-Government research have been characterized by studies in which digital technology efforts have been judged on narrow grounds, such as ease of use. Based on these findings, we propose a research agenda that includes a shift from evaluation of services to a focus on “big questions”, such as emancipation and democracy. We propose that scholars should undertake more case studies of evaluative practices to form stronger theories with solid empirical foundations.]]></description><guid isPermaLink="false">9272d6d9399d8e8d260570fc7753c692</guid></item><item><title>Leaking black boxes: Whistleblowing and big tech invisibility</title><link>https://firstmonday.org/ojs/index.php/fm/article/view/12670/10751</link><description><![CDATA[In a time when socially impactful technology plays a central part in a variety of political and societal dynamics and processes, new forms of secrecy have emerged. The “black box” metaphor is used to define socio-technical systems that operate in non-transparent and prone-to-abuse ways. Frequently, Big Tech companies, their platforms, services and practices have been described as such, especially for their secretive nature and lack of transparency. Whistleblowers and leaks have contributed extensively and at various levels to the understanding of these systems, providing otherwise unaccessible information for public debate. Based on the discussion of a series of recent instances and a review of the available literature, this paper discusses the peculiarities of whistleblowing from Big Tech companies, and how the practice is helping to shed light on various and new technological black boxes and secrecy, while also expanding the scope of whistleblowing itself.]]></description><guid isPermaLink="false">a76e59a79339726a6893a55d3d070016</guid></item></channel></rss>

This feed generates 6 items ( in roughly 4.5s), in the example XML feed they are in order;

(1) Twitter spam and false accounts prevalence, detection, and characterization: A survey
(2) Dreading big brother or dreading big profit?
(3) Radical bubbles on YouTube? Revisiting algorithmic extremism with personalised recommendations
(4) Pundits, presenters, and promoters: Investigating gaps in digital production among social media users using self-reported and behavioral measures
(5) Evaluating e-Government: Themes, trends, and directions for future research
(6) Leaking black boxes: Whistleblowing and big tech invisibility

However DTP3 only retrieves items 2,4,6. It appears to be skipping items.
Is this due to how I am formatting the feed or is it DTP3?

cgrunenberg · January 8, 2023, 8:07am

The URL of the feed would be useful to check this.

BLUEFROG · January 8, 2023, 3:17pm

Have you run the feed URL through a validator?
Have you successfully used the feed URL in a bespoke RSS application?

Jayboux · January 8, 2023, 11:18pm

I am running n8n on a self hosted instance, so I don’t think it would be too helpful. The Example XML feed I posted is the (current) output returned when the webhook is called.

Jayboux · January 8, 2023, 11:23pm

I have not run the feed URL though a validator, but I ran the feed output though W3C RSS feed validator.
It said my feed was valid, just suggesting that I add an Atom component.
NetNewsWire is able to retrieve all items that currently populate the feed.

cgrunenberg · January 9, 2023, 8:58am

The URL would be useful to easily reproduce & investigate the issue.

chrillek · January 9, 2023, 9:23am

They posted the XML, since their URL is private. Shouldn’t that help, too?

cgrunenberg · January 9, 2023, 9:33am

Possible but a URL is definitely easier for reproducing & investigating.

Jayboux · January 10, 2023, 1:26am

This is my feed URL.

http://192.168.1.41:49161/webhook/firstmonday

cgrunenberg · January 10, 2023, 7:45am

That’s unfortunately a local URL, only a public one would have been useful. I just tried to reproduce the issue using your example XML but all 6 items were shown/added in DEVONagent/DEVONthink. Maybe a smart rule is causing the issue?

Jayboux · January 16, 2023, 5:45pm

Strange thing is, it works now for me as well. The only smart rule I have that effects News is manually triggered.

The only thing I can think that might have caused my issue is my network.

BLUEFROG · January 16, 2023, 11:23pm

That’s certainly always something to consider