Capturing Twitter in 2022

Hi, I need to convert Twitter bookmarks into a format that’s searchable.

How I did it in the past

Between 2018 and 2020 I used a script (found on the forum) to convert tweet bookmarks into HTML. Records created via this script show the tweet’s text even without internet connection (but no images or videos) which is what I need.

Unfortunately the scpt file at some point got corrupted. As I didn’t use it for a long time I did’t notice the corruption and now have no backup of the script and can’t find it on the forum anymore (that’s one reason why I now have everything in git…)

But I suspect Twitter also changed something, so it might well be that saving a HTML that also works when offline isn’t possible at all anymore.

Just in case someone who knows HTML wants to take a look, here’s a record created in 2018 via the lost script and one created today via DEVONthink’s script Download > As HTML pages

Archive.zip (80.6 KB).

Captures taken without internet connection:

I can’t tell whether it’s the different script or something Twitter changed, just adding this in case someone knows what’s making the difference. But I think DEVONthink’s script Download > As HTML pages never produced a record whose text is available while offline as otherwise there wouldn’t have been a reason why I used the now lost script.

Needed solution

As I captured some thousand bookmarks (but never converted them) a programmatic solution is needed.

  • Did someone use the Twitter API?
    • Can it be used by everyone (without working in academic institutions)?

If the API can’t be used by everyone

  • Did someone found a programmatic solution that doesn’t rely on the URL being opened in a browser?

If there’s no way to capture without the URL being opened in a browser

  • How do you capture tweets in 2022?

I’m willing to put a lot work into finding a solution, however at the moment I don’t even know where to start.

1 Like

There’s nothing good on Twitter anyways, except our @devontech account. :stuck_out_tongue:

4 Likes

I wish I had more time to answer this better and do a bit more research, but I regret this is the best I can do right now:

  • The Twitter API (and the format of tweets) has almost certainly changed since 2018. It seems to change often.
  • A change announced last year may or may not be relevant to you.
  • Archive Team has a page about archiving twitter in which they list some tools that may be helpful in this context, as well as mentioning some notes, such as how to get the full-sized version of images embedded in a tweet.
  • There are other tools not mentioned on the Archive Team page, such as SFM and Thread Reader.
  • I’ve personally given up on saving tweets in HTML format. Yes, proper archiving best practices would probably say web archives should be stored in WARC format, but in my experience, the saved content is invariably broken when I view it (e.g.) a year later. Plus, if the author deletes the original tweet, then things are even worse.
  • In another forum posting, I described some things I do. Basically, I resort to saving a PDF rendering of the tweet.
1 Like

My understanding is that Twitter have made it more and more difficult to automate any kind of data collection—programatically or GUI-wise. I don’t think academics have any special privileges in that respect.

Your best bet might be going via the archive.org Wayback Machine. In principle, they will have archived all tweets so you should be able to look up the archived version. May be easier to scrape the data that way. There’s an API for making such requests without a browser.

Might also want to check out this project: GitHub - salcoast/deleted-tweets-archive: These tweets display several bad actors' most divisive uses of the Twitter platform.

Different goals but maybe some useful info.

In terms of capturing tweets in the present, there are some tools in development that allow this en masse (i.e. for quantitative analysis). E.g.: COSMOS – Social Data Science Lab (though I think this is restricted to academic users at the moment, and it’s in request-only beta).

1 Like

If it’s a matter of grabbing the occasional tweet here and there, I usually take a screenshot and automatically OCR in DEVONthink.

2 Likes

Yeah, that’s what I resort to doing too. A little bit of automation can help capture threads (after logging in to Twitter from within DEVONthink), but alas, only for short threads.

1 Like

Wow, that’s a lot to read/test. Thanks everyone!

This morning I attempted to save the very many, very informative replies to a particular tweet, which is a case that my usual approach doesn’t handle well. It seems that as you scroll down a long list of replies in the web page, Twitter unloads previous tweets – similar to what Discourse does with long discussions. Just another example of what makes it difficult to capture things on Twitter …

2 Likes

Dynamic content strikes again.
Makes sense for content delivery in cases like this but also makes things difficult for capturing without a bespoke public API available.

1 Like

Incidentally, great thread…

It might be interesting for DT to monitor how Readwise Reader will do this, as saving Twitter threads is one of their supported features.
If DT doesn’t support it themselves in the future, I might use the Readwise Reader app as an intermediate.

1 Like

For following twitter accounts, I have a subscription to inoreader for social media intelligence (Facebook, Twitter, Telegram, VKontakte). From there I can search and export to other apps/tools.

For capturing tweets, I clip to DEVONthink from Tweetbot. It is very handy.

1 Like

Thanks to everyone for sharing their finds and experiences on this subject :slight_smile:

1 Like

Also a Tweetbot user but I am not sure to understand: is it any different from clipping from twitter.com? If I do share > Add to DT from Tweetbot, it will just retrieve the URL of the tweet which does not work well for the reasons mentioned above.

Just a quick update: the Twitter API works fine. I got all tweets whose ID (via extracting them from the bookmarks’ URL) as JSON. (Note: You’ll need to create a Twitter Developer Account to use the API. )

Finding out whether a tweet got “siblings” (i.e. whether it’s a single tweet or the first tweet of a thread) seems to be quite difficult (according to results of a quick search), as afaik only sub-tweets got a marker that allows to determine whether it’s a single tweet or a thread. It’s probably possible to create a query that gets all tweets of a user that were posted in a given time period, but didn’t try that yet.

Anyway, it feels really good to have the (previously only bookmarked) tweets I’m interested in as JSON.

To be continued …