We classify seven months' worth of Belgian COVID-related Tweets using multilingual BERT and relate them to their governments' COVID measures. We classify Tweets by their stated opinion on Belgian government curfew measures (too strict, ok, too loose). We examine the change in topics discussed and views expressed over time and in reference to dates of related events such as implementation of new measures or COVID-19 related announcements in the media.

Collecting and labeling data

Since October 13, 2020, we have been collecting tweets from Belgian users on topics related to the corona pandemic. We now have a dataset of 1.3 million tweets, so there are some cool things we can do with that.

We—ok, we delegated that task to job students—spent some time labeling a small set of these tweets. Half (53%) of the tweets are what we consider 'irrelevant', which is often the case for tweets from or about other countries or if there is no opinion. These irrelevant tweets are a small problem during labeling, because this means our poor job students spend time on tweets that are not relevant to us. To address this, we used a pipeline with multiple sieves that filter out some tweets at each stage.

Pipeline. — Our filtering and classification pipeline.

After filtering out the clearly irrelevant ones, we labeled the others following two axes: support for the government and support for the measures. We also kept the topic labeling. All labeling was done with Doccano, which works really well for our goal.

Screenshot of our labeling tool. — Screenshot of Doccano with a labeled tweet.

Based on these tweets and our labels, we created multiple BERT models to classify the tweets. Currently, our paper describes two models, which we also made public on the HuggingFace Repository:

DTAI-KULeuven/mbert-corona-tweets-belgium-topics: Multilingual (NL, FR, EN) BERT model to classify tweets into the following topics:
- masks
- curfew
- quarantine
- lockdown
- schools
- testing
- closing-horeca
- vaccine
- other-measure
DTAI-KULeuven/mbert-corona-tweets-belgium-curfew-support: Another multilingual (NL, FR, EN) BERT model to classify the support for the measure that is expressed in the tweet, i.e. too-strict, too-loose, ok or not-applicable.

What are people talking about?

With these models, we can analyze the entire dataset and see some cool patterns. We do see that people talk a lot about certain topics, like the vaccines when the first good news about that topic came out, and not necessarily the ones I care about (like the masks with my application What The Mask) or the schools.

Number of tweets on vaccines. — The number of tweets on vaccines and schools.

Number of tweets on schools. — The number of tweets on vaccines and schools.

It's also really difficult to get a proper, meaningful signal from only a few tweets, so for the analysis of the support of certain measures, we focussed on the curfew. This is in part because the curfew—or rather curfews, since Brusssls, Flanders and Wallonia had different curfews—seemed to be a polarizing topic: either you support it or you think it doesn't work and should be abolished. So this gives some interesting opinions on Twitter, which we can use to track the support for it.

The curfews: not so controversial after all?

As it turns out, we overestimated the support that exists for the curfews; or at least the support that is expressed online. We see a constant number of tweets on the curfew which mostly are neutral (note that we have a really strict definition before a tweet is not neutral anymore) with some spikes following media attention. As we left the second wave, the fraction of tweets that thought the curfew was too strict grew. Only at the third wave the opinion shifted again, with media repeating that we'll be stuck with the curfew a little while longer.

The support for the curfew measures. — The support for the curfew measures over time. For reference, we included the number of daily cases (top).

Try it yourself

If you like these kinds of analysis, you can also do this yourself. We made our models available and the code to scrape the tweets is on our github repo. And you can read the paper if you want to know more!

paper

Attitudes Towards COVID-19 Measures

Measuring Shifts in Belgium Using Multilingual BERT

Collecting and labeling data

What are people talking about?

The curfews: not so controversial after all?

Try it yourself

Linked publications