Research

My research spans LLM pretraining, data, tokenization & AI safety. Papers marked with appeared at top-tier venues.

For a list with citation counts, see Google Scholar (1,000+ citations, h-index 15).

2026

ProbLog4Fairness: A Neurosymbolic Approach to Modeling and Mitigating Bias

Rik Adriaensen, Lucas Van Praet, Jessa Bekker, Robin Manhaeve, Pieter Delobelle and Maarten Buyl

AAAI

Paper

2025

SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models

Margaret Mitchell, Giuseppe Attanasio, Ioana Baldini, Miruna Clinciu, Jordan Clive, Pieter Delobelle and et al.

NAACL

Featured in MIT Technology Review WIRED De Tijd

Paper

2024

Metrics for What, Metrics for Whom: Assessing Actionability of Bias Evaluation Metrics in NLP

Pieter Delobelle, Giuseppe Attanasio, Debora Nozza, Su Lin Blodgett and Zeerak Talat

EMNLP

Paper Blog Video

BPE-knockout: Pruning Pre-existing BPE Tokenisers with Backwards-compatible Morphological Semi-supervision

Thomas Bauwens and Pieter Delobelle

NAACL

Paper Blog Code Video

Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

François Remy, Pieter Delobelle, Hayastan Avetisyan, Alfiya Khabibullina, Miryam de Lhoneux and Thomas Demeester

COLM

Blog

Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models

Xavier Suau, Pieter Delobelle, Katherine Metcalf, Armand Joulin, Nicholas Apostoloff, Luca Zappella and Pau Rodriguez

ICML

RobBERT-2023: Keeping Dutch Language Models Up-To-Date at a Lower Cost Thanks to Model Conversion

Pieter Delobelle and François Remy

CLIN

Blog

OneLove beyond the field--A few-shot pipeline for topic and sentiment analysis during the FIFA World Cup in Qatar

Christoph Rauchegger, Sonja Mei Wang and Pieter Delobelle

KONVENS

ItaEval and TweetyIta: A New Extensive Benchmark and Efficiency-First Language Model for Italian

Giuseppe Attanasio, Pieter Delobelle, Moreno La Quatra, Andrea Santilli and Beatrice Savoldi

CLiC-it

High stakes for LLMs: Analysis of Bias in BERT-based Recommender Systems

Nisse Degelin, Pieter Delobelle, Kristen Scott and Bettina Berendt

BNAIC/BeNeLearn

ChocoLlama: Lessons Learned From Teaching Llamas Dutch

Matthieu Meeus, Anthony Rathé, François Remy, Pieter Delobelle, Jens-Joris Decorte and Thomas Demeester

arXiv

arXiv

2023

How Far Can It Go? On Intrinsic Gender Bias Mitigation for Text Classification

Ewoenam Tokpo, Pieter Delobelle, Bettina Berendt and Toon Calders

EACL

Blog

ResumeTailor: Improving Resume Quality Through Co-Creative Tools

Pieter Delobelle, Sonja Mei Wang and Bettina Berendt

HHAI

Inferring Missing CV Skills using PU Learning and Variational Inference

Victor Verreet, Lennert De Smet, Robin Manhaeve, Pieter Delobelle and Jessa Bekker

AI4HR @ ECML-PKDD

Tik-to-Tok: Translating Language Models One Token at a Time

François Remy, Pieter Delobelle, Bettina Berendt, Kris Demuynck and Thomas Demeester

arXiv

Towards Fairer Foundation Models: On Measures and Mitigation Strategies

Pieter Delobelle

PhD thesis

Blog

2022

RobBERTje: A Distilled Dutch BERT Model

Pieter Delobelle, Thomas Winters and Bettina Berendt

CLIN

Algorithmic tools in public employment services: Towards a jobseeker-centric perspective

Kristen Scott, Sonja Mei Wang, Milagros Miceli, Pieter Delobelle, Karolina Sztandar-Sztanderska and Bettina Berendt

FAccT · Best Student Paper Award

Measuring Fairness with Biased Rulers: A Comparative Study on Bias Metrics for Pre-trained Language Models

Pieter Delobelle, Ewoenam Tokpo, Toon Calders and Bettina Berendt

NAACL

Blog

FairDistillation: Mitigating Stereotyping in Language Models

Pieter Delobelle and Bettina Berendt

ECML-PKDD

Blog

RobBERT-2022: Updating a Dutch Language Model to Account for Evolving Language Use

Pieter Delobelle, Thomas Winters and Bettina Berendt

arXiv

Blog arXiv

2021

Time to Question if We Should: Data-Driven and Algorithmic Tools in Public Employment Services

Pieter Delobelle, Kristen Scott, Sonja Mei Wang, Milagros Miceli, David Hartmann, Tianling Yang, Elena Murasso, Karolina Sztandar-Sztanderska and Bettina Berendt

FEAST @ ECML-PKDD

Learning to Rank Generated Portmanteaus

Lara Pollet, Thomas Winters and Pieter Delobelle

ICCC

Survival of the Wittiest: Evolving Satire with Language Models

Thomas Winters and Pieter Delobelle

ICCC

Measuring Shifts in Attitudes Towards COVID-19 Measures in Belgium

Kristen Scott, Pieter Delobelle and Bettina Berendt

CLIN Journal

Blog

2020

RobBERT: a Dutch RoBERTa-based Language Model

Pieter Delobelle, Thomas Winters and Bettina Berendt

EMNLP Findings

Ethical Adversaries: Towards Mitigating Unfairness with Adversarial Machine Learning

Pieter Delobelle, Paul Temple, Gilles Perrouin, Benoît Frénay, Patrick Heymans and Bettina Berendt

ACM SIGKDD Explorations Newsletter

Dutch Humor Detection by Generating Negative Examples

Thomas Winters and Pieter Delobelle

Benelearn · Best video award

2019

Computational Ad Hominem Detection

Pieter Delobelle, Murilo Cunha, Eric Massip Cano, Jeroen Peperkamp and Bettina Berendt

ACL Student Research Workshop

Time to Take Emoji Seriously: They Vastly Improve Casual Conversational Models

Pieter Delobelle and Bettina Berendt

Benelearn