Pieter Delobelle

Building and studying large language models — pretraining, data, tokenization & AI safety.

Lead AI Scientist at Pleias ยท Postdoc at KU Leuven

Pieter Delobelle at Station F, Paris

What I work on

At Pleias, I work on LLM pretraining and synthetic data — most recently releasing Nemotron-Personas-France together with NVIDIA. At KU Leuven, my research spans tokenization, fairness, and Dutch language models.

I created RobBERT, the Dutch language model family in the top 80 on Hugging Face, and have collaborated on most of the open Dutch LLMs — from Tweety to ChocoLlama. On the safety side, I work on measuring bias, reducing toxicity (with Apple, at ICML), and questioning who decides what “fair” means. I also work on tokenization and developed trans-tokenization, a method to translate LLMs from one language to another.

My work has been published at top AI venues including ICML, NAACL, and EMNLP, including work done at Apple and Aleph Alpha. I've done research visits at MilaNLP (Bocconi), HU Berlin, and the Weizenbaum Institute, and accumulated 1,000+ citations across 32 publications. My research has been covered by WIRED, MIT Technology Review, De Tijd, and VTM Nieuws.

At KU Leuven, I was a member of the GenAI advisory board and sit on the council for research policy. I also contributed as an NLP expert to a EU AI Office workshop on the code of practice for general-purpose AI.

I also frequently give talks on AI and language models. I've spoken at companies like Apple, KBC, VRT, TechWolf, ML6, and Superlinear, and at several events, for instance at Wintercircus. Topics range from LLM pretraining and technical deep dives into LLM inference to AI safety and introductory talks about the Dutch NLP ecosystem.

News

April 23, 2026 I will give a half-day lecture on safety and fairness in LLMs. February 24, 2026 I was interviewed for Nieuwsblad about AI model theft. February 11, 2026 I will give a talk on building LLM inference from scratch at AI Tinkerers Ghent. January 26, 2026 I joined Pleias to work on synthetic data and small language models ๐Ÿฅ–๐Ÿ‡ซ๐Ÿ‡ท. January 24, 2026 I was interviewed live on VTM Nieuws about LLM hallucinations. December 12, 2025 I gave a talk at UGent on steering LLMs. December 11, 2025 I gave a talk on Dutch LLMs at the Wintercircus in Ghent. November 19, 2025 Our paper on neurosymbolic bias mitigation got accepted at AAAI 2026 ๐ŸŽ‰. October 15, 2025 I talked about our work on safety and fairness in LLMs at Flanders AI Research Day.
All talks & appearances