Measuring and mitigating bias.
Abstract
Bias in language models, traced from the Bloomberg ChatGPT‑as‑recruiter case through allocational and representational harms — and an honest look at where the research has and hasn't actually moved.
A two‑part story. Part 1: how bias gets locked in by automated decision systems (SyRI, COMPAS, the Polish public employment service), and why “human‑in‑the‑loop” rarely rescues it. Part 2: what we can actually measure and mitigate, from RobBERT‑era intrinsic metrics through the multilingual SHADES benchmark, to inference‑time interventions like AurA.
Outline