Maarten Sap
Whether language models can produce or reason about social knowledge with the same competence they show on factual tasks — and what's at stake when they can't.
Sap's earlier work on COMET (Commonsense Transformers, 2019) built one of the field's first attempts at making machine-readable representations of everyday social inferences — that someone who borrowed money will probably want to repay it, that an apology implies prior wrongdoing. The same lens runs through his LLM-era work: what kinds of social and commonsense knowledge are LLMs producing fluently versus mimicking superficially, and where do the failures cluster. RealToxicityPrompts and adjacent benchmarks made that question quantitative for safety-relevant cases — models trained on the open web inherit the toxic patterns of that web in measurable ways, even when their alignment training tries to paper over the inheritance.