← Back to the list
Christof Monz
Year-over-year systematic evaluation of machine translation across dozens of language pairs — and what that tracking reveals about which translation problems are getting solved and which aren't.
Most NLP evaluation is one-shot: a benchmark drops, gets saturated, gets replaced. Monz has been co-organizing the Findings of the WMT campaigns for over a decade, producing one of the few datasets in the field with longitudinal structure — the same translation task, evaluated by the same protocol, across the same language pairs, year after year. The accumulated record shows what actually transferred from the old statistical MT era to the neural era to the LLM era, and where MT progress has stalled despite the impression of universal improvement.