Chris Callison-Burch
Whether human readers can tell apart text produced by a language model from text produced by humans — and how that distinguishability decays as models improve.
Callison-Burch's "Real or Fake Text?" line of research (with Liam Dugan and others, 2020s) ran interactive experiments where readers were asked to mark the point in a text where the human author stopped and an AI continuation began. The results have tracked the progress of LLMs from an angle most evaluation skips — through 2020 the cut-off point was easy to find; by 2023 readers struggled to find it at all, even when motivated. His earlier work pioneered crowdsourcing as a primary instrument for NLP evaluation, which gives the current line additional weight: he is unusually qualified to say what human evaluators can and cannot detect under realistic conditions.