Emma Strubell
Whether the compute and energy cost of training and serving language models belongs in the headline of an evaluation, where accuracy currently sits alone.
Strubell's 2019 paper "Energy and Policy Considerations for Deep Learning in NLP" put numbers to a thing the field had been quietly ignoring: training a single large language model could produce CO2 emissions comparable to several cars over their lifetimes, and the gap between research-paper accuracy reports and the carbon cost of producing them was widening with every new state-of-the-art. The work reframed efficiency from a nice-to-have engineering concern into an evaluation dimension that should show up next to accuracy in every model comparison. For ai100, which evaluates models that vary by orders of magnitude in inference cost, this is the methodological backing for treating "good answer" and "good answer at what cost" as different questions.