Igor Mordatch
What it means to evaluate a language model that takes consequential actions through its text — not only producing answers but operating in environments that respond.
Mordatch's research line — embodied reinforcement learning, multi-agent emergent communication, language-conditioned agents — has been at the intersection of LLMs and the environments they could potentially act on. As LLMs evolve into agent systems that take consequential actions (browse, execute code, talk to APIs, manipulate documents), the evaluation question changes shape: the right thing to measure is what happened in the world as a result of the action sequence, beyond the surface correctness of the text produced. His earlier OpenAI work on multi-agent emergent language and current DeepMind work on language-conditioned action policies sit at the methodological frontier for agent-grade evaluation. For ai100, the agent transition is on the horizon — when AI engines move from answering brand-questions to taking actions on behalf of users, the evaluation methodology has to evolve to match.