Mordatch's research line — embodied reinforcement learning, multi-agent emergent communication, language-conditioned agents — has been at the intersection of LLMs and the environments they could potentially act on. As LLMs evolve into agent systems that take consequential actions (browse, execute code, talk to APIs, manipulate documents), the evaluation question changes shape: the right thing to measure is what happened in the world as a result of the action sequence, beyond the surface correctness of the text produced. His earlier OpenAI work on multi-agent emergent language and current DeepMind work on language-conditioned action policies sit at the methodological frontier for agent-grade evaluation. For ai100, the agent transition is on the horizon — when AI engines move from answering brand-questions to taking actions on behalf of users, the evaluation methodology has to evolve to match.

Worth following when
you want to understand evaluation methodology for AI-agent systems — where the unit of analysis is the action sequence and its consequences, not the text alone.
Topics
language-conditioned reinforcement learning agents; multi-agent emergent communication; the methodological transition from text-output evaluation to action-sequence evaluation.
Key works
body of work on multi-agent emergent communication and language-conditioned RL (OpenAI 2017–2020, then Google DeepMind); embodied RL publications; ongoing language-agent research at DeepMind.