The Microsoft Research paper "Sparks of Artificial General Intelligence", which Horvitz co-authored in 2023, was the field's most-cited qualitative evaluation of GPT-4 — a hundred-plus pages of examples showing the model doing things its predecessors couldn't, framed as preliminary evidence of capabilities approaching general intelligence. The paper drew immediate criticism for its lack of systematic protocol, but the criticism mostly missed what made the document influential: it gave the industry a vocabulary for what it was looking at before anyone had built rigorous instruments to measure it. Horvitz's longer career — probabilistic AI, medical decision-support, AAAI presidency, the build-out of Microsoft's responsible-AI infrastructure — gives him standing the casual reader of "Sparks" usually misses.

Worth following when
you want to read the senior-industrial perspective on what current LLMs are capable of, even when (especially when) that perspective is methodologically informal in ways that ai100's own evaluation is the antidote to.
Topics
qualitative evaluation of LLM capability (Sparks); probabilistic AI and decision-theoretic methods; the responsible-AI infrastructure inside large model labs (Aether).
Key works
"Sparks of Artificial General Intelligence: Early experiments with GPT-4" (2023, senior author); decades of probabilistic AI and medical decision-support publications; Aether Working Group on responsible AI.