Jared Kaplan — Whom to read in AI

Kaplan led "Scaling Laws for Neural Language Models" (2020, lead author with OpenAI collaborators), the paper that formalized what the field had been seeing empirically: LLM training loss scales as a clean power law of compute, parameters, and dataset size, with predictable exponents. The result became the planning instrument for every subsequent training run at scale — if you know the loss curve, you can budget compute and data to hit a target performance with low surprise. The methodological limitation, which the paper itself was careful about and the field has since rediscovered the hard way, is that smooth scaling of loss does not translate cleanly into smooth scaling of any particular capability — emergent-capability discontinuities live in the gap between aggregate loss and task-level evaluation.

Worth following when: you want to understand what the underlying scaling theory predicts, what it doesn't, and where the methodological boundary between loss-level prediction and capability-level evaluation actually sits.
Topics: scaling laws for language model training; the gap between aggregate-loss scaling and task-level capability prediction; the methodological foundations of large-scale LLM training.
Key works: "Scaling Laws for Neural Language Models" (2020, lead author); subsequent scaling and capability research at Anthropic; theoretical-physics-to-ML methodology body of work.