Jared Kaplan
Whether language-model loss is a smooth function of compute, data, and model size — and what that smoothness lets you predict (and not predict) about capabilities at larger scales.
Kaplan led "Scaling Laws for Neural Language Models" (2020, lead author with OpenAI collaborators), the paper that formalized what the field had been seeing empirically: LLM training loss scales as a clean power law of compute, parameters, and dataset size, with predictable exponents. The result became the planning instrument for every subsequent training run at scale — if you know the loss curve, you can budget compute and data to hit a target performance with low surprise. The methodological limitation, which the paper itself was careful about and the field has since rediscovered the hard way, is that smooth scaling of loss does not translate cleanly into smooth scaling of any particular capability — emergent-capability discontinuities live in the gap between aggregate loss and task-level evaluation.