Marco Baroni
Whether neural language models can compose what they've learned into new combinations they've never seen — or whether they're really only doing sophisticated interpolation within their training distribution.
Baroni's group built SCAN (2018) as a controlled test of compositional generalization: a small synthetic language where the model has to combine verbs and modifiers in patterns absent from training. Neural sequence-to-sequence models, even at the time the dominant architecture, failed spectacularly — generalizing one short held-out combination but breaking on slightly longer ones. The result generalized: subsequent papers have shown the same compositional-generalization failures in modern LLMs at scale, which means the question Baroni put on the table in 2018 hasn't been answered by scale alone.