Spider, which Yasunaga co-authored as part of a Yale-Stanford PhD collaboration, has been the canonical benchmark for text-to-SQL translation since 2018 — a task that looks deceptively simple ("turn this question into a SQL query") but actually requires the model to bind natural-language entities to schema columns, resolve aggregations, and structure joins correctly. The reason this matters for evaluating LLMs as reasoning systems is that text-to-SQL has a hard ground truth: the query either runs and returns the right rows, or it doesn't. His later work on QA-GNN and knowledge-graph-augmented reasoning extends the same posture — reasoning over structured knowledge that yields checkable answers.

Worth following when
you want to evaluate LLM reasoning on tasks with strict, executable ground truth instead of human-judgment correctness.
Topics
text-to-SQL evaluation (Spider); knowledge-graph-augmented question answering; reasoning tasks with verifiable correctness criteria.
Key works
Spider text-to-SQL benchmark (2018, co-author); QA-GNN (2021, lead author); ongoing publications on reasoning over structured knowledge.