Danqi Chen
Whether a language model that produces an answer with citations is actually grounding the answer in those citations — or just attaching plausible references after the fact.
ALCE (2023, with Chen as senior author) is the benchmark that put the question on the table: when an LLM produces text-with-citations, do the cited sources actually support the claims they're attached to, or are the citations decorative? The framework decomposes attribution into measurable components — citation precision (the citation supports the claim), citation recall (every claim that should be cited is), and the interaction between the two — and shows that current LLMs vary enormously on which they get right. For ai100, which audits whether AI engines mention brands with appropriate sourcing, this is the closest precedent for the methodology: attribution as a separate evaluation axis from raw answer accuracy.