Clarke spent three decades inside the TREC evaluation tradition, where rigorous comparison of retrieval systems involved shared corpora, shared queries, shared relevance judgments, and an explicit protocol for resolving disagreements between assessors. His 2023 paper "Evaluating Open-Domain QA in the Era of LLMs" carries that discipline forward into the current moment, pointing out that most LLM-based QA evaluation has quietly dropped most of those guardrails — single-source ground truth, automatic judges that have not been validated against humans, no protocol for answers that are technically correct but stylistically different from the reference. The methodological reset he argues for is closer to the original TREC posture than to anything currently in vogue.

Worth following when
you want to know what rigorous QA evaluation looked like before LLMs made everyone forget the rules, and which of those rules should come back.
Topics
TREC-style evaluation methodology; QA evaluation in the era of LLM-generated answers; assessor disagreement and ground-truth construction.
Key works
Information Retrieval: Implementing and Evaluating Search Engines (textbook, 2010, with Büttcher and Cormack); "Evaluating Open-Domain QA in the Era of LLMs" (2023, co-author); decades of TREC organizing and participation.