Steven Schockaert
How to evaluate a retrieval-augmented system end-to-end when each part of it can fail in different ways.
A RAG system has at least three places where it can go wrong: the retriever fetches the wrong context, the generator ignores the context it was given, or the answer is plausible-sounding but unsupported by what was retrieved. Schockaert's RAGAs framework (2023, senior author) gave the field its first set of automated metrics that disentangle these failure modes — faithfulness, answer relevance, context precision and recall — rather than collapsing them into a single quality score. The methodological point is closer to ai100's own scoring philosophy than most evaluation frameworks: you don't get to call a system good if you can't say which part of it works.