A RAG system has at least three places where it can go wrong: the retriever fetches the wrong context, the generator ignores the context it was given, or the answer is plausible-sounding but unsupported by what was retrieved. Schockaert's RAGAs framework (2023, senior author) gave the field its first set of automated metrics that disentangle these failure modes — faithfulness, answer relevance, context precision and recall — rather than collapsing them into a single quality score. The methodological point is closer to ai100's own scoring philosophy than most evaluation frameworks: you don't get to call a system good if you can't say which part of it works.

Worth following when
you need to evaluate a RAG pipeline and want to attribute failure to the retrieval, the generation, or the integration between them.
Topics
automated RAG evaluation metrics; disentangling retrieval-side and generation-side failures; metric design for multi-stage pipelines.
Key works
RAGAs: Automated Evaluation of Retrieval Augmented Generation (2023, senior author); ongoing Cardiff NLP work on RAG and knowledge-augmented LLMs.