Callan has been building search engines and retrieval evaluation corpora since before "RAG" existed as a term — Indri, the ClueWeb09 through ClueWeb22 datasets, the kinds of artifacts that NLP papers cite as evaluation infrastructure without realizing they were the original IR research. His more recent contribution as co-author on FLARE (Forward-Looking Active RAG, 2023) is the bridge: an IR foundational figure helping a younger generation of NLP researchers recognize which retrieval problems are genuinely new and which were solved twenty years ago. Reading him is a way to avoid the standard NLP-side mistake of treating the retrieval layer as a black box that "just does search".

Worth following when
you want to understand the IR foundations underneath modern RAG systems — and the failure modes those foundations already mapped.
Topics
information retrieval foundations and history; web-scale retrieval corpora (ClueWeb family); active retrieval as the bridge between IR and LLM-driven NLP.
Key works
Indri search engine (early 2000s, ongoing); ClueWeb corpora (2009 through 2022); FLARE / Active Retrieval-Augmented Generation (2023, co-author).