"Is ChatGPT Good at Search?" (2023, with Ren as senior author) is one of the cleanest empirical studies of whether large language models can replace traditional ranking components in a retrieval pipeline. The answer turns out to be: for re-ranking a small candidate set, yes, quite well; for first-stage retrieval over a large corpus, no, not really — and the gap between those two tasks is one most LLM-centric papers gloss over. Ren's broader work in conversational IR pushes the question further: when retrieval happens inside a multi-turn conversation, the system is doing several different things at once, and they should not all be benchmarked the same way.

Worth following when
you want empirical results on where LLMs can replace traditional IR components and where they break down.
Topics
LLMs as re-rankers in retrieval pipelines; conversational information retrieval; the difference between candidate-set re-ranking and corpus-scale first-stage retrieval.
Key works
"Is ChatGPT Good at Search? Investigating LLMs as Re-Ranking Agents" (2023, co-author); ongoing Leiden LIACS publications on conversational IR and RAG.