Weijia Shi — Whom to read in AI

The dominant assumption in retrieval-augmented systems is that you train the retriever and the generator together, or at least fine-tune one to fit the other. REPLUG (2023, lead author) showed that this isn't necessary — the retriever can be trained against a closed-source language model treated as a frozen black-box scoring function, and the resulting system improves the LLM's outputs without ever touching its weights. The methodological consequence is that retrieval-augmentation research stopped being something that only applied to open-weight models, which is the only condition under which it's relevant for evaluating systems like the major commercial AI engines.

Worth following when: you want to understand how retrieval improves closed-source LLMs that don't expose their weights or training signals.
Topics: black-box retrieval-augmented generation; retriever training without LM access; the architectural assumptions buried in earlier RAG research.
Key works: REPLUG: Retrieval-Augmented Black-Box Language Models (2023, lead author); In-Context RALM (2023, co-author).