Furu Wei — Whom to read in AI

The standard direction of information flow in retrieval-augmented systems is from retrieval to generation: fetch documents first, then write the answer. Query2doc (2023, with Wei as senior author) inverts that order at the front end — given the user's query, have the language model first generate a plausible-looking answer document, then use that synthetic document as additional signal in the retrieval step. The trick works because such generated documents, while unreliable on specific facts, are usually reliable on what kind of document the user implicitly expects to find — and that's exactly the signal vector retrievers need.

Worth following when: you want to think about retrieval and generation as bidirectional rather than as a one-way pipeline.
Topics: query expansion via LLM-generated synthetic documents; the inverse direction of standard RAG; foundation-model contributions to retrieval methodology.
Key works: Query2doc: Query Expansion with Large Language Models (2023, senior author); broader foundation-model and pretraining contributions including BEiT, LayoutLM family, and MiniLM.