Wenjie Li — Whom to read in AI

Generative retrieval was first treated as a niche IR technique: train a sequence-to-sequence model to emit document identifiers given a query. Li's research, coming from a summarization and NLG background, reframed the picture — once a system can generate document IDs, it can also generate the summary of those documents conditioned on the query, and the answer to the query conditioned on the documents, all from the same architecture and weights. For ai100, this matters because the boundary between "the system retrieved sources" and "the system summarized them" gets blurry: a modern engine in answer-with-citations mode is doing both, and Li's lineage of work is the cleanest place to see why the two tasks were always entangled.

Worth following when: you want to think about retrieval, summarization, and question-answering as a single generation problem rather than as a pipeline of separate components.
Topics: generative document retrieval; the unification of summarization and retrieval as generation tasks; query-conditioned summarization in RAG-adjacent systems.
Key works: Neural Corpus Indexer (NCI, 2022, co-author) and downstream generative-retrieval line; long body of work on text summarization and query-focused NLG; HK PolyU Cognitive Computing Lab publications.