Jimmy Lin
Making information retrieval reproducible enough that an LLM researcher and an IR researcher can run the same experiment and get the same answer.
Lin's Anserini and Pyserini toolkits became the default reproducibility layer for retrieval research the moment they appeared — when an NLP paper now reports a BM25 baseline number, that number usually comes from his code. The same posture carries into HyDE (Hypothetical Document Embeddings, 2023, senior author): the idea was to use a language model to generate a synthetic answer document, embed it, and retrieve against that embedding — a method clean enough that other groups could replicate it without negotiating about implementation details. Reading Lin is the way to understand which retrieval results in current LLM papers actually mean something and which depend on undocumented baselines.