Philip S. Yu — Whom to read in AI

Yu's published record covers most of what counts as data mining since the field had that name — stream mining, sequential pattern discovery, knowledge graphs, anomaly detection — produced from IBM Research in its long tenure and then continued at UIC, with several hundred patents along the way. His co-authorship of "A Survey on Evaluation of LLMs" reads as that body of work meeting the LLM-evaluation literature from above: many of the statistical and methodological questions that current LLM evaluation faces have been answered, partially, by techniques developed decades ago for similar problems on smaller-scale data. Reading him is the way to discover that some current "novel" evaluation methods are rediscoveries.

Worth following when: you want LLM evaluation grounded in the longer history of data-mining methodology rather than treated as a discipline beginning in 2020.
Topics: the long arc of data-mining methodology relevant to LLM evaluation; stream and sequential pattern mining; bridging legacy techniques to modern model evaluation.
Key works: foundational work on stream mining and sequential pattern discovery (1990s–2000s); co-authorship of "A Survey on Evaluation of LLMs" (2024); decades of data-mining methodology publications at IBM Research and UIC.