Yu's published record covers most of what counts as data mining since the field had that name — stream mining, sequential pattern discovery, knowledge graphs, anomaly detection — produced from IBM Research in its long tenure and then continued at UIC, with several hundred patents along the way. His co-authorship of "A Survey on Evaluation of LLMs" reads as that body of work meeting the LLM-evaluation literature from above: many of the statistical and methodological questions that current LLM evaluation faces have been answered, partially, by techniques developed decades ago for similar problems on smaller-scale data. Reading him is the way to discover that some current "novel" evaluation methods are rediscoveries.

Worth following when
you want LLM evaluation grounded in the longer history of data-mining methodology rather than treated as a discipline beginning in 2020.
Topics
the long arc of data-mining methodology relevant to LLM evaluation; stream and sequential pattern mining; bridging legacy techniques to modern model evaluation.
Key works
foundational work on stream mining and sequential pattern discovery (1990s–2000s); co-authorship of "A Survey on Evaluation of LLMs" (2024); decades of data-mining methodology publications at IBM Research and UIC.