Xie's published career has crossed several adjacent fields — recommender systems, spatial data mining, responsible AI infrastructure — that all had to answer the same operational question: what does this AI system actually do for the user, and how do you measure that against the user's actual interests rather than against a benchmark you defined. His co-authorship of the 2023 "Survey on Evaluation of LLMs" reads as that long career meeting the LLM moment: many of the methodological frames for measuring fairness, drift, or unintended user-facing effects in recommender systems transfer directly to language models, often without modification. For ai100, which evaluates how language models shape what users hear about brands, this is the closest precedent literature — recommender-system evaluation methodology applied to a new substrate.

Worth following when
you want LLM evaluation methodology informed by the longer arc of evaluating user-facing AI systems before LLMs existed.
Topics
recommender-systems evaluation methodology applied to LLMs; responsible-AI infrastructure inside large research labs; the bridge between recommender-system metrics and LLM evaluation.
Key works
co-authorship of "A Survey on Evaluation of LLMs" (2023, with Wang and collaborators); long body of work on recommender systems and urban computing at Microsoft Research Asia; ongoing responsible-AI publications.