Mengnan Du — Whom to read in AI

The literature on explaining individual LLM outputs grew faster than the methods to validate those explanations against ground truth. Du's "Explainability for Large Language Models: A Survey" (2024, senior author with collaborators) sorts that landscape into categories that have come to matter — attention-based, gradient-based, perturbation-based, and natural-language explanations the model generates about itself, each with documented strengths and well-documented failure modes. The survey is particularly useful in the part most other surveys skip: case-by-case discussion of when an explanation method actively misleads rather than merely under-informs.

Worth following when: you need to choose an explainability method for an LLM and want to know which of the options have been independently validated.
Topics: taxonomy of LLM explainability methods; self-explanations by LLMs and their reliability; the methodology gap between proposing and validating explanations.
Key works: "Explainability for Large Language Models: A Survey" (2024, senior author); ongoing TMLR Lab publications on trustworthy and explainable LLMs.