Vaughan's research has long focused on a question that most ML evaluation skips: even when you have a clean number — accuracy, calibration, fairness measure — what happens when that number meets a human decision-maker who has to use it? Her studies on interpretability find regularly that explanations meant to build user trust can produce overconfidence instead, with people accepting model outputs they should have questioned, because the explanation looked authoritative regardless of its substance. For ai100, which produces evaluation reports customers will use to make six-figure decisions, the lesson is direct: the artifact we ship is not the score itself — it's whatever the customer ends up believing the score means.

Worth following when
you produce evaluation reports for non-ML audiences and want to know how those reports actually get read.
Topics
human interpretation of ML evaluation results; calibration and trust in human-AI collaboration; transparency frameworks for ML systems (Aether).
Key works
"Manipulating and Measuring Model Interpretability" (2021, co-author); long-standing work on fairness and accountability in ML (FATE); WiML community organization (co-founder, 2006 onward).