Yang led the 2023 "Is ChatGPT a General-Purpose NLP Solver?" study that gave the first sober task-by-task answer to a question everyone had been assuming: across more than twenty established NLP tasks, ChatGPT was strong on a few, mediocre on most, and bad on some — a pattern that complicated the narrative of broad generality. Her SALT Lab continues the harder line of inquiry: language models in roles where the right answer depends on social context — counselor, conflict mediator, persuasion target — and where the failure modes look different from what standard benchmarks reveal.

Worth following when
you want to know how LLMs behave outside the kinds of tasks they were tuned for, especially tasks where the human stakes are higher than benchmark scores.
Topics
task-level evaluation of LLMs on established NLP benchmarks; computational social science with language models; social context as a dimension of evaluation.
Key works
"Is ChatGPT a General-Purpose NLP Solver?" (2023); SALT Lab publications on socially-grounded NLP (2022, ongoing).