← Back to the list
Yoav Goldberg
Whether the chain of reasoning a language model produces is the chain of reasoning it actually followed.
Goldberg wrote one of the standard NLP textbooks for the neural era (2017) and has spent the years since pushing on a question that gets harder as models get more capable: is the explanation an LLM gives for its answer the actual cause of that answer, or a plausible-sounding cover story produced after the fact? His work on faithfulness in generated explanations is a steady reminder that "the model said why it did X" and "we know why the model did X" are different claims, and most current evaluation methodology conflates them.