"When Not to Trust Language Models" (2023, with collaborators across UW and AI2) is the cleanest formulation of a question the field had been talking around: a language model's internal knowledge is uneven by topic and by frequency, and an honest evaluation has to distinguish facts the model actually knows from facts it merely sounds confident about. Hajishirzi's broader work — leading the open-model line at AI2 that produced OLMo and the Tulu post-training family — extends the same logic upstream: if you want to study why models hallucinate, you have to study models you can open up, not ones served behind a closed API.

Worth following when
you want factuality evaluation methodology that works on open-weight models you can take apart, rather than only on whatever the latest closed API returns today.
Topics
parametric vs. retrieved knowledge in LLMs; atomic-fact factuality scoring; the open-model evaluation stack (OLMo, Tulu).
Key works
"When Not to Trust Language Models" (2023, co-author); OLMo open language model (2024, co-author); Tulu post-training framework (2023, co-author).