Bansal's MURGe-Lab at UNC has produced one of the more sustained research lines on multimodal evaluation — vision-language reasoning, summarization faithfulness, long-form generation assessment — at a pace where most of the lab's papers identify a specific failure mode in existing evaluation methodology and propose a replacement. The body of work matters more than any single paper: it has shaped what subsequent generations of multimodal-evaluation papers consider rigorous. For ai100, the question of how to evaluate brand-mention behavior in models that also process images (charts, product photos, screenshots) is precisely the multimodal-evaluation question Bansal's group has been carving out for several years.

Worth following when
you need to evaluate a multimodal language model and want methodology informed by the longer arc of multimodal-evaluation research rather than ad-hoc extensions of text-only metrics.
Topics
multimodal evaluation methodology (vision-language and beyond); summarization faithfulness assessment; the methodological practice of identifying failure modes before proposing metrics.
Key works
MURGe-Lab body of publications on multimodal evaluation and reasoning (UNC, 2018 onward); summarization-faithfulness research line; ENGAGE NSF-AI Institute publications on multimodal AI.