Mohit Bansal
Whether the methods we use to evaluate language-only models still work when the same model has to handle images, speech, or other modalities at the same time — and what fails first when modalities are combined.
Bansal's MURGe-Lab at UNC has produced one of the more sustained research lines on multimodal evaluation — vision-language reasoning, summarization faithfulness, long-form generation assessment — at a pace where most of the lab's papers identify a specific failure mode in existing evaluation methodology and propose a replacement. The body of work matters more than any single paper: it has shaped what subsequent generations of multimodal-evaluation papers consider rigorous. For ai100, the question of how to evaluate brand-mention behavior in models that also process images (charts, product photos, screenshots) is precisely the multimodal-evaluation question Bansal's group has been carving out for several years.