Mann co-led BloombergGPT (2023), one of the first major vertical large language models — a 50-billion-parameter model trained primarily on Bloomberg's financial-document corpus, designed to outperform general-purpose LLMs on financial NLP tasks. The paper's evaluation section is methodologically interesting in its own right: side-by-side comparison with general-purpose LLMs on both financial-specific tasks (where BloombergGPT was supposed to win) and general benchmarks (where it had to remain competitive). For ai100, the literature on vertical LLM evaluation matters because it answers a question we'll eventually face — how do you fairly evaluate a domain-specialist model against a generalist on queries that span both kinds of competence.

Worth following when
you need methodology for evaluating domain-specialist LLMs against general-purpose ones, or want the literature on what vertical LLM training actually buys you.
Topics
vertical-domain LLM development (BloombergGPT); financial-NLP evaluation; the methodological challenge of comparing specialist and generalist models on overlapping tasks.
Key works
BloombergGPT: A Large Language Model for Finance (2023, co-author); long body of ML research at Google and Bloomberg; ongoing publications on financial NLP at scale.