Gideon Mann
What language models look like when they're trained for a single high-value vertical — finance, in this case — and what evaluation that specialization requires beyond general-purpose benchmarks.
Mann co-led BloombergGPT (2023), one of the first major vertical large language models — a 50-billion-parameter model trained primarily on Bloomberg's financial-document corpus, designed to outperform general-purpose LLMs on financial NLP tasks. The paper's evaluation section is methodologically interesting in its own right: side-by-side comparison with general-purpose LLMs on both financial-specific tasks (where BloombergGPT was supposed to win) and general benchmarks (where it had to remain competitive). For ai100, the literature on vertical LLM evaluation matters because it answers a question we'll eventually face — how do you fairly evaluate a domain-specialist model against a generalist on queries that span both kinds of competence.