Rosso has spent two decades organizing evaluation campaigns from inside the Spanish-language NLP community: IberLEF for the Iberian languages, PAN for plagiarism and authorship problems across languages, and a long line of shared tasks on author profiling, irony detection, hate-speech analysis, and disinformation. The substantive value is that the tasks were designed with native-speaker sensitivity to what linguistic and cultural features actually carry signal — irony in Spanish is not the irony of English-language datasets, and hate-speech categories shift with the regulatory and cultural context. For ai100, which evaluates an engine's behavior in Spanish as one of its five language audits, this is the closest precedent for locale-native evaluation done with multi-year discipline.

Worth following when
you need to evaluate language-model behavior in Iberian-Romance languages with methodology designed for those languages, not adapted from English.
Topics
locale-native evaluation campaigns (IberLEF, PAN); irony, hate-speech, and disinformation detection in Spanish-language settings; multi-year shared-task organization as a research practice.
Key works
IberLEF evaluation campaigns (2018 onward, co-organizer); long-standing PAN co-organization with Stein and Potthast (2009 onward); UPV PRHLT publications on multi-language NLP evaluation.