Minlie Huang
Whether the categories of "harmful" used to evaluate language-model safety transfer from Western to Chinese-language deployment contexts, where the regulatory frame and cultural categories are different.
Most public LLM safety evaluation uses category schemes developed by Western labs — bias, toxicity, hate speech, jailbreak resistance, all benchmarked against English-language datasets and US/EU regulatory expectations. Huang's CoAI group at Tsinghua, with the more recent AISafetyLab framework, runs an explicit parallel for Chinese-language LLMs: what counts as "harmful" or "biased" output looks structurally different when the regulatory environment is different, the cultural taboos are different, and the deployed-model surface reaches users with different expectations. For ai100, which evaluates engines that serve five language regions, this kind of locale-aware safety evaluation is the only honest version — pretending one safety taxonomy fits all five locales would be its own form of bias.