← Back to the list
Mark Gales
Detecting hallucinations in a language model without any access to its weights or to ground truth.
Most hallucination-detection methods rely on at least one of three things: the model's internal probabilities, a labeled reference answer, or a more powerful second model acting as judge. SelfCheckGPT, which Gales's Cambridge group introduced in 2023, removed all three — the idea is to ask the model the same thing several times and watch whether the answers stay self-consistent. Inconsistency across samples is read as a signal of fabrication; consistency, as a signal of grounded retrieval from training.