Pappas came to LLM research from a long career in control theory and formal methods, where the question "how does this system fail" is treated as a primary design constraint, baked into the system from the start. His 2023 paper "Jailbreaking Black-Box LLMs in Twenty Queries" applies that posture to alignment: an attacker LLM is given the target model as a black box and instructed to find an input that bypasses the target's safety training, iteratively refining its prompts based on the target's responses. The result — that twenty queries on average suffice — reframes the conversation about LLM safety: alignment training is not a binary fix but a graded barrier whose height can be measured in attacker effort.

Worth following when
you want to understand LLM safety as an attack-surface measurement problem informed by control theory and formal methods.
Topics
automated jailbreaking of black-box LLMs (PAIR algorithm); safety-as-attack-surface measurement; control-theory methodology applied to LLM behavior.
Key works
"Jailbreaking Black-Box LLMs in Twenty Queries" (2023, senior author, PAIR algorithm); long-arc body of work on control theory and formal verification of autonomous systems.