Shafiq Joty — Whom to read in AI

Joty's earlier published work is in discourse-structure parsing — the kind of NLP where you ask whether a paragraph hangs together as an argument, not just whether each sentence parses. That habit carries into his more recent LLM-evaluation publications: an enterprise LLM application has to produce outputs that hold their argumentative shape across multi-turn use, and standard single-shot benchmarks don't capture that. The body of work reads as a steady reminder that "the model passes a benchmark" and "the model holds up under enterprise traffic" are not the same claim.

Worth following when: you need an evaluation perspective informed by what shipping LLM features to enterprise customers actually demands of the underlying methodology.
Topics: discourse structure as a lens on long-form LLM output; multi-turn evaluation beyond single-shot benchmarks; the gap between research benchmarks and enterprise deployment.
Key works: earlier foundational work on discourse parsing (RST and beyond); ongoing publications on multi-turn LLM evaluation and reliability.