Shinji Watanabe
The open-source speech-processing infrastructure that lets academic and industrial groups train, evaluate, and compare voice-input or voice-output language systems on the same footing.
Watanabe is one of the founders and primary maintainers of ESPnet (the End-to-End Speech Processing Toolkit), the open-source platform that became the default substrate for academic speech recognition and synthesis research over the past several years. The toolkit's design encodes a methodological position: evaluation should be reproducible across labs, baselines should run out of the box, and the tooling should make it harder to publish a "new SOTA" without comparing fairly to prior work. For ai100, as the AI engines we evaluate add voice-mode interfaces, ESPnet is the methodological substrate that defines what fair speech-vs-speech comparison even looks like.