Watanabe is one of the founders and primary maintainers of ESPnet (the End-to-End Speech Processing Toolkit), the open-source platform that became the default substrate for academic speech recognition and synthesis research over the past several years. The toolkit's design encodes a methodological position: evaluation should be reproducible across labs, baselines should run out of the box, and the tooling should make it harder to publish a "new SOTA" without comparing fairly to prior work. For ai100, as the AI engines we evaluate add voice-mode interfaces, ESPnet is the methodological substrate that defines what fair speech-vs-speech comparison even looks like.

Worth following when
you need to evaluate speech-input or speech-output capabilities of AI engines and want the open-toolkit methodology that the academic community converged on.
Topics
open-source end-to-end speech processing (ESPnet); reproducibility-as-design in speech evaluation tooling; the methodology lineage for speech-recognition and speech-synthesis benchmarks.
Key works
ESPnet end-to-end speech toolkit (2018 onward, co-creator and maintainer); body of work on end-to-end speech recognition and synthesis; CMU LTI publications on speech processing.