Comparison - Analysis between ASR #6519
Replies: 3 comments
-
This maybe a good start: ESB: A BENCHMARK FOR MULTI-DOMAIN |
Beta Was this translation helpful? Give feedback.
-
another comparison: OpenAI Whisper Accuracy and other recent models (Nemo Transducer XLarge, Gigaspeech) |
Beta Was this translation helpful? Give feedback.
-
It's a bit frustrating that none of these recent benchmarks consider RTF as a core metric. For any production system resource consumption and online processing speed are nearly as important as WER. Whisper is great for instance, but the accuracy delta between that and k2 is actually pretty small, whereas the I can run k2 at 0.07xRT on CPU my laptop while Whisper large will run at 2.5xRT if I'm lucky on the same hardware. That's a massive delta in terms of deployment, model size, and cost for an honestly quite small gain in the accuracy department on the most common domains. |
Beta Was this translation helpful? Give feedback.
-
A lot of AI models are popping up for ASR purposes. I'd appreciate a analysis on the WER between different models, especially NeMo ASR vs Triton or Whisper.
Beta Was this translation helpful? Give feedback.
All reactions