Comparison - Analysis between ASR #6519

arnavmehta7 · 2023-02-24T16:30:39Z

arnavmehta7
Feb 24, 2023

A lot of AI models are popping up for ASR purposes. I'd appreciate a analysis on the WER between different models, especially NeMo ASR vs Triton or Whisper.

borisgin · 2023-03-08T21:45:07Z

borisgin
Mar 8, 2023
Maintainer

This maybe a good start: ESB: A BENCHMARK FOR MULTI-DOMAIN
END-TO-END SPEECH RECOGNITION

0 replies

borisgin · 2023-03-08T21:46:26Z

borisgin
Mar 8, 2023
Maintainer

another comparison: OpenAI Whisper Accuracy and other recent models (Nemo Transducer XLarge, Gigaspeech)

0 replies

AdolfVonKleist · 2023-04-29T01:13:58Z

AdolfVonKleist
Apr 29, 2023

It's a bit frustrating that none of these recent benchmarks consider RTF as a core metric. For any production system resource consumption and online processing speed are nearly as important as WER. Whisper is great for instance, but the accuracy delta between that and k2 is actually pretty small, whereas the I can run k2 at 0.07xRT on CPU my laptop while Whisper large will run at 2.5xRT if I'm lucky on the same hardware. That's a massive delta in terms of deployment, model size, and cost for an honestly quite small gain in the accuracy department on the most common domains.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison - Analysis between ASR #6519

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Comparison - Analysis between ASR #6519

arnavmehta7 Feb 24, 2023

Replies: 3 comments

borisgin Mar 8, 2023 Maintainer

borisgin Mar 8, 2023 Maintainer

AdolfVonKleist Apr 29, 2023

arnavmehta7
Feb 24, 2023

borisgin
Mar 8, 2023
Maintainer

borisgin
Mar 8, 2023
Maintainer

AdolfVonKleist
Apr 29, 2023