ASR for long audio files (streaming) with Beam Search/LM #4597

Orelbabayoff · 2022-07-24T12:23:22Z

Orelbabayoff
Jul 24, 2022

Hi
Following older answers for closed issues: #2307 and the tutorial: https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Streaming_ASR.ipynb.
Any updates on Nemo supporting LM with long audio (streaming)?

titu1994 · 2022-07-24T18:23:04Z

titu1994
Jul 24, 2022
Maintainer

Not really. We do plan on doing a high level API for beam search, but it will not be with a LM. There is no plan for such integration in the next few months either since for streaming efficiently and in production environment we have Nvidia Riva for that.

0 replies

Orelbabayoff · 2022-07-24T18:38:23Z

Orelbabayoff
Jul 24, 2022
Author

Thank you for the answer.
Do you recommend using SAD/VAD models to slice the long audio and then use ASR with LM and beam search?

Any recommendations for SAD models?

0 replies

titu1994 · 2022-07-24T19:15:40Z

titu1994
Jul 24, 2022
Maintainer

If you have ground truth labels, then you can use CTC segmentation in Nemo to auto split audio files.

Any production grade VAD model can also do it, though I don't know of any open source production grade VAD models, most are research grade. Maybe @fayejf knows

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASR for long audio files (streaming) with Beam Search/LM #4597

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

ASR for long audio files (streaming) with Beam Search/LM #4597

Orelbabayoff Jul 24, 2022

Replies: 3 comments

titu1994 Jul 24, 2022 Maintainer

Orelbabayoff Jul 24, 2022 Author

titu1994 Jul 24, 2022 Maintainer

Orelbabayoff
Jul 24, 2022

titu1994
Jul 24, 2022
Maintainer

Orelbabayoff
Jul 24, 2022
Author

titu1994
Jul 24, 2022
Maintainer