On Kernel layouts for CitriNet #3881

piraka9011 · 2022-03-24T16:11:23Z

piraka9011
Mar 24, 2022

In the CitriNet paper, section 4.1, it's stated that:

The narrow kernel layout corresponding to γ = 0.25 is a primary candidate for streaming ASR.

Does this apply to the PyTorch model generated using NeMo only or is it applicable for RIVA as well?
Does RIVA already optimize for streaming models somehow so this is not needed?

Also, if I fine-tune a pre-trained model, but adjust the kernel_size_factor in my cfg, does that actually change the kernel layout? It doesn't seem to do so unless I train from scratch since when I tried to restore_from() my finetuned model, I got an error similar to #3167, but with the shapes of the encoder corresponding to a different kernel layout configurations specified in Table 2 of the paper.

If this question is more suitable in the Riva forum, lmk. I thought to ask here since the question relates to the model's architecture and training w/ NeMo.

titu1994 · 2022-03-25T05:01:39Z

titu1994
Mar 25, 2022
Maintainer

Good question ! We also use kernel size factor to scale down Citrinet for streaming mode (there are some checkpoints on ngc marked with gamma_0_25 indicating the kernel second of 0.25x).

There's actually a minor deviation from the original paper, on the sense that we scale down every single kernel after the first layer - as in even the final 41 kernel size gets multiplied by gamma. We found that this further improves the streaming / buffered mode inference of this model.

It's not just Riva supported btw, we have buffered CTC and RNNT support in Nemo too, but Riva is a lot more efficient + supports true streaming inference with low latency. Riva will do much better with a gamam Citrinet (plus the final kernel * gamma) than the original offline model (gamma=1).

Since you are doing kernel scaling, you must change the gamma value before you create the model - it cannot be changed once the model has been built or trained - and yes, weights are incompatible between different gammas (as the conv kernel shape changes)

0 replies

titu1994 · 2022-03-25T05:02:24Z

titu1994
Mar 25, 2022
Maintainer

For reference, you can see the config of the encoder.jasper part of the stt_en_citrinet_1024_gamma_0_25 model.

0 replies

piraka9011 · 2022-03-25T22:21:58Z

piraka9011
Mar 25, 2022
Author

Nice, this is good to know, thanks!

Riva will do much better with a gamam Citrinet (plus the final kernel * gamma) than the original offline model (gamma=1).

Will do better in what way? Improved inference latency, WER accuracy, or both?
Are there any other benchmarks we can reference related to latency and accuracy for the gamma = 0.25 and 0.5 models?

1 reply

titu1994 Mar 26, 2022
Maintainer

Gamma Citrinets will get better wer, while latency is kept constant, compared to offline models. At least on average.

We didn't try many other values of gamma, 0.25 was small enough receptive field for efficient inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On Kernel layouts for CitriNet #3881

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

On Kernel layouts for CitriNet #3881

piraka9011 Mar 24, 2022

Replies: 3 comments · 1 reply

titu1994 Mar 25, 2022 Maintainer

titu1994 Mar 25, 2022 Maintainer

piraka9011 Mar 25, 2022 Author

titu1994 Mar 26, 2022 Maintainer

piraka9011
Mar 24, 2022

Replies: 3 comments 1 reply

titu1994
Mar 25, 2022
Maintainer

titu1994
Mar 25, 2022
Maintainer

piraka9011
Mar 25, 2022
Author

titu1994 Mar 26, 2022
Maintainer