Attending beyond the fixed length context in cache-aware Conformer #6432

sknadig · 2023-04-15T17:10:36Z

sknadig
Apr 15, 2023

TL; DR: The current implementation of the cache-aware Conformer doesn't look like it can attend beyond the fixed length context, while implementations at k2 icefall and WeNet have the correct behavior.

Please let me know if this reasoning sounds good, or if I'm missing something.

In the cache-aware Conformer implementation, the left_chunks_num is initialized here as:

self.left_chunks_num = self.att_context_size[0] // self.chunk_size

Isn't one of the use cases of the cache to enable attending to history beyond what the model was trained on?

Looks like the chunked_limited_mask will not change once it's initialized, unless left_chunks_num also changes, and a call to set_max_audio_length is made with a max_audio_length greater than what the model was trained on.

Here's an example:
Consider a cache-aware Confomer trained with the following config

att_context_size = [102, 33]
chunk_size = 34
left_chunks_num = 4

Now, at inference, if I want the model to use a larger context and set left_chunks = 10 by calling setup_streaming_params(), it still attends to the context (cache+current chunk) set for 4 left chunks (in this case 170 frames) as shown below:

`att_mask`

However, if I explicitly set the left_chunks_num = 10 and subsequently call set_max_audio_length() to update the chunked_limited_mask, I get the correct attention score where the latest chunk can attend to itself and everything in the history up to the specified chunk length.

`att_mask`

This is mostly because the chunked_limited_mask is not updated at inference for a new left_chunks.

Currently, what `chunked_limited_mask` looks like for `left_chunks=10` (when it was set to 4 in training)

What `chunked_limited_mask` should ideally look like for `left_chunks=10` (irrespective of what it was set to in training)

The implementations at WeNet and k2 icefall (borrowed from WeNet) seem to have the correct behavior here.

What `chunked_limited_mask` from icefall and WeNet look like for `left_chunks=10`.

Not sure if I'm missing something here, please let me know. Happy to hear any thoughts or explanations on this.

VahidooX · 2023-04-17T19:20:19Z

VahidooX
Apr 17, 2023
Collaborator

On why mask creation is not in forward

The reason creating masking on the fly in the forward is avoided is that some operations like torch.triu was not supported by ONNX at that time. We can take a look again and see if it is supported, then move it in the forward. Other simple fix is to call the a method to recreate the masks in the setup_streaming_params.

On why left chunk is limited

The left_chunks should be limited as it is a streaming model and there should be a limit on the left context otherwise the memory consumption would increase over the streaming audio. So it is better to have a limited left context. Also note that usually left contexts more than 6 seconds does not help the accuracy significantly. This is a layer-wise left context, so effective left context is actually so much larger already. It should be a useful feature to be able to change the left context after the training but it is not critical as users usually train their models with the left context they have in mind for streaming.

Right context is the one very influential on the accuracy.

We have developed muli-lookahead cache-aware which can supports multiple left and right contexts in a single model. We have not added it to NeMo yet. That one creates the masking in the forward on the fly:
https://github.com/VahidooX/NeMo/blob/adaptive_streaming2/nemo/collections/asr/modules/conformer_encoder.py#L395

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attending beyond the fixed length context in cache-aware Conformer #6432

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Attending beyond the fixed length context in cache-aware Conformer #6432

sknadig Apr 15, 2023

TL; DR: The current implementation of the cache-aware Conformer doesn't look like it can attend beyond the fixed length context, while implementations at k2 icefall and WeNet have the correct behavior.

att_mask

att_mask

Currently, what chunked_limited_mask looks like for left_chunks=10 (when it was set to 4 in training)

What chunked_limited_mask should ideally look like for left_chunks=10 (irrespective of what it was set to in training)

What chunked_limited_mask from icefall and WeNet look like for left_chunks=10.

Replies: 1 comment

VahidooX Apr 17, 2023 Collaborator

sknadig
Apr 15, 2023

`att_mask`

`att_mask`

Currently, what `chunked_limited_mask` looks like for `left_chunks=10` (when it was set to 4 in training)

What `chunked_limited_mask` should ideally look like for `left_chunks=10` (irrespective of what it was set to in training)

What `chunked_limited_mask` from icefall and WeNet look like for `left_chunks=10`.

VahidooX
Apr 17, 2023
Collaborator