WER metric converges to 1.0 when applying Conformer-Transducer Model #4324

nghiahuynh-ai · 2022-06-03T12:28:51Z

nghiahuynh-ai
Jun 3, 2022

I get trouble in training conformer-transducer model. I try changing config of the net little by little, but it doesn't work. The WER metric always converges to 1.0 and the log of prediction, for instance, is:

[NeMo I 2022-06-03 12:19:00 rnnt_wer_bpe:232] reference :p i t t s b u r g h
[NeMo I 2022-06-03 12:19:00 rnnt_wer_bpe:233] predicted :
[NeMo I 2022-06-03 12:19:00 rnnt_wer_bpe:231]

It predicts nothing!
I use an4 datasets introduced in Tutorials. Here is my config for Conformer-Transducer Model (sub-word):

name: Conformer-Transducer-BPE
model:
sample_rate: 16000
compute_eval_loss: false
log_prediction: true
skip_nan_grad: false
model_defaults:
enc_hidden: ${model.encoder.d_model}
pred_hidden: 64
joint_hidden: 64

train_ds:
manifest_filepath: datasets/an4/train_manifest.json
sample_rate: ${model.sample_rate}
batch_size: 16
shuffle: true
num_workers: 8
pin_memory: true
use_start_end_token: false
trim_silence: false
max_duration: 16.7
min_duration: 0.1
is_tarred: false
tarred_audio_filepaths: null
shuffle_n: 2048
bucketing_strategy: synced_randomized
bucketing_batch_size: null

validation_ds:
manifest_filepath: datasets/an4/test_manifest.json
sample_rate: ${model.sample_rate}
batch_size: 16
shuffle: false
num_workers: 8
pin_memory: true
use_start_end_token: false

test_ds:
manifest_filepath: datasets/an4/test_manifest.json
sample_rate: ${model.sample_rate}
batch_size: 16
shuffle: false
num_workers: 8
pin_memory: true
use_start_end_token: false

tokenizer:
dir: tokenizers/tokenizer_spe_unigram_v32
type: bpe

preprocessor:
target: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor
sample_rate: 16000
normalize: per_feature
window_size: 0.025
window_stride: 0.01
window: hann
features: 80
n_fft: 512
frame_splicing: 1
dither: 1.0e-05
pad_to: 0

spec_augment:
target: nemo.collections.asr.modules.SpectrogramAugmentation
freq_masks: 0
time_masks: 0
freq_width: 27
time_width: 0.05

encoder:
target: nemo.collections.asr.modules.ConformerEncoder
feat_in: ${model.preprocessor.features}
feat_out: -1
n_layers: 17
d_model: 512
subsampling: striding
subsampling_factor: 4
subsampling_conv_channels: -1
ff_expansion_factor: 4
self_attention_model: rel_pos
n_heads: 8
att_context_size:
- -1
- -1
xscaling: true
untie_biases: true
pos_emb_max_len: 5000
conv_kernel_size: 31
conv_norm_type: batch_norm
dropout: 0.1
dropout_emb: 0.0
dropout_att: 0.1

decoder:
target: nemo.collections.asr.modules.RNNTDecoder
normalization_mode: null
random_state_sampling: false
blank_as_pad: true
prednet:
pred_hidden: ${model.model_defaults.pred_hidden}
pred_rnn_layers: 1
t_max: null
dropout: 0.1

joint:
target: nemo.collections.asr.modules.RNNTJoint
log_softmax: null
preserve_memory: false
fuse_loss_wer: true
fused_batch_size: 16
jointnet:
joint_hidden: ${model.model_defaults.joint_hidden}
activation: relu
dropout: 0.1

decoding:
strategy: greedy_batch
greedy:
max_symbols: 30
beam:
beam_size: 2
return_best_hypothesis: false
score_norm: true
tsd_max_sym_exp: 50
alsd_max_target_len: 2.0

loss:
loss_name: default
warprnnt_numba_kwargs:
fastemit_lambda: 0.0
clamp: -1.0

variational_noise:
start_step: 0
std: 0.0

optim:
name: adamw
lr: 0.001
betas:
- 0.9
- 0.98
weight_decay: 0
sched:
name: NoamAnnealing
d_model: ${model.encoder.d_model}
warmup_steps: 10000
warmup_ratio: null
min_lr: 1.0e-06

trainer:
devices: -1
num_nodes: 1
max_epochs: 500
max_steps: null
val_check_interval: 1.0
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32
log_every_n_steps: 10
progress_bar_refresh_rate: 10
resume_from_checkpoint: null
num_sanity_val_steps: 0
check_val_every_n_epoch: 1
sync_batchnorm: true
enable_checkpointing: false
logger: false

exp_manager:
exp_dir: null
name: ${name}
create_tensorboard_logger: true
create_checkpoint_callback: true
checkpoint_callback_params:
monitor: val_wer
mode: min
save_top_k: 5
always_save_nemo: true
resume_if_exists: false
resume_ignore_no_checkpoint: false
create_wandb_logger: false
wandb_logger_kwargs:
name: null
project: null

Please show me the key to solve this problem. Thanks.

titu1994 · 2022-06-04T22:30:22Z

titu1994
Jun 4, 2022
Maintainer

Your model is way too large for a toy dataset such as an4. Reduce it to 1 M or so params and try. Or use a pretrained checkpoint to use as initialization

1 reply

nghiahuynh-ai Jun 5, 2022
Author

Thank for nice advice

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WER metric converges to 1.0 when applying Conformer-Transducer Model #4324

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

WER metric converges to 1.0 when applying Conformer-Transducer Model #4324

nghiahuynh-ai Jun 3, 2022

Replies: 1 comment · 1 reply

titu1994 Jun 4, 2022 Maintainer

nghiahuynh-ai Jun 5, 2022 Author

nghiahuynh-ai
Jun 3, 2022

Replies: 1 comment 1 reply

titu1994
Jun 4, 2022
Maintainer

nghiahuynh-ai Jun 5, 2022
Author