-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Losses are Nan and Infinite #1869
Comments
Which directory did you get the code? The later version in zipformer/ is more stable, there are earlier versions that eventually get unstable like that. |
How can I find which version I'm using? |
well if it's a git repo "git log -1" might tell you, if you are using a pip package then pip show icefall. |
I'm finetuning the model in zipformer. When I finetune with 100hrs of data, there was no issue but when I finetune the model with 3000hrs of data I'm facing the infinity or nan losses. What will be the cause for this issue
[1,mpirank:5,algo-1]:2025-01-19 09:00:32,064 INFO [finetune.py:1142] (5/8) Epoch 7, batch 1650, loss[loss=nan, simple_loss=inf, pruned_loss=inf, ctc_loss=nan, over 3310.00 frames. ], tot_loss[over 792510.56 frames. ], batch size: 14, lr: 4.28e-03,
[1,mpirank:0,algo-1]:2025-01-19 09:00:32,065 INFO [finetune.py:1142] (0/8) Epoch 7, batch 1650, loss[loss=nan, simple_loss=inf, pruned_loss=inf, ctc_loss=nan, over 4819.00 frames. ], tot_loss[over 814690.14 frames. ], batch size: 58, lr: 4.28e-03,
[1,mpirank:6,algo-1]:2025-01-19 09:00:32,068 INFO [finetune.py:1142] (6/8) Epoch 7, batch 1650, loss[loss=nan, simple_loss=inf, pruned_loss=inf, ctc_loss=nan, over 3370.00 frames. ], tot_loss[over 799670.96 frames. ], batch size: 13, lr: 4.28e-03,
[1,mpirank:3,algo-1]:2025-01-19 09:00:32,070 INFO [finetune.py:1142] (3/8) Epoch 7, batch 1650, loss[loss=nan, simple_loss=inf, pruned_loss=inf, ctc_loss=nan, over 4945.00 frames. ], tot_loss[over 807011.63 frames. ], batch size: 33, lr: 4.28e-03,
[1,mpirank:2,algo-1]:2025-01-19 09:00:32,071 INFO [finetune.py:1142] (2/8) Epoch 7, batch 1650, loss[loss=nan, simple_loss=inf, pruned_loss=inf, ctc_loss=nan, over 4949.00 frames. ], tot_loss[over 812248.61 frames. ], batch size: 66, lr: 4.28e-03,
[1,mpirank:1,algo-1]:2025-01-19 09:00:32,073 INFO [finetune.py:1142] (1/8) Epoch 7, batch 1650, loss[loss=nan, simple_loss=inf, pruned_loss=inf, ctc_loss=nan, over 4903.00 frames. ], tot_loss[over 823203.24 frames. ], batch size: 49, lr: 4.28e-03,
[1,mpirank:4,algo-1]:2025-01-19 09:00:32,075 INFO [finetune.py:1142] (4/8) Epoch 7, batch 1650, loss[loss=nan, simple_loss=inf, pruned_loss=inf, ctc_loss=nan, over 4743.00 frames. ], tot_loss[over 806376.22 frames. ], batch size: 27, lr: 4.28e-03,
The text was updated successfully, but these errors were encountered: