Model parallelism on ASR tasks with DP accelerator. #2385
-
Hello! How can I use model parallelism with NeMo, or is it not implemented yet? To reproduce: |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
We do not support any mode other than DDP, because it is the most efficient way of multi gpu training. Most of our models are also non pickleble, therefore other methods of distributed training would not work anyway v |
Beta Was this translation helpful? Give feedback.
-
Our example scripts show how to train with multiple GPUs: https://github.com/NVIDIA/NeMo/tree/main/examples/asr |
Beta Was this translation helpful? Give feedback.
-
To anyone who might be interested, I was able to train a MatchBoxNet model from the notebook in 'DP' mode by tweaking some code in torchaudio and NeMo libraries - just following the errors and applying x.to(y.device) or x=x.type_as(y) where needed. It's a hacky solution but works for my needs. If anyone has a better one I'd be glad to see it. |
Beta Was this translation helpful? Give feedback.
We do not support any mode other than DDP, because it is the most efficient way of multi gpu training. Most of our models are also non pickleble, therefore other methods of distributed training would not work anyway v