Model parallelism on ASR tasks with DP accelerator. #2385

roman-vygon · 2021-06-22T07:25:39Z

roman-vygon
Jun 22, 2021

Hello!
I've been trying to port my ASR NeMo pipeline to support multi-gpu batches.
Right now I'm using the standard tutorial notebook, but with 'dp' accelerator turned on. However, I keep getting error of type "x and y must be on the same device, but x was on cuda:0 and y was on cuda:1", first in stft, later in mel_spectrogram when I manually added .type_as(input) to the window parameter of stft.

How can I use model parallelism with NeMo, or is it not implemented yet?

To reproduce:
run tutorial notebook with trainer.accelerator = 'dp' and multiple gpus.

Answered by titu1994

Jun 22, 2021

We do not support any mode other than DDP, because it is the most efficient way of multi gpu training. Most of our models are also non pickleble, therefore other methods of distributed training would not work anyway v

View full answer

titu1994 · 2021-06-22T07:31:16Z

titu1994
Jun 22, 2021
Maintainer

We do not support any mode other than DDP, because it is the most efficient way of multi gpu training. Most of our models are also non pickleble, therefore other methods of distributed training would not work anyway v

0 replies

ericharper · 2021-06-22T15:29:19Z

ericharper
Jun 22, 2021
Maintainer

Our example scripts show how to train with multiple GPUs: https://github.com/NVIDIA/NeMo/tree/main/examples/asr

1 reply

roman-vygon Jun 23, 2021
Author

Yes, but they don't work with the 'dp' accelerator

roman-vygon · 2021-06-23T08:36:31Z

roman-vygon
Jun 23, 2021
Author

To anyone who might be interested, I was able to train a MatchBoxNet model from the notebook in 'DP' mode by tweaking some code in torchaudio and NeMo libraries - just following the errors and applying x.to(y.device) or x=x.type_as(y) where needed. It's a hacky solution but works for my needs. If anyone has a better one I'd be glad to see it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model parallelism on ASR tasks with DP accelerator. #2385

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Model parallelism on ASR tasks with DP accelerator. #2385

roman-vygon Jun 22, 2021

Replies: 3 comments · 1 reply

titu1994 Jun 22, 2021 Maintainer

ericharper Jun 22, 2021 Maintainer

roman-vygon Jun 23, 2021 Author

roman-vygon Jun 23, 2021 Author

roman-vygon
Jun 22, 2021

Replies: 3 comments 1 reply

titu1994
Jun 22, 2021
Maintainer

ericharper
Jun 22, 2021
Maintainer

roman-vygon Jun 23, 2021
Author

roman-vygon
Jun 23, 2021
Author