You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for pointing this out. Currently we don't support dist ops with uneven sizes. Basically XLA needd to know the gathered output tensor shape before hand. One work around we can do is to pad those tensor to even sizes before all_gather.
By the way, we now support dist ops directly like:
🐛 Bug
Torch XLA Model all_gather works with tensors of same size along
dim=0
, but if tensor sizes are different alongdim=0
, it hangs.To Reproduce
Save this code in
test_all_gather.py
Expected behavior
It should gather all the tensors from all the devices along
dim=0
Environment
Docker image
us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_cuda_12.4
Additional context
According to this documentation for
all_gather
https://pytorch.org/docs/stable/distributed.html uneven tensor sizes are supported.The text was updated successfully, but these errors were encountered: