xla_backend make_send_channel_id NotImplementedError #8594

radna0 · 2025-01-21T14:32:17Z

I'm working on integrating XLA backend with DeepSpeed and encounter this:

File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/__init__.py", line 209, in initialize
    engine = PipelineEngine(args=args,
  File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/runtime/pipe/engine.py", line 239, in __init__
    p2p.send(self.loss, self.next_stage)
  File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/runtime/pipe/p2p.py", line 60, in send
    return dist.send(tensor, dest_rank)
  File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 117, in log_wrapper
    return func(*args, **kwargs)
  File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 358, in send
    return cdb.send(tensor=tensor, dst=dst, group=group, tag=tag)
  File "/home/kojoe/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
    return fn(*args, **kwargs)
  File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/comm/torch.py", line 296, in send
    return torch.distributed.send(tensor=tensor, dst=dst, group=group, tag=tag)
  File "/home/kojoe/.local/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 83, in wrapper
    return func(*args, **kwargs)
  File "/home/kojoe/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2148, in send
    default_pg.send([tensor], dst, tag).wait()
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/distributed/xla_backend.py", line 249, in send
    channel_id = self.make_send_channel_id(dst_rank, tag)
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/distributed/xla_backend.py", line 242, in make_send_channel_id
    raise NotImplementedError
NotImplementedError

The text was updated successfully, but these errors were encountered:

bhavya01 · 2025-01-23T19:19:40Z

There's a few missing ops in xla backend for torch_xla. I don't think we'll be able to implement this soon but if you're interested, I'll be happy to review the PRs

radna0 · 2025-01-24T00:14:03Z

@bhavya01 I'm not really sure if that's the right thing to do, I saw this here from @JackCaoG , this seems like a definite decision not missing ops, but it would have been to get an example for these ops with different accelerators/backend?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xla_backend make_send_channel_id NotImplementedError #8594

xla_backend make_send_channel_id NotImplementedError #8594

radna0 commented Jan 21, 2025

bhavya01 commented Jan 23, 2025

radna0 commented Jan 24, 2025

xla_backend make_send_channel_id NotImplementedError #8594

xla_backend make_send_channel_id NotImplementedError #8594

Comments

radna0 commented Jan 21, 2025

bhavya01 commented Jan 23, 2025

radna0 commented Jan 24, 2025