Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xla_backend make_send_channel_id NotImplementedError #8594

Open
radna0 opened this issue Jan 21, 2025 · 2 comments
Open

xla_backend make_send_channel_id NotImplementedError #8594

radna0 opened this issue Jan 21, 2025 · 2 comments

Comments

@radna0
Copy link

radna0 commented Jan 21, 2025

I'm working on integrating XLA backend with DeepSpeed and encounter this:

File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/__init__.py", line 209, in initialize
    engine = PipelineEngine(args=args,
  File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/runtime/pipe/engine.py", line 239, in __init__
    p2p.send(self.loss, self.next_stage)
  File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/runtime/pipe/p2p.py", line 60, in send
    return dist.send(tensor, dest_rank)
  File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 117, in log_wrapper
    return func(*args, **kwargs)
  File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 358, in send
    return cdb.send(tensor=tensor, dst=dst, group=group, tag=tag)
  File "/home/kojoe/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
    return fn(*args, **kwargs)
  File "/home/kojoe/.local/lib/python3.10/site-packages/deepspeed/comm/torch.py", line 296, in send
    return torch.distributed.send(tensor=tensor, dst=dst, group=group, tag=tag)
  File "/home/kojoe/.local/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 83, in wrapper
    return func(*args, **kwargs)
  File "/home/kojoe/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2148, in send
    default_pg.send([tensor], dst, tag).wait()
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/distributed/xla_backend.py", line 249, in send
    channel_id = self.make_send_channel_id(dst_rank, tag)
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/distributed/xla_backend.py", line 242, in make_send_channel_id
    raise NotImplementedError
NotImplementedError
@bhavya01
Copy link
Collaborator

There's a few missing ops in xla backend for torch_xla. I don't think we'll be able to implement this soon but if you're interested, I'll be happy to review the PRs

@radna0
Copy link
Author

radna0 commented Jan 24, 2025

@bhavya01 I'm not really sure if that's the right thing to do, I saw this here from @JackCaoG , this seems like a definite decision not missing ops, but it would have been to get an example for these ops with different accelerators/backend?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants