-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Use max_inflight_executions for buffered channels on same node #49755
base: master
Are you sure you want to change the base?
Conversation
4369481
to
f78c82c
Compare
Gpu multinode Perf Resultsexec_nccl_gpu = [3875.943098419206, 110.49947845465303] |
Second run exec_nccl_gpu = [3859.74345306353, 138.13849291778993] |
bdb9b53
to
61d0124
Compare
Improvement on single node exec_torch_cpu_cpu = [12599.322323620976, 71.66839011380934] |
Signed-off-by: dayshah <[email protected]>
Signed-off-by: dayshah <[email protected]>
61d0124
to
32ff703
Compare
Signed-off-by: dayshah <[email protected]>
Signed-off-by: dayshah <[email protected]>
Multinode performance same as before with change to only create buffered when on same node exec_nccl_gpu = [3790.8402979063458, 56.6865765961406] |
Signed-off-by: dayshah <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM
if ( | ||
self._writer is None | ||
or self._writer == ray.get_runtime_context().current_actor | ||
): | ||
writer_node = ray.get_runtime_context().get_node_id() | ||
else: | ||
writer_node = ray.get( | ||
self._writer.__ray_call__.remote( | ||
lambda self: ray.get_runtime_context().get_node_id() | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this as a utils method? Maybe the larger scope can be extracted, i.e., have something like write_to_remote_node()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, made get_writer_node
Signed-off-by: dayshah <[email protected]>
Signed-off-by: dayshah <[email protected]>
if i % 2 == 0: | ||
results.append(dag.execute(b"x")) | ||
else: | ||
results.append(dag.execute(b"yyyyyyy")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this code? Add a comment
@@ -122,6 +122,7 @@ def exec_ray_dag( | |||
use_cgraph=True, | |||
static_shape=False, | |||
direct_return=False, | |||
num_times_execute=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just num_executions
?
return readers_on_same_node, readers_on_different_node | ||
|
||
|
||
def get_writer_node(writer: "ray.actor.ActorHandle"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just call it get_node
?
return type annotation
@@ -41,3 +41,51 @@ def split_readers_by_locality( | |||
local_readers.append((reader, node)) | |||
|
|||
return remote_readers, local_readers | |||
|
|||
|
|||
def split_readers_by_node_locality( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a utils method, let's try to make it general. "readers" are not relevant here, the function just splits actors based on locality.
Why are these changes needed?
We reverted the change to have max_inflight_executions buffered channels due to performance regressions on multi-node benchmarks.
This change uses the buffered channels when they're between actors on the same node, and just uses a single channel when it's between actors on different nodes.
Related issue number
Closes #49044
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.