Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][compiled-graphs] Register custom serializers for InputAttributeNodes and pass their type hints to downstream nodes #49236

Merged
merged 14 commits into from
Dec 15, 2024

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Dec 12, 2024

Why are these changes needed?

Issue #47251 states that the tensor will not be moved to the GPU even if users specify TorchTensorType on the input attribute node. The behavior is inconsistent with specifying TorchTensorType on the InputNode.

There are two reasons contributing to this inconsistent behavior:

  • Reason 1: The InputAttributeNode's type hint is not passed to the downstream DAG nodes.
  • Reason 2: The driver process does not register the custom serializer for InputAttributeNode.

There are four changes in this PR:

  • Resolve "Reason 1"
  • Resolve "Reason 2"
  • Refactor the code to register custom serializers for InputNode, InputAttributeNode, and output nodes in a single location.
  • Add tests

Related issue number

Closes #47251

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
@kevin85421 kevin85421 changed the title [core][compiled-graphs] Feeding multiple inputs to compiled graph and annotating with TorchTensorType does not move tensors to GPU [core][compiled-graphs] Register custom serializers for InputAttributeNodes and pass the type hints of InputAttributeNodes to downstream nodes Dec 12, 2024
@kevin85421 kevin85421 changed the title [core][compiled-graphs] Register custom serializers for InputAttributeNodes and pass the type hints of InputAttributeNodes to downstream nodes [core][compiled-graphs] Register custom serializers for InputAttributeNodes and pass their type hints to downstream nodes Dec 12, 2024
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
@@ -1088,6 +1092,9 @@ def _preprocess(self) -> None:
):
downstream_actor_handle = dag_node._get_actor_handle()

# Add the type hint of the upstream node to the task.
task.arg_type_hints.append(upstream_task.dag_node.type_hint)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve "Reason 1". In the logic below, if the upstream_task.dag_node is an InputAttributeNode, the upstream_task will be set to self.idx_to_task[self.input_task_idx]. This is why I moved this line above.

input_task.dag_node.type_hint.register_custom_serializer()

# Register custom serializers for input attribute nodes.
for input_attr_task_idx in self.input_attr_task_idx_list:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve "Reason 2"

@kevin85421 kevin85421 marked this pull request as ready for review December 13, 2024 02:42
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Copy link
Contributor

@ruisearch42 ruisearch42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Minor comments.

python/ray/dag/compiled_dag_node.py Outdated Show resolved Hide resolved
python/ray/dag/tests/experimental/test_torch_tensor_dag.py Outdated Show resolved Hide resolved
python/ray/dag/tests/experimental/test_torch_tensor_dag.py Outdated Show resolved Hide resolved
Signed-off-by: Kai-Hsun Chen <[email protected]>
@kevin85421 kevin85421 added the go add ONLY when ready to merge, run all tests label Dec 13, 2024
@jjyao jjyao merged commit 1e0916e into ray-project:master Dec 15, 2024
5 checks passed
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Dec 17, 2024
…eNodes and pass their type hints to downstream nodes (ray-project#49236)

Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[aDAG][Core] Feeding multiple inputs to aDAG and annotating with TorchTensor does not move tensors to GPU
3 participants