-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I implement pytorchjob's workers to use different images or configurations? #2365
Comments
Thank you for creating this issue! /remove-kind bug |
@tenzen-y: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Ok, I got it, thank you for your reply~ By the way, if I still want to achieve this behavior, is there any way to do it? I can accept any other job (tfjob or anythong else) or any proposal, thanks again and have a nice day. |
Thank you for creating this @certainly-cyber! |
/kind feature |
What happened?
Hello everyone! As mentioned above, I hope that the worker of pytorhjob can use different images or configurations.
Normally, my yaml is similar to this:
apiVersion: "kubeflow.org/v1"
kind: "PyTorchJob"
metadata:
name: "resnet-1"
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
...
Worker:
replicas: 3
...
However, if I want to use different configurations (such as images) for different workers, how should I do it? I tried to configure multiple workers, like this:
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
...
Worker:
replicas: 1
...
Worker:
replicas: 1
...
But unfortunately, this doesn't work.
What did you expect to happen?
How can I implement pytorchjob's workers to use different images or configurations?
Environment
Kubernetes version:
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.2
Training Operator version:
kubeflow/training-operator:v1-9e52eb7#
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
The text was updated successfully, but these errors were encountered: