Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change instance count for ACA workloadProfiles to be min 0, max 1 #143

Open
wants to merge 1 commit into
base: ai-toolkit
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Copy link

@XiaofuHuang XiaofuHuang Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set the maximum instance count to one, the inference API will exclusively use this single instance, leaving none for the fine-tuning job. Therefore, if you need to fine-tune, the job will indefinitely be on hold and never start after the inference endpoint has been provisioned. Our tests show that the ACA instance has the capacity to automatically scale up to 2 and down to 0.

Copy link

@XiaofuHuang XiaofuHuang Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ACA instance for the inference API is currently set to scale down only to one, not zero. https://github.com/microsoft/windows-ai-studio-templates/blob/00b35ff85cbf1cfbaece85f55a294db81f6af1f1/configs/Phi-3-mini-4k-instruct/infra/provision/inference.bicep#L208C7-L209C11
This is because we must execute the command pip install -r requirements.txt, which takes more than four minutes to initialize the container before the endpoint is accessible. However, we can change this setting. Once we have updated the image to include all dependencies, we will be able to adjust the ACA for inference to automatically scale down to zero.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested on the cost of ACA resources over the past three days.

After run fine-tuning once, the ACA instance auto scale-down to 0, and we keep these ACA resources. The total cost for this three-day period was just $2.16.
屏幕截图 2024-06-03 103823

Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"timeout": {
"value": 10800
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"location": {
"value": null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"timeout": {
"value": 10800
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"location": {
"value": null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"timeout": {
"value": 10800
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"location": {
"value": null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"location": {
"value": null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"location": {
"value": null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"location": {
"value": null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"location": {
"value": null
Expand Down
2 changes: 1 addition & 1 deletion configs/phi-1_5/infra/provision/finetuning.parameters.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"timeout": {
"value": 10800
Expand Down
2 changes: 1 addition & 1 deletion configs/phi-1_5/infra/provision/inference.parameters.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"location": {
"value": null
Expand Down
2 changes: 1 addition & 1 deletion configs/phi-2/infra/provision/finetuning.parameters.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"timeout": {
"value": 10800
Expand Down
2 changes: 1 addition & 1 deletion configs/phi-2/infra/provision/inference.parameters.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"location": {
"value": null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"timeout": {
"value": 10800
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
]
},
"maximumInstanceCount": {
"value": 2
"value": 1
},
"location": {
"value": null
Expand Down