Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change instance count for ACA workloadProfiles to be min 0, max 1 #143

Open
wants to merge 1 commit into
base: ai-toolkit
Choose a base branch
from

Conversation

kaysabelle
Copy link
Member

Affects both finetuning and inferencing.

Copy link

@XiaofuHuang XiaofuHuang Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set the maximum instance count to one, the inference API will exclusively use this single instance, leaving none for the fine-tuning job. Therefore, if you need to fine-tune, the job will indefinitely be on hold and never start after the inference endpoint has been provisioned. Our tests show that the ACA instance has the capacity to automatically scale up to 2 and down to 0.

Copy link

@XiaofuHuang XiaofuHuang Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ACA instance for the inference API is currently set to scale down only to one, not zero. https://github.com/microsoft/windows-ai-studio-templates/blob/00b35ff85cbf1cfbaece85f55a294db81f6af1f1/configs/Phi-3-mini-4k-instruct/infra/provision/inference.bicep#L208C7-L209C11
This is because we must execute the command pip install -r requirements.txt, which takes more than four minutes to initialize the container before the endpoint is accessible. However, we can change this setting. Once we have updated the image to include all dependencies, we will be able to adjust the ACA for inference to automatically scale down to zero.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested on the cost of ACA resources over the past three days.

After run fine-tuning once, the ACA instance auto scale-down to 0, and we keep these ACA resources. The total cost for this three-day period was just $2.16.
屏幕截图 2024-06-03 103823

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants