A House United Within Itself: SLO-Awareness for On-Premises Containerized ML Inference Clusters via Faro (Eurosys 25)

Simulation

We provide Dockerfile and prebuilt Docker image: beomyeol/faro-operator-1600:sim.

Please see src/simulation/README.md to see how to run simulation.

Cluster experiments

Tested on the IBM Cloud VPC Kubernetes Cluster (two cx2-32x64 VM instances or 32 cx2-4x8 VM instances).

Currently, all components including Ray clusters should run under the namespace k8s-ray.

Prerequisites

kubectl (https://kubernetes.io/docs/reference/kubectl/)
kustomize (https://kustomize.io/)
docker (https://www.docker.com/)

Prepare: build and push a controller docker image, and download `kustomize`.

make build && make push && make kustomize

Setup: create namespace, install crd, etc.

make install
kubectl create -f example/ray/cluster_crd.yaml

Deploy a Ray controller

kubectl create -f example/ray/operator.yaml

Deploy trace replayer

kubectl create -f example/serve/trace_replayer/replayer.yaml

Build autoscaler docker image and push (skip this if the image is already built)

make build
make push

Deploy autoscaler

make deploy

Set a quota for namespace to limit resources

kubectl apply -f experiments/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/quota/30_workers.yaml
kubectl apply -f experiments/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/quota/32_workers.yaml
kubectl apply -f experiments/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/quota/36_workers.yaml
kubectl apply -f experiments/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/quota/40_workers.yaml
kubectl apply -f experiments/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/quota/44_workers.yaml

Deploy Ray clusters

kubectl kustomize experiments/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/k8s | kubectl create -f -

Deploy inference jobs for each cluster and copy input

python scripts/deploy_jobs.py experiments/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/input.json

Copy config and launch terminal for autoscaler

python scripts/deploy_autoscaler_config.py config/autoscaler/aiad.yaml
python scripts/deploy_autoscaler_config.py config/autoscaler/oneshot.yaml
python scripts/deploy_autoscaler_config.py config/autoscaler/mark.yaml
python scripts/deploy_autoscaler_config.py config/autoscaler/32_workers/faro_sum.yaml
python scripts/deploy_autoscaler_config.py config/autoscaler/36_workers/faro_sum.yaml
python scripts/deploy_autoscaler_config.py config/autoscaler/40_workers/faro_sum.yaml
kubectl exec -n k8s-ray deployment/faro-operator -it -- /bin/bash

Launch autoscaler (inside `deployment/faro-operator`)

python src/autoscaler.py config.yaml |& tee run.log

Launch replayer

kubectl exec -n k8s-ray pod/replayer -it -- /bin/bash
sleep 230 && ./loadgen -i input.json -img image.jpg -max_idle_conn 100 -interval_type poisson -seed 42 -unit_time 60 --max_trials 2

Get logs and parse them

python scripts/get_serve_logs.py results/faro-us-south/mixed/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/32_cpus/aiad --with-autoscaler [--with-worker]
python scripts/parse_serve_logs.py results/faro-us-south/mixed/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/32_cpus/aiad

Delete Ray clusters

kubectl kustomize experiments/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/k8s | kubectl delete -f -

Repeat from changing quota to parsing logs while changing policies (AIAD, Faro-Sum, etc.)

Generate stats from the parsed logs

python -m scripts.simulation.run_suite --max-workers=8 $(pwd)/results/faro-us-south/mixed/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/32_cpus --stats --unit_time=60 --utility=latency

This will create latency_stats.pkl that is used for generating plots

Undeploy all components

make undeploy
kubectl kustomize experiments/top9_twitter_1_1600_avgproc_min_int5m_reduced_6hr_augmented/k8s | kubectl delete -f -
kubectl delete -f example/serve/trace_replayer/replayer.yaml
kubectl delete -f example/ray/operator.yaml

Generate plots

See scripts/plots/README.md

Miscellaneous

Setup IBM Cloud Registry

Reference

ibmcloud plugin install container-registry -r 'IBM Cloud'
ibmcloud cr region-set us-south
ibmcloud cr login

Push images to IBM Cloud CR

docker push us.icr.io/faro/faro-operator-1600:234b913`

Copy secret for k8s-ray namespaces and patch default account to use image pull secrets

kubectl get secret all-icr-io -n default -o yaml | sed 's/default/k8s-ray/g' | kubectl create -n k8s-ray -f - 
kubectl patch -n k8s-ray serviceaccount/default -p '{"imagePullSecrets":[{"name": "all-icr-io"}]}'

Cilantro

Use a Cilantro fork that supports ResNet34 and uses the same traces that Faro uses: https://github.com/beomyeol/cilantro and https://github.com/beomyeol/cilantro-workloads

Create docker images for each by running

docker build -t beomyeol/cilantro:latest . && docker push beomyeol/cilantro:latest
docker build -t beomyeol/cray-workloads:latest . && docker push beomyeol/cray-workloads:latest

For evaluation, we used beomyeol/cilantro:ef28039 and beomyeol/cray-workloads:2a1d9e1.

Run cilantro and get logs

./starters/launch_cilantro_driver_kind.sh ~/.kube/config utilwelflearn

Wait for 6 hours to finish experiments. Then fetch results by using the following command.

./starters/fetch_results.sh

This will create workdirs_eks. Provide this for generating the cilantro timeline plot. See scripts/plots/README.md.

License

University of Illinois/NCSA Open Source License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A House United Within Itself: SLO-Awareness for On-Premises Containerized ML Inference Clusters via Faro (Eurosys 25)

Simulation

Cluster experiments

Prerequisites

Prepare: build and push a controller docker image, and download `kustomize`.

Setup: create namespace, install crd, etc.

Deploy a Ray controller

Deploy trace replayer

Build autoscaler docker image and push (skip this if the image is already built)

Deploy autoscaler

Set a quota for namespace to limit resources

Deploy Ray clusters

Deploy inference jobs for each cluster and copy input

Copy config and launch terminal for autoscaler

Launch autoscaler (inside `deployment/faro-operator`)

Launch replayer

Get logs and parse them

Delete Ray clusters

Generate stats from the parsed logs

Undeploy all components

Generate plots

Miscellaneous

Setup IBM Cloud Registry

Push images to IBM Cloud CR

Copy secret for k8s-ray namespaces and patch default account to use image pull secrets

Cilantro

Run cilantro and get logs

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data		data
example		example
experiments		experiments
pred		pred
ray		ray
results		results
scripts		scripts
src		src
.dockerignore		.dockerignore
Dockerfile		Dockerfile
Dockerfile_sim		Dockerfile_sim
License.txt		License.txt
Makefile		Makefile
README.md		README.md

License

beomyeol/faro

Folders and files

Latest commit

History

Repository files navigation

A House United Within Itself: SLO-Awareness for On-Premises Containerized ML Inference Clusters via Faro (Eurosys 25)

Simulation

Cluster experiments

Prerequisites

Prepare: build and push a controller docker image, and download kustomize.

Setup: create namespace, install crd, etc.

Deploy a Ray controller

Deploy trace replayer

Build autoscaler docker image and push (skip this if the image is already built)

Deploy autoscaler

Set a quota for namespace to limit resources

Deploy Ray clusters

Deploy inference jobs for each cluster and copy input

Copy config and launch terminal for autoscaler

Launch autoscaler (inside deployment/faro-operator)

Launch replayer

Get logs and parse them

Delete Ray clusters

Generate stats from the parsed logs

Undeploy all components

Generate plots

Miscellaneous

Setup IBM Cloud Registry

Push images to IBM Cloud CR

Copy secret for k8s-ray namespaces and patch default account to use image pull secrets

Cilantro

Run cilantro and get logs

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Prepare: build and push a controller docker image, and download `kustomize`.

Launch autoscaler (inside `deployment/faro-operator`)

Packages