-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized SD3 pipeline #1682
base: main
Are you sure you want to change the base?
Optimized SD3 pipeline #1682
Conversation
* HPU graphs enabled * Batching for inference enabled * Fused SDPA integrated * FP8 quantization enabled Co-authored-by: Daniel Socek <[email protected]>
@libinta @imangohari1 @regisss Request to review PR |
Additional performance results of this PR:
|
@deepak-gowda-narayana |
[error][tid:C62] FP32 operations are not supported on this device. Node Name BatchGemm135 |
@sywangyi Thank you for pointing this out. We removed the |
do you mean user could use like "PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python3 xxxx" even if torch.autocast is not there? |
@sywangyi Thanks. Yes that is one way, or one can add explicit lists directly via config file like this: https://huggingface.co/Habana/stable-diffusion-2/blob/main/gaudi_config.json, which is then handled in base GaudiDiffusers class: https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/diffusers/pipelines/pipeline_utils.py#L161 We did some testing with Flux pipeline and there we did see different ops being cast internally if we have Maybe to be safe we could add back the context and set enable to config value but we should not force |
optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Outdated
Show resolved
Hide resolved
optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Outdated
Show resolved
Hide resolved
optimum/habana/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Daniel Socek <[email protected]>
@sywangyi Daniel has made updates to autocasting in SD3, Please re-review PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What does this PR do?
Performance comparison of Pre and Post Optimizations for SD3 Pipeline
Achieved ~4x throughput improvement with HPU Graph and Fused SDPA
Diffusers CI Tests Pass
This PR is jointly co-authored with:
Before submitting