-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Nvidia] Support fp8 to bf16 casting on RTX 4000 series (#5544)
I noticed that some of the tests were failing when I was testing on a workstation with a consumer RTX card. Turns out that sm_89 supports fp8, but doesn't support cvt.bf16.f16 From the ptx spec: ``` cvt.bf16.{u8/s8/u16/s16/u32/s32/u64/s64/f16/f64/bf16}, cvt.{u8/s8/u16/s16/u32/s32/u64/s64/f16/f64}.bf16, and cvt.tf32.f32.{relu}.{rn/rz} require sm_90 or higher. ``` This adds a path to first convert to fp32 and then bf16 if compute compatibility is < 90, This is already hit in the tests (specifically several test cases in test core, many variations on dot_scaled in particular).
- Loading branch information
1 parent
4a4dac9
commit 4947a95
Showing
1 changed file
with
35 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters