Add float4_e2m1fn type support #26525

wenscarl · 2025-02-13T20:55:53Z

This PR adds f4E2M1FN type support.
E2M1FN is a OpenCompute MX scale format, which has the following properties:

4-bit floating point type with 1 sign bit, 2 bits exponent and 1 bit mantissa.

f4E2M1FN
- Exponent bias: 1
- Minimum stored exponent value: 1 (binary 01)
- Maximum stored exponent value: 3 (binary 11)
- Minimum unbiased exponent value: 1 − 1 = 0
- Maximum unbiased exponent value: 3 - 1 = 2
- Precision specifies the total number of bits used for the significand
    (mantissa), including implicit leading integer bit = 1 + 1 = 2
- Has positive and negative zero
- Doesn't have infinity
- Doesn't have NaN

Additional details:
- Zeros (+/-): S.00.0
- Max normal number (+/-): S.11.1 = 2^2 x (1+0.5) = ±6
- Min normal number (+/-): S.01.0 = 2^0 x (1+0) = ±1
- Min subnormal number (+/-): S.00.1 = 2^0 x 0.5 = ±0.5

support e2m1fn

81b3196

wenscarl marked this pull request as ready for review February 14, 2025 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add float4_e2m1fn type support #26525

Add float4_e2m1fn type support #26525

wenscarl commented Feb 13, 2025

Add float4_e2m1fn type support #26525

Are you sure you want to change the base?

Add float4_e2m1fn type support #26525

Conversation

wenscarl commented Feb 13, 2025