You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the implementation of LayerNorm, I wonder the var computation may result in a wrong value when BLOCK_SIZE is not the same as the feature dimension (in other words, mask has some False elements).
This intuition is because, the elements where mask == True will be -mean instead of 0.0, and var just takes the summation including them too.
I think we should properly mask those positions with 0.0 by using tl.where or something.
Thank you for the comment. Please see the discussion in #519
To put it simply, BLOCK_SIZE must be the power of 2 in triton.
So we can make arbitrary situations where BLOCK_SIZE != feature dim e.g., feat_dim == 768, 1536, whatever.
🚀 The feature, motivation and pitch
In the implementation of LayerNorm, I wonder the
var
computation may result in a wrong value whenBLOCK_SIZE
is not the same as the feature dimension (in other words,mask
has some False elements).This intuition is because, the elements where
mask == True
will be-mean
instead of0.0
, andvar
just takes the summation including them too.I think we should properly mask those positions with 0.0 by using
tl.where
or something.Liger-Kernel/src/liger_kernel/ops/layer_norm.py
Line 60 in 134a13e
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: