For better numerical accuracy in LayerNorm #518

nhamanasu · 2025-01-09T16:57:12Z

🚀 The feature, motivation and pitch

In the implementation of LayerNorm, I wonder the var computation may result in a wrong value when BLOCK_SIZE is not the same as the feature dimension (in other words, mask has some False elements).

This intuition is because, the elements where mask == True will be -mean instead of 0.0, and var just takes the summation including them too.

I think we should properly mask those positions with 0.0 by using tl.where or something.

Liger-Kernel/src/liger_kernel/ops/layer_norm.py

Line 60 in 134a13e

var = tl.sum((X_row - mean) * (X_row - mean), axis=0) / n_cols

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

ByronHsu · 2025-01-21T18:28:53Z

in what case BLOCK_SIZE != feature dim?

nhamanasu · 2025-01-22T01:38:07Z

Thank you for the comment. Please see the discussion in #519

To put it simply, BLOCK_SIZE must be the power of 2 in triton.
So we can make arbitrary situations where BLOCK_SIZE != feature dim e.g., feat_dim == 768, 1536, whatever.

nhamanasu mentioned this issue Jan 9, 2025

Fix mean subtraction in layer norm kernels #519

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For better numerical accuracy in LayerNorm #518

For better numerical accuracy in LayerNorm #518

nhamanasu commented Jan 9, 2025 •

edited

Loading

ByronHsu commented Jan 21, 2025

nhamanasu commented Jan 22, 2025

For better numerical accuracy in LayerNorm #518

For better numerical accuracy in LayerNorm #518

Comments

nhamanasu commented Jan 9, 2025 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

ByronHsu commented Jan 21, 2025

nhamanasu commented Jan 22, 2025

nhamanasu commented Jan 9, 2025 •

edited

Loading