Skip to content

Some question about layernom in MLA code #3082

Closed Answered by ispobock
hcyz33 asked this question in Q&A
Discussion options

You must be logged in to vote

It's added in the original implementation.
Ref: https://huggingface.co/deepseek-ai/DeepSeek-V2.5/blob/c85b5ede86f2a598af339624cac5723861e557ed/modeling_deepseek.py#L825

And also mentioned in the paper:

For other tiny details (e.g., layer normalization and the activation function in FFNs), unless specifically stated, DeepSeek-V2 follows the settings of DeepSeek 67B

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@hcyz33
Comment options

Answer selected by hcyz33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
help wanted Extra attention is needed
3 participants
Converted from issue

This discussion was converted from issue #3072 on January 23, 2025 13:28.