Skip to content

Commit

Permalink
.
Browse files Browse the repository at this point in the history
  • Loading branch information
KellerJordan committed Jan 4, 2025
1 parent ea96131 commit b944c19
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ The following is the progression of world records for the task of *training a mo
All new record attempts:

1. Must not modify the train or validation data pipelines. (Except to change batch size, seqlen, attention structure etc. I.e., just don't change the underlying tokens.)
2. Must attain ≤ 3.28 val loss. Unfortunately, due to high inter-run variance, new record attempts must provide enough run logs to attain a statistical significance level of p<0.01 that their average val loss is lower than 3.28. You see see how to conduct a t-test [here](./records/120424_ValueEmbed).
2. Must attain ≤ 3.28 val loss. Unfortunately, due to high inter-run variance, new record attempts must provide enough run logs to attain a statistical significance level of p<0.01 that their average val loss is lower than 3.28. You see see how to conduct a t-test [here](records/120424_ValueEmbed).

Other than that, go crazy! Anything is fair game (e.g., MoE is fair, but will probably require implementing fast kernels to be competitive).

Expand Down

0 comments on commit b944c19

Please sign in to comment.