Logo
Explore Help
Sign In
fam/nanochat-omni
1
0
Fork 0
You've already forked nanochat-omni
Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity
245 Commits 3 Branches 0 Tags
e1dafc510f122d5c31c38a3c96e45e544f47930f
Commit Graph

3 Commits

Author SHA1 Message Date
Andrej Karpathy 6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit 2026-01-15 03:20:48 +00:00
Andrej Karpathy 2c4473dd1b Big Muon optimizer changes inspired by latest of modded-nanogpt. Added Polar Express, Adafactor-style variance reduction, cautious weight decay, schedule weight decay linearly to ramp down to zero. Tuned optimum weight decay for multiple model sizes d8, d12, d16, d20 and found a scaling law with optimum wd \propto 1/channels^2, including it as default into code. --weight_decay of base_train is now default on and configured optimally according to all of these experiments. Solid bump to val_bpb observed as a result of these changes. 2026-01-11 16:56:59 +00:00
karpathy 3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00
Powered by Gitea Version: 1.26.0+rc0 Page: 26ms Template: 5ms
Auto
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API