Logo
Explore Help
Sign In
fam/nanochat-omni
1
0
Fork 0
You've already forked nanochat-omni
Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity
266 Commits 3 Branches 0 Tags
8630d32be43912c1f8670c03fe6c0bdc843c1215
Commit Graph

3 Commits

Author SHA1 Message Date
Andrej Karpathy 6bb92403d5 changes and optimizations to muon, making it more efficient and simpler/cleaner a bit 2026-01-15 03:20:48 +00:00
Andrej Karpathy 2c4473dd1b Big Muon optimizer changes inspired by latest of modded-nanogpt. Added Polar Express, Adafactor-style variance reduction, cautious weight decay, schedule weight decay linearly to ramp down to zero. Tuned optimum weight decay for multiple model sizes d8, d12, d16, d20 and found a scaling law with optimum wd \propto 1/channels^2, including it as default into code. --weight_decay of base_train is now default on and configured optimally according to all of these experiments. Solid bump to val_bpb observed as a result of these changes. 2026-01-11 16:56:59 +00:00
karpathy 3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00
Powered by Gitea Version: 1.26.0+rc0 Page: 75ms Template: 16ms
Auto
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API