This website requires JavaScript.
Explore
Help
Sign In
fam
/
nanochat-omni
Watch
1
Star
0
Fork
0
You've already forked nanochat-omni
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
061f83c152b359a145b2d76286a7d019d04fa882
nanochat-omni
/
scripts
/
base_train.py
T
Andrej Karpathy
061f83c152
delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
2026-01-08 02:16:50 +00:00
22 KiB
Raw
Blame
History
View Raw
Reference in New Issue
View Git Blame
Copy Permalink