This website requires JavaScript.
Explore
Help
Sign In
fam
/
nanochat-omni
Watch
1
Star
0
Fork
0
You've already forked nanochat-omni
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
747ed4491f7fe77b1f99a385309804a3f2cca353
nanochat-omni
/
nanochat
/
gpt.py
T
Andrej Karpathy
fbc1484e8c
add alternating window size patterns for the GPT layers, following GPT-3. Experimented a bit and found the pattern SSSL to work well - 3 short, 1 long alternating. This is now the new default and the plots look quite a bit better on flops vs. bpb
2026-01-11 21:49:54 +00:00
20 KiB
Raw
Blame
History
View Raw
Reference in New Issue
View Git Blame
Copy Permalink