Commit Graph

3 Commits

Author SHA1 Message Date
Matěj Kripner f1bf69d562 feat: pad vocab size to 64 for DDP optimizers and efficiency 2025-12-09 12:38:18 +01:00
Sermet Pekin 49cd02f283 fix: remove unnecessary tensor allocation in DistAdamW optimizer
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-20 12:03:26 +03:00
karpathy 3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00