Matěj Kripner
|
bbc57da7d5
|
slightly nicer error message
|
2025-12-09 12:46:48 +01:00 |
|
Matěj Kripner
|
f1bf69d562
|
feat: pad vocab size to 64 for DDP optimizers and efficiency
|
2025-12-09 12:38:18 +01:00 |
|
Sermet Pekin
|
49cd02f283
|
fix: remove unnecessary tensor allocation in DistAdamW optimizer
fix: remove unnecessary tensor allocation in DistAdamW optimizer
|
2025-10-20 12:03:26 +03:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|