Andrej Karpathy
|
d6c4f3b923
|
i think this is the new torch 2.9+ API for declaring tf32 preference
|
2026-01-30 17:03:15 +00:00 |
|
Andrej Karpathy
|
f5425245f9
|
more GPU types from PR 147 thanks @Qubitium
|
2026-01-17 03:22:20 +00:00 |
|
Andrej Karpathy
|
2955650327
|
add detection of device to report more correct mfu for bf16
|
2026-01-17 03:16:14 +00:00 |
|
Dipesh Babu
|
2f2d7ab80c
|
fix: safe DDP cleanup (check initialized PG, not just env) (#256)
|
2025-12-27 20:27:40 -08:00 |
|
Andrej Karpathy
|
49389ecaa8
|
fix tf32 warning for deprecated api use
|
2025-12-27 22:03:06 +00:00 |
|
Andrej Karpathy
|
c6abcdfe3a
|
big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
|
2025-11-13 15:34:40 +00:00 |
|
Andrej
|
3a2ae631c4
|
Merge branch 'master' into master
|
2025-11-04 16:35:02 -08:00 |
|
Yasser Makram
|
1e89af9862
|
Replace fcntl with filelock for Windows compatibility
|
2025-11-04 07:22:34 +00:00 |
|
svlandeg
|
c72b8b2309
|
add explicit UTF-8 encoding
|
2025-11-03 21:27:12 +01:00 |
|
Andrej
|
b6da6982f6
|
fix nanochat logo: the t was placed too far to the right
|
2025-11-02 08:17:00 -08:00 |
|
Andrej Karpathy
|
cf587acb1a
|
move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts
|
2025-11-01 16:04:38 +00:00 |
|
svlandeg
|
b996131570
|
Merge branch 'master' into logo/kerning-update
|
2025-10-29 11:45:40 +01:00 |
|
svlandeg
|
3fa974f93c
|
few more reverts
|
2025-10-29 11:45:02 +01:00 |
|
svlandeg
|
cbd560a83d
|
revert formatting changes to minimize diff and merge conflicts
|
2025-10-29 11:42:56 +01:00 |
|
Andrej Karpathy
|
8892470f29
|
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
|
2025-10-24 14:02:48 +00:00 |
|
obxium
|
2b58e2dd2a
|
Update logo in code as well
|
2025-10-18 09:31:11 -04:00 |
|
Andrej
|
cf2baf9933
|
fix typo
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>
|
2025-10-17 08:35:41 -07:00 |
|
karpathy
|
df600b6ed5
|
many small tweaks. base, eval, core work now i think
|
2025-10-16 15:46:18 -07:00 |
|
karpathy
|
786119d593
|
add autodetect of device and related stuff. getting weird warnings/errors still, so wip
|
2025-10-16 10:26:19 -07:00 |
|
karpathy
|
306bc380ab
|
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
|
2025-10-16 10:04:43 -07:00 |
|
Andrej Karpathy
|
722da4f543
|
trying to add basic cpu support, will try mps too
|
2025-10-16 16:14:38 +00:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|