25 Commits

Author SHA1 Message Date
Andrej Karpathy 1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
George Shakan 0a23f87643 Fix bug in setting precision (#538) 2026-02-18 07:42:11 -08:00
Sofie Van Landeghem 4d6415b8ef use _PEAK_FLOPS_TABLE instead of if-else structure (#479) 2026-01-31 19:45:06 -08:00
Andrej Karpathy d6c4f3b923 i think this is the new torch 2.9+ API for declaring tf32 preference 2026-01-30 17:03:15 +00:00
Andrej Karpathy f5425245f9 more GPU types from PR 147 thanks @Qubitium 2026-01-17 03:22:20 +00:00
Andrej Karpathy 2955650327 add detection of device to report more correct mfu for bf16 2026-01-17 03:16:14 +00:00
Dipesh Babu 2f2d7ab80c fix: safe DDP cleanup (check initialized PG, not just env) (#256) 2025-12-27 20:27:40 -08:00
Andrej Karpathy 49389ecaa8 fix tf32 warning for deprecated api use 2025-12-27 22:03:06 +00:00
Andrej Karpathy c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster. 2025-11-13 15:34:40 +00:00
Andrej 3a2ae631c4 Merge branch 'master' into master 2025-11-04 16:35:02 -08:00
Yasser Makram 1e89af9862 Replace fcntl with filelock for Windows compatibility 2025-11-04 07:22:34 +00:00
svlandeg c72b8b2309 add explicit UTF-8 encoding 2025-11-03 21:27:12 +01:00
Andrej b6da6982f6 fix nanochat logo: the t was placed too far to the right 2025-11-02 08:17:00 -08:00
Andrej Karpathy cf587acb1a move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts 2025-11-01 16:04:38 +00:00
svlandeg b996131570 Merge branch 'master' into logo/kerning-update 2025-10-29 11:45:40 +01:00
svlandeg 3fa974f93c few more reverts 2025-10-29 11:45:02 +01:00
svlandeg cbd560a83d revert formatting changes to minimize diff and merge conflicts 2025-10-29 11:42:56 +01:00
Andrej Karpathy 8892470f29 add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think 2025-10-24 14:02:48 +00:00
obxium 2b58e2dd2a Update logo in code as well 2025-10-18 09:31:11 -04:00
Andrej cf2baf9933 fix typo
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>
2025-10-17 08:35:41 -07:00
karpathy df600b6ed5 many small tweaks. base, eval, core work now i think 2025-10-16 15:46:18 -07:00
karpathy 786119d593 add autodetect of device and related stuff. getting weird warnings/errors still, so wip 2025-10-16 10:26:19 -07:00
karpathy 306bc380ab add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights 2025-10-16 10:04:43 -07:00
Andrej Karpathy 722da4f543 trying to add basic cpu support, will try mps too 2025-10-16 16:14:38 +00:00
karpathy 3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00