Commit Graph

388 Commits

Author SHA1 Message Date
svlandeg a2fb3c83a6 fix typos 2025-11-14 11:20:25 +01:00
svlandeg e5efb4b471 add test_engine.py to file structure 2025-11-14 11:13:42 +01:00
Andrej Karpathy 9a71d13688 typo oops 2025-11-13 16:08:30 +00:00
Andrej Karpathy 7b7fd0fe71 thank you Sophie for your help with nanochat 2025-11-13 16:07:54 +00:00
Andrej Karpathy c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster. 2025-11-13 15:34:40 +00:00
Andrej Karpathy 91f09ccd0d minor fix comment in engine 2025-11-13 15:28:18 +00:00
Andrej Karpathy adb5d4a16c uv lock has to change when we removed numpy the other commit 2025-11-13 15:16:27 +00:00
howardgao@outlook.com b399e43168 fix engine test bug 2025-11-06 08:56:45 +08:00
Andrej Karpathy c6b7ab7440 grad clip logging and printing and cosmetics 2025-11-05 21:08:30 +00:00
Andrej 885a4f25e7 Replace fcntl with filelock for Windows compatibility 2025-11-04 16:35:39 -08:00
Andrej 3a2ae631c4 Merge branch 'master' into master 2025-11-04 16:35:02 -08:00
Andrej 12d995f58c Add NPROC_PER_NODE var to speedrun.sh and run1000.sh 2025-11-04 16:26:33 -08:00
svlandeg f1683c5b16 set nproc_per_node as var in speedrun and run1000 scripts 2025-11-04 21:36:10 +01:00
Andrej d1558c7873 handle bf16 on MPS by casting to fp32 during load checkpoint 2025-11-04 09:42:50 -08:00
Andrej df25293087 Add explicit UTF-8 encoding on open 2025-11-04 09:38:18 -08:00
Yasser Makram 1e89af9862 Replace fcntl with filelock for Windows compatibility 2025-11-04 07:22:34 +00:00
Dipesh Babu 7a40ee77b4 fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues 2025-11-03 16:00:56 -05:00
svlandeg 2ce62ec076 ensure consistency of quotes within each statement 2025-11-03 21:52:02 +01:00
svlandeg e22fc6f2fa few more explicit UTF-8 encodings 2025-11-03 21:46:39 +01:00
svlandeg c72b8b2309 add explicit UTF-8 encoding 2025-11-03 21:27:12 +01:00
Andrej a83646e098 fix(eval): use UTF-8 when reading CORE JSONL and writing CSV 2025-11-03 06:38:33 -08:00
Andrej 8681922328 fix lstrip bug, make it removeprefix, TIL. 2025-11-03 06:37:48 -08:00
Dipesh Babu 226953b841 fix: open JSONL and results CSV with UTF-8 encoding for portability 2025-11-03 01:20:56 -05:00
Josh Odom f1e15f5f4d Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead. 2025-11-02 23:40:37 -06:00
Andrej b6da6982f6 fix nanochat logo: the t was placed too far to the right 2025-11-02 08:17:00 -08:00
Andrej c2c4f77e22 oops small bugfix to run1000.sh missing kwarg 2025-11-02 08:14:41 -08:00
Andrej d1ac0b2d07 when loading models on CPU, convert tensors from bfloat16 to float 2025-11-02 07:58:56 -08:00
svlandeg 5bfcd31b73 revert more formatting changes 2025-11-02 14:17:10 +01:00
svlandeg 036a3c5881 revert formatting changes to facilitate review 2025-11-02 14:16:43 +01:00
svlandeg 52e85aaf80 Merge branch 'master' into fix/typo 2025-11-02 13:41:13 +01:00
Jing Zhang ba4f40bf58 Update run1000.sh to add missing --run=$WANDB_RUN 2025-11-01 21:27:00 -07:00
Manuel Saelices d54c9cbf8c CPU Support, as bfloat16 params breaks inference 2025-11-01 23:38:50 +01:00
Andrej Karpathy cf587acb1a move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts 2025-11-01 16:04:38 +00:00
Andrej Karpathy 7d2c4a3d95 delete pandas dep in base_eval use csv instead 2025-11-01 15:28:30 +00:00
Andrej ad39db5a23 tiny fix to comment
Update engine.py with correct error message on assert
2025-11-01 07:43:57 -07:00
Andrej 630f54ae5a use empty locals and globals in call to eval() in engine tool use
harden eval: prevent the calc tool from accessing globals and locals
2025-11-01 07:22:59 -07:00
Andrej Karpathy f15732524a make deepwiki link better 2025-11-01 14:13:29 +00:00
Andrej dfc88334b6 fix tok/sec calculation bug when grad accum steps > 1
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
2025-10-30 08:36:32 -07:00
Andrej eb11bb0e2e remove numpy as dep
Remove explicit numpy dependency
2025-10-30 08:28:14 -07:00
svlandeg 70319851fc fix typo 2025-10-29 19:48:34 +01:00
Andrej 1ccbaf4416 nit delete redundant catch/raise in execute
Remove redundant exception handling in chdir
2025-10-29 08:10:03 -07:00
Andrej 29ff38d94b Merge pull request #35 from bhaskar0210s/master
fix: return inf instead of crashing when evaluate_bpb has zero total_bytes
2025-10-29 08:06:24 -07:00
svlandeg b996131570 Merge branch 'master' into logo/kerning-update 2025-10-29 11:45:40 +01:00
svlandeg 3fa974f93c few more reverts 2025-10-29 11:45:02 +01:00
svlandeg cbd560a83d revert formatting changes to minimize diff and merge conflicts 2025-10-29 11:42:56 +01:00
Andrej a1de1f46ad Merge pull request #156 from tlepoint/fix/export-base-dir
Export the base dir variable in runcpu.sh
2025-10-28 15:19:08 -07:00
Andrej ee00f523d0 fixing all the typos to make the pull requests stop
Batch of typo fixes
2025-10-28 13:36:07 -07:00
Ajeesh Sunil 5e0987a431 numpy isnt acting as a dependency for nanochat, so isnt it better to remove numpy from dependencies list 2025-10-28 20:05:38 +00:00
svlandeg 8c9b004c99 typo fixes in scripts 2025-10-28 20:17:31 +01:00
svlandeg 0a3ce7b0ff typo fixes in readme 2025-10-28 20:11:00 +01:00