eb7bbc1b66
delete the configurator in favor of argparse and clean up a lot of kwarg details to make them more consistent across all scripts
Andrej Karpathy
2026-01-04 19:14:23 +00:00
507d54224a
fix small bug where this would break if git stage has deleted files
Andrej Karpathy
2026-01-04 19:11:43 +00:00
9c60dfb64c
bump nanochat to use the latest stable pytorch that is 2.9.1 . Run e.g. to re-update your local environment if you git pull
Andrej Karpathy
2026-01-04 18:36:36 +00:00
be56d29b87
simplify redundant if/elif in bloat metrics
Andrej Karpathy
2026-01-04 01:40:42 +00:00
ee79f29fbd
replace files-to-prompt with git ls-files for bloat metrics
Andrej Karpathy
2026-01-04 01:38:15 +00:00
da8b7ea4cb
also delete the rustbpe test code, this now lives in rustbpe repo that is separate
Andrej Karpathy
2026-01-04 01:23:34 +00:00
aa42f40e66
delete the inline rustbpe project. it was ugly to have a project within project and rustbpe is now nicely a separate repo on my github karpathy/rustbpe and it's on pypi etc., so we just add it as a depedency to uv. i think it is appropriate that this is a separate repo because 1) it doesn't have too many knobs, other than the ones that are exposed - the regex pattern and vocab size and 2) all of its complexity is not algorithmic (it's equivalent to minbpe), instead it is efficiency-related, so it is ok to hide relatively speaking
Andrej Karpathy
2026-01-03 23:55:28 +00:00
48abd7d85f
simplify, clarify and slightly tune model initialization. should be very slightly better possibly, but certainly a lot clearer
Andrej Karpathy
2026-01-01 21:14:26 +00:00
10231dfb40
Fix conversation scroll to bottom on some browsers + remove duplicated padding (#348)
Paweł Krefta
2025-12-31 22:03:22 +01:00
389d019a0b
small change to doc string at top of tok_train.py (#402)
helloaidank
2025-12-31 20:57:26 +00:00
8c89661465
Update README to match current d34 demo (#314) (#381)
Hossein-Lakzaei
2025-12-30 12:47:11 +03:30
8f979a8bda
fix: sample first token independently for each row in multi-sample generation
Andrej Karpathy
2025-12-28 04:52:13 +00:00
2f2d7ab80c
fix: safe DDP cleanup (check initialized PG, not just env) (#256)
Dipesh Babu
2025-12-27 23:27:40 -05:00
91d76cc690
Replace speedup assertion with warning in batch_encode test
Andrej Karpathy
2025-12-28 04:10:49 +00:00
7a8769a40c
Merge pull request #383 from barisozmen/master
Andrej
2025-12-27 20:06:57 -08:00
088726aa7d
clean up model_tag handling across scripts a bit more.
Andrej
2025-12-27 20:01:09 -08:00
2874eda59a
update to new os env var to get rid of deprecation warning
Andrej Karpathy
2025-12-28 03:32:46 +00:00
e1770a3061
remove spurious cast, gets compiled away anyway but it's confusing people
Andrej Karpathy
2025-12-27 23:07:48 +00:00
49389ecaa8
fix tf32 warning for deprecated api use
Andrej Karpathy
2025-12-27 22:03:06 +00:00
ea4229851b
bugfix
DU Wenjie
2025-12-26 17:41:57 +08:00
7840049189
bugfix keep same args style in scripts/base_eval.py
DU Wenjie
2025-12-26 17:29:08 +08:00
bc51da8bac
pad vocab size to 64 for DDP optimizers and efficiency
Andrej
2025-12-23 09:13:31 -08:00
92c6654b95
bugfix save and load ckpt from model_tag dir
duwenjie
2025-12-21 15:07:04 +08:00
790f3be65c
add rust batch encode as a faster option over encode
Barış Özmen
2025-12-18 19:17:59 +03:00
f1bf69d562
feat: pad vocab size to 64 for DDP optimizers and efficiency
Matěj Kripner
2025-12-09 12:38:18 +01:00
d5759400f9
fixing two typos in comments
Andrej
2025-12-08 20:03:08 -08:00
e72c3299df
fix random.seed() footgun bug for SpellingBee data generation
Andrej
2025-12-08 19:58:45 -08:00
7931e0903a
rename checkpoint_dir to checkpoints_dir for consistency.
Andrej
2025-12-08 18:32:12 -08:00
849d95ae1f
remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer
Andrej
2025-12-08 18:30:37 -08:00
39cccc527f
small bugfix make mid_train script work even with a tiny number of iterations
Andrej
2025-12-08 18:27:32 -08:00
8b1cecaa95
Apply suggestion from @svlandeg for nicer looking comparison
Andrej
2025-12-08 18:27:06 -08:00
58f3e84e01
clean up train/val loader in sft for consistency with mid/base
Andrej
2025-12-08 18:23:57 -08:00
1b2a675c88
Improve KV cache code readability
Andrej
2025-12-08 18:19:05 -08:00
d75e6ed711
Fix script comment to reference correct file
Andrej
2025-12-08 18:16:42 -08:00
72a7cf2bc4
Fix distributed Parquet dataloader resume for multi-epoch training
Andrej
2025-12-08 18:15:02 -08:00
bffdb2ef91
group common code to make things neater in gpt logit computation
Andrej Karpathy
2025-12-09 02:01:05 +00:00
cbf30c842c
apply float32 cast before logits softcapping so the tanh is in fp32. torch compile fuses this correctly with no extra memory costs.
Andrej
2025-12-08 14:17:43 -08:00
90442de35f
fix bug where any rank has to be able to create checkpoint_dir if saving optim
Andrej Karpathy
2025-12-08 20:45:11 +00:00
2fd0440355
fix: missing val_bpb on resume
Andrej
2025-12-08 12:35:08 -08:00
01ea71be39
Fix distributed Parquet dataloader resume for multi-epoch training
sunyujun03
2025-12-08 00:10:19 -06:00
e5efb4b471
add test_engine.py to file structure
svlandeg
2025-11-14 11:13:42 +01:00
9a71d13688
typo oops
Andrej Karpathy
2025-11-13 16:08:30 +00:00
7b7fd0fe71
thank you Sophie for your help with nanochat
Andrej Karpathy
2025-11-13 16:07:54 +00:00
c6abcdfe3a
big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
Andrej Karpathy
2025-11-13 15:34:40 +00:00
91f09ccd0d
minor fix comment in engine
Andrej Karpathy
2025-11-13 15:28:18 +00:00
adb5d4a16c
uv lock has to change when we removed numpy the other commit
Andrej Karpathy
2025-11-13 15:16:27 +00:00
b399e43168
fix engine test bug
howardgao@outlook.com
2025-11-06 08:56:45 +08:00
c6b7ab7440
grad clip logging and printing and cosmetics
Andrej Karpathy
2025-11-05 21:08:30 +00:00
885a4f25e7
Replace fcntl with filelock for Windows compatibility
Andrej
2025-11-04 16:35:39 -08:00
3a2ae631c4
Merge branch 'master' into master
Andrej
2025-11-04 16:35:02 -08:00
12d995f58c
Add NPROC_PER_NODE var to speedrun.sh and run1000.sh
Andrej
2025-11-04 16:26:33 -08:00
f1683c5b16
set nproc_per_node as var in speedrun and run1000 scripts
svlandeg
2025-11-04 21:36:10 +01:00
d1558c7873
handle bf16 on MPS by casting to fp32 during load checkpoint
Andrej
2025-11-04 09:42:50 -08:00
df25293087
Add explicit UTF-8 encoding on open
Andrej
2025-11-04 09:38:18 -08:00
1e89af9862
Replace fcntl with filelock for Windows compatibility
Yasser Makram
2025-11-04 07:22:34 +00:00
7a40ee77b4
fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues
Dipesh Babu
2025-11-03 16:00:56 -05:00
2ce62ec076
ensure consistency of quotes within each statement
svlandeg
2025-11-03 21:52:02 +01:00
e22fc6f2fa
few more explicit UTF-8 encodings
svlandeg
2025-11-03 21:46:39 +01:00
d54c9cbf8c
CPU Support, as bfloat16 params breaks inference
Manuel Saelices
2025-11-01 23:38:50 +01:00
cf587acb1a
move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts
Andrej Karpathy
2025-11-01 16:04:38 +00:00
7d2c4a3d95
delete pandas dep in base_eval use csv instead
Andrej Karpathy
2025-11-01 15:28:30 +00:00
ad39db5a23
tiny fix to comment
Andrej
2025-11-01 07:43:57 -07:00
630f54ae5a
use empty locals and globals in call to eval() in engine tool use
Andrej
2025-11-01 07:22:59 -07:00
f15732524a
make deepwiki link better
Andrej Karpathy
2025-11-01 14:13:29 +00:00
dfc88334b6
fix tok/sec calculation bug when grad accum steps > 1
Andrej
2025-10-30 08:36:32 -07:00
eb11bb0e2e
remove numpy as dep
Andrej
2025-10-30 08:28:14 -07:00