nanochat-omni/scripts at 348fbb301b8b709ad5d59bdf69e99a51982f594a - nanochat-omni - Gitea: Git with a cup of tea

fam/nanochat-omni

Files

T

History

Andrej Karpathy 348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining

2026-01-31 18:21:36 +00:00

..

base_eval.py

bugfix

2025-12-26 19:02:12 +08:00

base_loss.py

update the CPU/MPS script to give reasonable results. The model can at least answer that Paris is the capital of France and knows that the sky is blue, for about 40 minutes of training on my macbook. Also fixed a bug that existed due to KVCache bfloat16 dtype assumption

2026-01-17 12:27:30 -08:00

base_train.py

warmdown of 0.5 is slightly better:

2026-01-31 01:08:44 +00:00

chat_cli.py

upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming

2025-10-20 10:15:17 -07:00

chat_eval.py

Fix args in readme (#438 )

2026-01-15 16:26:38 -08:00

chat_rl.py

Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help

2026-01-29 00:52:08 +00:00

chat_sft.py

Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help

2026-01-29 00:52:08 +00:00

chat_web.py

feat: allow top_k=0 in web api to disable filtering (#458 )

2026-01-30 09:21:41 -08:00

mid_train.py

fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining

2026-01-31 18:21:36 +00:00

tok_eval.py

initial commit

2025-10-13 06:49:24 -07:00

tok_train.py

quick fix to not OOM main speedrun script

2026-01-26 22:31:42 +00:00