Commit Graph

264 Commits

Author SHA1 Message Date
Andrej Karpathy 190d9515d0 dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports 2025-10-15 16:42:23 +00:00
Andrej Karpathy b8076dd367 fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation 2025-10-15 16:35:04 +00:00
Andrej 67aaca98f5 export NANOCHAT_BASE_DIR so child processes get it too
Export the cache directory so that users can use their own cache location
2025-10-14 16:01:28 -07:00
obxium 938cb31f1a Update logo 2025-10-14 14:19:44 -04:00
Zach Mueller f0855cbcc7 Update speedrun.sh 2025-10-14 14:12:01 -04:00
Bhaskar 02440f670d fix: return inf instead of crashing when evaluate_bpb has zero total_bytes
Edge case: all tokens are special tokens or ignored

Return infinity to indicate no meaningful bytes were processed
2025-10-14 17:21:11 +05:30
Andrej dd6ff9a1cc fix bug in fallback case of find_largest_model
Fix: Handle missing d<number> model tags in find_largest_model
ty
2025-10-13 14:38:34 -07:00
Mirza-Samad-Ahmed-Baig afaa5b4c90 Fix: Handle missing d<number> model tags in find_largest_model 2025-10-14 00:24:07 +03:00
Andrej 5fd0b13886 Merge pull request #2 from epoyraz/patch-1
Update README.md
2025-10-13 10:10:15 -07:00
Enes Poyraz 6a795baf27 Update README.md
fix typos
2025-10-13 18:40:12 +02:00
Andrej 626bd3e260 Add image of the WebUI to readme 2025-10-13 08:03:00 -07:00
karpathy da96b46565 update link to the new discussion 2025-10-13 07:42:09 -07:00
karpathy a53833d04f add nanochat logo png 2025-10-13 06:59:59 -07:00
karpathy 3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00