Commit Graph

15 Commits

Author SHA1 Message Date
Andrej 088726aa7d clean up model_tag handling across scripts a bit more. 2025-12-27 20:01:09 -08:00
Andrej Karpathy 2874eda59a update to new os env var to get rid of deprecation warning 2025-12-28 03:32:46 +00:00
duwenjie 92c6654b95 bugfix save and load ckpt from model_tag dir 2025-12-21 15:07:04 +08:00
Andrej 8b1cecaa95 Apply suggestion from @svlandeg for nicer looking comparison
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2025-12-08 18:27:06 -08:00
svlandeg 4bcc3bb698 clarify comment 2025-11-21 13:19:45 +01:00
Eric Silberstein dddb95caac make mid_train script work even with a tiny number of iterations 2025-11-19 15:52:20 -05:00
water-vapor a9de4b1038 Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1 2025-10-26 01:43:49 -05:00
Andrej Karpathy 8892470f29 add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think 2025-10-24 14:02:48 +00:00
Luke Stanley defd1246aa Fix Torch crash caused by pinning on CPU 2025-10-21 20:28:10 +00:00
Andrej Karpathy 5bdc99abfb merge and resolve conflict 2025-10-21 17:19:10 +00:00
Andrej Karpathy fe5aed940b add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok 2025-10-21 15:04:58 +00:00
karpathy 2e9669e03a upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
karpathy ae02650afe update the midtraining script too 2025-10-16 16:33:17 -07:00
Andrej Karpathy b8076dd367 fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation 2025-10-15 16:35:04 +00:00
karpathy 3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00