Commit Graph

302 Commits

Author SHA1 Message Date
Andrej Karpathy fe55b092b8 minor cosmetics for the table 2026-02-03 21:05:28 +00:00
Andrej Karpathy a67eba35dc add feb2 new leaderboard record from upgrading to fp8 training, +4.3% speedup to time to GPT-2 2026-02-03 21:03:42 +00:00
Andrej Karpathy 6079f78fc3 add fp8 training with torchao 2026-02-03 21:03:42 +00:00
Andrej Karpathy 8ebc14b348 small touchups to the eval script, re-order items etc, cosmetic 2026-02-03 21:03:42 +00:00
Sofie Van Landeghem 72b9064f9d remove leftover mid references (#491) 2026-02-02 08:33:46 -08:00
Andrej Karpathy b19b4f3e49 fix bug in speedrun script, batch size that doesn't OOM on 8XH100 for d24 is 16 2026-02-02 15:50:14 +00:00
Andrej Karpathy 230d6cf6c6 tune the synthetic data generation script. delete the king andrej stuff lol. also, upgrade to gemini 3 2026-02-02 01:45:59 +00:00
Andrej Karpathy 07c4dd4cd9 manually control the over-active garbage collector, save a small few minutes from a typical run 2026-02-02 01:44:30 +00:00
Andrej Karpathy e8fec97d4c slightly more efficient dataloader that reduces the number of python objects flying around and causing strain on runtime and garbage collector 2026-02-02 01:17:30 +00:00
Andrej Karpathy 8b4849d548 fix bug in chat_sft, the attention window must be preserved sigh 2026-02-01 20:58:44 +00:00
Andrej Karpathy eaf49a33c8 fix path which i think was modified during the refactor and this is a bug introduced by claude i believe 2026-02-01 20:15:19 +00:00
Andrej Karpathy 31b61d2d17 fix broken import sigh 2026-02-01 05:03:44 +00:00
Sofie Van Landeghem 4d6415b8ef use _PEAK_FLOPS_TABLE instead of if-else structure (#479) 2026-01-31 19:45:06 -08:00
Sofie Van Landeghem 43078c347e clean up original tokenizing_distributed_data_loader (#478) 2026-01-31 19:44:12 -08:00
Franci Penov dc291c627f Add Blackwell (SM100) GPU support via SDPA fallback (#475) 2026-01-31 19:42:58 -08:00
Andrej Karpathy 0307997f9b merge two files base_loss and base_eval into a single file, it's nicer this way, and unify the huggingface code associated with both 2026-02-01 02:36:43 +00:00
Andrej Karpathy 1ddaad1c1c nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1 2026-01-31 19:12:25 +00:00
Andrej Karpathy 348fbb301b fix dataloader for midtrain to never crop data. we can't just throw it away like we do in pretraining 2026-01-31 18:21:36 +00:00
Andrej Karpathy 3c3a3d7042 warmdown of 0.5 is slightly better: 2026-01-31 01:08:44 +00:00
Andrei Panferov 4d8dbaf6e0 Fix escape character in README bibtex entry (#454) 2026-01-30 09:34:02 -08:00
Andrej Karpathy 3ba42e8135 Fix SDPA KV-cache decode to respect sliding window (#456)
SDPA fallback now respects sliding window during single-token KV-cache
decode by slicing K/V to the last (window + 1) tokens.

Also simplifies the mask building for chunk inference to properly apply
sliding window in that path as well.

Fixes #452

Co-Authored-By: Kartik Vashishta <kartikv776@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 17:32:12 +00:00
Aarushi Singh ace6740bdd feat: allow top_k=0 in web api to disable filtering (#458)
* allow top_k=0 in web api to disable filtering

* adding a comment for clear reasoning

* adding change to docstring
2026-01-30 09:21:41 -08:00
Harsh Gupta 2e17723817 Fix generate() crash when top_k=0 (#467)
Prevent a crash in generate() by skipping top-k filtering when top_k is set to 0
2026-01-30 09:21:02 -08:00
Andrej Karpathy 02baa15405 i am feeling in a delete mood today. i need to delete a lot of code. there is too much code and surface area and complexity. ew 2026-01-30 17:08:53 +00:00
Andrej Karpathy d6c4f3b923 i think this is the new torch 2.9+ API for declaring tf32 preference 2026-01-30 17:03:15 +00:00
Andrej Karpathy 067daa7758 small fix cpu script ty PR #474 2026-01-30 02:11:25 +00:00
Andrej Karpathy 6a341f2ecf contiguous views and single HtoD transfer for inputs/targets much cleaner 2026-01-30 00:23:01 +00:00
Andrej Karpathy ebd4d9bbf5 tried muonh, appealing but didn't work out of the box 2026-01-29 19:01:36 +00:00
Andrej Karpathy 41bb2eac32 Combine AdamW and Muon into single MuonAdamW optimizer, cleaner, ty @chrisjmccormick for idea/help 2026-01-29 00:52:08 +00:00
Andrej Karpathy 64a651a63c include .claude is ok 2026-01-29 00:35:02 +00:00
Andrej Karpathy 65df0de42b add arxiv reading skill 2026-01-29 00:34:24 +00:00
Andrej Karpathy 74554be3b5 revert engram, not seeing an improvement at larger scale 2026-01-28 20:07:39 +00:00
Sofie Van Landeghem d5418ea5a1 Fix link to DeepSeek Engram paper (#470)
* Fix link to DeepSeek Engram paper in LOG.md

Updated link to the DeepSeek Engram paper in the log.

* remove www
2026-01-28 08:31:44 -08:00
Andrej Karpathy c88bbf8133 Merge branch 'engram' 2026-01-27 22:33:16 +00:00
Andrej Karpathy c8d93beed2 add engram-lite, add log, tune scaling laws analysis scripts 2026-01-27 22:31:17 +00:00
Andrej Karpathy 8630d32be4 quick fix to not OOM main speedrun script 2026-01-26 22:31:42 +00:00
Andrej Karpathy 59e36cc727 first version of engram following modded nanogpt style 2026-01-25 18:59:51 +00:00
Andrej Karpathy 85b3e95e09 320 experiments just to tune the adam beta1 of x0 a little bit up from 0.8 to 0.96 2026-01-25 00:04:02 +00:00
xiayan0118 6a477eedbd fix: pass device_type to compute_init in engine.__main__ (#451)
When running engine.py directly on non-GPU devices (CPU, MPS),
compute_init() needs the device_type parameter to initialize correctly.
This fixes failures on machines without CUDA support.
2026-01-19 17:19:51 -08:00
Andrej Karpathy 63bb5831e2 something i've wanted to do for a while - move all .sh runs to their own directory so they don't pollute root dir 2026-01-18 15:27:41 +00:00
Andrej Karpathy a91743c168 Merge branch 've' 2026-01-18 15:14:39 +00:00
Andrej Karpathy d58fcd9d73 log for jan 17 2026-01-18 03:01:17 +00:00
Andrej Karpathy babde18ce1 small tweaks 2026-01-18 03:00:38 +00:00
Andrej Karpathy cf5c9e5b8e resolve a crash for odd depths because FA3 needs head_dim % 8 == 0 2026-01-18 00:07:08 +00:00
Andrej Karpathy 413e91aa0f optimal ratio is now around 4 2026-01-17 23:51:09 +00:00
Andrej Karpathy e7ed2082b8 update the default GPTConfig kwargs otherwise they are confusing 2026-01-17 21:16:46 +00:00
karpathy f9a7e0f111 update the CPU/MPS script to give reasonable results. The model can at least answer that Paris is the capital of France and knows that the sky is blue, for about 40 minutes of training on my macbook. Also fixed a bug that existed due to KVCache bfloat16 dtype assumption 2026-01-17 12:27:30 -08:00
Andrej Karpathy f5425245f9 more GPU types from PR 147 thanks @Qubitium 2026-01-17 03:22:20 +00:00
Andrej Karpathy 2955650327 add detection of device to report more correct mfu for bf16 2026-01-17 03:16:14 +00:00
Yury Kirpichev 77a46902e4 Fix WANDB_RUN parameter passing in runcpu.sh (#407)
- Add --run=$WANDB_RUN to base_train, mid_train, and chat_sft calls
- Ensures wandb logging works when WANDB_RUN environment variable is set
- Matches the behavior in speedrun.sh

Co-authored-by: svlandeg <svlandeg@github.com>
2026-01-16 18:59:44 -08:00