Commit Graph

171 Commits

Author SHA1 Message Date
Andrej d5759400f9 fixing two typos in comments 2025-12-08 20:03:08 -08:00
Andrej e72c3299df fix random.seed() footgun bug for SpellingBee data generation 2025-12-08 19:58:45 -08:00
Andrej 7931e0903a rename checkpoint_dir to checkpoints_dir for consistency. 2025-12-08 18:32:12 -08:00
Andrej 849d95ae1f remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer 2025-12-08 18:30:37 -08:00
Andrej 39cccc527f small bugfix make mid_train script work even with a tiny number of iterations 2025-12-08 18:27:32 -08:00
Andrej 8b1cecaa95 Apply suggestion from @svlandeg for nicer looking comparison
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2025-12-08 18:27:06 -08:00
Andrej 58f3e84e01 clean up train/val loader in sft for consistency with mid/base 2025-12-08 18:23:57 -08:00
Andrej 1b2a675c88 Improve KV cache code readability 2025-12-08 18:19:05 -08:00
Andrej d75e6ed711 Fix script comment to reference correct file 2025-12-08 18:16:42 -08:00
Andrej 72a7cf2bc4 Fix distributed Parquet dataloader resume for multi-epoch training 2025-12-08 18:15:02 -08:00
Andrej Karpathy bffdb2ef91 group common code to make things neater in gpt logit computation 2025-12-09 02:01:05 +00:00
Andrej cbf30c842c apply float32 cast before logits softcapping so the tanh is in fp32. torch compile fuses this correctly with no extra memory costs. 2025-12-08 14:17:43 -08:00
Andrej Karpathy 90442de35f fix bug where any rank has to be able to create checkpoint_dir if saving optim 2025-12-08 20:45:19 +00:00
Andrej 2fd0440355 fix: missing val_bpb on resume 2025-12-08 12:35:08 -08:00
sunyujun03 01ea71be39 Fix distributed Parquet dataloader resume for multi-epoch training 2025-12-08 00:10:19 -06:00
KimYeongHyeon a8847a0f83 Fix script comment to reference correct file 2025-12-02 10:46:20 +09:00
deepbuilder 06677c30e0 Refactor dimension validation for KV cache 2025-11-28 15:22:18 -05:00
deepbuilder a770dcef2e Fix kv_cache indexing to explicitly include head dimension 2025-11-28 15:00:14 -05:00
spjosyula 16788eed3c fix(model): apply float32 cast before logits softcapping
This change ensures that the logits softcapping operation (tanh) is performed in float32 precision rather than bfloat16. Previously, the code cast to float32 after the tanh operation, which meant the non-linearity was computed with bfloat16 precision
2025-11-23 20:12:09 +05:30
Sanzo00 53b3a4fb81 fix: missing val_bpb on resume 2025-11-22 11:04:20 +08:00
svlandeg 4bcc3bb698 clarify comment 2025-11-21 13:19:45 +01:00
Eric Silberstein f37d45c21f remove unneeded iter() 2025-11-20 15:14:56 -05:00
Eric Silberstein 5c93a56be5 remove unnecessary check 2025-11-19 16:31:41 -05:00
Eric Silberstein dddb95caac make mid_train script work even with a tiny number of iterations 2025-11-19 15:52:20 -05:00
Eric Silberstein a4a0959c73 renamed find_largest_model() argument checkpoint_dir to checkpoints_dir for clarity 2025-11-19 15:33:36 -05:00
Eric Silberstein 024781f9df fixing two typos in comments 2025-11-19 15:12:53 -05:00
Eric Silberstein 97770700f2 change test/train split approach because random.seed(1) and random.seed(-1) do the same thing 2025-11-19 14:51:02 -05:00
Andrej 4a87a0d19f Merge pull request #299 from samjabrahams/rotary_embedding_head_dim_comment_cleanup
Fix comment: rotary embeddings final dimension size
2025-11-17 13:29:21 -08:00
Sam Abrahams 11e68bf442 Fix comment: rotary embeddings final dimension size 2025-11-17 11:32:56 -05:00
Andrej Karpathy bc1fca39f3 mqa -> gqa to reduce confusion 2025-11-15 15:43:37 +00:00
Andrej f66a780f68 Fix torch.dtype mismatching when running engine inline test. 2025-11-14 07:28:29 -08:00
Andrej 4763ce612a Small fixes to typos 2025-11-14 07:25:59 -08:00
Sofie Van Landeghem c6f5bd67db revert change of base to sft for quick inline test 2025-11-14 12:20:03 +01:00
svlandeg a2fb3c83a6 fix typos 2025-11-14 11:20:25 +01:00
svlandeg e5efb4b471 add test_engine.py to file structure 2025-11-14 11:13:42 +01:00
Andrej Karpathy 9a71d13688 typo oops 2025-11-13 16:08:30 +00:00
Andrej Karpathy 7b7fd0fe71 thank you Sophie for your help with nanochat 2025-11-13 16:07:54 +00:00
Andrej Karpathy c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster. 2025-11-13 15:34:40 +00:00
Andrej Karpathy 91f09ccd0d minor fix comment in engine 2025-11-13 15:28:18 +00:00
Andrej Karpathy adb5d4a16c uv lock has to change when we removed numpy the other commit 2025-11-13 15:16:27 +00:00
howardgao@outlook.com b399e43168 fix engine test bug 2025-11-06 08:56:45 +08:00
Andrej Karpathy c6b7ab7440 grad clip logging and printing and cosmetics 2025-11-05 21:08:30 +00:00
Andrej 885a4f25e7 Replace fcntl with filelock for Windows compatibility 2025-11-04 16:35:39 -08:00
Andrej 3a2ae631c4 Merge branch 'master' into master 2025-11-04 16:35:02 -08:00
Andrej 12d995f58c Add NPROC_PER_NODE var to speedrun.sh and run1000.sh 2025-11-04 16:26:33 -08:00
svlandeg f1683c5b16 set nproc_per_node as var in speedrun and run1000 scripts 2025-11-04 21:36:10 +01:00
Andrej d1558c7873 handle bf16 on MPS by casting to fp32 during load checkpoint 2025-11-04 09:42:50 -08:00
Andrej df25293087 Add explicit UTF-8 encoding on open 2025-11-04 09:38:18 -08:00
Yasser Makram 1e89af9862 Replace fcntl with filelock for Windows compatibility 2025-11-04 07:22:34 +00:00
Dipesh Babu 7a40ee77b4 fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues 2025-11-03 16:00:56 -05:00