nanochat-omni

fam/nanochat-omni

Fork 0

Commit Graph

Select branches

Hide Pull Requests

main

mochi/issue-4

pre-fork

#5

eb7bbc1b66 delete the configurator in favor of argparse and clean up a lot of kwarg details to make them more consistent across all scripts Andrej Karpathy 2026-01-04 19:14:23 +00:00
507d54224a fix small bug where this would break if git stage has deleted files Andrej Karpathy 2026-01-04 19:11:43 +00:00
9c60dfb64c bump nanochat to use the latest stable pytorch that is 2.9.1 . Run e.g. to re-update your local environment if you git pull Andrej Karpathy 2026-01-04 18:36:36 +00:00
be56d29b87 simplify redundant if/elif in bloat metrics Andrej Karpathy 2026-01-04 01:40:42 +00:00
ee79f29fbd replace files-to-prompt with git ls-files for bloat metrics Andrej Karpathy 2026-01-04 01:38:15 +00:00
da8b7ea4cb also delete the rustbpe test code, this now lives in rustbpe repo that is separate Andrej Karpathy 2026-01-04 01:23:34 +00:00
aa42f40e66 delete the inline rustbpe project. it was ugly to have a project within project and rustbpe is now nicely a separate repo on my github karpathy/rustbpe and it's on pypi etc., so we just add it as a depedency to uv. i think it is appropriate that this is a separate repo because 1) it doesn't have too many knobs, other than the ones that are exposed - the regex pattern and vocab size and 2) all of its complexity is not algorithmic (it's equivalent to minbpe), instead it is efficiency-related, so it is ok to hide relatively speaking Andrej Karpathy 2026-01-03 23:55:28 +00:00
48abd7d85f simplify, clarify and slightly tune model initialization. should be very slightly better possibly, but certainly a lot clearer Andrej Karpathy 2026-01-01 21:14:26 +00:00
10231dfb40 Fix conversation scroll to bottom on some browsers + remove duplicated padding (#348) Paweł Krefta 2025-12-31 22:03:22 +01:00
389d019a0b small change to doc string at top of tok_train.py (#402) helloaidank 2025-12-31 20:57:26 +00:00
8c89661465 Update README to match current d34 demo (#314) (#381) Hossein-Lakzaei 2025-12-30 12:47:11 +03:30
8f979a8bda fix: sample first token independently for each row in multi-sample generation Andrej Karpathy 2025-12-28 04:52:13 +00:00
2f2d7ab80c fix: safe DDP cleanup (check initialized PG, not just env) (#256) Dipesh Babu 2025-12-27 23:27:40 -05:00
91d76cc690 Replace speedup assertion with warning in batch_encode test Andrej Karpathy 2025-12-28 04:10:49 +00:00
7a8769a40c Merge pull request #383 from barisozmen/master Andrej 2025-12-27 20:06:57 -08:00
088726aa7d clean up model_tag handling across scripts a bit more. Andrej 2025-12-27 20:01:09 -08:00
2874eda59a update to new os env var to get rid of deprecation warning Andrej Karpathy 2025-12-28 03:32:46 +00:00
e1770a3061 remove spurious cast, gets compiled away anyway but it's confusing people Andrej Karpathy 2025-12-27 23:07:48 +00:00
49389ecaa8 fix tf32 warning for deprecated api use Andrej Karpathy 2025-12-27 22:03:06 +00:00
ea4229851b bugfix DU Wenjie 2025-12-26 17:41:57 +08:00
7840049189 bugfix keep same args style in scripts/base_eval.py DU Wenjie 2025-12-26 17:29:08 +08:00
bc51da8bac pad vocab size to 64 for DDP optimizers and efficiency Andrej 2025-12-23 09:13:31 -08:00
92c6654b95 bugfix save and load ckpt from model_tag dir duwenjie 2025-12-21 15:07:04 +08:00
790f3be65c add rust batch encode as a faster option over encode Barış Özmen 2025-12-18 19:17:59 +03:00
d314e96aa2 formatting Matěj Kripner 2025-12-09 12:48:46 +01:00
bbc57da7d5 slightly nicer error message Matěj Kripner 2025-12-09 12:46:48 +01:00
f1bf69d562 feat: pad vocab size to 64 for DDP optimizers and efficiency Matěj Kripner 2025-12-09 12:38:18 +01:00
d5759400f9 fixing two typos in comments Andrej 2025-12-08 20:03:08 -08:00
e72c3299df fix random.seed() footgun bug for SpellingBee data generation Andrej 2025-12-08 19:58:45 -08:00
7931e0903a rename checkpoint_dir to checkpoints_dir for consistency. Andrej 2025-12-08 18:32:12 -08:00
849d95ae1f remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer Andrej 2025-12-08 18:30:37 -08:00
39cccc527f small bugfix make mid_train script work even with a tiny number of iterations Andrej 2025-12-08 18:27:32 -08:00
8b1cecaa95 Apply suggestion from @svlandeg for nicer looking comparison Andrej 2025-12-08 18:27:06 -08:00
58f3e84e01 clean up train/val loader in sft for consistency with mid/base Andrej 2025-12-08 18:23:57 -08:00
1b2a675c88 Improve KV cache code readability Andrej 2025-12-08 18:19:05 -08:00
d75e6ed711 Fix script comment to reference correct file Andrej 2025-12-08 18:16:42 -08:00
72a7cf2bc4 Fix distributed Parquet dataloader resume for multi-epoch training Andrej 2025-12-08 18:15:02 -08:00
bffdb2ef91 group common code to make things neater in gpt logit computation Andrej Karpathy 2025-12-09 02:01:05 +00:00
cbf30c842c apply float32 cast before logits softcapping so the tanh is in fp32. torch compile fuses this correctly with no extra memory costs. Andrej 2025-12-08 14:17:43 -08:00
90442de35f fix bug where any rank has to be able to create checkpoint_dir if saving optim Andrej Karpathy 2025-12-08 20:45:11 +00:00
2fd0440355 fix: missing val_bpb on resume Andrej 2025-12-08 12:35:08 -08:00
01ea71be39 Fix distributed Parquet dataloader resume for multi-epoch training sunyujun03 2025-12-08 00:10:19 -06:00
a8847a0f83 Fix script comment to reference correct file KimYeongHyeon 2025-12-02 10:46:20 +09:00
06677c30e0 Refactor dimension validation for KV cache deepbuilder 2025-11-28 15:22:18 -05:00
a770dcef2e Fix kv_cache indexing to explicitly include head dimension deepbuilder 2025-11-28 15:00:14 -05:00
16788eed3c fix(model): apply float32 cast before logits softcapping spjosyula 2025-11-23 20:12:09 +05:30
53b3a4fb81 fix: missing val_bpb on resume Sanzo00 2025-11-22 11:04:20 +08:00
4bcc3bb698 clarify comment svlandeg 2025-11-21 13:19:45 +01:00
f37d45c21f remove unneeded iter() Eric Silberstein 2025-11-20 15:14:56 -05:00
5c93a56be5 remove unnecessary check Eric Silberstein 2025-11-19 16:31:41 -05:00
dddb95caac make mid_train script work even with a tiny number of iterations Eric Silberstein 2025-11-19 15:52:20 -05:00
a4a0959c73 renamed find_largest_model() argument checkpoint_dir to checkpoints_dir for clarity Eric Silberstein 2025-11-19 15:33:36 -05:00
024781f9df fixing two typos in comments Eric Silberstein 2025-11-19 15:12:53 -05:00
97770700f2 change test/train split approach because random.seed(1) and random.seed(-1) do the same thing Eric Silberstein 2025-11-19 14:51:02 -05:00
4a87a0d19f Merge pull request #299 from samjabrahams/rotary_embedding_head_dim_comment_cleanup Andrej 2025-11-17 13:29:21 -08:00
11e68bf442 Fix comment: rotary embeddings final dimension size Sam Abrahams 2025-11-17 11:32:56 -05:00
bc1fca39f3 mqa -> gqa to reduce confusion Andrej Karpathy 2025-11-15 15:43:37 +00:00
f66a780f68 Fix torch.dtype mismatching when running engine inline test. Andrej 2025-11-14 07:28:29 -08:00
4763ce612a Small fixes to typos Andrej 2025-11-14 07:25:59 -08:00
c6f5bd67db revert change of base to sft for quick inline test Sofie Van Landeghem 2025-11-14 12:20:03 +01:00
a2fb3c83a6 fix typos svlandeg 2025-11-14 11:20:25 +01:00
e5efb4b471 add test_engine.py to file structure svlandeg 2025-11-14 11:13:42 +01:00
9a71d13688 typo oops Andrej Karpathy 2025-11-13 16:08:30 +00:00
7b7fd0fe71 thank you Sophie for your help with nanochat Andrej Karpathy 2025-11-13 16:07:54 +00:00
c6abcdfe3a big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster. Andrej Karpathy 2025-11-13 15:34:40 +00:00
91f09ccd0d minor fix comment in engine Andrej Karpathy 2025-11-13 15:28:18 +00:00
adb5d4a16c uv lock has to change when we removed numpy the other commit Andrej Karpathy 2025-11-13 15:16:27 +00:00
b399e43168 fix engine test bug howardgao@outlook.com 2025-11-06 08:56:45 +08:00
c6b7ab7440 grad clip logging and printing and cosmetics Andrej Karpathy 2025-11-05 21:08:30 +00:00
885a4f25e7 Replace fcntl with filelock for Windows compatibility Andrej 2025-11-04 16:35:39 -08:00
3a2ae631c4 Merge branch 'master' into master Andrej 2025-11-04 16:35:02 -08:00
12d995f58c Add NPROC_PER_NODE var to speedrun.sh and run1000.sh Andrej 2025-11-04 16:26:33 -08:00
f1683c5b16 set nproc_per_node as var in speedrun and run1000 scripts svlandeg 2025-11-04 21:36:10 +01:00
d1558c7873 handle bf16 on MPS by casting to fp32 during load checkpoint Andrej 2025-11-04 09:42:50 -08:00
df25293087 Add explicit UTF-8 encoding on open Andrej 2025-11-04 09:38:18 -08:00
1e89af9862 Replace fcntl with filelock for Windows compatibility Yasser Makram 2025-11-04 07:22:34 +00:00
7a40ee77b4 fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues Dipesh Babu 2025-11-03 16:00:56 -05:00
2ce62ec076 ensure consistency of quotes within each statement svlandeg 2025-11-03 21:52:02 +01:00
e22fc6f2fa few more explicit UTF-8 encodings svlandeg 2025-11-03 21:46:39 +01:00
c72b8b2309 add explicit UTF-8 encoding svlandeg 2025-11-03 21:27:12 +01:00
a83646e098 fix(eval): use UTF-8 when reading CORE JSONL and writing CSV Andrej 2025-11-03 06:38:33 -08:00
8681922328 fix lstrip bug, make it removeprefix, TIL. Andrej 2025-11-03 06:37:48 -08:00
226953b841 fix: open JSONL and results CSV with UTF-8 encoding for portability Dipesh Babu 2025-11-03 01:20:56 -05:00
f1e15f5f4d Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead. Josh Odom 2025-11-02 23:40:37 -06:00
b6da6982f6 fix nanochat logo: the t was placed too far to the right Andrej 2025-11-02 08:17:00 -08:00
c2c4f77e22 oops small bugfix to run1000.sh missing kwarg Andrej 2025-11-02 08:14:41 -08:00
d1ac0b2d07 when loading models on CPU, convert tensors from bfloat16 to float Andrej 2025-11-02 07:58:56 -08:00
5bfcd31b73 revert more formatting changes svlandeg 2025-11-02 14:17:10 +01:00
036a3c5881 revert formatting changes to facilitate review svlandeg 2025-11-02 14:16:43 +01:00
52e85aaf80 Merge branch 'master' into fix/typo svlandeg 2025-11-02 13:41:13 +01:00
ba4f40bf58 Update run1000.sh to add missing --run=$WANDB_RUN Jing Zhang 2025-11-01 21:27:00 -07:00
d54c9cbf8c CPU Support, as bfloat16 params breaks inference Manuel Saelices 2025-11-01 23:38:50 +01:00
cf587acb1a move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts Andrej Karpathy 2025-11-01 16:04:38 +00:00
7d2c4a3d95 delete pandas dep in base_eval use csv instead Andrej Karpathy 2025-11-01 15:28:30 +00:00
ad39db5a23 tiny fix to comment Andrej 2025-11-01 07:43:57 -07:00
630f54ae5a use empty locals and globals in call to eval() in engine tool use Andrej 2025-11-01 07:22:59 -07:00
f15732524a make deepwiki link better Andrej Karpathy 2025-11-01 14:13:29 +00:00
dfc88334b6 fix tok/sec calculation bug when grad accum steps > 1 Andrej 2025-10-30 08:36:32 -07:00
eb11bb0e2e remove numpy as dep Andrej 2025-10-30 08:28:14 -07:00
70319851fc fix typo svlandeg 2025-10-29 19:48:34 +01:00

1 2 3 4

Commit Graph Select branches Hide Pull Requests main mochi/issue-4 pre-fork #5 Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

mochi/issue-4

pre-fork

#5