nanochat-omni

Author	SHA1	Message	Date
Andrej	cbf30c842c	apply float32 cast before logits softcapping so the tanh is in fp32. torch compile fuses this correctly with no extra memory costs.	2025-12-08 14:17:43 -08:00
Andrej Karpathy	90442de35f	fix bug where any rank has to be able to create checkpoint_dir if saving optim	2025-12-08 20:45:19 +00:00
spjosyula	16788eed3c	fix(model): apply float32 cast before logits softcapping This change ensures that the logits softcapping operation (tanh) is performed in float32 precision rather than bfloat16. Previously, the code cast to float32 after the tanh operation, which meant the non-linearity was computed with bfloat16 precision	2025-11-23 20:12:09 +05:30
Sam Abrahams	11e68bf442	Fix comment: rotary embeddings final dimension size	2025-11-17 11:32:56 -05:00
Andrej Karpathy	bc1fca39f3	mqa -> gqa to reduce confusion	2025-11-15 15:43:37 +00:00
Andrej	f66a780f68	Fix torch.dtype mismatching when running engine inline test.	2025-11-14 07:28:29 -08:00
Andrej	4763ce612a	Small fixes to typos	2025-11-14 07:25:59 -08:00
Sofie Van Landeghem	c6f5bd67db	revert change of base to sft for quick inline test	2025-11-14 12:20:03 +01:00
svlandeg	a2fb3c83a6	fix typos	2025-11-14 11:20:25 +01:00
Andrej Karpathy	c6abcdfe3a	big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.	2025-11-13 15:34:40 +00:00
Andrej Karpathy	91f09ccd0d	minor fix comment in engine	2025-11-13 15:28:18 +00:00
howardgao@outlook.com	b399e43168	fix engine test bug	2025-11-06 08:56:45 +08:00
Andrej	3a2ae631c4	Merge branch 'master' into master	2025-11-04 16:35:02 -08:00
Andrej	d1558c7873	handle bf16 on MPS by casting to fp32 during load checkpoint	2025-11-04 09:42:50 -08:00
Yasser Makram	1e89af9862	Replace fcntl with filelock for Windows compatibility	2025-11-04 07:22:34 +00:00
Dipesh Babu	7a40ee77b4	fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues	2025-11-03 16:00:56 -05:00
svlandeg	2ce62ec076	ensure consistency of quotes within each statement	2025-11-03 21:52:02 +01:00
svlandeg	c72b8b2309	add explicit UTF-8 encoding	2025-11-03 21:27:12 +01:00
Josh Odom	f1e15f5f4d	Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead.	2025-11-02 23:40:37 -06:00
Andrej	b6da6982f6	fix nanochat logo: the t was placed too far to the right	2025-11-02 08:17:00 -08:00
Andrej	d1ac0b2d07	when loading models on CPU, convert tensors from bfloat16 to float	2025-11-02 07:58:56 -08:00
svlandeg	5bfcd31b73	revert more formatting changes	2025-11-02 14:17:10 +01:00
svlandeg	036a3c5881	revert formatting changes to facilitate review	2025-11-02 14:16:43 +01:00
Manuel Saelices	d54c9cbf8c	CPU Support, as bfloat16 params breaks inference	2025-11-01 23:38:50 +01:00
Andrej Karpathy	cf587acb1a	move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts	2025-11-01 16:04:38 +00:00
Andrej	ad39db5a23	tiny fix to comment Update engine.py with correct error message on assert	2025-11-01 07:43:57 -07:00
Andrej	630f54ae5a	use empty locals and globals in call to eval() in engine tool use harden eval: prevent the calc tool from accessing globals and locals	2025-11-01 07:22:59 -07:00
Andrej	1ccbaf4416	nit delete redundant catch/raise in execute Remove redundant exception handling in chdir	2025-10-29 08:10:03 -07:00
Andrej	29ff38d94b	Merge pull request #35 from bhaskar0210s/master fix: return inf instead of crashing when evaluate_bpb has zero total_bytes	2025-10-29 08:06:24 -07:00
svlandeg	b996131570	Merge branch 'master' into logo/kerning-update	2025-10-29 11:45:40 +01:00
svlandeg	3fa974f93c	few more reverts	2025-10-29 11:45:02 +01:00
svlandeg	cbd560a83d	revert formatting changes to minimize diff and merge conflicts	2025-10-29 11:42:56 +01:00
Haowei Zhang	2b9c085559	update the kv_shape	2025-10-27 02:47:13 -07:00
Haowei Zhang	b062b422ac	Fix kv cache, given resize will destroys the logical structure	2025-10-27 02:23:08 -07:00
Marius Wachtler	fca2b8cd07	harden eval: prevent the calc tool from accessing globals and locals By passing empty globals() and locals() to eval() we can prevent simple malicious cases where the user gets the model to output something like ```<global variable/func> or "a".count("a")``` e.g. ```signal.raise_signal(9) or "a".count("a")``` which would kill the process. or one could maybe get it to output secrets etc. I think to make it 100% secure one would need to parse the AST and only execute secure nodes but this should make it much more robust.	2025-10-24 14:41:12 -05:00
Andrej Karpathy	8892470f29	add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think	2025-10-24 14:02:48 +00:00
Andrej Karpathy	cc3636b01c	allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough	2025-10-24 13:27:05 +00:00
Luke Stanley	32571664b1	Fix Torch crash caused by pinning on CPU	2025-10-22 16:25:36 +00:00
ulanch	796f84527f	fix(ui): prevent iOS Safari toolbar from covering input on initial load	2025-10-21 17:34:40 -07:00
Andrej	2e938530ce	delete spurious torch.empty allocation in adamw fix: remove unnecessary tensor allocation in DistAdamW optimizer	2025-10-21 11:35:17 -07:00
Andrej Karpathy	a088b7a6ec	use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available	2025-10-21 18:07:33 +00:00
Andrej Karpathy	5bdc99abfb	merge and resolve conflict	2025-10-21 17:19:10 +00:00
Andrej Karpathy	dfcb1c16f1	Merge branch 'master' into cpu-mps-dev	2025-10-21 17:15:53 +00:00
Andrej Karpathy	bb71c64579	fix silly issue in dataloader, this version is much faster and more portable to mps too	2025-10-21 17:12:50 +00:00
karpathy	2e9669e03a	upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming	2025-10-20 10:15:17 -07:00
Sermet Pekin	49cd02f283	fix: remove unnecessary tensor allocation in DistAdamW optimizer fix: remove unnecessary tensor allocation in DistAdamW optimizer	2025-10-20 12:03:26 +03:00
obxium	2b58e2dd2a	Update logo in code as well	2025-10-18 09:31:11 -04:00
Andrej	cf2baf9933	fix typo Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>	2025-10-17 08:35:41 -07:00
Phúc H. Lê Khắc	ed519b0f24	Update engine.py with correct error message on assert	2025-10-17 17:21:25 +07:00
karpathy	df600b6ed5	many small tweaks. base, eval, core work now i think	2025-10-16 15:46:18 -07:00

1 2

61 Commits