nanochat-omni

Author	SHA1	Message	Date
Andrej Karpathy	c6abcdfe3a	big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.	2025-11-13 15:34:40 +00:00
Andrej	d1558c7873	handle bf16 on MPS by casting to fp32 during load checkpoint	2025-11-04 09:42:50 -08:00
Dipesh Babu	7a40ee77b4	fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues	2025-11-03 16:00:56 -05:00
svlandeg	2ce62ec076	ensure consistency of quotes within each statement	2025-11-03 21:52:02 +01:00
svlandeg	c72b8b2309	add explicit UTF-8 encoding	2025-11-03 21:27:12 +01:00
Josh Odom	f1e15f5f4d	Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead.	2025-11-02 23:40:37 -06:00
svlandeg	5bfcd31b73	revert more formatting changes	2025-11-02 14:17:10 +01:00
svlandeg	036a3c5881	revert formatting changes to facilitate review	2025-11-02 14:16:43 +01:00
Manuel Saelices	d54c9cbf8c	CPU Support, as bfloat16 params breaks inference	2025-11-01 23:38:50 +01:00
Mirza-Samad-Ahmed-Baig	afaa5b4c90	Fix: Handle missing d<number> model tags in find_largest_model	2025-10-14 00:24:07 +03:00
karpathy	3a5e0bc50b	initial commit	2025-10-13 06:49:24 -07:00

11 Commits