nanochat-omni

Author	SHA1	Message	Date
Andrej Karpathy	8892470f29	add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think	2025-10-24 14:02:48 +00:00
Andrej Karpathy	cc3636b01c	allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough	2025-10-24 13:27:05 +00:00
Luke Stanley	32571664b1	Fix Torch crash caused by pinning on CPU	2025-10-22 16:25:36 +00:00
ulanch	796f84527f	fix(ui): prevent iOS Safari toolbar from covering input on initial load	2025-10-21 17:34:40 -07:00
Andrej	2e938530ce	delete spurious torch.empty allocation in adamw fix: remove unnecessary tensor allocation in DistAdamW optimizer	2025-10-21 11:35:17 -07:00
Andrej Karpathy	a088b7a6ec	use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available	2025-10-21 18:07:33 +00:00
Andrej Karpathy	5bdc99abfb	merge and resolve conflict	2025-10-21 17:19:10 +00:00
Andrej Karpathy	dfcb1c16f1	Merge branch 'master' into cpu-mps-dev	2025-10-21 17:15:53 +00:00
Andrej Karpathy	bb71c64579	fix silly issue in dataloader, this version is much faster and more portable to mps too	2025-10-21 17:12:50 +00:00
karpathy	2e9669e03a	upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming	2025-10-20 10:15:17 -07:00
Sermet Pekin	49cd02f283	fix: remove unnecessary tensor allocation in DistAdamW optimizer fix: remove unnecessary tensor allocation in DistAdamW optimizer	2025-10-20 12:03:26 +03:00
Andrej	cf2baf9933	fix typo Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>	2025-10-17 08:35:41 -07:00
karpathy	df600b6ed5	many small tweaks. base, eval, core work now i think	2025-10-16 15:46:18 -07:00
karpathy	786119d593	add autodetect of device and related stuff. getting weird warnings/errors still, so wip	2025-10-16 10:26:19 -07:00
karpathy	306bc380ab	add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights	2025-10-16 10:04:43 -07:00
Andrej Karpathy	722da4f543	trying to add basic cpu support, will try mps too	2025-10-16 16:14:38 +00:00
Andrej Karpathy	4346536ab2	also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate	2025-10-16 01:28:37 +00:00
Andrej Karpathy	2846999b8f	allow user to click on their message to edit them. conversation after that point is wiped	2025-10-16 01:16:22 +00:00
Andrej Karpathy	92d52ecc92	add slash commands to webui	2025-10-16 01:09:53 +00:00
Andrej Karpathy	01fb290f53	allow multiple GPUs to do inference in a data parallel way	2025-10-15 19:12:19 +00:00
Mirza-Samad-Ahmed-Baig	afaa5b4c90	Fix: Handle missing d<number> model tags in find_largest_model	2025-10-14 00:24:07 +03:00
karpathy	3a5e0bc50b	initial commit	2025-10-13 06:49:24 -07:00

22 Commits