Commit Graph

121 Commits

Author SHA1 Message Date
Andrej Karpathy a088b7a6ec use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available 2025-10-21 18:07:33 +00:00
Andrej Karpathy 5bdc99abfb merge and resolve conflict 2025-10-21 17:19:10 +00:00
Andrej Karpathy dfcb1c16f1 Merge branch 'master' into cpu-mps-dev 2025-10-21 17:15:53 +00:00
Andrej Karpathy bb71c64579 fix silly issue in dataloader, this version is much faster and more portable to mps too 2025-10-21 17:12:50 +00:00
karpathy 2e9669e03a upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
Sermet Pekin 49cd02f283 fix: remove unnecessary tensor allocation in DistAdamW optimizer
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-20 12:03:26 +03:00
obxium 2b58e2dd2a Update logo in code as well 2025-10-18 09:31:11 -04:00
Andrej cf2baf9933 fix typo
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>
2025-10-17 08:35:41 -07:00
Phúc H. Lê Khắc ed519b0f24 Update engine.py with correct error message on assert 2025-10-17 17:21:25 +07:00
karpathy df600b6ed5 many small tweaks. base, eval, core work now i think 2025-10-16 15:46:18 -07:00
karpathy 786119d593 add autodetect of device and related stuff. getting weird warnings/errors still, so wip 2025-10-16 10:26:19 -07:00
karpathy 306bc380ab add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights 2025-10-16 10:04:43 -07:00
Andrej Karpathy 722da4f543 trying to add basic cpu support, will try mps too 2025-10-16 16:14:38 +00:00
Ram Rachum 1f7ee5d3ce Remove redundant exception handling in chdir 2025-10-16 15:40:10 +03:00
Andrej Karpathy 4346536ab2 also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate 2025-10-16 01:28:37 +00:00
Andrej Karpathy 2846999b8f allow user to click on their message to edit them. conversation after that point is wiped 2025-10-16 01:16:22 +00:00
Andrej Karpathy 92d52ecc92 add slash commands to webui 2025-10-16 01:09:53 +00:00
Andrej Karpathy 01fb290f53 allow multiple GPUs to do inference in a data parallel way 2025-10-15 19:12:19 +00:00
Bhaskar 02440f670d fix: return inf instead of crashing when evaluate_bpb has zero total_bytes
Edge case: all tokens are special tokens or ignored

Return infinity to indicate no meaningful bytes were processed
2025-10-14 17:21:11 +05:30
Mirza-Samad-Ahmed-Baig afaa5b4c90 Fix: Handle missing d<number> model tags in find_largest_model 2025-10-14 00:24:07 +03:00
karpathy 3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00