Andrej Karpathy
|
8892470f29
|
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
|
2025-10-24 14:02:48 +00:00 |
|
Andrej Karpathy
|
cc3636b01c
|
allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough
|
2025-10-24 13:27:05 +00:00 |
|
Luke Stanley
|
32571664b1
|
Fix Torch crash caused by pinning on CPU
|
2025-10-22 16:25:36 +00:00 |
|
ulanch
|
796f84527f
|
fix(ui): prevent iOS Safari toolbar from covering input on initial load
|
2025-10-21 17:34:40 -07:00 |
|
Andrej
|
2e938530ce
|
delete spurious torch.empty allocation in adamw
fix: remove unnecessary tensor allocation in DistAdamW optimizer
|
2025-10-21 11:35:17 -07:00 |
|
Andrej Karpathy
|
a088b7a6ec
|
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
|
2025-10-21 18:07:33 +00:00 |
|
Andrej Karpathy
|
5bdc99abfb
|
merge and resolve conflict
|
2025-10-21 17:19:10 +00:00 |
|
Andrej Karpathy
|
dfcb1c16f1
|
Merge branch 'master' into cpu-mps-dev
|
2025-10-21 17:15:53 +00:00 |
|
Andrej Karpathy
|
bb71c64579
|
fix silly issue in dataloader, this version is much faster and more portable to mps too
|
2025-10-21 17:12:50 +00:00 |
|
karpathy
|
2e9669e03a
|
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
|
2025-10-20 10:15:17 -07:00 |
|
Sermet Pekin
|
49cd02f283
|
fix: remove unnecessary tensor allocation in DistAdamW optimizer
fix: remove unnecessary tensor allocation in DistAdamW optimizer
|
2025-10-20 12:03:26 +03:00 |
|
Andrej
|
cf2baf9933
|
fix typo
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>
|
2025-10-17 08:35:41 -07:00 |
|
karpathy
|
df600b6ed5
|
many small tweaks. base, eval, core work now i think
|
2025-10-16 15:46:18 -07:00 |
|
karpathy
|
786119d593
|
add autodetect of device and related stuff. getting weird warnings/errors still, so wip
|
2025-10-16 10:26:19 -07:00 |
|
karpathy
|
306bc380ab
|
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
|
2025-10-16 10:04:43 -07:00 |
|
Andrej Karpathy
|
722da4f543
|
trying to add basic cpu support, will try mps too
|
2025-10-16 16:14:38 +00:00 |
|
Andrej Karpathy
|
4346536ab2
|
also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
|
2025-10-16 01:28:37 +00:00 |
|
Andrej Karpathy
|
2846999b8f
|
allow user to click on their message to edit them. conversation after that point is wiped
|
2025-10-16 01:16:22 +00:00 |
|
Andrej Karpathy
|
92d52ecc92
|
add slash commands to webui
|
2025-10-16 01:09:53 +00:00 |
|
Andrej Karpathy
|
01fb290f53
|
allow multiple GPUs to do inference in a data parallel way
|
2025-10-15 19:12:19 +00:00 |
|
Mirza-Samad-Ahmed-Baig
|
afaa5b4c90
|
Fix: Handle missing d<number> model tags in find_largest_model
|
2025-10-14 00:24:07 +03:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|