Andrej Karpathy
cf587acb1a
move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts
2025-11-01 16:04:38 +00:00
Andrej
ad39db5a23
tiny fix to comment
...
Update engine.py with correct error message on assert
2025-11-01 07:43:57 -07:00
Andrej
630f54ae5a
use empty locals and globals in call to eval() in engine tool use
...
harden eval: prevent the calc tool from accessing globals and locals
2025-11-01 07:22:59 -07:00
Andrej
1ccbaf4416
nit delete redundant catch/raise in execute
...
Remove redundant exception handling in chdir
2025-10-29 08:10:03 -07:00
Andrej
29ff38d94b
Merge pull request #35 from bhaskar0210s/master
...
fix: return inf instead of crashing when evaluate_bpb has zero total_bytes
2025-10-29 08:06:24 -07:00
Haowei Zhang
2b9c085559
update the kv_shape
2025-10-27 02:47:13 -07:00
Haowei Zhang
b062b422ac
Fix kv cache, given resize will destroys the logical structure
2025-10-27 02:23:08 -07:00
Marius Wachtler
fca2b8cd07
harden eval: prevent the calc tool from accessing globals and locals
...
By passing empty globals() and locals() to eval() we can prevent simple
malicious cases where the user gets the model to output something like
```<global variable/func> or "a".count("a")```
e.g.
```signal.raise_signal(9) or "a".count("a")``` which would kill the process.
or one could maybe get it to output secrets etc.
I think to make it 100% secure one would need to parse the AST and only execute secure nodes but this should make it much more robust.
2025-10-24 14:41:12 -05:00
Andrej Karpathy
8892470f29
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
2025-10-24 14:02:48 +00:00
Andrej Karpathy
cc3636b01c
allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough
2025-10-24 13:27:05 +00:00
Luke Stanley
32571664b1
Fix Torch crash caused by pinning on CPU
2025-10-22 16:25:36 +00:00
ulanch
796f84527f
fix(ui): prevent iOS Safari toolbar from covering input on initial load
2025-10-21 17:34:40 -07:00
Andrej
2e938530ce
delete spurious torch.empty allocation in adamw
...
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-21 11:35:17 -07:00
Andrej Karpathy
a088b7a6ec
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
2025-10-21 18:07:33 +00:00
Andrej Karpathy
5bdc99abfb
merge and resolve conflict
2025-10-21 17:19:10 +00:00
Andrej Karpathy
dfcb1c16f1
Merge branch 'master' into cpu-mps-dev
2025-10-21 17:15:53 +00:00
Andrej Karpathy
bb71c64579
fix silly issue in dataloader, this version is much faster and more portable to mps too
2025-10-21 17:12:50 +00:00
karpathy
2e9669e03a
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
2025-10-20 10:15:17 -07:00
Sermet Pekin
49cd02f283
fix: remove unnecessary tensor allocation in DistAdamW optimizer
...
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-20 12:03:26 +03:00
Andrej
cf2baf9933
fix typo
...
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com >
2025-10-17 08:35:41 -07:00
Phúc H. Lê Khắc
ed519b0f24
Update engine.py with correct error message on assert
2025-10-17 17:21:25 +07:00
karpathy
df600b6ed5
many small tweaks. base, eval, core work now i think
2025-10-16 15:46:18 -07:00
karpathy
786119d593
add autodetect of device and related stuff. getting weird warnings/errors still, so wip
2025-10-16 10:26:19 -07:00
karpathy
306bc380ab
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
2025-10-16 10:04:43 -07:00
Andrej Karpathy
722da4f543
trying to add basic cpu support, will try mps too
2025-10-16 16:14:38 +00:00
Ram Rachum
1f7ee5d3ce
Remove redundant exception handling in chdir
2025-10-16 15:40:10 +03:00
Andrej Karpathy
4346536ab2
also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
2025-10-16 01:28:37 +00:00
Andrej Karpathy
2846999b8f
allow user to click on their message to edit them. conversation after that point is wiped
2025-10-16 01:16:22 +00:00
Andrej Karpathy
92d52ecc92
add slash commands to webui
2025-10-16 01:09:53 +00:00
Andrej Karpathy
01fb290f53
allow multiple GPUs to do inference in a data parallel way
2025-10-15 19:12:19 +00:00
Bhaskar
02440f670d
fix: return inf instead of crashing when evaluate_bpb has zero total_bytes
...
Edge case: all tokens are special tokens or ignored
Return infinity to indicate no meaningful bytes were processed
2025-10-14 17:21:11 +05:30
Mirza-Samad-Ahmed-Baig
afaa5b4c90
Fix: Handle missing d<number> model tags in find_largest_model
2025-10-14 00:24:07 +03:00
karpathy
3a5e0bc50b
initial commit
2025-10-13 06:49:24 -07:00