Commit Graph

127 Commits

Author SHA1 Message Date
Marius Wachtler fca2b8cd07 harden eval: prevent the calc tool from accessing globals and locals
By passing empty globals() and locals() to eval() we can prevent simple
malicious cases where the user gets the model to output something like

```<global variable/func> or "a".count("a")```
e.g.
```signal.raise_signal(9) or "a".count("a")``` which would kill the process.
or one could maybe get it to output secrets etc.

I think to make it 100% secure one would need to parse the AST and only execute secure nodes but this should make it much more robust.
2025-10-24 14:41:12 -05:00
Andrej Karpathy 8892470f29 add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think 2025-10-24 14:02:48 +00:00
Andrej Karpathy cc3636b01c allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough 2025-10-24 13:27:05 +00:00
Luke Stanley 32571664b1 Fix Torch crash caused by pinning on CPU 2025-10-22 16:25:36 +00:00
ulanch 796f84527f fix(ui): prevent iOS Safari toolbar from covering input on initial load 2025-10-21 17:34:40 -07:00
Andrej 2e938530ce delete spurious torch.empty allocation in adamw
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-21 11:35:17 -07:00
Andrej Karpathy a088b7a6ec use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available 2025-10-21 18:07:33 +00:00
Andrej Karpathy 5bdc99abfb merge and resolve conflict 2025-10-21 17:19:10 +00:00
Andrej Karpathy dfcb1c16f1 Merge branch 'master' into cpu-mps-dev 2025-10-21 17:15:53 +00:00
Andrej Karpathy bb71c64579 fix silly issue in dataloader, this version is much faster and more portable to mps too 2025-10-21 17:12:50 +00:00
karpathy 2e9669e03a upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
Sermet Pekin 49cd02f283 fix: remove unnecessary tensor allocation in DistAdamW optimizer
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-20 12:03:26 +03:00
obxium 2b58e2dd2a Update logo in code as well 2025-10-18 09:31:11 -04:00
Andrej cf2baf9933 fix typo
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>
2025-10-17 08:35:41 -07:00
Phúc H. Lê Khắc ed519b0f24 Update engine.py with correct error message on assert 2025-10-17 17:21:25 +07:00
karpathy df600b6ed5 many small tweaks. base, eval, core work now i think 2025-10-16 15:46:18 -07:00
karpathy 786119d593 add autodetect of device and related stuff. getting weird warnings/errors still, so wip 2025-10-16 10:26:19 -07:00
karpathy 306bc380ab add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights 2025-10-16 10:04:43 -07:00
Andrej Karpathy 722da4f543 trying to add basic cpu support, will try mps too 2025-10-16 16:14:38 +00:00
Ram Rachum 1f7ee5d3ce Remove redundant exception handling in chdir 2025-10-16 15:40:10 +03:00
Andrej Karpathy 4346536ab2 also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate 2025-10-16 01:28:37 +00:00
Andrej Karpathy 2846999b8f allow user to click on their message to edit them. conversation after that point is wiped 2025-10-16 01:16:22 +00:00
Andrej Karpathy 92d52ecc92 add slash commands to webui 2025-10-16 01:09:53 +00:00
Andrej Karpathy 01fb290f53 allow multiple GPUs to do inference in a data parallel way 2025-10-15 19:12:19 +00:00
Bhaskar 02440f670d fix: return inf instead of crashing when evaluate_bpb has zero total_bytes
Edge case: all tokens are special tokens or ignored

Return infinity to indicate no meaningful bytes were processed
2025-10-14 17:21:11 +05:30
Mirza-Samad-Ahmed-Baig afaa5b4c90 Fix: Handle missing d<number> model tags in find_largest_model 2025-10-14 00:24:07 +03:00
karpathy 3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00