Commit Graph

282 Commits

Author SHA1 Message Date
water-vapor a9de4b1038 Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1 2025-10-26 01:43:49 -05:00
Andrej Karpathy c75fe54aa7 readme tweak, link to new discussion and add file structure 2025-10-25 19:39:16 +00:00
Marius Wachtler fca2b8cd07 harden eval: prevent the calc tool from accessing globals and locals
By passing empty globals() and locals() to eval() we can prevent simple
malicious cases where the user gets the model to output something like

```<global variable/func> or "a".count("a")```
e.g.
```signal.raise_signal(9) or "a".count("a")``` which would kill the process.
or one could maybe get it to output secrets etc.

I think to make it 100% secure one would need to parse the AST and only execute secure nodes but this should make it much more robust.
2025-10-24 14:41:12 -05:00
Andrej Karpathy 05a051dbe9 fix tokenization bug, there should be no space before first letter. sigh 2025-10-24 15:06:06 +00:00
Andrej Karpathy 8892470f29 add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think 2025-10-24 14:02:48 +00:00
Andrej Karpathy 81597cd616 move the lr schedule args up in base_train so they are tunable in configurator 2025-10-24 13:27:31 +00:00
Andrej Karpathy cc3636b01c allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough 2025-10-24 13:27:05 +00:00
Tancrède Lepoint d5cda11ab8 Export the base dir variable 2025-10-22 18:15:02 -04:00
Andrej Karpathy 5eeb2b6ef9 experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme 2025-10-22 16:55:54 +00:00
Andrej Karpathy 2dda5c4c8d Merge branch 'ulanch-fix/ios-safari-input-overlap' 2025-10-22 16:26:35 +00:00
Andrej Karpathy 80b203ea59 also bump run1000.sh to new uv sync 2025-10-22 16:25:36 +00:00
Luke Stanley 917c858136 Updates lockfile with CPU package support without overwriting other architectures 2025-10-22 16:25:36 +00:00
Luke Stanley db1d5b595d Git ignore eval_bundle 2025-10-22 16:25:36 +00:00
Luke Stanley dd9387b362 Fix GPU-less CPU use on Linux with specific Torch indexes 2025-10-22 16:25:36 +00:00
Luke Stanley 32571664b1 Fix Torch crash caused by pinning on CPU 2025-10-22 16:25:36 +00:00
Andrej Karpathy 51e70f0d3c Merge branch 'lukestanley-fix-cpu-support-with-extras' 2025-10-22 16:11:15 +00:00
Andrej Karpathy 48387cd895 also bump run1000.sh to new uv sync 2025-10-22 16:08:31 +00:00
ulanch 796f84527f fix(ui): prevent iOS Safari toolbar from covering input on initial load 2025-10-21 17:34:40 -07:00
Luke Stanley 7a52f9bfbb Updates lockfile with CPU package support without overwriting other architectures 2025-10-21 23:14:34 +00:00
Luke Stanley 760af62e11 Git ignore eval_bundle 2025-10-21 23:14:34 +00:00
Luke Stanley 901b075605 Fix GPU-less CPU use on Linux with specific Torch indexes 2025-10-21 23:14:16 +00:00
Luke Stanley defd1246aa Fix Torch crash caused by pinning on CPU 2025-10-21 20:28:10 +00:00
Andrej 2e938530ce delete spurious torch.empty allocation in adamw
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-21 11:35:17 -07:00
Andrej Karpathy a088b7a6ec use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available 2025-10-21 18:07:33 +00:00
Andrej Karpathy 94ee507054 quick fix base eval due to fewshot requirement 2025-10-21 17:56:08 +00:00
Andrej 33e8a27f91 Merge karpathy/cpu-mps-dev , adding the ability to run on CPU, on MPS, or on CUDA, with autodetect. Gnarly PR, nonzero chance I broke something.
add cpu|mps support
2025-10-21 10:26:04 -07:00
Andrej Karpathy 50bea28ef9 also add readme mention of the cpu mps changes 2025-10-21 17:24:48 +00:00
Andrej Karpathy 5bdc99abfb merge and resolve conflict 2025-10-21 17:19:10 +00:00
Andrej Karpathy dfcb1c16f1 Merge branch 'master' into cpu-mps-dev 2025-10-21 17:15:53 +00:00
Andrej Karpathy bb71c64579 fix silly issue in dataloader, this version is much faster and more portable to mps too 2025-10-21 17:12:50 +00:00
karpathy bb786c5560 i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history... 2025-10-21 10:07:40 -07:00
Andrej c9ea7a91e2 Add customization instructions to README
Added a section on customization for nanochat.
2025-10-21 08:57:10 -07:00
Andrej Karpathy 03cddd9878 actually let's not brick code on git pull. change error to warning 2025-10-21 15:13:25 +00:00
Andrej Karpathy fe5aed940b add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok 2025-10-21 15:04:58 +00:00
karpathy 2e9669e03a upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
Andrej a09ac812ed toml changes for cpu only install 2025-10-20 07:53:15 -07:00
Sermet Pekin 49cd02f283 fix: remove unnecessary tensor allocation in DistAdamW optimizer
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-20 12:03:26 +03:00
burtenshaw 0abb0fa2e3 add both sides of the source check 2025-10-20 10:44:07 +02:00
burtenshaw c7ae920a77 add check for linux on cpu 2025-10-20 06:51:52 +02:00
Andrej 0f007889dd Add MIT License as a file to the project 2025-10-19 17:22:19 -07:00
Andrej 5a879f4947 export NANOCHAT_BASE_DIR so child processes get it too 2025-10-19 17:07:56 -07:00
Andrej Karpathy c1d2ed1c13 use orig_model in sampling, silly of me to miss this 2025-10-20 00:05:09 +00:00
Andrej Karpathy 2bc521a6de use orig_model in sampling, silly of me to miss this 2025-10-20 00:04:15 +00:00
Andrej Karpathy 9467d83cf2 fix memory leak bug in rust tokenizer ty @mitsuhiko 2025-10-19 23:54:31 +00:00
Tancrède Lepoint b1443dc98c export NANOCHAT_BASE_DIR so child processes get it too 2025-10-19 14:05:40 -04:00
obxium 2b58e2dd2a Update logo in code as well 2025-10-18 09:31:11 -04:00
Andrej cf2baf9933 fix typo
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com>
2025-10-17 08:35:41 -07:00
karpathy e4f9b9c64d revert to previous pyproject.toml 2025-10-17 08:08:16 -07:00
Andrej e883b1d597 Merge pull request #99 from burtenshaw/cpu-mps-dev-ben
Add mps and cpu dependency management
2025-10-17 07:24:38 -07:00
Phúc H. Lê Khắc ed519b0f24 Update engine.py with correct error message on assert 2025-10-17 17:21:25 +07:00