Commit Graph

318 Commits

Author SHA1 Message Date
Andrej a83646e098 fix(eval): use UTF-8 when reading CORE JSONL and writing CSV 2025-11-03 06:38:33 -08:00
Andrej 8681922328 fix lstrip bug, make it removeprefix, TIL. 2025-11-03 06:37:48 -08:00
Dipesh Babu 226953b841 fix: open JSONL and results CSV with UTF-8 encoding for portability 2025-11-03 01:20:56 -05:00
Josh Odom f1e15f5f4d Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead. 2025-11-02 23:40:37 -06:00
Andrej b6da6982f6 fix nanochat logo: the t was placed too far to the right 2025-11-02 08:17:00 -08:00
Andrej c2c4f77e22 oops small bugfix to run1000.sh missing kwarg 2025-11-02 08:14:41 -08:00
Andrej d1ac0b2d07 when loading models on CPU, convert tensors from bfloat16 to float 2025-11-02 07:58:56 -08:00
svlandeg 5bfcd31b73 revert more formatting changes 2025-11-02 14:17:10 +01:00
svlandeg 036a3c5881 revert formatting changes to facilitate review 2025-11-02 14:16:43 +01:00
svlandeg 52e85aaf80 Merge branch 'master' into fix/typo 2025-11-02 13:41:13 +01:00
Jing Zhang ba4f40bf58 Update run1000.sh to add missing --run=$WANDB_RUN 2025-11-01 21:27:00 -07:00
Manuel Saelices d54c9cbf8c CPU Support, as bfloat16 params breaks inference 2025-11-01 23:38:50 +01:00
Andrej Karpathy cf587acb1a move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts 2025-11-01 16:04:38 +00:00
Andrej Karpathy 7d2c4a3d95 delete pandas dep in base_eval use csv instead 2025-11-01 15:28:30 +00:00
Andrej ad39db5a23 tiny fix to comment
Update engine.py with correct error message on assert
2025-11-01 07:43:57 -07:00
Andrej 630f54ae5a use empty locals and globals in call to eval() in engine tool use
harden eval: prevent the calc tool from accessing globals and locals
2025-11-01 07:22:59 -07:00
Andrej Karpathy f15732524a make deepwiki link better 2025-11-01 14:13:29 +00:00
Andrej dfc88334b6 fix tok/sec calculation bug when grad accum steps > 1
Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1
2025-10-30 08:36:32 -07:00
Andrej eb11bb0e2e remove numpy as dep
Remove explicit numpy dependency
2025-10-30 08:28:14 -07:00
svlandeg 70319851fc fix typo 2025-10-29 19:48:34 +01:00
Andrej 1ccbaf4416 nit delete redundant catch/raise in execute
Remove redundant exception handling in chdir
2025-10-29 08:10:03 -07:00
Andrej 29ff38d94b Merge pull request #35 from bhaskar0210s/master
fix: return inf instead of crashing when evaluate_bpb has zero total_bytes
2025-10-29 08:06:24 -07:00
svlandeg b996131570 Merge branch 'master' into logo/kerning-update 2025-10-29 11:45:40 +01:00
svlandeg 3fa974f93c few more reverts 2025-10-29 11:45:02 +01:00
svlandeg cbd560a83d revert formatting changes to minimize diff and merge conflicts 2025-10-29 11:42:56 +01:00
Andrej a1de1f46ad Merge pull request #156 from tlepoint/fix/export-base-dir
Export the base dir variable in runcpu.sh
2025-10-28 15:19:08 -07:00
Andrej ee00f523d0 fixing all the typos to make the pull requests stop
Batch of typo fixes
2025-10-28 13:36:07 -07:00
Ajeesh Sunil 5e0987a431 numpy isnt acting as a dependency for nanochat, so isnt it better to remove numpy from dependencies list 2025-10-28 20:05:38 +00:00
svlandeg 8c9b004c99 typo fixes in scripts 2025-10-28 20:17:31 +01:00
svlandeg 0a3ce7b0ff typo fixes in readme 2025-10-28 20:11:00 +01:00
Andrej Karpathy fdda5826e3 Merge branch 'haowei01-fix_kv_cache_due_to_resize' 2025-10-28 16:54:30 +00:00
Andrej Karpathy baf0b3fdda also add a test that failed before the fix and passes now with the fix for kv cache resize 2025-10-28 16:54:17 +00:00
Andrej Karpathy f1db6b4712 delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm 2025-10-28 16:51:41 +00:00
Andrej Karpathy 9415931f85 delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm 2025-10-28 15:17:43 +00:00
Haowei Zhang 2b9c085559 update the kv_shape 2025-10-27 02:47:13 -07:00
Haowei Zhang b062b422ac Fix kv cache, given resize will destroys the logical structure 2025-10-27 02:23:08 -07:00
water-vapor a9de4b1038 Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1 2025-10-26 01:43:49 -05:00
Andrej Karpathy c75fe54aa7 readme tweak, link to new discussion and add file structure 2025-10-25 19:39:16 +00:00
Marius Wachtler fca2b8cd07 harden eval: prevent the calc tool from accessing globals and locals
By passing empty globals() and locals() to eval() we can prevent simple
malicious cases where the user gets the model to output something like

```<global variable/func> or "a".count("a")```
e.g.
```signal.raise_signal(9) or "a".count("a")``` which would kill the process.
or one could maybe get it to output secrets etc.

I think to make it 100% secure one would need to parse the AST and only execute secure nodes but this should make it much more robust.
2025-10-24 14:41:12 -05:00
Andrej Karpathy 05a051dbe9 fix tokenization bug, there should be no space before first letter. sigh 2025-10-24 15:06:06 +00:00
Andrej Karpathy 8892470f29 add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think 2025-10-24 14:02:48 +00:00
Andrej Karpathy 81597cd616 move the lr schedule args up in base_train so they are tunable in configurator 2025-10-24 13:27:31 +00:00
Andrej Karpathy cc3636b01c allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough 2025-10-24 13:27:05 +00:00
Tancrède Lepoint d5cda11ab8 Export the base dir variable 2025-10-22 18:15:02 -04:00
Andrej Karpathy 5eeb2b6ef9 experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme 2025-10-22 16:55:54 +00:00
Andrej Karpathy 2dda5c4c8d Merge branch 'ulanch-fix/ios-safari-input-overlap' 2025-10-22 16:26:35 +00:00
Andrej Karpathy 80b203ea59 also bump run1000.sh to new uv sync 2025-10-22 16:25:36 +00:00
Luke Stanley 917c858136 Updates lockfile with CPU package support without overwriting other architectures 2025-10-22 16:25:36 +00:00
Luke Stanley db1d5b595d Git ignore eval_bundle 2025-10-22 16:25:36 +00:00
Luke Stanley dd9387b362 Fix GPU-less CPU use on Linux with specific Torch indexes 2025-10-22 16:25:36 +00:00