Andrej
cbf30c842c
apply float32 cast before logits softcapping so the tanh is in fp32. torch compile fuses this correctly with no extra memory costs.
2025-12-08 14:17:43 -08:00
Andrej Karpathy
90442de35f
fix bug where any rank has to be able to create checkpoint_dir if saving optim
2025-12-08 20:45:19 +00:00
spjosyula
16788eed3c
fix(model): apply float32 cast before logits softcapping
...
This change ensures that the logits softcapping operation (tanh) is performed in float32 precision rather than bfloat16. Previously, the code cast to float32 after the tanh operation, which meant the non-linearity was computed with bfloat16 precision
2025-11-23 20:12:09 +05:30
Sam Abrahams
11e68bf442
Fix comment: rotary embeddings final dimension size
2025-11-17 11:32:56 -05:00
Andrej Karpathy
bc1fca39f3
mqa -> gqa to reduce confusion
2025-11-15 15:43:37 +00:00
Andrej
f66a780f68
Fix torch.dtype mismatching when running engine inline test.
2025-11-14 07:28:29 -08:00
Andrej
4763ce612a
Small fixes to typos
2025-11-14 07:25:59 -08:00
Sofie Van Landeghem
c6f5bd67db
revert change of base to sft for quick inline test
2025-11-14 12:20:03 +01:00
svlandeg
a2fb3c83a6
fix typos
2025-11-14 11:20:25 +01:00
Andrej Karpathy
c6abcdfe3a
big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.
2025-11-13 15:34:40 +00:00
Andrej Karpathy
91f09ccd0d
minor fix comment in engine
2025-11-13 15:28:18 +00:00
howardgao@outlook.com
b399e43168
fix engine test bug
2025-11-06 08:56:45 +08:00
Andrej
3a2ae631c4
Merge branch 'master' into master
2025-11-04 16:35:02 -08:00
Andrej
d1558c7873
handle bf16 on MPS by casting to fp32 during load checkpoint
2025-11-04 09:42:50 -08:00
Yasser Makram
1e89af9862
Replace fcntl with filelock for Windows compatibility
2025-11-04 07:22:34 +00:00
Dipesh Babu
7a40ee77b4
fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues
2025-11-03 16:00:56 -05:00
svlandeg
2ce62ec076
ensure consistency of quotes within each statement
2025-11-03 21:52:02 +01:00
svlandeg
c72b8b2309
add explicit UTF-8 encoding
2025-11-03 21:27:12 +01:00
Josh Odom
f1e15f5f4d
Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead.
2025-11-02 23:40:37 -06:00
Andrej
b6da6982f6
fix nanochat logo: the t was placed too far to the right
2025-11-02 08:17:00 -08:00
Andrej
d1ac0b2d07
when loading models on CPU, convert tensors from bfloat16 to float
2025-11-02 07:58:56 -08:00
svlandeg
5bfcd31b73
revert more formatting changes
2025-11-02 14:17:10 +01:00
svlandeg
036a3c5881
revert formatting changes to facilitate review
2025-11-02 14:16:43 +01:00
Manuel Saelices
d54c9cbf8c
CPU Support, as bfloat16 params breaks inference
2025-11-01 23:38:50 +01:00
Andrej Karpathy
cf587acb1a
move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts
2025-11-01 16:04:38 +00:00
Andrej
ad39db5a23
tiny fix to comment
...
Update engine.py with correct error message on assert
2025-11-01 07:43:57 -07:00
Andrej
630f54ae5a
use empty locals and globals in call to eval() in engine tool use
...
harden eval: prevent the calc tool from accessing globals and locals
2025-11-01 07:22:59 -07:00
Andrej
1ccbaf4416
nit delete redundant catch/raise in execute
...
Remove redundant exception handling in chdir
2025-10-29 08:10:03 -07:00
Andrej
29ff38d94b
Merge pull request #35 from bhaskar0210s/master
...
fix: return inf instead of crashing when evaluate_bpb has zero total_bytes
2025-10-29 08:06:24 -07:00
svlandeg
b996131570
Merge branch 'master' into logo/kerning-update
2025-10-29 11:45:40 +01:00
svlandeg
3fa974f93c
few more reverts
2025-10-29 11:45:02 +01:00
svlandeg
cbd560a83d
revert formatting changes to minimize diff and merge conflicts
2025-10-29 11:42:56 +01:00
Haowei Zhang
2b9c085559
update the kv_shape
2025-10-27 02:47:13 -07:00
Haowei Zhang
b062b422ac
Fix kv cache, given resize will destroys the logical structure
2025-10-27 02:23:08 -07:00
Marius Wachtler
fca2b8cd07
harden eval: prevent the calc tool from accessing globals and locals
...
By passing empty globals() and locals() to eval() we can prevent simple
malicious cases where the user gets the model to output something like
```<global variable/func> or "a".count("a")```
e.g.
```signal.raise_signal(9) or "a".count("a")``` which would kill the process.
or one could maybe get it to output secrets etc.
I think to make it 100% secure one would need to parse the AST and only execute secure nodes but this should make it much more robust.
2025-10-24 14:41:12 -05:00
Andrej Karpathy
8892470f29
add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
2025-10-24 14:02:48 +00:00
Andrej Karpathy
cc3636b01c
allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough
2025-10-24 13:27:05 +00:00
Luke Stanley
32571664b1
Fix Torch crash caused by pinning on CPU
2025-10-22 16:25:36 +00:00
ulanch
796f84527f
fix(ui): prevent iOS Safari toolbar from covering input on initial load
2025-10-21 17:34:40 -07:00
Andrej
2e938530ce
delete spurious torch.empty allocation in adamw
...
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-21 11:35:17 -07:00
Andrej Karpathy
a088b7a6ec
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
2025-10-21 18:07:33 +00:00
Andrej Karpathy
5bdc99abfb
merge and resolve conflict
2025-10-21 17:19:10 +00:00
Andrej Karpathy
dfcb1c16f1
Merge branch 'master' into cpu-mps-dev
2025-10-21 17:15:53 +00:00
Andrej Karpathy
bb71c64579
fix silly issue in dataloader, this version is much faster and more portable to mps too
2025-10-21 17:12:50 +00:00
karpathy
2e9669e03a
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
2025-10-20 10:15:17 -07:00
Sermet Pekin
49cd02f283
fix: remove unnecessary tensor allocation in DistAdamW optimizer
...
fix: remove unnecessary tensor allocation in DistAdamW optimizer
2025-10-20 12:03:26 +03:00
obxium
2b58e2dd2a
Update logo in code as well
2025-10-18 09:31:11 -04:00
Andrej
cf2baf9933
fix typo
...
Co-authored-by: Tancrède Lepoint <tlepoint@users.noreply.github.com >
2025-10-17 08:35:41 -07:00
Phúc H. Lê Khắc
ed519b0f24
Update engine.py with correct error message on assert
2025-10-17 17:21:25 +07:00
karpathy
df600b6ed5
many small tweaks. base, eval, core work now i think
2025-10-16 15:46:18 -07:00