Eric Silberstein
|
5c93a56be5
|
remove unnecessary check
|
2025-11-19 16:31:41 -05:00 |
|
Sam Abrahams
|
11e68bf442
|
Fix comment: rotary embeddings final dimension size
|
2025-11-17 11:32:56 -05:00 |
|
Andrej Karpathy
|
bc1fca39f3
|
mqa -> gqa to reduce confusion
|
2025-11-15 15:43:37 +00:00 |
|
Andrej Karpathy
|
a088b7a6ec
|
use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
|
2025-10-21 18:07:33 +00:00 |
|
karpathy
|
306bc380ab
|
add support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
|
2025-10-16 10:04:43 -07:00 |
|
karpathy
|
3a5e0bc50b
|
initial commit
|
2025-10-13 06:49:24 -07:00 |
|