Commit Graph

4 Commits

Author SHA1 Message Date
Andrej Karpathy 1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
Alan 124f49be98 Removed redundant qunatization of gradients 2026-02-15 15:41:33 +00:00
Alan d9678ff0f9 Save FP8 tensors in autograd ctx instead of full-precision inputs
Store quantized input/weight and their inverse scales in _Float8Matmul ctx to avoid re-quantization in backward and reduce saved-activation memory without changing numerics.
2026-02-15 14:31:54 +00:00
Andrej Karpathy e569b59f92 delete torchao dependency, create our own exact API-matched version of Float8Linear, document it very well. for some poorly understood reason, the performance is not only ~identical but actually runs 3% faster. despite of it being significantly simpler and much less code. i don't fully understand why/how atm 2026-02-10 18:46:39 +00:00