Files
nanochat-omni/nanochat/optim.py
T
Andrej Karpathy c0dbf1f3ff use COMPUTE_DTYPE-aware cast in Muon polar express step
The bf16 cast is intentional for speed on Hopper+ GPUs, but should be
skipped on other platforms rather than blindly applied. fp16 is unstable
here due to its limited exponent range, and fp32 platforms don't benefit
from the cast. Now: bf16 when COMPUTE_DTYPE is bf16, no cast otherwise.

Inspired by PR #667.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 20:19:14 +00:00

25 KiB