Files
nanochat-omni/scripts/base_train.py
T
RoomWithOutRoof 47e983eea7 fix: use meta device in disable_fp8 to avoid VRAM spike (#616)
When swapping Float8Linear to Linear in disable_fp8 context manager,
using device=fp8_module.weight.device directly allocates new tensors
on GPU, causing unnecessary VRAM spike (~1GB for large models).

This fix uses device='meta' to avoid physical memory allocation,
then swaps in the weight tensor reference. This eliminates the
unnecessary VRAM spike during evaluation phase.

Fixes issue #592

Co-authored-by: RoomWithOutRoof <roomwithoutroof@sparklab.ai>
2026-03-25 14:24:57 -07:00

33 KiB