nanochat-omni/scripts/base_train.py at 8ef90bc154e8ffaa5ce53db4a0aef3d22ea73a6b

Files

T

RoomWithOutRoof 47e983eea7 fix: use meta device in disable_fp8 to avoid VRAM spike (#616 )

When swapping Float8Linear to Linear in disable_fp8 context manager,
using device=fp8_module.weight.device directly allocates new tensors
on GPU, causing unnecessary VRAM spike (~1GB for large models).

This fix uses device='meta' to avoid physical memory allocation,
then swaps in the weight tensor reference. This eliminates the
unnecessary VRAM spike during evaluation phase.

Fixes issue #592

Co-authored-by: RoomWithOutRoof <roomwithoutroof@sparklab.ai>

2026-03-25 14:24:57 -07:00

33 KiB

Raw Blame History

View Raw

33 KiB Raw Blame History

33 KiB

Raw Blame History