nanochat-omni

Files

T

RoomWithOutRoof 47e983eea7 fix: use meta device in disable_fp8 to avoid VRAM spike (#616 )

When swapping Float8Linear to Linear in disable_fp8 context manager,
using device=fp8_module.weight.device directly allocates new tensors
on GPU, causing unnecessary VRAM spike (~1GB for large models).

This fix uses device='meta' to avoid physical memory allocation,
then swaps in the weight tensor reference. This eliminates the
unnecessary VRAM spike during evaluation phase.

Fixes issue #592

Co-authored-by: RoomWithOutRoof <roomwithoutroof@sparklab.ai>

2026-03-25 14:24:57 -07:00

base_eval.py

delete autocast, an unnecessary thorn in my side, manage dtypes directly

2026-03-04 23:55:30 +00:00

base_train.py

fix: use meta device in disable_fp8 to avoid VRAM spike (#616 )

2026-03-25 14:24:57 -07:00

chat_cli.py

delete autocast, an unnecessary thorn in my side, manage dtypes directly

2026-03-04 23:55:30 +00:00

chat_eval.py

delete autocast, an unnecessary thorn in my side, manage dtypes directly