nanochat-omni

Files

T

Fam Zheng 7cc94cf584 omni: nanochat/audio.py — frozen Whisper encoder + Projector

The audio modality module that pairs with the gpt.forward audio_features
hook. Two things live here:

WhisperEncoder: thin wrapper around transformers' WhisperModel.encoder.
- Weight loading prefers ModelScope when WHISPER_MS_ID is set (matches the
  CN-mirror policy in doc/todo.md — modelscope is first-class for model
  weights, hf-mirror is fallback). Otherwise falls back to plain HF, with
  WHISPER_HF_ID as the override and `openai/whisper-base` as the default
  (the smallest variant that still produces useful features for smoke).
- Encoder params have requires_grad=False from __init__ so they never
  appear in the optimizer's param list. Caller does not need to remember
  to freeze it.
- preprocess() runs the feature extractor; forward() takes (B, n_mels,
  T_mel) and returns last_hidden_state (B, T_enc, d_model). Whisper pads
  every clip to 30 s, so T_enc is a constant 1500 regardless of input
  duration — handy for batching, wasteful for short clips. We accept the
  waste at W1; W2 can switch to streaming-style chunking.
- Note for W3+/W5+: last_hidden_state is the most text-semantic layer.
  When we start caring about timbre / prosody / emotion ("质感感知"), we
  should expose middle layers or a learnable weighted sum across layers.

Projector: 2-layer MLP (in_dim → out_dim → out_dim) with GELU and the
nanochat Linear class so master weights stay fp32 while forward runs in
the activation dtype (bf16). fc2 is zero-initialized so the model starts
ignoring audio entirely, which gives a clean baseline before any training
signal flows through (audio path is opt-out by default, opt-in by
training).

2026-05-05 22:39:05 +01:00

__init__.py

initial commit

2025-10-13 06:49:24 -07:00

audio.py

omni: nanochat/audio.py — frozen Whisper encoder + Projector

2026-05-05 22:39:05 +01:00

checkpoint_manager.py

tune the data mixture a bit, load optimizer by default when SFT. These were confirmed to be best settings from sweeps of sft

2026-02-18 15:49:18 +00:00

common.py

delete autocast, an unnecessary thorn in my side, manage dtypes directly