nanochat-omni/.gitignore at 8c7210abebe28436c937a8efb7fe825de0c136b9 - nanochat-omni - Gitea: Git with a cup of tea

fam/nanochat-omni

Files

T

Fam Zheng 3c1cc3302f omni: W1 audio align smoke — synthetic dataset + 50-step script

End-to-end smoke proving the audio path:
  wav -> WhisperEncoder (frozen) -> Projector -> prepend to text embeddings
      -> tiny d6 GPT (random init) -> CE loss on text only

Pass criterion is a plain "loss drops by at least 0.5". On a 4090 the run
finishes in ~1 s and goes 5.55 -> 0.17 over 50 steps, so the threshold has
plenty of headroom against false positives.

Two design calls worth keeping in mind:

1. Synthetic sine clips, not LibriSpeech. W1 is forward-path proof, not
   alignment quality, and a deterministic offline dataset means no network
   on the smoke path. data/audio_smoke/manifest.jsonl is the only thing
   committed; wavs are regenerated by audio_smoke_data.py and gitignored.
   W2 swaps in real LibriSpeech.

2. Standalone byte-level tokenizer (UTF-8 bytes + a single BOS, vocab=257).
   Avoids depending on a trained nanochat BPE — the d6 GPT is random
   anyway, so vocab choice doesn't matter for "does the gradient flow"
   smoke. W2 onwards uses the real BPE on a real base.

Caveat documented in doc/todo.md: because the LM is also random and being
trained, the loss-down here mostly reflects the LM memorising 5 short
strings, not Whisper-Projector alignment. That's fine for proving
plumbing; W2 freezes the LM so projector-only gradient is the only path
to lower loss.

2026-05-05 22:39:20 +01:00

20 lines

254 B

Plaintext

Raw Blame History

 .venv/
 __pycache__/
 *.pyc
 dev-ignore/
 report.md
 eval_bundle/
 # Secrets
 .env
 # Local setup
 CLAUDE.md
 wandb/
 # Claude Code runtime
 .claude/
 # W1 audio smoke: regenerated by scripts/audio_smoke_data.py, only manifest is committed
 data/audio_smoke/wavs/