nuonuo/doc/p1_embedding_models.md
Fam Zheng d923aa1e31 NuoNuo: Hippocampal memory module prototype
Hopfield + Hebbian hybrid memory system for LLMs.
Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025).

Architecture:
- Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle)
- Multi-hop: Hebbian W matrix with WTA pattern separation
- 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency
- 4ms latency @ 20K memories, ~1GB VRAM

Key findings:
- Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian)
- WTA pattern separation enables 20K+ capacity
- Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this
- MiniLM-L6 is optimal (discrimination gap > absolute similarity)
- Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark
- SNN encoder viable (CosSim 0.99) but not needed for current architecture
2026-04-07 10:37:24 +01:00

35 lines
1.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# P1: Embedding 模型对比
## 核心发现:更大的模型 ≠ 更好的 recall反直觉
| Model | Dim | Same Sim | Diff Sim | **Gap** | **Recall** | Speed |
|-------|-----|----------|----------|---------|-----------|-------|
| **MiniLM (22M)** | 384 | 0.653 | 0.090 | **0.563** | **60%** | 11K/s |
| BGE-small (33M) | 384 | 0.808 | 0.534 | 0.274 | 25% | 7K/s |
| BGE-base (109M) | 768 | 0.793 | 0.506 | 0.287 | 35% | 5K/s |
| E5-small (33M) | 384 | 0.890 | 0.790 | 0.100 | 10% | 9K/s |
## 为什么
Recall 取决于 **discrimination gap**,不是绝对 similarity。
BGE/E5 是为检索任务优化的,倾向于把所有文本映射到一个窄锥体里(高基础相似度)。这导致:
- 正确 cue 和 background 的相似度差距太小
- Hopfield softmax attention 无法集中到正确答案
MiniLM 的 embedding 空间更分散:
- Background 真的很不像0.09
- 即使 paraphrase 不完美0.65),相对差距也大得多
## 结论
1. **MiniLM 是当前最优**——最快、最小、discrimination 最好
2. **不要盲目换大模型**——gap 比 absolute similarity 重要
3. 改善 recall 的正确路径是 **paraphrase augmentation**(已验证 95%),不是换 embedding 模型
4. 如果要换模型,应该找 **gap 最大**的,不是 same-sim 最高的
## 对架构的影响
保持 MiniLM (384-dim)。不需要扩大 code_dim 来适配更大 embedding。
省了 VRAM102MB vs 656MB和速度11K/s vs 5K/s