nuonuo/doc/exp05_benchmark.md
Fam Zheng d923aa1e31 NuoNuo: Hippocampal memory module prototype
Hopfield + Hebbian hybrid memory system for LLMs.
Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025).

Architecture:
- Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle)
- Multi-hop: Hebbian W matrix with WTA pattern separation
- 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency
- 4ms latency @ 20K memories, ~1GB VRAM

Key findings:
- Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian)
- WTA pattern separation enables 20K+ capacity
- Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this
- MiniLM-L6 is optimal (discrimination gap > absolute similarity)
- Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark
- SNN encoder viable (CosSim 0.99) but not needed for current architecture
2026-04-07 10:37:24 +01:00

62 lines
1.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 实验5性能 Benchmark
## 学习吞吐量
| code_dim | k | 吞吐量 | 5000条耗时 |
|----------|---|--------|-----------|
| 8192 | 50 | **794/s** | 6.3s |
| 16384 | 50 | 211/s | 23.7s |
| 32768 | 50 | 54/s | 92.7s |
瓶颈是 outer-product 更新O(code_dim²) per memory。
16384 维的 211/s 意味着一天的对话(假设 1000 条记忆)只需 ~5 秒。
## 召回延迟
| code_dim | k | 延迟 |
|----------|---|------|
| 8192 | 50 | **0.35 ms** |
| 16384 | 50 | 1.26 ms |
| 32768 | 50 | 4.63 ms |
**16384 维1.3ms/query**——对 LLM 对话场景完全够快LLM 生成一个 token 都要 ~20ms
## Multi-hop 延迟
| 跳数 | 延迟 (code=16384) |
|------|-------------------|
| 1 | 1.26 ms |
| 2 | 2.45 ms |
| 3 | 3.64 ms |
| 5 | 6.03 ms |
| 10 | 12.05 ms |
线性增长:~1.2ms/hop。10 跳 12ms 仍然远快于 LLM inference。
## GPU 显存
| code_dim | W 矩阵 | 总占用 |
|----------|---------|--------|
| 4096 | 64 MB | 70 MB |
| 8192 | 256 MB | 268 MB |
| **16384** | **1024 MB** | **1048 MB** |
| 32768 | 4096 MB | 4144 MB |
推荐 **16384 维 = 1GB 显存**,在 RTX 4090 (24GB) 上轻松和 Gemma 4B 共存。
## 端到端 Pipeline含 embedding 模型)
| 步骤 | 延迟 |
|------|------|
| Embedding (all-MiniLM-L6-v2) | 1.8 ms |
| Hebbian Recall (1-hop) | 1.3 ms |
| **Total** | **3.1 ms** |
Embedding 和 recall 耗时相当。总计 3ms 远低于 LLM 生成延迟。
## 结论
- code_dim=16384 是最佳平衡点1GB 显存1.3ms 召回211/s 学习
- 性能完全不是瓶颈——LLM inference 才是
- 32768 维如果需要更大容量也可以4GB但 learning 慢 4x