nuonuo/doc/architecture.md
Fam Zheng d923aa1e31 NuoNuo: Hippocampal memory module prototype
Hopfield + Hebbian hybrid memory system for LLMs.
Two nights of experiments (16 iterations), validated on LongMemEval (ICLR 2025).

Architecture:
- Single-hop: Two-Stage Hopfield (NN top-20 → softmax settle)
- Multi-hop: Hebbian W matrix with WTA pattern separation
- 64% on LongMemEval (500 questions), retrieval-only, no LLM dependency
- 4ms latency @ 20K memories, ~1GB VRAM

Key findings:
- Hopfield attention solved noise tolerance (20% → 100% vs flat Hebbian)
- WTA pattern separation enables 20K+ capacity
- Multi-hop associative chains (6 hops, CosSim=1.0) — RAG can't do this
- MiniLM-L6 is optimal (discrimination gap > absolute similarity)
- Paraphrase cue augmentation: 55% → 100% on synthetic, 36% → 64% on benchmark
- SNN encoder viable (CosSim 0.99) but not needed for current architecture
2026-04-07 10:37:24 +01:00

5.4 KiB
Raw Permalink Blame History

NuoNuo: Hippocampal Memory Module — Architecture v2

项目目标

为 LLM如 Gemma 4添加一个类海马体的长期记忆模块

  • 不使用传统 RAG向量数据库 + 检索)
  • 记忆存储在网络权重Hebbian和显式模式Hopfield
  • 支持 paraphrase 容忍的模糊检索
  • 支持多跳联想推理A→B→CRAG 做不到)
  • 每晚可整合/遗忘

核心架构

┌─────────────────────────────────────────────────────────┐
│  Query Embedding (from Sentence Transformer)             │
│                    ↓                                     │
│  ┌──── Stage 1: NN Pre-filter ────────────────────────┐ │
│  │  cosine(query, stored_cues) → top-20 candidates     │ │
│  │  O(N) brute force, O(log N) with FAISS              │ │
│  └─────────────────────┬──────────────────────────────┘ │
│                        ↓                                 │
│  ┌──── Stage 2: Hopfield Settle ──────────────────────┐ │
│  │  softmax(β · query @ candidates^T) → attention       │ │
│  │  Iterate 3 steps → converge to nearest attractor     │ │
│  │  Aggregate attention by memory_id (cue variants)     │ │
│  └─────────────────────┬──────────────────────────────┘ │
│                        ↓                                 │
│  ┌──── Optional: Multi-hop Hebbian Chain ─────────────┐ │
│  │  Settled cue → WTA code → W @ code → next target     │ │
│  │  Repeat for N hops (A → B → C → ...)                 │ │
│  └─────────────────────┬──────────────────────────────┘ │
│                        ↓                                 │
│               Retrieved memories                          │
└─────────────────────────────────────────────────────────┘

生物学类比

大脑区域 系统组件 功能
嗅内皮层 (EC) Sentence Transformer 感知编码
齿状回 (DG) WTA Pattern Separation 稀疏化/正交化
CA3 Hebbian W matrix 联想存储 + 多跳
CA1 Hopfield attention 检索输出
睡眠重播 W rebuild 整合/遗忘

实验验证总结

能力 验证结果 实验
Paraphrase recall (+ augmentation) 95% exp07e
Multi-hop (3 hops, 500 bg) 100% (sim=1.0) exp07b, 07c
Scale (20K memories) 80% exp07d
Exact cue recall 100% exp02c
Memory capacity 20K+ exp02d
Recall latency 4ms @ 20K exp05, 07d
SNN encoder roundtrip CosSim 0.99 exp01b

参数推荐

参数 备注
embed_dim 384-768 取决于 Sentence Transformer
code_dim 16384 Hebbian 容量 20K+
k (WTA) 50 平衡噪声容忍和容量
β (Hopfield) 16.0 中等锐度
hopfield_top_k 20 候选集大小,越小越稳
hopfield_steps 3 收敛迭代次数
cue_variants 3-5 per memory LLM 生成 paraphrase

VRAM 预算 (RTX 4090, 24GB)

组件 大小
Hebbian W (16384²) 1024 MB
WTA projection (384×16384) 24 MB
Hopfield store (20K × 384 × 2) ~60 MB
Sentence Transformer ~90 MB
Gemma 4B (fp16) ~8 GB
Total ~9.2 GB
Headroom ~14.8 GB

与 Gemma 集成

推荐方案:Context Injection

# 1. User input → embed
query_emb = encoder.encode(user_input)

# 2. Recall memories
results = memory.recall(query_emb, top_k=3)
chain = memory.recall_chain(query_emb, hops=2)

# 3. Format and inject
context = format_memories(results + chain)
prompt = f"[Recalled memories]\n{context}\n\n[User]\n{user_input}"

# 4. Generate response
response = gemma.generate(prompt)

# 5. Store new memory (with LLM-generated paraphrases)
paraphrases = gemma.generate(f"Generate 3 paraphrases of: {user_input}")
memory.store(query_emb, response_emb,
             cue_variants=[encoder.encode(p) for p in paraphrases])

文件结构

src/nuonuo/
├── hippocampus.py    # 最终模块 v2 (Hopfield + Hebbian hybrid)
├── encoder.py        # SNN spike encoder/decoder
├── memory.py         # STDP + Hebbian memory (historical)
├── consolidation.py  # Sleep consolidation (historical)
└── __init__.py

doc/
├── architecture.md   # 本文件
├── findings.md       # 核心发现与反直觉结论
├── exp01_*.md        # SNN Encoder
├── exp02_*.md        # Associative Recall
├── exp03_*.md        # Consolidation
├── exp04_*.md        # Real Embeddings
├── exp05_*.md        # Benchmarks
├── exp06_*.md        # BioHash
└── exp07_*.md        # Hopfield (突破)