# P4: 记忆生命周期管理

## Deduplication

**可行**。cosine threshold=0.7 正确识别了 2 组近似重复（9 → 6 memories）。
- "The database is slow" / "Database is really slow today" / "DB performance terrible" → 合并
- "The API returns 500 errors" / "Getting 500 errors from API" → 合并

实现简单：pairwise cosine on cue embeddings → group → keep best per group.
O(N²) 但可以离线做（夜间整合），或用 ANN 加速。

## Importance Scoring

Heuristic 规则 6/7 准确：
- 关键词检测（crash, compromised, secret → critical）有效
- 回答长度 > 15 词 → 更可能包含有用信息
- 简单问答（时间、天气）正确标记为 low

待 LLM 可用时，可以让 LLM 评分——更准确但有延迟。

## Forgetting 策略

三种策略（FIFO / LRU / 重要性加权）在当前测试中效果相同——因为没有差异化的 access pattern。

实际系统中应该用 **importance + access count + recency** 的加权组合：
```
forget_score = age_days * 0.3 + (max_access - access_count) * 0.5 + (1 - importance) * 0.2
```
低分优先遗忘。

## 整合到 hippocampus.py 的建议

1. **Store 时**：importance scoring（heuristic 或 LLM），低于阈值不存
2. **每晚**：deduplication（cos > 0.7 合并）+ capacity check（超限时按 forget_score 裁剪）
3. **Recall 时**：自动 +1 access_count（已实现）