feat: step isolation — each step runs in independent sub-loop

Main loop becomes a coordinator that reviews step summaries and may revise the plan. Each step gets its own chat history and scratchpad, preventing context pollution across steps. - Add run_step_loop with 50-iteration limit and isolated context - Replace advance_step with step_done (sub-loop only) - Add coordinator review after each step completion - Add scratchpad 8K capacity check - Add 33 unit tests for state, tools, and message building
2026-03-08 08:35:41 +00:00 · 2026-03-08 08:35:41 +00:00 · feb2a08d97
commit feb2a08d97
parent 47546a9d15
4 changed files with 1159 additions and 225 deletions
--- a/src/agent.rs
+++ b/src/agent.rs
--- a/src/prompts/execution.md
+++ b/src/prompts/execution.md
@ -1,25 +1,27 @@
-你是一个 AI 智能体，正处于【执行阶段】。请专注完成当前步骤的任务。
+你是一个 AI 智能体的协调者，正处于【执行阶段】。每个步骤由独立的子执行器完成，你负责审视结果并协调整体进度。

-可用工具：
- execute：执行 shell 命令
- read_file / write_file / list_files：文件操作
- start_service / stop_service：管理后台服务
- update_requirement：更新项目需求
- advance_step：完成当前步骤并进入下一步（必须提供摘要）
- update_scratchpad：保存跨步骤持久化的关键信息
+## 你的角色

-工作流程：
-1. 阅读下方的「当前步骤」描述
-2. 使用工具执行所需操作
-3. 完成后调用 advance_step(summary=...) 推进到下一步
-4. 最后一步完成后，直接回复简要总结（不调用工具）即可结束
+- 审视每个步骤的执行摘要
+- 根据执行结果决定：继续下一步、修改后续计划、或终止执行
+- 维护全局备忘录，记录跨步骤的关键信息
+
+## 可用工具
+
+- update_plan：修改执行计划（提供完整步骤列表，系统自动 diff）
+- update_scratchpad：更新全局备忘录（跨步骤持久化的关键信息）
+- update_requirement：更新项目需求描述
+
+## 工作流程
+
+当你收到步骤执行摘要时：
+1. 审视摘要，判断步骤是否成功完成了预期目标
+2. 如需调整后续计划，使用 update_plan
+3. 如无需调整，回复确认继续（不调用工具即可）

 环境信息：
- 工作目录是独立的项目工作区，Python venv 已预先激活（.venv/）
- 使用 `uv add <包名>` 或 `pip install <包名>` 安装依赖
+- 工作目录是独立的项目工作区
 - 静态文件访问：/api/projects/{project_id}/files/{filename}
- 后台服务访问：/api/projects/{project_id}/app/（启动命令需监听 0.0.0.0:$PORT）
- 【重要】应用通过反向代理访问，前端 HTML/JS 中的 fetch/XHR 请求必须使用相对路径（如 fetch('todos')），绝对不能用 / 开头的路径（如 fetch('/todos')），否则会 404
- 知识库工具：kb_search(query) 搜索相关片段，kb_read() 读取全文
+- 后台服务访问：/api/projects/{project_id}/app/

 请使用中文回复。
--- a/src/prompts/step_execution.md
+++ b/src/prompts/step_execution.md
@ -0,0 +1,34 @@
+你是一个步骤执行者，负责完成当前分配给你的步骤。
+
+## 可用工具
+
+- execute：执行 shell 命令
+- read_file / write_file / list_files：文件操作
+- start_service / stop_service：管理后台服务
+- kb_search / kb_read：搜索和读取知识库
+- update_scratchpad：记录本步骤内的中间状态（步骤结束后丢弃，精华写进 summary）
+- wait_for_approval：暂停执行等待用户确认
+- step_done：**完成当前步骤时必须调用**，提供本步骤的工作摘要
+
+## 工作流程
+
+1. 阅读当前步骤的描述和上下文
+2. 使用工具执行所需操作
+3. 完成后调用 step_done(summary=...) 汇报结果
+
+## 规则
+
+- **专注当前步骤**，不做超出范围的事
+- 完成后**必须**调用 step_done(summary)，summary 应简洁概括本步骤做了什么、结果如何
+- 需要用户确认时使用 wait_for_approval(reason)
+- update_scratchpad 用于记录本步骤内的中间状态，是工作记忆而非日志，只保留当前有用的信息
+
+## 环境信息
+
+- 工作目录是独立的项目工作区，Python venv 已预先激活（.venv/）
+- 使用 `uv add <包名>` 或 `pip install <包名>` 安装依赖
+- 静态文件访问：/api/projects/{project_id}/files/{filename}
+- 后台服务访问：/api/projects/{project_id}/app/（启动命令需监听 0.0.0.0:$PORT）
+- 【重要】应用通过反向代理访问，前端 HTML/JS 中的 fetch/XHR 请求必须使用相对路径（如 fetch('todos')），绝对不能用 / 开头的路径（如 fetch('/todos')），否则会 404
+
+请使用中文回复。
--- a/src/state.rs
+++ b/src/state.rs
@ -2,6 +2,36 @@ use serde::{Deserialize, Serialize};

 use crate::llm::ChatMessage;

+// --- Step result (returned by run_step_loop) ---
+
+#[derive(Debug, Clone)]
+pub struct StepResult {
+    pub status: StepResultStatus,
+    pub summary: String,
+}
+
+#[derive(Debug, Clone)]
+pub enum StepResultStatus {
+    Done,
+    Failed { error: String },
+    NeedsApproval { message: String },
+}
+
+/// Check scratchpad size. Limit: ~8K tokens ≈ 24K bytes.
+const SCRATCHPAD_MAX_BYTES: usize = 24_000;
+
+pub fn check_scratchpad_size(content: &str) -> Result<(), String> {
+    if content.len() > SCRATCHPAD_MAX_BYTES {
+        Err(format!(
+            "Scratchpad 超出容量限制（当前 {} 字节，上限 {} 字节）。请精简内容后重试。",
+            content.len(),
+            SCRATCHPAD_MAX_BYTES,
+        ))
+    } else {
+        Ok(())
+    }
+}
+
 // --- Agent phase state machine ---

 #[derive(Debug, Clone, Serialize, Deserialize)]
@ -205,3 +235,312 @@ impl AgentState {
        msgs
    }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    fn make_step(order: i32, title: &str, desc: &str, status: StepStatus) -> Step {
+        Step {
+            order,
+            title: title.into(),
+            description: desc.into(),
+            status,
+            summary: None,
+            user_feedbacks: Vec::new(),
+            db_id: String::new(),
+        }
+    }
+
+    // --- check_scratchpad_size ---
+
+    #[test]
+    fn scratchpad_empty_ok() {
+        assert!(check_scratchpad_size("").is_ok());
+    }
+
+    #[test]
+    fn scratchpad_under_limit_ok() {
+        let content = "a".repeat(24_000);
+        assert!(check_scratchpad_size(&content).is_ok());
+    }
+
+    #[test]
+    fn scratchpad_over_limit_err() {
+        let content = "a".repeat(24_001);
+        let err = check_scratchpad_size(&content).unwrap_err();
+        assert!(err.contains("24001"));
+        assert!(err.contains("24000"));
+    }
+
+    #[test]
+    fn scratchpad_exactly_at_limit() {
+        let content = "a".repeat(SCRATCHPAD_MAX_BYTES);
+        assert!(check_scratchpad_size(&content).is_ok());
+    }
+
+    #[test]
+    fn scratchpad_multibyte_counts_bytes_not_chars() {
+        // 8000 个中文字 = 24000 bytes (UTF-8), exactly at limit
+        let content = "你".repeat(8000);
+        assert_eq!(content.len(), 24000);
+        assert!(check_scratchpad_size(&content).is_ok());
+
+        // One more char pushes over
+        let content_over = format!("{}你", content);
+        assert!(check_scratchpad_size(&content_over).is_err());
+    }
+
+    // --- first_actionable_step ---
+
+    #[test]
+    fn first_actionable_all_done() {
+        let state = AgentState {
+            phase: AgentPhase::Executing { step: 1 },
+            steps: vec![
+                make_step(1, "A", "a", StepStatus::Done),
+                make_step(2, "B", "b", StepStatus::Done),
+            ],
+            current_step_chat_history: Vec::new(),
+            scratchpad: String::new(),
+        };
+        assert_eq!(state.first_actionable_step(), None);
+    }
+
+    #[test]
+    fn first_actionable_skips_done() {
+        let state = AgentState {
+            phase: AgentPhase::Executing { step: 2 },
+            steps: vec![
+                make_step(1, "A", "a", StepStatus::Done),
+                make_step(2, "B", "b", StepStatus::Pending),
+                make_step(3, "C", "c", StepStatus::Pending),
+            ],
+            current_step_chat_history: Vec::new(),
+            scratchpad: String::new(),
+        };
+        assert_eq!(state.first_actionable_step(), Some(2));
+    }
+
+    #[test]
+    fn first_actionable_finds_running() {
+        let state = AgentState {
+            phase: AgentPhase::Executing { step: 2 },
+            steps: vec![
+                make_step(1, "A", "a", StepStatus::Done),
+                make_step(2, "B", "b", StepStatus::Running),
+            ],
+            current_step_chat_history: Vec::new(),
+            scratchpad: String::new(),
+        };
+        assert_eq!(state.first_actionable_step(), Some(2));
+    }
+
+    #[test]
+    fn first_actionable_finds_waiting_approval() {
+        let state = AgentState {
+            phase: AgentPhase::Executing { step: 1 },
+            steps: vec![
+                make_step(1, "A", "a", StepStatus::WaitingApproval),
+                make_step(2, "B", "b", StepStatus::Pending),
+            ],
+            current_step_chat_history: Vec::new(),
+            scratchpad: String::new(),
+        };
+        assert_eq!(state.first_actionable_step(), Some(1));
+    }
+
+    #[test]
+    fn first_actionable_skips_failed() {
+        let state = AgentState {
+            phase: AgentPhase::Executing { step: 2 },
+            steps: vec![
+                make_step(1, "A", "a", StepStatus::Failed),
+                make_step(2, "B", "b", StepStatus::Pending),
+            ],
+            current_step_chat_history: Vec::new(),
+            scratchpad: String::new(),
+        };
+        assert_eq!(state.first_actionable_step(), Some(2));
+    }
+
+    // --- apply_plan_diff ---
+
+    #[test]
+    fn plan_diff_identical_keeps_done() {
+        let mut state = AgentState::new();
+        state.steps = vec![
+            Step { status: StepStatus::Done, summary: Some("did A".into()),
+                ..make_step(1, "A", "desc A", StepStatus::Done) },
+            make_step(2, "B", "desc B", StepStatus::Pending),
+        ];
+
+        let new_steps = vec![
+            make_step(1, "A", "desc A", StepStatus::Pending),
+            make_step(2, "B", "desc B", StepStatus::Pending),
+        ];
+        state.apply_plan_diff(new_steps);
+
+        assert!(matches!(state.steps[0].status, StepStatus::Done));
+        assert_eq!(state.steps[0].summary.as_deref(), Some("did A"));
+        assert!(matches!(state.steps[1].status, StepStatus::Pending));
+    }
+
+    #[test]
+    fn plan_diff_change_invalidates_from_mismatch() {
+        let mut state = AgentState::new();
+        state.steps = vec![
+            Step { status: StepStatus::Done, summary: Some("did A".into()),
+                ..make_step(1, "A", "desc A", StepStatus::Done) },
+            Step { status: StepStatus::Done, summary: Some("did B".into()),
+                ..make_step(2, "B", "desc B", StepStatus::Done) },
+            make_step(3, "C", "desc C", StepStatus::Pending),
+        ];
+
+        // Change step 2's description → invalidates 2 and 3
+        let new_steps = vec![
+            make_step(1, "A", "desc A", StepStatus::Pending),
+            make_step(2, "B", "desc B CHANGED", StepStatus::Pending),
+            make_step(3, "C", "desc C", StepStatus::Pending),
+        ];
+        state.apply_plan_diff(new_steps);
+
+        assert!(matches!(state.steps[0].status, StepStatus::Done)); // kept
+        assert!(matches!(state.steps[1].status, StepStatus::Pending)); // invalidated
+        assert!(state.steps[1].summary.is_none()); // summary cleared
+        assert!(matches!(state.steps[2].status, StepStatus::Pending)); // invalidated
+    }
+
+    #[test]
+    fn plan_diff_add_new_steps() {
+        let mut state = AgentState::new();
+        state.steps = vec![
+            Step { status: StepStatus::Done, summary: Some("did A".into()),
+                ..make_step(1, "A", "desc A", StepStatus::Done) },
+        ];
+
+        let new_steps = vec![
+            make_step(1, "A", "desc A", StepStatus::Pending),
+            make_step(2, "New", "new step", StepStatus::Pending),
+        ];
+        state.apply_plan_diff(new_steps);
+
+        assert_eq!(state.steps.len(), 2);
+        assert!(matches!(state.steps[0].status, StepStatus::Done));
+        assert!(matches!(state.steps[1].status, StepStatus::Pending));
+        assert_eq!(state.steps[1].title, "New");
+    }
+
+    #[test]
+    fn plan_diff_remove_steps() {
+        let mut state = AgentState::new();
+        state.steps = vec![
+            Step { status: StepStatus::Done, summary: Some("did A".into()),
+                ..make_step(1, "A", "desc A", StepStatus::Done) },
+            make_step(2, "B", "desc B", StepStatus::Pending),
+            make_step(3, "C", "desc C", StepStatus::Pending),
+        ];
+
+        // New plan only has 1 step (same as step 1)
+        let new_steps = vec![
+            make_step(1, "A", "desc A", StepStatus::Pending),
+        ];
+        state.apply_plan_diff(new_steps);
+
+        assert_eq!(state.steps.len(), 1);
+        assert!(matches!(state.steps[0].status, StepStatus::Done));
+    }
+
+    // --- build_step_context ---
+
+    #[test]
+    fn step_context_includes_all_sections() {
+        let state = AgentState {
+            phase: AgentPhase::Executing { step: 2 },
+            steps: vec![
+                Step { status: StepStatus::Done, summary: Some("installed deps".into()),
+                    ..make_step(1, "Setup", "install deps", StepStatus::Done) },
+                make_step(2, "Build", "compile code", StepStatus::Running),
+                make_step(3, "Test", "run tests", StepStatus::Pending),
+            ],
+            current_step_chat_history: Vec::new(),
+            scratchpad: "key=value".into(),
+        };
+
+        let ctx = state.build_step_context("Build a web app");
+
+        assert!(ctx.contains("## 需求\nBuild a web app"));
+        assert!(ctx.contains("## 计划概览"));
+        assert!(ctx.contains("1. Setup  done"));
+        assert!(ctx.contains("2. Build  >> current"));
+        assert!(ctx.contains("3. Test"));
+        assert!(ctx.contains("## 当前步骤（步骤 2）"));
+        assert!(ctx.contains("标题：Build"));
+        assert!(ctx.contains("描述：compile code"));
+        assert!(ctx.contains("## 已完成步骤摘要"));
+        assert!(ctx.contains("installed deps"));
+        assert!(ctx.contains("## 备忘录\nkey=value"));
+    }
+
+    #[test]
+    fn step_context_user_feedback() {
+        let state = AgentState {
+            phase: AgentPhase::Executing { step: 1 },
+            steps: vec![
+                Step {
+                    user_feedbacks: vec!["please use React".into()],
+                    ..make_step(1, "Setup", "setup project", StepStatus::Running)
+                },
+            ],
+            current_step_chat_history: Vec::new(),
+            scratchpad: String::new(),
+        };
+
+        let ctx = state.build_step_context("Build app");
+        assert!(ctx.contains("用户反馈"));
+        assert!(ctx.contains("please use React"));
+    }
+
+    // --- build_messages ---
+
+    #[test]
+    fn build_messages_planning() {
+        let state = AgentState::new();
+        let msgs = state.build_messages("system prompt", "requirement text");
+
+        assert_eq!(msgs.len(), 2);
+        assert_eq!(msgs[0].role, "system");
+        assert_eq!(msgs[0].content.as_deref(), Some("system prompt"));
+        assert_eq!(msgs[1].role, "user");
+        assert_eq!(msgs[1].content.as_deref(), Some("requirement text"));
+    }
+
+    #[test]
+    fn build_messages_executing_includes_history() {
+        let state = AgentState {
+            phase: AgentPhase::Executing { step: 1 },
+            steps: vec![make_step(1, "Do thing", "details", StepStatus::Running)],
+            current_step_chat_history: vec![
+                ChatMessage { role: "assistant".into(), content: Some("let me help".into()), tool_calls: None, tool_call_id: None },
+            ],
+            scratchpad: String::new(),
+        };
+
+        let msgs = state.build_messages("sys", "req");
+        assert_eq!(msgs.len(), 3); // system + user context + 1 history
+        assert_eq!(msgs[2].role, "assistant");
+    }
+
+    #[test]
+    fn build_messages_completed_minimal() {
+        let state = AgentState {
+            phase: AgentPhase::Completed,
+            steps: Vec::new(),
+            current_step_chat_history: Vec::new(),
+            scratchpad: String::new(),
+        };
+
+        let msgs = state.build_messages("sys", "req");
+        assert_eq!(msgs.len(), 1); // only system
+    }
+}