# AgentRuntime Review Log

This file records AgentRuntime review findings and remediation status so a new
session can continue without reconstructing the same context.

## Target Contract

The current target is a Codex / GitHub Copilot style runtime:

- Backend run is the complete durable task. Frontend SSE is only an observation
  channel and must not control task lifecycle.
- Browser refresh, page switch, SSE disconnect, or proxy stream break must not
  cancel or advance a run.
- User cancellation must call an explicit backend cancel API. Local SSE close is
  not cancellation.
- Thread/run state must be reconnectable and recoverable from backend state.
  Frontend reload must query the true backend state, then replay/subscribe or
  display the persisted terminal state.
- The frontend must not duplicate execution when a thread already has a running
  or pending run.
- Pending requests are backend facts persisted in `AI_PENDING_REQUEST`.
- `maxTurnsPerRun` must suspend the run with a `confirm_action` pending request,
  not hard-fail or silently terminate. User confirmation continues from the
  current thread transcript with a fresh turn budget; user rejection terminates
  the run while preserving transcript.
- `/runs/stream` must support reconnect by `runId`, replay missed events from
  `AI_RUN_STREAM_EVENT`, then continue live delivery through the in-process mux.

## Entry Format

Use this format for each review or remediation pass:

- Date:
- Scope:
- Code snapshot:
- Result:
- Findings:
- Remediation applied:
- Verification:
- Remaining risk:

## 2026-05-15 LLM Stream Output Performance Remediation

Date: 2026-05-15

Scope:

- `src/com/gzzm/lobster/runtime/AgentRuntime.java`
- `src/com/gzzm/lobster/runtime/MultiplexStreamEmitter.java`
- `src/com/gzzm/lobster/runtime/StreamEmitter.java`
- `src/com/gzzm/lobster/api/SseStreamEmitter.java`
- `zm-ai-frontend/src/lib/lobster/lobsterSse.js`

Code snapshot: current workspace at review time. Static review only; compile/build
validation was intentionally not run.

Result:

The LLM token-consumption path was changed so slow observation channels and
stream-event persistence no longer apply per-token backpressure to the provider
stream reader. Runtime output remains reconnectable: active runs replay from a
combination of durable events, unflushed in-memory events, and the recent live
event window.

### Problem

Observed behavior showed a large latency gap between direct DeepSeek streaming
and Lobster conversation output on server 13. Direct network tests did not
reproduce the full slowdown, which pointed to Lobster's output-consumption path
rather than the provider endpoint.

The old chain had two per-token backpressure points:

- `SseStreamEmitter.emit()` wrote and flushed each SSE frame synchronously.
  A slow browser, proxy, or servlet response could block the run worker that was
  consuming the LLM stream.
- `RecordingStreamEmitter.emit()` persisted every stream event before forwarding
  it to observers. Per-token DB I/O therefore sat on the hot path between
  provider stream read and the next token read.

### Remediation Applied

- `MultiplexStreamEmitter` now gives each observer a bounded per-client queue
  and a single writer pump. The run worker only enqueues events; network writes
  happen outside the LLM consumption thread.
- Consecutive text-like events (`assistant_text`, `assistant_thinking`,
  `write_file_content_delta`) are merged before and during queue consumption.
  This reduces servlet flush count and avoids false queue overflow during long
  output or replay bursts.
- Slow observer overflow closes only that observation stream with
  `closeObservation()`. It does not cancel or slow the backend run. The frontend
  reconnects from its last acknowledged `eventSeq`.
- `SseStreamEmitter` is idempotent after completion. `emit()` and `heartbeat()`
  no-op/return false after close, and `closeObservation()` completes the async
  context without flushing.
- `RecordingStreamEmitter` no longer writes every output event to DB while the
  model is producing. Events are assigned `eventSeq`, cached in the active run,
  and streamed to observers. The buffer is flushed to `AI_RUN_STREAM_EVENT` only
  at status/final boundaries:
  - `run_started`
  - `pending_request`
  - `error`
  - `run_ended`
  - cancel/final cleanup paths
- Per-call `LlmCompleted` no longer forces a synchronous DB flush. A multi-turn
  run can proceed directly from one model response to tool execution or the next
  model call; the active in-memory buffer remains available for reconnect while
  the run is live.
- Status boundary events are flushed to durable storage before they are made
  visible over SSE. This keeps observers from seeing terminal or pending states
  that are not replayable.
- Active-run reconnect replay merges:
  - durable events from `AI_RUN_STREAM_EVENT`
  - buffered events captured before the DB replay query
  - recent events captured before the DB replay query
  - buffered events captured again after the DB replay query
  - recent events captured again after the DB replay query

  The double snapshot avoids a race where events move from memory to DB while a
  reconnect is preparing its replay set.
- The frontend SSE parser now flushes `TextDecoder` at stream end so a final
  multibyte UTF-8 character is not lost when a stream closes on a chunk boundary.
- `run_ended` is treated as the terminal observation boundary. Persisted-event
  replay stops as soon as it is sent, and the frontend cancels the reader after
  dispatching it instead of waiting for trailing heartbeat/drain intervals.

### Behavior Contract

- Frontend SSE is an observation channel. Browser disconnects, refreshes, slow
  clients, and proxy breaks must not affect backend run lifecycle.
- The provider stream-reader thread must not do network writes or per-token DB
  writes.
- Output events may live only in active-run memory while an LLM call is still
  producing. Running reconnects must therefore consult active memory in addition
  to DB.
- Do not merge persisted stream events across missing `eventSeq` ranges.
  Merging persisted events can duplicate text for clients whose last acknowledged
  cursor falls inside the merged range.
- Terminal/status events must be durable before being emitted to observers.
- Queue overflow is an observer failure, not a run failure. Close that
  observation stream and let the client reconnect by cursor.

### Verification

- `git diff --check` for changed server files: pass.
- `git diff --check` for changed frontend SSE file: pass.
- Target-file trailing whitespace check: pass.
- Added a focused `MultiplexStreamEmitterTest` covering text batching without
  dropping the following terminal event and preserving the maximum `eventSeq`.

Remaining risk:

- Full Maven compile/test validation was intentionally skipped for this Lobster
  workspace unless explicitly requested.
- The current DAO layer exposes single-row `save` rather than a batch insert API,
  so boundary flush still writes records one by one. This is outside the token
  hot path, but a future DAO batch method would reduce boundary latency.

## 2026-05-13 Review

Date: 2026-05-13

Scope:

- `zm-ai-server/docs/RUNTIME-V2-PLAN.md`
- `zm-ai-server/src/com/gzzm/lobster/runtime/AgentRuntime.java`
- `zm-ai-server/src/com/gzzm/lobster/api/RunApi.java`
- `zm-ai-server/src/com/gzzm/lobster/api/RunStreamServlet.java`
- `zm-ai-server/src/com/gzzm/lobster/api/PendingRequestApi.java`
- `zm-ai-frontend/src/stores/lobsterStore.js`
- `zm-ai-frontend/src/components/lobster/ChatComposer.vue`

Code snapshot: current workspace at review time. Static review only; compile/build
validation was intentionally not run.

Result:

The backend runtime is aligned with the V0.3 durable-observer direction: run
creation is detached from SSE, stream replay is persisted by runId, explicit
cancel is required, pending is represented by backend state, and max-turn
continuation suspends through `confirm_action`.

### AR-2026-05-13-01 [P1] Frontend send flow could silently swallow backend-state blockers

Evidence:

- `lobsterStore.sendMessage` returned silently when `currentRun.streaming` or
  open pending requests blocked a new send.
- `lobsterStore.sendMessage` also handled start failure locally without
  rethrowing to callers.
- `ChatComposer.onSend` cleared the input before `sendMessage` had confirmed
  run creation.

Impact:

Callers could treat a blocked or failed send as successful. In the composer this
could lose user input; in secondary callers such as document-to-OA actions it
could show a submitted state even though no backend run was created. That lets
local UI state obscure backend truth, which conflicts with the V0.3 contract
that the frontend must only observe backend run state and issue explicit user
actions.

Remediation applied:

- `sendMessage` now throws when a running run or unresolved pending blocks a new
  send.
- `sendMessage` now rethrows run-start failures after rolling back local
  optimistic state.
- `ChatComposer` now clears text and pending attachments only after
  `sendMessage` succeeds.

Verification:

- `git diff --check` in `zm-ai-server`: pass.
- `git diff --check` in `zm-ai-frontend`: pass.
- `node --input-type=module --check` for `src/stores/lobsterStore.js`: pass.
- `@vue/compiler-sfc` parse for `src/components/lobster/ChatComposer.vue`: pass.

Remaining risk:

- Full compile/build validation was intentionally skipped for this Lobster
  workspace unless explicitly requested.

### AR-2026-05-13-02 [P1] Reused backend run was still treated as a successful new submit by callers

Evidence:

- `lobsterStore.sendMessage` returned only `{ runId, reused }`, and some early
  reused-run branches returned no structured result.
- `ChatComposer.onSend` cleared local input whenever `sendMessage` resolved,
  even when the backend had rejected duplicate execution by returning an
  existing active run.
- `DocWorkshopDrawer.onSaveToOa` showed a submitted OA回写 success state and
  closed the drawer after `sendMessage` resolved, even if no new user request
  was accepted because the backend restored an existing active run.

Impact:

The backend correctly prevents duplicate execution, but the frontend could still
erase user input or close the OA回写 UI while only reconnecting to an existing
run. That violates the V0.3 rule that local UI state must not decide backend
task progress and must not imply that a duplicate user action was submitted.

Remediation applied:

- `sendMessage` now returns explicit `{ runId, reused, submitted }` metadata for
  reused-run and normal-start outcomes.
- `ChatComposer` preserves text and attachments when `submitted === false`, and
  tells the user the existing background run was restored.
- `DocWorkshopDrawer` keeps the drawer open and avoids the success flow when
  `submitted === false`.

Verification:

- `git diff --check` in `zm-ai-server`: pass.
- `git diff --check` in `zm-ai-frontend`: pass.
- `node --input-type=module --check` for `src/stores/lobsterStore.js`: pass.
- `@vue/compiler-sfc` parse for `src/components/lobster/ChatComposer.vue` and
  `src/components/lobster/DocWorkshopDrawer.vue`: pass.

Remaining risk:

- Full compile/build validation was intentionally skipped for this Lobster
  workspace unless explicitly requested.

### AR-2026-05-13-03 [P3] Frontend stream-event constants were not byte-aligned with backend enum

Evidence:

- Backend `StreamEventType` includes `write_file_content_delta` and
  `thread_renamed`.
- Frontend `lobsterTypes.js` says the shared constants are byte-aligned with the
  backend enum, but `StreamEventType` missed those two event names.
- `lobsterSse.js` still handled both events via string literals, so current
  runtime behavior was not broken.

Impact:

This is a low-risk contract drift. Future frontend logic that imports
`StreamEventType` instead of using raw strings could miss persisted/replayed
write-file preview events or thread rename events.

Remediation applied:

- Added `write_file_content_delta` and `thread_renamed` to frontend
  `StreamEventType`.
- Switched `lobsterSse.js` dispatch from raw event-name strings to the shared
  `StreamEventType` constants.

Verification:

- `git diff --check` in `zm-ai-server`: pass.
- `git diff --check` in `zm-ai-frontend`: pass.
- `node --input-type=module --check` for `src/lib/lobster/lobsterTypes.js`: pass.
- `node --input-type=module --check` for `src/lib/lobster/lobsterSse.js`: pass.

Remaining risk:

- Full compile/build validation was intentionally skipped for this Lobster
  workspace unless explicitly requested.

### AR-2026-05-13-04 [P3] SSE catch-all event type could be overwritten by payload type

Evidence:

- `lobsterSse.js` built the catch-all `onEvent` payload as
  `{ type, ...payload }`.
- Some backend payloads also have a `type` field, notably
  `pending_request.type` for `confirm_action` / `ask_user`.

Impact:

The typed event handlers still received the correct payload, and the current
catch-all consumer only records `eventSeq`, so runtime behavior was not broken.
But generic stream observers could see `evt.type === "confirm_action"` instead
of `evt.type === "pending_request"`, which weakens the replay/stream event
contract.

Remediation applied:

- `onEvent` now always exposes the SSE event name as `type` and `eventType`.
- When the payload has its own `type`, it is preserved as `payloadType`.
- Updated the SSE client JSDoc to describe the current observe-by-runId API.

Verification:

- `git diff --check` in `zm-ai-server`: pass.
- `git diff --check` in `zm-ai-frontend`: pass.
- `node --input-type=module --check` for `src/lib/lobster/lobsterSse.js`: pass.

Remaining risk:

- Full compile/build validation was intentionally skipped for this Lobster
  workspace unless explicitly requested.

## 2026-05-12 Review

Date: 2026-05-12

Scope:

- `zm-ai-server/src/com/gzzm/lobster/runtime/AgentRuntime.java`
- `zm-ai-server/src/com/gzzm/lobster/api/RunApi.java`
- `zm-ai-server/src/com/gzzm/lobster/api/PendingRequestApi.java`
- `zm-ai-server/src/com/gzzm/lobster/runtime/tools/ToolExecutorDispatcher.java`
- `zm-ai-server/src/com/gzzm/lobster/db/ToolExecutionRecordDao.java`
- `zm-ai-server/src/com/gzzm/lobster/runtime/ContextAssembler.java`
- `zm-ai-frontend/src/stores/lobsterStore.js`

Code snapshot: current workspace at review time. Static review only; compile/build
validation was intentionally not run.

Result:

The implementation is directionally aligned with the target contract, especially
around persisted run events and stream replay. It is not yet architecturally
complete enough for the required Codex-like semantics. The remaining gaps are
runtime-model gaps, not cosmetic frontend issues, so they should be fixed by
refactoring the run/pending/recovery model rather than by adding local UI
workarounds.

### AR-2026-05-12-01 [P0] Ordinary pending is not modeled as backend waiting state

Evidence:

- `AgentRuntime.java:998` detects pending and exits the current loop with
  `exitReason = "pending"`.
- `AgentRuntime.java:1022` persists the run as a terminal state and clears the
  active run.
- `RunApi.java:48` only checks `activeRunForThread` before starting a new run.
- `PendingRequestApi.java:68` resolves a pending request before starting the
  follow-up run.
- `AgentRuntime.java:273` returns an existing active run only when the thread has
  an active run.

Impact:

After an ordinary blocking pending is created, the thread no longer has an active
run guard. A page refresh or second send can create a new run while unresolved
pending still exists. This violates:

- pending is a backend fact;
- no duplicate execution;
- frontend must restore pending instead of deciding lifecycle locally.

Required remediation:

- Represent blocking pending as a first-class backend waiting state for the
  thread/run, or make `POST /runs` reject any unresolved blocking pending for the
  thread.
- Resolve pending and create the continuation run atomically. If continuation run
  creation fails or conflicts, do not leave the pending permanently resolved.
- Thread resume API must surface unresolved pending before allowing a new user
  input run.

Remediation status: Not fixed in this pass.

Retest checklist:

- Create a blocking pending, refresh the page, and verify the frontend restores
  the pending from backend state.
- While the pending is unresolved, send another user message and verify backend
  rejects or attaches to pending flow instead of creating an unrelated duplicate
  run.
- Resolve the pending and verify only one continuation run is created.

### AR-2026-05-12-02 [P0] Durable recovery replays the old request instead of resuming the task journal

Evidence:

- `AgentRuntime.java:1694` restores the previous `RunRequest` and submits the run
  worker again.
- `ToolExecutorDispatcher.java:149` creates `ToolExecutionRecord`.
- `ToolExecutionRecordDao.java:15` has list/finish style persistence, but there
  is no recovery path that uses tool execution records for idempotent resume.
- `ContextAssembler.java:424` can inject placeholder tool results for orphan tool
  calls.

Impact:

Process restart can re-enter the runtime from an old request, but the backend
does not have a durable step-level state machine. In-flight model/tool work may
be duplicated, orphaned, or converted into placeholder transcript data instead
of being resumed from authoritative persisted state. This does not satisfy
"complete, durable, recoverable task execution".

Required remediation:

- Introduce a persistent run journal/state model with at least run, turn, model
  response, tool call, tool result, pending, and terminal events.
- Make tool execution idempotent by stable tool-call/run-step identity.
- On recovery, resume from the latest durable step state instead of blindly
  re-submitting the original request.
- Preserve the existing SSE projection as a view of the durable journal, not as
  the source of execution truth.

Remediation status: Not fixed in this pass.

Retest checklist:

- Kill/restart the backend during a tool call and verify the tool call is not
  duplicated unless the tool is explicitly marked retry-safe.
- Kill/restart during a model turn and verify the run resumes or reaches a clear
  recoverable terminal/waiting state.
- Verify stream replay after recovery matches persisted run state.

### AR-2026-05-12-03 [P1] Recovered same-run execution resets the turn budget

Evidence:

- `AgentRuntime.java:609` reads `initialTurns`.
- `AgentRuntime.java:752` initializes local counters with `turn = 0` and
  `totalTurns = initialTurns`.
- `AgentRuntime.java:769` loops while `turn < maxTurns`.

Impact:

A recovered same run may receive a fresh `maxTurnsPerRun` budget without user
confirmation. This blurs two different concepts:

- process recovery of the same run, which should preserve remaining budget;
- user-confirmed continuation after max-turn suspension, which should reset a new
  budget.

Required remediation:

- Persist consumed/remaining turn budget for the current run.
- Same-run recovery must resume with the persisted remaining budget.
- Only explicit max-turn continuation should create a new continuation segment or
  run with a fresh budget.

Remediation status: Not fixed in this pass.

Retest checklist:

- Start a run near the turn limit, restart the backend, and verify it does not
  silently receive a full fresh budget.
- Hit `maxTurnsPerRun`, verify a pending confirmation is created.
- Confirm continuation, verify the new continuation has a fresh budget.

### AR-2026-05-12-04 [P1] Fresh frontend replay can duplicate persisted transcript projection

Evidence:

- `lobsterStore.js:238` loads thread messages into local state during
  `openThread`.
- `RunApi.java:192` creates a stream with `replayAfterSeq = 0` by default.
- `lobsterStore.js:769` reconnects using a local cursor or the supplied replay
  cursor; after a fresh page load the cursor can be absent, causing replay from
  the beginning.

Impact:

After page refresh, the frontend can have persisted messages from the thread API
and then receive old stream events again from sequence zero. If the frontend
applies both projections into the same UI state, messages/tool output can appear
duplicated or out of order.

Required remediation:

- Pick one projection source for restored UI state:
  - either rebuild live run UI from replayed stream events and merge with
    transcript by stable IDs; or
  - have the backend return a replay cursor that starts after transcript-covered
    events.
- All streamed UI updates must be keyed by stable message/tool/run-step IDs.

Remediation status: Not fixed in this pass.

Retest checklist:

- Start a run, refresh the browser, and verify transcript/tool results are not
  duplicated.
- Reconnect after missed stream events and verify only missed events are replayed.
- Re-enter a completed thread and verify final messages, tool results, resources,
  and pending state are restored from backend facts.

## Remediation Tracking

| ID | Priority | Status | Fix summary | Verification |
| --- | --- | --- | --- | --- |
| AR-2026-05-12-01 | P0 | Fixed | Ordinary pending now leaves the source run in `waiting_user`; new user-input runs are guarded by active/waiting run and unresolved-pending checks; pending resolve starts a continuation that replaces the waiting source run. | Static checks passed |
| AR-2026-05-12-02 | P0 | Partial | Added tool-execution idempotent replay/unknown-side-effect protection and preserved same-run recovery budget. A full run-step journal remains the long-term architecture target. | Static checks passed |
| AR-2026-05-12-03 | P1 | Fixed | Same-run recovery now uses remaining turn budget; explicit pending continuation gets a fresh budget. | Static checks passed |
| AR-2026-05-12-04 | P1 | Fixed | Terminal/waiting run maps now return `replayAfterSeq=lastEventSeq`; frontend reused-run attach uses backend replay cursor instead of hard-coded zero. | Static checks passed |

## Next Session Checklist

1. Read this file first.
2. Re-check each evidence location against current source text before assuming
   the issue still exists.
3. If a fix has been applied, add a new dated remediation entry below and update
   the tracking table.
4. Validate with static inspection and targeted checks by default. Do not run
   compile/build in this workspace unless explicitly requested.

## Remediation Entries

Add future entries below this line.

### 2026-05-12 Remediation

Date: 2026-05-12

Scope:

- `zm-ai-server/src/com/gzzm/lobster/runtime/AgentRuntime.java`
- `zm-ai-server/src/com/gzzm/lobster/runtime/RunRequest.java`
- `zm-ai-server/src/com/gzzm/lobster/api/RunApi.java`
- `zm-ai-server/src/com/gzzm/lobster/api/PendingRequestApi.java`
- `zm-ai-server/src/com/gzzm/lobster/tool/ToolExecutorDispatcher.java`
- `zm-ai-frontend/src/stores/lobsterStore.js`

Remediation applied:

- Ordinary tool-created pending no longer closes as a normal terminal run. It
  persists the source run as `RunStatus.waiting_user`, keeps the thread
  `activeRunId`, emits `run_ended: pending` only as stream closure, and does not
  clear the active run claim until user action resolves or cancels it.
- `POST /runs` now refuses/reuses backend state when a thread has an active,
  waiting, or unresolved-pending run. Frontend duplicate sends cannot create a
  second backend run for the same pending user input.
- Pending resolution now creates a backend continuation with
  `asResumeContinuation(sourceRunId, appendUserInput)`. Ordinary pending
  responses still append the user's answer to transcript; max-turn continuation
  resumes without adding a duplicate user message and gets a fresh turn budget.
- Pending resolve failure now reopens the pending request for all pending types,
  not only max-turn confirmation.
- Same-run durable recovery distinguishes recovery from user-confirmed
  continuation. Recovery consumes only the remaining turn budget; continuation
  from a source run resets the new run budget.
- Tool execution records now protect recovery from duplicate side effects:
  completed tool calls are replayed by `(runId, toolCallId)`, while stale
  started non-read executions return an explicit unknown-state tool result
  instead of re-running side-effectful work.
- Run maps now expose replay cursors that skip stream replay for terminal or
  waiting projections already represented by transcript/pending state. The
  frontend uses the backend cursor when attaching to a reused run.

Verification:

- `git diff --check -- src/com/gzzm/lobster/runtime/RunRequest.java src/com/gzzm/lobster/runtime/AgentRuntime.java src/com/gzzm/lobster/api/PendingRequestApi.java src/com/gzzm/lobster/api/RunApi.java src/com/gzzm/lobster/tool/ToolExecutorDispatcher.java`
- `git diff --check -- src/stores/lobsterStore.js`
- Static symbol scan for changed methods:
  `runCore`, `asResumeContinuation`, `turnBudgetFor`,
  `blocksNewRunForOpenPending`, `pendingReuse`,
  `replayRecordedToolResult`.

Compile/build validation: not run, following the workspace preference to avoid
builds unless explicitly requested.

### 2026-05-12 Exhaustive Follow-up Remediation

Date: 2026-05-12

Scope:

- `zm-ai-server/src/com/gzzm/lobster/api/PendingRequestApi.java`
- `zm-ai-server/src/com/gzzm/lobster/pending/PendingActionResolver.java`
- `zm-ai-server/src/com/gzzm/lobster/tool/ToolExecutorDispatcher.java`
- `zm-ai-server/src/com/gzzm/lobster/tool/builtin/InteractionTools.java`

Remediation applied:

- Side-effectful tools now fail closed when `ToolExecutionRecord` cannot be
  created. `READ` tools may still execute without a durable record, but
  `WRITE_LOCAL` and `WRITE_EXTERNAL` tools return an explicit
  `error.record_unavailable` audit path instead of executing without a replay
  guard.
- `confirm_action` and `ask_user` are explicitly marked
  `SideEffectLevel.WRITE_LOCAL`, because both create local `PendingRequest`
  rows despite using `READ_ONLY` risk for permission/rate-limit purposes.
  Recovery therefore treats unfinished interaction calls as unknown local
  side effects instead of re-running them and creating duplicate pending rows.
- `PendingActionResolver` now returns an `ActionResult` that records whether a
  confirmed backend action already applied an external side effect.
- `PendingRequestApi.resolve` no longer reopens the same pending request after
  a confirmed external action has already been applied. If continuation startup
  fails after the external action, the source run is marked `error` and its
  active-run claim is cleared instead of allowing a user retry to repeat the
  external write.
- OA overwrite exceptions after the external write call has been attempted are
  now classified as possible side effects. This routes the API compensation path
  through `finishSourceRunAfterAppliedActionFailure` instead of reopening the
  original pending request.

Verification:

- `git diff --check -- src/com/gzzm/lobster/api/PendingRequestApi.java src/com/gzzm/lobster/pending/PendingActionResolver.java src/com/gzzm/lobster/tool/ToolExecutorDispatcher.java src/com/gzzm/lobster/tool/builtin/InteractionTools.java`
- Static symbol scan for `executeWithResultIfApplicable`,
  `ActionResult.sideEffectApplied`, `SideEffectLevel.WRITE_LOCAL`,
  `error.record_unavailable`, `isReadOnly`, and
  `finishSourceRunAfterAppliedActionFailure`.
- Static branch check for `externalWriteAttempted` before
  `oaFileClient.overwrite*` and side-effect classification in the catch block.

Compile/build validation: not run, following the workspace preference to avoid
builds unless explicitly requested.

Remaining risk:

- AR-2026-05-12-02 is materially hardened but not a full durable run-step
  scheduler. The long-term optimal architecture should still introduce an
  explicit run-step journal for model turns, tool calls, pending transitions,
  and terminal events, then make recovery resume from that journal instead of
  relying primarily on transcript plus tool-execution records.

### 2026-05-12 Re-review

Date: 2026-05-12

Scope:

- `zm-ai-server/src/com/gzzm/lobster/api/PendingRequestApi.java`
- `zm-ai-server/src/com/gzzm/lobster/runtime/AgentRuntime.java`
- `zm-ai-server/src/com/gzzm/lobster/api/RunApi.java`
- `zm-ai-server/src/com/gzzm/lobster/tool/ToolExecutorDispatcher.java`
- `zm-ai-frontend/src/stores/lobsterStore.js`

Result:

The main direction is correct, but the current patch still has blocking lifecycle
gaps before it can be considered aligned with the target runtime contract.

Findings:

- `AR-2026-05-12-RR-01 [P0]`: pending resolve is still not atomic. In
  `PendingRequestApi.resolve`, `pendingService.resolve(...)` marks the pending
  resolved before `actionResolver.executeIfApplicable(...)` and before
  `agentRuntime.startDetached(...)`. Only `startDetached` failure reopens the
  pending. If `actionResolver` fails, or if non-CONFIRM max-turn cancel fails,
  the pending remains resolved and the thread may lose its recoverable waiting
  fact.
- `AR-2026-05-12-RR-02 [P1]`: `AgentRuntime.startDetached` closes the
  continuation source run before worker submission is known to be accepted, and
  worker submission failure is converted to a terminal error. For pending
  continuation, this can leave the source `waiting_user` run ended, the pending
  already resolved, and the new run failed instead of recoverable.
- `AR-2026-05-12-RR-03 [P2]`: `ToolExecutorDispatcher.replayRecordedToolResult`
  replays by `(runId, toolCallId)` only. It does not verify tool name or
  arguments against the persisted record, so a provider/tool-call-id collision
  or divergent retry can replay the wrong tool result.

Verification:

- Static review of the exact file text and current diff.
- No compile/build run.

Required remediation:

- Wrap pending resolution, action execution, and continuation creation in a
  compensating flow: if any step after resolve fails, reopen the pending and keep
  or restore the source run waiting state.
- Do not end the source waiting run until the continuation worker is durably
  accepted, or keep submit failure recoverable instead of terminal.
- Compare persisted tool name and normalized/redacted arguments before replaying
  a recorded tool result.

### 2026-05-12 Re-review Remediation

Date: 2026-05-12

Scope:

- `zm-ai-server/src/com/gzzm/lobster/api/PendingRequestApi.java`
- `zm-ai-server/src/com/gzzm/lobster/runtime/AgentRuntime.java`
- `zm-ai-server/src/com/gzzm/lobster/tool/ToolExecutorDispatcher.java`

Remediation applied:

- Pending resolution now compensates every post-resolve failure path. If cancel,
  action execution, or continuation creation fails after the pending row was
  marked resolved, the API reopens the pending request and best-effort restores
  the source run to `waiting_user`.
- `AgentRuntime.startDetached` no longer closes the continuation source run
  before worker submission is accepted. Submit failure restores the source run
  claim, marks the unaccepted continuation run as cancelled, and throws so the
  pending API can reopen the pending request.
- Tool-result replay now requires the persisted tool name and canonical
  redacted argument JSON to match the current tool call before replaying a
  recorded result or applying unknown-side-effect protection.

Verification:

- `git diff --check -- src/com/gzzm/lobster/api/PendingRequestApi.java src/com/gzzm/lobster/runtime/AgentRuntime.java src/com/gzzm/lobster/tool/ToolExecutorDispatcher.java docs/AGENT_RUNTIME_REVIEW_LOG.md`
- Static symbol scan for `restoreSourceRunWaiting`,
  `restoreContinuationSourceClaim`, `markDetachedRunNotAccepted`, and
  `matchesRecordedToolCall`.

Compile/build validation: not run, following the workspace preference to avoid
builds unless explicitly requested.

Remaining risk:

- This fixes the re-review lifecycle bugs but does not replace the partial
  journal architecture noted under `AR-2026-05-12-02`.

### 2026-05-12 Agent-demand Re-review Remediation

Date: 2026-05-12

Scope:

- `zm-ai-server/src/com/gzzm/lobster/api/PendingRequestApi.java`
- `zm-ai-server/src/com/gzzm/lobster/runtime/AgentRuntime.java`
- `zm-ai-server/src/com/gzzm/lobster/tool/ToolExecutorDispatcher.java`
- `zm-ai-server/src/com/gzzm/lobster/tool/ToolExecutionRecord.java`

Remediation applied:

- Continuation source close is now part of the pre-submit acceptance path again,
  but submit failure restores the source waiting run and reopens the pending via
  the API compensation path. Source close uses a conditional `waiting_user` to
  `ended/pending` transition and throws instead of silently returning success
  when the source cannot be closed.
- Pre-submit failures after the continuation run row has been persisted now mark
  that unaccepted run cancelled before rethrowing, preventing stale recovery from
  executing a failed continuation while the pending has been reopened.
- Pending API compensation now restores the source run claim whether the thread
  active slot is empty or still points at a failed continuation run.
- Tool execution records now persist a SHA-256 fingerprint of canonical raw
  arguments inside the existing `argumentsJson` CLOB, alongside redacted
  arguments, so the fix does not depend on an `ALTER TABLE` for a new column.
- Tool replay now requires tool name plus raw-argument fingerprint match. A
  same `(runId, toolCallId)` mismatch on non-read tools returns an explicit
  error instead of executing another side-effectful call under an already-used
  call id.
- Legacy pre-fingerprint records replay only when the current args have no
  sensitive keys and the stored redacted args contain no redaction markers;
  otherwise non-read calls fail closed instead of risking a wrong replay.

Verification:

- `git diff --check -- src/com/gzzm/lobster/api/PendingRequestApi.java src/com/gzzm/lobster/runtime/AgentRuntime.java src/com/gzzm/lobster/tool/ToolExecutorDispatcher.java src/com/gzzm/lobster/tool/ToolExecutionRecord.java docs/AGENT_RUNTIME_REVIEW_LOG.md`
- Static symbol scan for `closeContinuationSourceRun`,
  `restoreContinuationSourceClaim`, `fingerprintArguments`,
  `parseRecordedArguments`, `containsSensitiveKey`, and
  `matchesRecordedToolCall`.

Compile/build validation: not run, following the workspace preference to avoid
builds unless explicitly requested.
