← All posts

Meshfleet v0.4.0 — Resilience

Per-fleet timeouts, a structured event log, and list_fleets. The mesh is starting to watch its own back.

Meshfleet v0.4.0 — Resilience

We just shipped v0.4.0 of the Agent Mesh MCP server. Three new tools, one new observability surface, and a meaningful change to the timeout model. This post walks through what changed and why.

Install v0.4.0 · Read the spec · Browse the API

What’s new

1. Per-fleet timeouts

Until v0.4.0, the only way to control how long an agent could run was the global AGENT_MESH_AGENT_TIMEOUT_MS env var. That works fine when you want a uniform timeout across all your work. It fails the moment you have a heterogeneous workload — a quick lint agent that should time out in 30 seconds, alongside a deep-reasoning agent that needs the full 30 minutes.

set_fleet_timeout solves that:

await callTool("set_fleet_timeout", {
  fleet_id: "lint-fleet",
  timeout_ms: 30_000,
});

After this call, agents in lint-fleet get killed after 30 seconds. Other fleets in the same MCP server instance are unaffected. The effective timeout for each fleet is resolved as: per-fleet override → env var → 30-minute default. Inspect with get_fleet_timeout_ms(fleet_id).

2. Structured event log

We’ve added an append-only NDJSON log at ~/.config/opencode/agent-mesh.events.log. Every fleet_created, agent_spawned, fleet_timeout_set, and spawn_fleet_called event is now written to it. Format:

{"event":"fleet_created","fleet_id":"abc-123","timestamp":1751347200000}
{"event":"agent_spawned","fleet_id":"abc-123","agent_id":"a1","role":"Explorer","timestamp":1751347201000}
{"event":"agent_spawned","fleet_id":"abc-123","agent_id":"a2","role":"Analyst","agent_file":"oracle","timestamp":1751347201100}

This is the foundation for the upcoming v0.5.0 fleet inspector (CLI/TUI). For now, you can tail -f the log to watch fleets spawn in real time, or jq it to query historical runs.

The log is intentionally minimal — just the events, no PII, no prompt contents, no agent output. If you want to add observability without leaking data, this is a safe baseline.

3. list_fleets tool

Until v0.4.0, you could only inspect one fleet at a time via fleet_status({ fleet_id }). The new list_fleets tool returns a summary of every fleet the MCP server knows about:

const { fleets } = await callTool("list_fleets", {});
// → {
//   fleets: [
//     { id: "abc-123", status: "complete", agent_count: 3, agents_complete: 3, agents_failed: 0, agents_running: 0, ... },
//     { id: "def-456", status: "running",  agent_count: 2, agents_complete: 1, agents_failed: 0, agents_running: 1, ... },
//   ]
// }

Each summary includes agent counts broken down by status (complete, failed, running). The first thing we built with this was a “what’s running right now?” check in our own internal tooling.

4. Auto-emit events from core paths

createFleet, spawn_fleet, and set_fleet_timeout all emit structured events to the log now. This is what powers the log above — but it also means external tools (custom dashboards, alerting, audit pipelines) can subscribe to the file directly and know exactly when fleets come and go.

Breaking changes

None. v0.3.0 callers continue to work unchanged. The new Fleet.timeout_ms field is optional and defaults to “use the env var.”

What’s next

v0.5.0 is where the mesh gets real-time. The plan:

  • Heartbeat / watchdog — emit periodic heartbeat events; auto-fail agents that miss N heartbeats (this is the one piece left over from the v0.4 resilience push)
  • SSE push notificationssubscribe_inbox(agent_id, callback) for real-time message delivery instead of polling
  • Fleet events — emit events on fleet start / agent complete / fleet complete, building on the event log
  • CLI inspectornpx agent-mesh inspect <fleet_id> shows a live TUI of running agents

Full roadmap →

Try it

cd ~/.config/opencode/mcp-servers/agent-mesh
git pull
npm install
npm run build

Restart OpenCode. Check the new tools:

// Per-fleet timeout
await callTool("set_fleet_timeout", { fleet_id: "abc-123", timeout_ms: 60_000 });

// List all your fleets
const { fleets } = await callTool("list_fleets", {});

// Watch the event log
// tail -f ~/.config/opencode/agent-mesh.events.log

36 tests passing, MIT licensed, no telemetry, no cloud. Read the full spec →

— The Meshfleet team