← EasyTool.me Blog

Statewright Guide: Visual State Machines for Reliable AI Agents

Published: 2026-05-13 • Reading: 12 min • Tags: AI Agent, State Machine, LLM Reliability, Statewright, Visual Workflow, MCP, Agent Control Flow

Quick Summary:

What: Statewright — an open-source tool that uses visual state machines to make AI agents predictable and reliable
Stats: 70+ points on Hacker News, active GitHub repository at github.com/statewright/statewright
Key insight: Instead of making the model bigger, make the problem smaller — state machines constrain agent behavior with deterministic transitions
Core engine: Rust-based, evaluates state machine definitions with zero LLM involvement — pure deterministic logic
Integration: Works with Claude Code, Codex, Cursor, opencode, and Pi via MCP plugin layer
Visual editor: Drag-and-drop workflow builder at statewright.ai — non-developers can create and modify workflows

The Problem: AI Agents Are Unpredictable by Design

Here's a dirty secret about building with LLMs: every prompt is a suggestion, not a command. Give a model 40+ tools and an open-ended problem, and it barely gets out of the gate. It re-reads the same file 5+ times. It calls tools in the wrong order. It "verifies" its own output and returns a hallucination. It decides to skip a step you marked "MANDATORY".

This isn't the model's fault. It's an architectural problem. LLMs are generative — they produce plausible continuations, not guaranteed outputs. When you give them free rein over tool selection, execution order, and decision-making boundaries, you're gambling on alignment that doesn't exist yet.

The common fix? Bigger models and longer prompts. It helps sometimes. Observability tells you what went wrong after the fact, but it doesn't prevent it. What if, instead of making the model bigger, we made the problem smaller?

The Solution: State Machines as Guardrails

This is where Statewright comes in. Its core philosophy is captured in the tagline: "Agents are suggestions, states are laws."

State machines have been a bedrock of reliable software engineering for decades. A finite state machine (FSM) defines a set of states, valid transitions between them, and actions that can occur in each state. The key property: transitions are deterministic. The machine doesn't "guess" what to do next — it follows the rules you defined.

Statewright applies this principle to AI agents. Instead of letting the LLM decide which tool to call and in what order, you define a state machine that restricts available tools, commands, and actions in each phase. The LLM only operates within the boundaries of the current state. When it needs to move to the next phase, it triggers a transition — if the transition conditions are met.

The result: your agent gets 5 visible tools instead of 30, clear instructions for the current phase, and zero ambiguity about what's allowed. Models use fewer tokens to completion, and smaller models (13B+) start solving tasks they'd otherwise fail.

How Statewright Works Under the Hood

Statewright has two layers. At the core is a Rust engine that evaluates state machine definitions. It's deterministic — no LLM in the loop. On top sits a plugin layer that integrates with your coding agent via the Model Context Protocol (MCP). When you activate a workflow, hooks enforce tool restrictions per state automatically.

Here's a concrete example — a bugfix workflow defined in JSON:

{
  "id": "bugfix",
  "initial": "planning",
  "states": {
    "planning": {
      "allowed_tools": ["Read", "Grep", "Glob"],
      "max_iterations": 8,
      "on": { "READY": "implementing" }
    },
    "implementing": {
      "allowed_tools": ["Read", "Edit", "Write"],
      "max_edit_lines": 20,
      "max_files_per_state": 3,
      "on": { "DONE": "testing" }
    },
    "testing": {
      "allowed_tools": ["Read", "Bash"],
      "allowed_commands": ["pytest", "cargo test", "npm test"],
      "on": {
        "PASS": { "target": "completed", "guard": "tests_passed" },
        "FAIL_TEST": "implementing"
      }
    },
    "completed": { "type": "final" }
  },
  "guards": {
    "tests_passed": { "field": "test_result", "op": "eq", "value": "pass" }
  }
}

In this workflow:

Planning phase: The agent can only read files (Read, Grep, Glob). No editing, no running code. It figures out the problem first.
Implementing phase: Edit tools unlock, but capped at 20 lines per edit and 3 files per state. No destructive operations.
Testing phase: Only designated test commands (pytest, cargo test, npm test) are allowed. If tests fail, the agent loops back to implementing.
Completed: A final state — the agent is done, and cannot proceed further.

What makes this different from a simple DAG? State machines loop and retry. In the example above, failed tests send the agent back to implementing. The agent can iterate, fix, and re-test — that's what agentic work actually needs, and what a directed acyclic graph can't express.

Research Results: Real Impact on Model Performance

Statewright's team tested the approach on a 5-task SWE-bench subset with local models. The results speak for themselves:

Model	Size	Bug Fix	SWE-bench (5 tasks)
gemma3	3.3GB	FAIL	FAIL
gemma4:e2b	7.2GB	PASS*	FAIL
gpt-oss:20b	13.8GB	PASS	PASS (5/5)
gemma4:31b	19.9GB	PASS	PASS (5/5)
llama3.3	42.5GB	PASS	PASS (2/2)†

* with specialized edit_line tool adaptation • † tested on 2 of 5 tasks (added after initial run)

The most striking result: two models (13.8GB and 19.9GB) went from 2/10 to 10/10 with Statewright constraints — same tasks, same hardware. Below 13GB, models can produce tool calls but can't retain enough file context for accurate edits. That's a model capability floor, not a Statewright limitation.

For frontier models, the win is more structural: breaking "read-loop death spirals" where models re-read the same file 5+ times without editing, and keeping the tool space small enough that the model actually reasons instead of flailing around.

Guardrail Reference: What Statewright Enforces

Statewright provides a comprehensive set of guardrails for production agent workloads:

Per-state tool enforcement — Tools are invisible to the agent when not in the current state's allowed_tools list
Bash discernment — Redirects (>>), destructive ops (rm, shred), and scripting interpreters blocked in non-write states
Edit guards — Rejects diffs exceeding max_edit_lines, caps files edited per state
Command allow-lists — Prefix-matched allowed commands per state (e.g., only pytest during testing)
Conditional transitions — Guards with programmatic predicates (eq, gt, exists) on context data
Approval gates — requires_approval pauses for human review before high-risk transitions
Environment scoping — blocked_env + env_overrides per state
Session isolation — Per-session state via CLAUDE_SESSION_ID

Visual Editor: Drag-and-Drop Workflow Design

One of Statewright's standout features is the visual workflow editor at statewright.ai/workflows. You can create, edit, and visualize state machine workflows with drag-and-drop — no JSON editing required.

This is a game-changer for teams where domain experts (not developers) define agent behavior. A customer success manager can design a customer support workflow without writing a single line of code. A data pipeline operator can visually map out ETL stages. The visual editor generates the JSON definition automatically, which the Rust engine then enforces.

For developers, the JSON definition is always there as the source of truth. You can author workflows by hand, use the visual editor, or even ask your AI agent to generate one via statewright_create_workflow — point it at the JSON schema and it writes the workflow for you.

Real-World Use Cases

1. Customer Support Chatbots

A support bot moves through states: Identify Issue → Gather Context → Propose Solution → Escalate (if needed) → Confirm Resolution. In each state, the LLM has a narrow set of tools: reading knowledge base articles, checking order status, or drafting a reply. It can't accidentally delete a user account or escalate prematurely because those tools simply don't exist in the current state.

2. Data Processing Pipelines

ETL workflows with states: Extract → Validate Schema → Transform → Load → Verify. The LLM only has read access during validation, only sees transformation tools during the transform phase, and must pass verification before the pipeline moves forward. Failed validation loops back to extraction.

3. Form Filling & Document Generation

A form completion agent: Gather Requirements → Draft → Validate Fields → Human Review → Submit. Each state restricts what the agent can read, write, and modify. The human_review state requires approval before submission — preventing premature or incorrect form submission.

4. Workflow Automation

Complex business processes with branching logic, retry loops, and escalation paths. Conditional transitions enable workflows that adapt to real-world conditions — if a payment fails, transition to a retry state; if retries exceed max, escalate to human review.

Statewright vs LangGraph vs Vellum vs Temporal

The agent orchestration space is crowded in 2026. Here's how Statewright compares to alternatives:

Feature	Statewright	LangGraph	Vellum	Temporal
Visual editor	✅ Drag-and-drop	❌ Code-only	✅ Visual canvas	❌ Code-only
Hard tool enforcement	✅ Protocol-level	❌ Advisory	❌ Advisory	N/A
Loops & retries	✅ Native	✅ Native	⚠️ Limited	✅ Robust
Deterministic engine	✅ Rust (no LLM)	⚠️ Python runtime	⚠️ Python runtime	✅ Go/Java
Non-developer friendly	✅ Visual editor	❌ Python required	✅ Low-code	❌ SDK required
Coding agent focus	✅ Primary (Claude Code, etc.)	✅ General agent	⚠️ Prompt engineering	❌ Workflow engine
Pricing	Free tier (3 workflows)	Open-source	Paid tiers	Open-source + Cloud

Key differentiator: Statewright enforces tool restrictions at the protocol layer (MCP) before the model even sees them. LangGraph and Vellum inject instructions into the context — the model can still ignore them. This is the difference between an "advisory" guardrail and a "hard" one.

Getting Started with Statewright

Installation is straightforward, especially for Claude Code users:

# In Claude Code, run:
/plugin marketplace add statewright/statewright
/plugin install statewright
/reload-plugins

# Then start a bugfix workflow:
/statewright start bugfix

This opens a browser tab where you sign up at statewright.ai, generate an API key, and paste it back. Once activated, every tool call is gated by the active workflow's state machine.

Statewright also supports Codex, opencode, Pi, and Cursor (with advisory enforcement on Cursor due to MCP architecture limitations). Integration coverage:

Agent	Integration	Enforcement
Claude Code	Hooks + MCP	✅ Hard
Codex	Hooks	✅ Hard (alpha)
opencode	TS plugin	✅ Hard (alpha)
Pi	Skills extension	✅ Hard (alpha)
Cursor	MCP + rules	⚠️ Advisory (alpha)

Hard enforcement means tool calls are blocked at the protocol layer before the model sees them. Advisory means rules are injected into the context but not enforced — the model can still violate them.

Pricing

Statewright's engine is open-source (Apache 2.0 / FSL-1.1-ALv2). The managed cloud handles workflow storage, run history, and MCP gateway:

Plan	Workflows	Transitions/mo	Run History	Price
Free	3	200	72 hours	$0
Pro	10	2,500	7 days	$29/mo
Team	30	10,000	90 days	$99/mo
Enterprise	Unlimited	Unlimited	Custom	Contact

Individual developers can self-host the full stack under the FSL license. The Rust engine (crates/engine) is Apache 2.0 and embeddable with no runtime dependencies.

Developer Value: Debuggable, Traceable, Controllable

Beyond reliability, state machines give developers three things that prompt-based agents can't:

Debuggability

When an agent fails, you know exactly which state it was in, which tool it tried to call, and which transition was blocked. Each run produces a chain of state transitions — a clear audit trail. Compare this to a monolithic prompt agent where all you have is a log of raw LLM calls and have to infer intent.

Traceability

Every tool call is logged against a specific state. You can replay a workflow run step-by-step, see where the model got stuck, and understand why. Statewright's managed cloud stores run history with per-transition timestamps and token usage.

Controllability

Need to add a safety check? Add a state. Need to restrict a tool to production admins only? Add it to the allowed tools list of a single state. Need to block the agent from running arbitrary shell commands during the planning phase? Remove Bash from the allowed_tools list. These are all data changes to a JSON definition — no code changes, no prompt modifications, no model retraining.

Limitations to Know

Statewright isn't a silver bullet. Here are the caveats:

Requires MCP support in the agent (or hooks for non-MCP agents like Codex)
Workflow definitions are authored by hand (though agents can generate them via statewright_create_workflow)
Cursor enforcement is advisory — MCP alone can't gate tool calls in Cursor's architecture
Research results are from a 5-task SWE-bench subset, not the full 2294-instance benchmark
Too restrictive and the agent gets stuck — statewright_deactivate is the escape hatch

Conclusion

The AI agent community has spent two years trying to make models "smarter" at staying on task. Statewright takes the opposite approach: instead of making the model bigger, make the problem smaller.

By constraining agents with deterministic state machines — enforced at the protocol layer, visualized with drag-and-drop, and debuggable with full traceability — Statewright offers a path to AI agents that are actually reliable enough for production.

Whether you're building customer support bots, data pipelines, form automation, or complex multi-step workflows, the state machine pattern gives you what no prompt can: guarantees, not suggestions.

Try it for free at statewright.ai or explore the source code on GitHub.

Related reading:

AI Agent Control Flow Guide: Stop Stacking Prompts, Write Code — same philosophy, practical code examples
Statewright Research Brief — detailed SWE-bench results
Statewright Documentation — install guide, workflow authoring, schema reference
Hacker News discussion: Statewright GitHub Show HN

← Back to blog