Statewright Guide: Visual State Machines for Reliable AI Agents
Published: 2026-05-13 • Reading: 12 min • Tags: AI Agent, State Machine, LLM Reliability, Statewright, Visual Workflow, MCP, Agent Control Flow
- What: Statewright — an open-source tool that uses visual state machines to make AI agents predictable and reliable
- Stats: 70+ points on Hacker News, active GitHub repository at github.com/statewright/statewright
- Key insight: Instead of making the model bigger, make the problem smaller — state machines constrain agent behavior with deterministic transitions
- Core engine: Rust-based, evaluates state machine definitions with zero LLM involvement — pure deterministic logic
- Integration: Works with Claude Code, Codex, Cursor, opencode, and Pi via MCP plugin layer
- Visual editor: Drag-and-drop workflow builder at statewright.ai — non-developers can create and modify workflows
The Problem: AI Agents Are Unpredictable by Design
Here's a dirty secret about building with LLMs: every prompt is a suggestion, not a command. Give a model 40+ tools and an open-ended problem, and it barely gets out of the gate. It re-reads the same file 5+ times. It calls tools in the wrong order. It "verifies" its own output and returns a hallucination. It decides to skip a step you marked "MANDATORY".
This isn't the model's fault. It's an architectural problem. LLMs are generative — they produce plausible continuations, not guaranteed outputs. When you give them free rein over tool selection, execution order, and decision-making boundaries, you're gambling on alignment that doesn't exist yet.
The common fix? Bigger models and longer prompts. It helps sometimes. Observability tells you what went wrong after the fact, but it doesn't prevent it. What if, instead of making the model bigger, we made the problem smaller?
The Solution: State Machines as Guardrails
This is where Statewright comes in. Its core philosophy is captured in the tagline: "Agents are suggestions, states are laws."
State machines have been a bedrock of reliable software engineering for decades. A finite state machine (FSM) defines a set of states, valid transitions between them, and actions that can occur in each state. The key property: transitions are deterministic. The machine doesn't "guess" what to do next — it follows the rules you defined.
Statewright applies this principle to AI agents. Instead of letting the LLM decide which tool to call and in what order, you define a state machine that restricts available tools, commands, and actions in each phase. The LLM only operates within the boundaries of the current state. When it needs to move to the next phase, it triggers a transition — if the transition conditions are met.
The result: your agent gets 5 visible tools instead of 30, clear instructions for the current phase, and zero ambiguity about what's allowed. Models use fewer tokens to completion, and smaller models (13B+) start solving tasks they'd otherwise fail.
How Statewright Works Under the Hood
Statewright has two layers. At the core is a Rust engine that evaluates state machine definitions. It's deterministic — no LLM in the loop. On top sits a plugin layer that integrates with your coding agent via the Model Context Protocol (MCP). When you activate a workflow, hooks enforce tool restrictions per state automatically.
Here's a concrete example — a bugfix workflow defined in JSON:
{
"id": "bugfix",
"initial": "planning",
"states": {
"planning": {
"allowed_tools": ["Read", "Grep", "Glob"],
"max_iterations": 8,
"on": { "READY": "implementing" }
},
"implementing": {
"allowed_tools": ["Read", "Edit", "Write"],
"max_edit_lines": 20,
"max_files_per_state": 3,
"on": { "DONE": "testing" }
},
"testing": {
"allowed_tools": ["Read", "Bash"],
"allowed_commands": ["pytest", "cargo test", "npm test"],
"on": {
"PASS": { "target": "completed", "guard": "tests_passed" },
"FAIL_TEST": "implementing"
}
},
"completed": { "type": "final" }
},
"guards": {
"tests_passed": { "field": "test_result", "op": "eq", "value": "pass" }
}
}
In this workflow:
- Planning phase: The agent can only read files (Read, Grep, Glob). No editing, no running code. It figures out the problem first.
- Implementing phase: Edit tools unlock, but capped at 20 lines per edit and 3 files per state. No destructive operations.
- Testing phase: Only designated test commands (pytest, cargo test, npm test) are allowed. If tests fail, the agent loops back to implementing.
- Completed: A final state — the agent is done, and cannot proceed further.
What makes this different from a simple DAG? State machines loop and retry. In the example above, failed tests send the agent back to implementing. The agent can iterate, fix, and re-test — that's what agentic work actually needs, and what a directed acyclic graph can't express.
Research Results: Real Impact on Model Performance
Statewright's team tested the approach on a 5-task SWE-bench subset with local models. The results speak for themselves:
| Model | Size | Bug Fix | SWE-bench (5 tasks) |
|---|---|---|---|
| gemma3 | 3.3GB | FAIL | FAIL |
| gemma4:e2b | 7.2GB | PASS* | FAIL |
| gpt-oss:20b | 13.8GB | PASS | PASS (5/5) |
| gemma4:31b | 19.9GB | PASS | PASS (5/5) |
| llama3.3 | 42.5GB | PASS | PASS (2/2)† |
* with specialized edit_line tool adaptation • † tested on 2 of 5 tasks (added after initial run)
The most striking result: two models (13.8GB and 19.9GB) went from 2/10 to 10/10 with Statewright constraints — same tasks, same hardware. Below 13GB, models can produce tool calls but can't retain enough file context for accurate edits. That's a model capability floor, not a Statewright limitation.
For frontier models, the win is more structural: breaking "read-loop death spirals" where models re-read the same file 5+ times without editing, and keeping the tool space small enough that the model actually reasons instead of flailing around.
Guardrail Reference: What Statewright Enforces
Statewright provides a comprehensive set of guardrails for production agent workloads:
- Per-state tool enforcement — Tools are invisible to the agent when not in the current state's
allowed_toolslist - Bash discernment — Redirects (
>>), destructive ops (rm,shred), and scripting interpreters blocked in non-write states - Edit guards — Rejects diffs exceeding
max_edit_lines, caps files edited per state - Command allow-lists — Prefix-matched allowed commands per state (e.g., only
pytestduring testing) - Conditional transitions — Guards with programmatic predicates (
eq,gt,exists) on context data - Approval gates —
requires_approvalpauses for human review before high-risk transitions - Environment scoping —
blocked_env+env_overridesper state - Session isolation — Per-session state via
CLAUDE_SESSION_ID
Visual Editor: Drag-and-Drop Workflow Design
One of Statewright's standout features is the visual workflow editor at statewright.ai/workflows. You can create, edit, and visualize state machine workflows with drag-and-drop — no JSON editing required.
This is a game-changer for teams where domain experts (not developers) define agent behavior. A customer success manager can design a customer support workflow without writing a single line of code. A data pipeline operator can visually map out ETL stages. The visual editor generates the JSON definition automatically, which the Rust engine then enforces.
For developers, the JSON definition is always there as the source of truth. You can author workflows by hand, use the visual editor, or even ask your AI agent to generate one via statewright_create_workflow — point it at the JSON schema and it writes the workflow for you.
Real-World Use Cases
1. Customer Support Chatbots
A support bot moves through states: Identify Issue → Gather Context → Propose Solution → Escalate (if needed) → Confirm Resolution. In each state, the LLM has a narrow set of tools: reading knowledge base articles, checking order status, or drafting a reply. It can't accidentally delete a user account or escalate prematurely because those tools simply don't exist in the current state.
2. Data Processing Pipelines
ETL workflows with states: Extract → Validate Schema → Transform → Load → Verify. The LLM only has read access during validation, only sees transformation tools during the transform phase, and must pass verification before the pipeline moves forward. Failed validation loops back to extraction.
3. Form Filling & Document Generation
A form completion agent: Gather Requirements → Draft → Validate Fields → Human Review → Submit. Each state restricts what the agent can read, write, and modify. The human_review state requires approval before submission — preventing premature or incorrect form submission.
4. Workflow Automation
Complex business processes with branching logic, retry loops, and escalation paths. Conditional transitions enable workflows that adapt to real-world conditions — if a payment fails, transition to a retry state; if retries exceed max, escalate to human review.
Statewright vs LangGraph vs Vellum vs Temporal
The agent orchestration space is crowded in 2026. Here's how Statewright compares to alternatives:
| Feature | Statewright | LangGraph | Vellum | Temporal |
|---|---|---|---|---|
| Visual editor | ✅ Drag-and-drop | ❌ Code-only | ✅ Visual canvas | ❌ Code-only |
| Hard tool enforcement | ✅ Protocol-level | ❌ Advisory | ❌ Advisory | N/A |
| Loops & retries | ✅ Native | ✅ Native | ⚠️ Limited | ✅ Robust |
| Deterministic engine | ✅ Rust (no LLM) | ⚠️ Python runtime | ⚠️ Python runtime | ✅ Go/Java |
| Non-developer friendly | ✅ Visual editor | ❌ Python required | ✅ Low-code | ❌ SDK required |
| Coding agent focus | ✅ Primary (Claude Code, etc.) | ✅ General agent | ⚠️ Prompt engineering | ❌ Workflow engine |
| Pricing | Free tier (3 workflows) | Open-source | Paid tiers | Open-source + Cloud |
Key differentiator: Statewright enforces tool restrictions at the protocol layer (MCP) before the model even sees them. LangGraph and Vellum inject instructions into the context — the model can still ignore them. This is the difference between an "advisory" guardrail and a "hard" one.
Getting Started with Statewright
Installation is straightforward, especially for Claude Code users:
# In Claude Code, run:
/plugin marketplace add statewright/statewright
/plugin install statewright
/reload-plugins
# Then start a bugfix workflow:
/statewright start bugfix
This opens a browser tab where you sign up at statewright.ai, generate an API key, and paste it back. Once activated, every tool call is gated by the active workflow's state machine.
Statewright also supports Codex, opencode, Pi, and Cursor (with advisory enforcement on Cursor due to MCP architecture limitations). Integration coverage:
| Agent | Integration | Enforcement |
|---|---|---|
| Claude Code | Hooks + MCP | ✅ Hard |
| Codex | Hooks | ✅ Hard (alpha) |
| opencode | TS plugin | ✅ Hard (alpha) |
| Pi | Skills extension | ✅ Hard (alpha) |
| Cursor | MCP + rules | ⚠️ Advisory (alpha) |
Hard enforcement means tool calls are blocked at the protocol layer before the model sees them. Advisory means rules are injected into the context but not enforced — the model can still violate them.
Pricing
Statewright's engine is open-source (Apache 2.0 / FSL-1.1-ALv2). The managed cloud handles workflow storage, run history, and MCP gateway:
| Plan | Workflows | Transitions/mo | Run History | Price |
|---|---|---|---|---|
| Free | 3 | 200 | 72 hours | $0 |
| Pro | 10 | 2,500 | 7 days | $29/mo |
| Team | 30 | 10,000 | 90 days | $99/mo |
| Enterprise | Unlimited | Unlimited | Custom | Contact |
Individual developers can self-host the full stack under the FSL license. The Rust engine (crates/engine) is Apache 2.0 and embeddable with no runtime dependencies.
Developer Value: Debuggable, Traceable, Controllable
Beyond reliability, state machines give developers three things that prompt-based agents can't:
Debuggability
When an agent fails, you know exactly which state it was in, which tool it tried to call, and which transition was blocked. Each run produces a chain of state transitions — a clear audit trail. Compare this to a monolithic prompt agent where all you have is a log of raw LLM calls and have to infer intent.
Traceability
Every tool call is logged against a specific state. You can replay a workflow run step-by-step, see where the model got stuck, and understand why. Statewright's managed cloud stores run history with per-transition timestamps and token usage.
Controllability
Need to add a safety check? Add a state. Need to restrict a tool to production admins only? Add it to the allowed tools list of a single state. Need to block the agent from running arbitrary shell commands during the planning phase? Remove Bash from the allowed_tools list. These are all data changes to a JSON definition — no code changes, no prompt modifications, no model retraining.
Limitations to Know
Statewright isn't a silver bullet. Here are the caveats:
- Requires MCP support in the agent (or hooks for non-MCP agents like Codex)
- Workflow definitions are authored by hand (though agents can generate them via
statewright_create_workflow) - Cursor enforcement is advisory — MCP alone can't gate tool calls in Cursor's architecture
- Research results are from a 5-task SWE-bench subset, not the full 2294-instance benchmark
- Too restrictive and the agent gets stuck —
statewright_deactivateis the escape hatch
Conclusion
The AI agent community has spent two years trying to make models "smarter" at staying on task. Statewright takes the opposite approach: instead of making the model bigger, make the problem smaller.
By constraining agents with deterministic state machines — enforced at the protocol layer, visualized with drag-and-drop, and debuggable with full traceability — Statewright offers a path to AI agents that are actually reliable enough for production.
Whether you're building customer support bots, data pipelines, form automation, or complex multi-step workflows, the state machine pattern gives you what no prompt can: guarantees, not suggestions.
Try it for free at statewright.ai or explore the source code on GitHub.
Related reading:
- AI Agent Control Flow Guide: Stop Stacking Prompts, Write Code — same philosophy, practical code examples
- Statewright Research Brief — detailed SWE-bench results
- Statewright Documentation — install guide, workflow authoring, schema reference
- Hacker News discussion: Statewright GitHub Show HN