Cascade is a CLI orchestration system that decomposes any task into a three-tier hierarchy — Administrator → Manager → Worker — routing work across the best available models automatically.
Every prompt flows through a hierarchy. T1 plans, spawns T2 managers in parallel, each of which orchestrates T3 workers — all streaming back into a single final answer.
Analyzes complexity · Selects models · Decomposes into sections · Compiles final output
Cascade classifies your prompt before dispatching — simple questions go direct to a T3 worker, complex implementations spin up a full hierarchy.
| Classification | Example | Route | T2 Managers |
|---|---|---|---|
| Simple | "What is a closure?" |
T3
|
— |
| Moderate | "Add pagination to the users API" |
T2
T3 ×2
|
1 |
| Complex | "Refactor auth module to JWT, add tests, open PR" |
T1
T2 ×3
T3 ×n
|
3–5 |
| Highly Complex | "Research, benchmark, and document the full auth ecosystem" |
T1
T2 ×5+
T3 ×n
|
5+ |
Works with every major model provider
No plugin store to browse. The tools your agents need are already wired in.
Watch the T1→T2→T3 hierarchy execute in real time directly in the terminal via ink rendering.
Dangerous tool calls escalate through T2 → T1 → user before executing. Never a silent file delete.
Rate-limit hit? Cascade auto-switches providers with exponential backoff. Zero config required.
Shell, file CRUD, git, GitHub/GitLab PRs, Playwright browser automation, PDF creation, code interpreter.
React + ReactFlow live topology graph, session browser, cost tracker, JWT auth, WebSocket updates.
Connect any Model Context Protocol server. Its tools become available to every T3 worker automatically.
Every result exposes costByTier, tokensByTier, and percentage attribution. Set a live session budget with /budget set 0.50 — Cascade warns you at 80% spend (configurable via warnAtPct) and stops new tasks the moment the cap is hit, with no config-file edits required.
First-run TUI collects API keys for every provider — including multiple Azure deployments and custom OpenAI-compatible endpoints. Fetches live model lists, then assign T1/T2/T3 models or let Cascade Auto decide.
Redesigned terminal UI with a top status bar showing live tier models and cost, a compact agent tree for T1→T2→T3 progress, and a keyboard hint bar — all purpose-built for Cascade's multi-tier hierarchy.
Run /model inside the REPL for a three-step picker — provider → tier → model — with Auto at every step. Arrow keys, Tab, j/k and number keys all work; selections write .cascade/config.json and hot-swap the live router, no restart required.
Pass an AbortSignal to cascade.run() to stop any in-progress run mid-flight. All active tiers (T1 → T2 → T3) halt at the next safe checkpoint before the next LLM call — no mid-stream interruptions, no orphaned agents. A run:cancelled event fires with partial output so you can still surface what was produced. Prevents runaway token spend on long tasks.
// .cascade/config.json
{
"version": "1.0",
"providers": [
{ "type": "anthropic",
"apiKey": "sk-ant-..." },
{ "type": "ollama" }
],
"models": {
"t1": "claude-opus-4",
"t2": "claude-sonnet-4",
"t3": "llama3.2:3b"
},
"tools": {
"shellBlocklist": ["rm -rf"],
"requireApprovalFor": ["shell"]
}
}
Cascade exposes a first-class TypeScript SDK. Bring your own approval flow, stream tokens to any UI, or wire it into a CI pipeline.
Full TypeScript types for every option and result
Token-by-token streaming via callback
Custom approval callbacks for tool gating
Per-tier cost & token breakdown — costByTier, tokensByTier, and percentage attribution in every result
Live budget management — /budget set <$amount> caps session spend at runtime; /budget shows a visual spend bar; proactive warning fires at 80% (configurable warnAtPct) before the hard stop
runCascade, createCascade, streamCascade — three entry points
import { streamCascade } from 'cascade-ai';
await streamCascade(
'Refactor auth module to use JWT, add tests, open a PR',
(token) => process.stdout.write(token),
{
workspacePath: '/my/project',
approvalCallback: async (req) => {
console.log(`Allow ${req.toolName}?`);
return { approved: true, always: false };
},
}
);
Open-source, MIT licensed. No telemetry by default. Runs local models via Ollama — your code never leaves your machine if you don't want it to.