Open-source · MIT License · v0.2.9

Agents that
think in tiers.

Cascade is a CLI orchestration system that decomposes any task into a three-tier hierarchy — Administrator → Manager → Worker — routing work across the best available models automatically.

$ npm install -g cascade-ai click to copy ✓ copied

Three tiers. One coherent output.

Every prompt flows through a hierarchy. T1 plans, spawns T2 managers in parallel, each of which orchestrates T3 workers — all streaming back into a single final answer.

T1

Administrator

Analyzes complexity · Selects models · Decomposes into sections · Compiles final output

active
dispatches in parallel
T2 — Section 1
Auth module refactor
T2 — Section 2
Test generation
T2 — Section 3
Open pull request
spawn workers
T3
Read files
T3
Write JWT logic
T3
Run tests
T3
Write test file
T3
git commit & push

Complexity determines the tier count.

Cascade classifies your prompt before dispatching — simple questions go direct to a T3 worker, complex implementations spin up a full hierarchy.

Classification Example Route T2 Managers
Simple "What is a closure?"
T3
Moderate "Add pagination to the users API"
T2 T3 ×2
1
Complex "Refactor auth module to JWT, add tests, open PR"
T1 T2 ×3 T3 ×n
3–5
Highly Complex "Research, benchmark, and document the full auth ecosystem"
T1 T2 ×5+ T3 ×n
5+

Works with every major model provider

Anthropic Claude
OpenAI GPT
Google Gemini
Azure OpenAI
Ollama (local)
OpenAI-Compatible

Production-grade from day one.

No plugin store to browse. The tools your agents need are already wired in.

Live Agent Tree

Watch the T1→T2→T3 hierarchy execute in real time directly in the terminal via ink rendering.

🔒

Permission Escalation

Dangerous tool calls escalate through T2 → T1 → user before executing. Never a silent file delete.

🔄

Provider Failover

Rate-limit hit? Cascade auto-switches providers with exponential backoff. Zero config required.

🛠️

Full Tool Suite

Shell, file CRUD, git, GitHub/GitLab PRs, Playwright browser automation, PDF creation, code interpreter.

🌐

Web Dashboard

React + ReactFlow live topology graph, session browser, cost tracker, JWT auth, WebSocket updates.

🔌

MCP Support

Connect any Model Context Protocol server. Its tools become available to every T3 worker automatically.

💰

Per-Tier Cost Breakdown & Budget Control

Every result exposes costByTier, tokensByTier, and percentage attribution. Set a live session budget with /budget set 0.50 — Cascade warns you at 80% spend (configurable via warnAtPct) and stops new tasks the moment the cap is hit, with no config-file edits required.

⌨️

Guided Setup Wizard

First-run TUI collects API keys for every provider — including multiple Azure deployments and custom OpenAI-compatible endpoints. Fetches live model lists, then assign T1/T2/T3 models or let Cascade Auto decide.

🖥️

Claude Code-Style CLI

Redesigned terminal UI with a top status bar showing live tier models and cost, a compact agent tree for T1→T2→T3 progress, and a keyboard hint bar — all purpose-built for Cascade's multi-tier hierarchy.

🎛️

Interactive Model Picker

Run /model inside the REPL for a three-step picker — provider → tier → model — with Auto at every step. Arrow keys, Tab, j/k and number keys all work; selections write .cascade/config.json and hot-swap the live router, no restart required.

Task Cancellation via AbortSignal

Pass an AbortSignal to cascade.run() to stop any in-progress run mid-flight. All active tiers (T1 → T2 → T3) halt at the next safe checkpoint before the next LLM call — no mid-stream interruptions, no orphaned agents. A run:cancelled event fires with partial output so you can still surface what was produced. Prevents runaway token spend on long tasks.

cascade.config.json
// .cascade/config.json
{
  "version": "1.0",
  "providers": [
    { "type": "anthropic",
      "apiKey": "sk-ant-..." },
    { "type": "ollama" }
  ],
  "models": {
    "t1": "claude-opus-4",
    "t2": "claude-sonnet-4",
    "t3": "llama3.2:3b"
  },
  "tools": {
    "shellBlocklist": ["rm -rf"],
    "requireApprovalFor": ["shell"]
  }
}

Embed in any
Node.js project.

Cascade exposes a first-class TypeScript SDK. Bring your own approval flow, stream tokens to any UI, or wire it into a CI pipeline.

Full TypeScript types for every option and result

Token-by-token streaming via callback

Custom approval callbacks for tool gating

Per-tier cost & token breakdown — costByTier, tokensByTier, and percentage attribution in every result

Live budget management — /budget set <$amount> caps session spend at runtime; /budget shows a visual spend bar; proactive warning fires at 80% (configurable warnAtPct) before the hard stop

runCascade, createCascade, streamCascade — three entry points

example.ts
import { streamCascade } from 'cascade-ai';

await streamCascade(
  'Refactor auth module to use JWT, add tests, open a PR',
  (token) => process.stdout.write(token),
  {
    workspacePath: '/my/project',
    approvalCallback: async (req) => {
      console.log(`Allow ${req.toolName}?`);
      return { approved: true, always: false };
    },
  }
);

One command away.

Open-source, MIT licensed. No telemetry by default. Runs local models via Ollama — your code never leaves your machine if you don't want it to.