The 38-agent fleet that runs AskBaily
By Jason, Founder · Published · 4 min read · Waves 185, 194, 200
Summary
AskBaily runs on a fleet of 38 specialized agents, each with a narrow persona, tool registry, and runbook. Wave 200 finalized the fleet with three parallel ships: persona and tool corpus, canary-wired dispatch, and the Fleet Control Room dashboard with 152 red-team probes. Two agents are live in canary today. The rest are staged.
Article body
Three years ago I would have called this post AI-engineering hype. Then I built a contractor marketplace, and I needed 38 different kinds of judgment running in the background to keep it honest, and I could not hire 38 people to do them. So I built the fleet.
Wave 200 is the point in the project where the fleet is real and documented, not a plan. On 2026-04-22 three commits shipped in parallel: 6856cd14 defined the 38 personas and their tool registries; 818a8c2f wired the universal dispatch gate and fleet-dispatcher so the canary feature flag pipes through cleanly; 06842b72 shipped the Fleet Control Room dashboard and the 152-probe red-team matrix. Two agents — Growth-Sales from Wave 185 and Content-Moderation from Wave 194 — are live in canary. The rest are staged with feature flags off and runbooks ready.
The shape of an agent
An agent, in this fleet, is five artifacts: a persona definition (about 400 words of voice and responsibility), a tool registry (the list of APIs, read and write, it is allowed to call), a runbook (how it escalates, what it must not do, what its kill switch is), a canary gate (a feature flag plus a phone-number or city allowlist so it only runs for staging traffic until promoted), and a red-team suite (a subset of the 152 probes specifically tuned to adversarial inputs for that agent's domain).
All five artifacts are in the repo at lib/agents/, one subdirectory per agent. The fleet-dispatcher at lib/agents/dispatcher.ts is the only entry point. It reads a request, decides which agent should handle it, checks the canary gate, and either routes it or drops it. There is no agent-to-agent direct call. Everything goes through the dispatcher so the audit trail is one file, not 38.
Five agents in the fleet, as examples
Growth-Sales is the one we shipped first, in Wave 185. Its job is to draft outbound messages to contractors who have expressed intent (tool usage, inbound inquiry, partner-referred). It is allowed to call our CRM, read the contractor's public licensing record, and draft a message for human review. It is not allowed to send. Feature flag is AGENT_GROWTH_SALES_ENABLED, dry-run mode is AGENT_GROWTH_SALES_DRY_RUN, canary is restricted to a specific test-phone allowlist. Prometheus metrics track dispatch count, dry-run count, human-override count, and red-team failure count. Grafana alerts fire if any of those cross their threshold.
Content-Moderation is the second live agent, from Wave 194. It reviews new /for-pros applications, new homeowner scope descriptions, and reviews-to-be-published for policy violations (offensive content, impersonation attempts, attempts to bypass our verification rail with a claimed-elsewhere license). It drafts a decision, never executes one. A human operator approves or rejects before anything ships to production.
The remaining 36 agents are staged, not live. They include Regulatory-Watch (a watcher for new board-side regulation), Scope-Clarifier (an assistant that disambiguates a homeowner's free-text scope into our structured scope-snap schema), License-Lapse-Watcher (monitors the registry rail and alerts us when a contractor's license approaches expiration), Dispute-Triage (routes homeowner-contractor disputes to the right escalation path), and 32 more. Each has a specific narrow job. None of them are allowed to do each other's work.
Why 38 and not one big model
Because the right agent count for any real operation is the number of distinct judgments the operation needs, not the number of big models that can theoretically cover the space. A general-purpose assistant can, in principle, do any of these 38 jobs. In practice, each job has a different failure mode, different adversarial surface, different escalation path, different metric for success, and different canary allowlist. Collapsing them into one agent loses the ability to canary them independently, fail them independently, and audit them independently.
The 152-probe red-team matrix, from Wave 200.C, is structured by agent. Each agent has between three and seven probes tuned to its domain. The Growth-Sales agent has probes that test whether it will draft a message to a contractor whose license was just suspended (it must not). The Content-Moderation agent has probes that test whether a sufficiently clever impersonation attempt gets flagged. The matrix runs on every commit that touches an agent file; a red-team regression blocks the commit at pre-push.
What this buys us in practice
Three things. First, we can ship agent changes with real confidence because the canary-plus-red-team gate catches the obvious regressions before they reach real traffic. Second, the operator runbook at docs/agents/{agent-slug}.md — 38 of them — means anyone stepping in can see what the agent is allowed to do, what it must not do, and how to kill it, without reading source code. Third, the Fleet Control Room dashboard gives us a single pane of glass: which agents are canary-live, which are staged, which have red-team regressions, which are rate-limited.
What Angi and Thumbtack cannot do
They can. The agents are engineering, not IP. What they cannot do is publish the fleet map and the red-team results. If Thumbtack runs a triage agent on homeowner-submitted scope, they have not said so; if they run one and it drops a percentage of inbound that a homeowner would consider serious, there is no external accountability. We publish which agents are live, which are canary, and which are staged. When we promote an agent, the promotion is dated. The /roadmap page keeps the record.
The fleet is the answer to a question nobody else in the category has publicly answered: "who, or what, is actually making the call when your platform decides what to do with my request?" On AskBaily, in 38 separate cases, the answer is documented. That is not an AI-engineering flourish. That is accountability, rendered as engineering.
Sources & references
Commit attestation
- 6856cd146c36c4c9ae5ff01b32254169cad27046
- 818a8c2f999c0782d203330dcb09b3be60113fa7
- 06842b723e98e5ff09724e2e69535db7b0e19a26
- 3eb234de857528f0109baf78b832b56d35638e7f
- 310dc2dd5ea155ba2be59408d5c04a5d4dd79cd7
- Waves
- 185, 194, 200
- Author
- jason
Commit SHAs are from the AskBaily private repository. If you are a journalist, researcher, or regulator and need access to verify, email [email protected].
Frequently asked
- How many of the 38 agents are live in production today?
- Two — Growth-Sales (Wave 185) and Content-Moderation (Wave 194), both in canary with feature flags and test-phone allowlists. The other 36 are staged with runbooks and red-team suites ready, awaiting canary promotion.
- Can one agent call another?
- No. All routing goes through the fleet-dispatcher, which enforces the canary gate and logs the dispatch. There is no agent-to-agent direct call path in the codebase.
- What happens when an agent fails a red-team probe?
- The commit is blocked at pre-push. The agent cannot be promoted past canary until the regression is fixed and the suite is green again.