Scaling AI Operations for M+ Daily Executions
February 10, 2026 · 1 min read

When I started Right Aim, the six AI agents executed a handful of tasks per day. Today, they collectively process over a million operations weekly. This article breaks down what changed, and what had to change, to make that scale work.
The Queue Architecture
Every agent in the team operates through n8n workflows. At low scale, simple webhook triggers suffice. But once you cross the 10,000 daily execution threshold, you need proper queue management. We use n8n's built-in queue mode backed by Redis, with separate queues per agent priority tier.
Aston (Planning Partner) gets priority 1, his decisions cascade to every other agent. Bob (Factory Manager) and Leah (Growth Operator) share priority 2. Metrick, Mira, and Rex operate on priority 3 with burst capacity.
Cost Control at Scale
The single biggest cost driver is LLM token consumption. I track every token through Metrick's cost infrastructure, which provides real-time visibility into spend per agent, per workflow, per day. The key insight: 80% of the token spend comes from 20% of the workflows, specifically, content generation and code review operations.
I implemented a tiered model strategy: Haiku for data gathering (0.25x cost), Sonnet for code comprehension (0.60x), and Opus only for complex reasoning (1.00x). This alone cut the monthly token cost by 40%.
What I Learned
Scale reveals architecture debt faster than anything else. The patterns that work at 100 executions per day break spectacularly at 10,000. Separate your hot path from your cold path. Instrument everything from day one. Design for graceful degradation.
Share this article
If you found this useful, connect with me — it takes 10 seconds and helps me know who I'm writing for.
Connect →Recommended

