The real cost of enterprise AI — models, infrastructure, operations
What enterprise AI actually costs at scale. Model inference, infrastructure, engineering, ongoing operations — with specific numbers from production deployments.
Beyond the model subscription
When leaders budget for AI, they often focus on the model API costs — "we'll spend $X/month on Claude or GPT." That framing dramatically understates total cost. For production enterprise AI, model inference is typically 20-30% of total ongoing cost; engineering and operations are the bigger line items.
This post breaks down the real cost of enterprise AI based on our experience deploying ten production systems.
The cost categories
1. Model inference (20-30% of ongoing cost)
Per-token pricing from major providers as of Q1 2026:
| Model | Input $/M tokens | Output $/M tokens |
|---|---|---|
| Claude Opus 4 | $15 | $75 |
| Claude Sonnet 4.6 | $3 | $15 |
| Claude Haiku 4.5 | $0.80 | $4 |
| GPT-4o | $3 | $12 |
| GPT-4o-mini | $0.15 | $0.60 |
| Gemini 2.5 Pro | $1.25 | $10 |
| Open models (self-hosted) | $0.20-0.50 | $0.20-0.50 (compute) |
For an agent processing 10,000 queries/day at 2,000 tokens in/out per query:
- Claude Opus 4: ~$13,500/month
- Claude Sonnet 4.6: ~$2,700/month
- Claude Haiku 4.5: ~$720/month
- GPT-4o-mini: ~$135/month
Order-of-magnitude cost swing based on model choice. For high-volume agents, route by query complexity — flagship models for hard queries, smaller models for easy ones. Typical savings: 60-80% vs using flagship for everything.
2. Engineering (60%+ of total investment)
Initial build engineering cost for production agents:
- Simple agent (narrow scope, shallow integration): $40-80K
- Standard agent (production-grade with observability, evaluation, guardrails): $80-150K
- Complex agent (deep integration, custom workflows, multi-modal): $150-400K
Ongoing engineering:
- Monitoring and maintenance: 20-30 hours/month
- Tuning and improvement: 20-40 hours/month
- New feature and scope expansion: variable
Budget 30-50% of initial build cost annually for ongoing engineering.
3. Infrastructure (10-15% of ongoing cost)
- Retrieval layer (vector database, embedding generation): $500-3,000/month
- Observability (Langfuse, Helicone, or equivalent): $300-2,000/month
- Compute (API gateways, workers, queues): $500-2,500/month
- Storage (logs, archives, data lakes): $200-1,000/month
Total: $1,500-8,500/month for production-grade infrastructure.
4. Observability and evaluation (5-10% of ongoing cost)
Dedicated tooling for continuous evaluation:
- Evaluation harness (custom or tools like Promptfoo, DeepEval): $200-1,500/month
- Golden set maintenance: 10-20 hours/month of SME time
- Adversarial testing infrastructure: $100-500/month
5. Operations and incident response (varies)
When something breaks (and it will):
- Incident response: 2-5 hours per incident, $200-500/hour blended rate
- Root cause analysis: 4-10 hours per incident
- Post-mortem and process improvement: 2-4 hours per incident
Budget 5-10 incidents per year for a mature production agent.
Real-world total cost example
Tier-1 customer support agent at 100K monthly tickets
Initial build (Month 0-6):
- Engineering: $180K
- Infrastructure setup: $15K
- Model inference during dev/test: $8K
- Evaluation harness: $20K
- Total one-time: $223K
Ongoing (Year 1 from Month 6):
- Model inference (routing Haiku/Sonnet by complexity): $2,400/month = $28.8K/year
- Infrastructure: $4,500/month = $54K/year
- Observability: $800/month = $9.6K/year
- Engineering (tuning, incidents): $9,000/month = $108K/year
- Total ongoing: $200K/year
Economic comparison against human-only:
- Human agent cost (fully loaded): $75K/year per agent
- Agents resolving 58% of 100K tickets/month = 58K tickets resolved autonomously
- Equivalent human capacity: 6-8 agents
- Human cost avoided: $450-600K/year
- Net benefit: ~$250-400K/year after agent operating cost
Payback period: ~Year 1. Compelling ROI once at scale.
Hidden costs to plan for
Model vendor lock-in risk
If you build exclusively against Claude SDK, a significant Anthropic pricing change materially affects your economics. Model-agnostic infrastructure (routing through gateways) reduces this risk.
Ongoing prompt tuning
Production agents need prompt tuning as source data and user patterns evolve. Budget 10-20 hours/month of prompt engineering.
Retrieval index rebuilds
As underlying data changes, retrieval indexes need periodic rebuilds. Budget engineering effort and potential compute cost for this.
Scaling the evaluation harness
As scope grows, evaluation needs more golden cases. SME time to curate evaluation cases is often overlooked in budgets.
Compliance and audit requirements
For regulated industries, compliance costs (SOC 2 Type II, HIPAA, audit prep) can add $20-80K/year on top of direct AI costs.
Cost optimization strategies
- Route by query complexity — flagship models only for hard queries. Typical 60-80% cost reduction.
- Cache aggressively — semantic caching for common queries. 20-40% cost reduction for high-overlap workloads.
- Fine-tune smaller models — for very high-volume specialized tasks, fine-tuning a smaller open model can be 10x cheaper than flagship API. Engineering investment required.
- Prompt compression — trim prompt templates to essentials. Sometimes 30-50% token savings.
- Embed once, retrieve many — don't re-embed documents on every query.
- Observability sampling — at high volume, sample detailed traces rather than logging everything.
Conclusion
Enterprise AI costs what it costs — but model inference is not the dominant factor. Engineering, infrastructure, and operations usually outweigh model costs 2-4x. Budget holistically and plan for ongoing operations at 30-50% of initial build annually.
If you're modeling ROI for a specific AI engagement and want honest input, talk to us.
Related reading: Ten agentic AI deployments · Build vs buy for AI agents · LLM observability