The Real Cost of Enterprise AI

Beyond the model subscription

When leaders budget for AI, they often focus on the model API costs, "we'll spend $X/month on Claude or GPT." That framing dramatically understates total cost. For production enterprise AI, model inference is typically 20-30% of total ongoing cost; engineering and operations are the bigger line items.

This post breaks down the real cost of enterprise AI based on our experience deploying ten production systems.

The cost categories

1. Model inference (20-30% of ongoing cost)

Per-token pricing from major providers as of Q1 2026:

Model	Input $/M tokens	Output $/M tokens
Claude Opus 4	$15	$75
Claude Sonnet 4.6	$3	$15
Claude Haiku 4.5	$0.80	$4
GPT-4o	$3	$12
GPT-4o-mini	$0.15	$0.60
Gemini 2.5 Pro	$1.25	$10
Open models (self-hosted)	$0.20-0.50	$0.20-0.50 (compute)

For an agent processing 10,000 queries/day at 2,000 tokens in/out per query:

Claude Opus 4: ~$13,500/month
Claude Sonnet 4.6: ~$2,700/month
Claude Haiku 4.5: ~$720/month
GPT-4o-mini: ~$135/month

Order-of-magnitude cost swing based on model choice. For high-volume agents, route by query complexity, flagship models for hard queries, smaller models for easy ones. Typical savings: 60-80% vs using flagship for everything.

2. Engineering (60%+ of total investment)

Initial build engineering cost for production agents:

Simple agent (narrow scope, shallow integration): $40-80K
Standard agent (production-grade with observability, evaluation, guardrails): $80-150K
Complex agent (deep integration, custom workflows, multi-modal): $150-400K

Ongoing engineering:

Monitoring and maintenance: 20-30 hours/month
Tuning and improvement: 20-40 hours/month
New feature and scope expansion: variable

Budget 30-50% of initial build cost annually for ongoing engineering.

3. Infrastructure (10-15% of ongoing cost)

Retrieval layer (vector database, embedding generation): $500-3,000/month
Observability (Langfuse, Helicone, or equivalent): $300-2,000/month
Compute (API gateways, workers, queues): $500-2,500/month
Storage (logs, archives, data lakes): $200-1,000/month

Total: $1,500-8,500/month for production-grade infrastructure.

4. Observability and evaluation (5-10% of ongoing cost)

Dedicated tooling for continuous evaluation:

Evaluation harness (custom or tools like Promptfoo, DeepEval): $200-1,500/month
Golden set maintenance: 10-20 hours/month of SME time
Adversarial testing infrastructure: $100-500/month

5. Operations and incident response (varies)

When something breaks (and it will):

Incident response: 2-5 hours per incident, $200-500/hour blended rate
Root cause analysis: 4-10 hours per incident
Post-mortem and process improvement: 2-4 hours per incident

Budget 5-10 incidents per year for a mature production agent.

Real-world total cost example

Tier-1 customer support agent at 100K monthly tickets

Initial build (Month 0-6):

Engineering: $180K
Infrastructure setup: $15K
Model inference during dev/test: $8K
Evaluation harness: $20K
Total one-time: $223K

Ongoing (Year 1 from Month 6):

Model inference (routing Haiku/Sonnet by complexity): $2,400/month = $28.8K/year
Infrastructure: $4,500/month = $54K/year
Observability: $800/month = $9.6K/year
Engineering (tuning, incidents): $9,000/month = $108K/year
Total ongoing: $200K/year

Economic comparison against human-only:

Human agent cost (fully loaded): $75K/year per agent
Agents resolving 58% of 100K tickets/month = 58K tickets resolved autonomously
Equivalent human capacity: 6-8 agents
Human cost avoided: $450-600K/year
Net benefit: ~$250-400K/year after agent operating cost

Payback period: ~Year 1. Compelling ROI once at scale.

Hidden costs to plan for

Model vendor lock-in risk

If you build exclusively against Claude SDK, a significant Anthropic pricing change materially affects your economics. Model-agnostic infrastructure (routing through gateways) reduces this risk.

Ongoing prompt tuning

Production agents need prompt tuning as source data and user patterns evolve. Budget 10-20 hours/month of prompt engineering.

Retrieval index rebuilds

As underlying data changes, retrieval indexes need periodic rebuilds. Budget engineering effort and potential compute cost for this.

Scaling the evaluation harness

As scope grows, evaluation needs more golden cases. SME time to curate evaluation cases is often overlooked in budgets.

Compliance and audit requirements

For regulated industries, compliance costs (SOC 2 Type II, HIPAA, audit prep) can add $20-80K/year on top of direct AI costs.

Cost optimization strategies

Route by query complexity, flagship models only for hard queries. Typical 60-80% cost reduction.
Cache aggressively, semantic caching for common queries. 20-40% cost reduction for high-overlap workloads.
Fine-tune smaller models, for very high-volume specialized tasks, fine-tuning a smaller open model can be 10x cheaper than flagship API. Engineering investment required.
Prompt compression, trim prompt templates to essentials. Sometimes 30-50% token savings.
Embed once, retrieve many, don't re-embed documents on every query.
Observability sampling, at high volume, sample detailed traces rather than logging everything.

Conclusion

Enterprise AI costs what it costs, but model inference is not the dominant factor. Engineering, infrastructure, and operations usually outweigh model costs 2-4x. Budget holistically and plan for ongoing operations at 30-50% of initial build annually.

If you're modeling ROI for a specific AI engagement and want honest input, talk to us.

The real cost of enterprise AI, models, infrastructure, operations

Beyond the model subscription

The cost categories

1. Model inference (20-30% of ongoing cost)

2. Engineering (60%+ of total investment)

3. Infrastructure (10-15% of ongoing cost)

4. Observability and evaluation (5-10% of ongoing cost)

5. Operations and incident response (varies)

Real-world total cost example

Tier-1 customer support agent at 100K monthly tickets

Hidden costs to plan for

Model vendor lock-in risk

Ongoing prompt tuning

Retrieval index rebuilds

Scaling the evaluation harness

Compliance and audit requirements

Cost optimization strategies

Conclusion

Working on something like this? Let's compare notes.

Beyond the model subscription

The cost categories

1. Model inference (20-30% of ongoing cost)

2. Engineering (60%+ of total investment)

3. Infrastructure (10-15% of ongoing cost)

4. Observability and evaluation (5-10% of ongoing cost)

5. Operations and incident response (varies)

Real-world total cost example

Tier-1 customer support agent at 100K monthly tickets

Hidden costs to plan for

Model vendor lock-in risk

Ongoing prompt tuning

Retrieval index rebuilds

Scaling the evaluation harness

Compliance and audit requirements

Cost optimization strategies

Conclusion

Keep reading

Build vs buy for AI agents in 2026, a practical framework

What we've learned shipping ten agentic AI deployments, unvarnished lessons

LLM observability, why production AI dies without it

Working on something like this? Let's compare notes.