The real cost of enterprise AI — models, infrastructure, operations

What enterprise AI actually costs at scale. Model inference, infrastructure, engineering, ongoing operations — with specific numbers from production deployments.

Beyond the model subscription

When leaders budget for AI, they often focus on the model API costs — "we'll spend $X/month on Claude or GPT." That framing dramatically understates total cost. For production enterprise AI, model inference is typically 20-30% of total ongoing cost; engineering and operations are the bigger line items.

This post breaks down the real cost of enterprise AI based on our experience deploying ten production systems.

The cost categories

1. Model inference (20-30% of ongoing cost)

Per-token pricing from major providers as of Q1 2026:

ModelInput $/M tokensOutput $/M tokens
Claude Opus 4$15$75
Claude Sonnet 4.6$3$15
Claude Haiku 4.5$0.80$4
GPT-4o$3$12
GPT-4o-mini$0.15$0.60
Gemini 2.5 Pro$1.25$10
Open models (self-hosted)$0.20-0.50$0.20-0.50 (compute)

For an agent processing 10,000 queries/day at 2,000 tokens in/out per query:

  • Claude Opus 4: ~$13,500/month
  • Claude Sonnet 4.6: ~$2,700/month
  • Claude Haiku 4.5: ~$720/month
  • GPT-4o-mini: ~$135/month

Order-of-magnitude cost swing based on model choice. For high-volume agents, route by query complexity — flagship models for hard queries, smaller models for easy ones. Typical savings: 60-80% vs using flagship for everything.

2. Engineering (60%+ of total investment)

Initial build engineering cost for production agents:

  • Simple agent (narrow scope, shallow integration): $40-80K
  • Standard agent (production-grade with observability, evaluation, guardrails): $80-150K
  • Complex agent (deep integration, custom workflows, multi-modal): $150-400K

Ongoing engineering:

  • Monitoring and maintenance: 20-30 hours/month
  • Tuning and improvement: 20-40 hours/month
  • New feature and scope expansion: variable

Budget 30-50% of initial build cost annually for ongoing engineering.

3. Infrastructure (10-15% of ongoing cost)

  • Retrieval layer (vector database, embedding generation): $500-3,000/month
  • Observability (Langfuse, Helicone, or equivalent): $300-2,000/month
  • Compute (API gateways, workers, queues): $500-2,500/month
  • Storage (logs, archives, data lakes): $200-1,000/month

Total: $1,500-8,500/month for production-grade infrastructure.

4. Observability and evaluation (5-10% of ongoing cost)

Dedicated tooling for continuous evaluation:

  • Evaluation harness (custom or tools like Promptfoo, DeepEval): $200-1,500/month
  • Golden set maintenance: 10-20 hours/month of SME time
  • Adversarial testing infrastructure: $100-500/month

5. Operations and incident response (varies)

When something breaks (and it will):

  • Incident response: 2-5 hours per incident, $200-500/hour blended rate
  • Root cause analysis: 4-10 hours per incident
  • Post-mortem and process improvement: 2-4 hours per incident

Budget 5-10 incidents per year for a mature production agent.

Real-world total cost example

Tier-1 customer support agent at 100K monthly tickets

Initial build (Month 0-6):

  • Engineering: $180K
  • Infrastructure setup: $15K
  • Model inference during dev/test: $8K
  • Evaluation harness: $20K
  • Total one-time: $223K

Ongoing (Year 1 from Month 6):

  • Model inference (routing Haiku/Sonnet by complexity): $2,400/month = $28.8K/year
  • Infrastructure: $4,500/month = $54K/year
  • Observability: $800/month = $9.6K/year
  • Engineering (tuning, incidents): $9,000/month = $108K/year
  • Total ongoing: $200K/year

Economic comparison against human-only:

  • Human agent cost (fully loaded): $75K/year per agent
  • Agents resolving 58% of 100K tickets/month = 58K tickets resolved autonomously
  • Equivalent human capacity: 6-8 agents
  • Human cost avoided: $450-600K/year
  • Net benefit: ~$250-400K/year after agent operating cost

Payback period: ~Year 1. Compelling ROI once at scale.

Hidden costs to plan for

Model vendor lock-in risk

If you build exclusively against Claude SDK, a significant Anthropic pricing change materially affects your economics. Model-agnostic infrastructure (routing through gateways) reduces this risk.

Ongoing prompt tuning

Production agents need prompt tuning as source data and user patterns evolve. Budget 10-20 hours/month of prompt engineering.

Retrieval index rebuilds

As underlying data changes, retrieval indexes need periodic rebuilds. Budget engineering effort and potential compute cost for this.

Scaling the evaluation harness

As scope grows, evaluation needs more golden cases. SME time to curate evaluation cases is often overlooked in budgets.

Compliance and audit requirements

For regulated industries, compliance costs (SOC 2 Type II, HIPAA, audit prep) can add $20-80K/year on top of direct AI costs.

Cost optimization strategies

  1. Route by query complexity — flagship models only for hard queries. Typical 60-80% cost reduction.
  2. Cache aggressively — semantic caching for common queries. 20-40% cost reduction for high-overlap workloads.
  3. Fine-tune smaller models — for very high-volume specialized tasks, fine-tuning a smaller open model can be 10x cheaper than flagship API. Engineering investment required.
  4. Prompt compression — trim prompt templates to essentials. Sometimes 30-50% token savings.
  5. Embed once, retrieve many — don't re-embed documents on every query.
  6. Observability sampling — at high volume, sample detailed traces rather than logging everything.

Conclusion

Enterprise AI costs what it costs — but model inference is not the dominant factor. Engineering, infrastructure, and operations usually outweigh model costs 2-4x. Budget holistically and plan for ongoing operations at 30-50% of initial build annually.

If you're modeling ROI for a specific AI engagement and want honest input, talk to us.


Related reading: Ten agentic AI deployments · Build vs buy for AI agents · LLM observability

Tagged Agentic AIAI costsEnterprise AIAI strategy
NETLINKS AI Team

NETLINKS is a US-headquartered enterprise technology partner — Odoo ERP, custom software, agentic AI, IT staff augmentation, and cloud managed services. Writing grounded in 50+ Odoo implementations, certified Odoo partner since 2012, and enterprise delivery since 2005.

Talk to our team →

Working on something like this? Let's compare notes.

If this piece resonated, odds are we've seen the problem before. 30-minute call with a senior architect — honest answers, no sales deck.

Book a 30-min discovery call