Inference Economics for Open AI and Anthropic — Full Cost & Income Breakdown (Free Tier + API Business)

A detailed cost model, paid+free chat and API P&L (users/traffic & per‑million margins).

$ per 1M tokens
$4.22–$11.32
Optimized floor → Baseline ceiling (H100; DeepSeek‑V3.1‑class). NVIDIA, NextPlatform
H100 throughput (t/s)
45 → 90
fp16 bound → fp8 + FlashAttention‑2 + vLLM batching FA‑2, vLLM, bench
ChatGPT usage anchors
190.6M DAU · 2.5B prompts/day
Free token share (OpenAI)
84.8%
Derived from 1k tokens/prompt and paid usage model (20/80).
A. Cost Model

Per‑GPU monthly cost and $/1M tokens (component breakdown)

Infographic: H100 monthly TCO (mid): $1,031.85 = amort $833.33 + power $50.40 + cooling $15.12 + colocation $133.00 Brightlio, CBRE. $/M spans baseline $7.45–$11.32 and optimized $4.22–$5.13.

Per‑GPU Monthly Cost (low / mid / high)

ComponentLowMidHigh
Amortization$694.44$833.33$1,111.11
Power$50.40$50.40$50.40
Cooling (+30%)$15.12$15.12$15.12
Colocation$114.41$133.00$152.11
Total$874.37$1,031.85$1,328.74

$ per 1M tokens — ranges

MinMidMax
Baseline (~45 t/s)$7.45$8.79$11.32
Optimized (~90 t/s)$4.22$4.42$5.13

Per‑million component contributions (mid): Baseline amort $7.10, power $0.43, cooling $0.13, colo $1.13. Optimized amort $3.57, power $0.22, cooling $0.06, colo $0.57.

B. OpenAI Chat P&L

Paid vs free usage — costs and revenue

Infographic: DAU ≈ 190.6M (source), prompts/day ≈ 2.5B (source), 10% paid ⇒ 19.1M paid, 171.5M free. Tokens/prompt = 1,000. Paid usage: 20% heavy (80/day) & 80% avg (5/day).

Token volumes

Tokens / month
Paid users11.4T
Free users63.6T
Total75.0T

Costs — full range

Baseline minBaseline midBaseline maxOpt minOpt midOpt max
Paid cost $85,216,271 $100,564,346 $129,499,006 $48,222,045 $50,584,168 $58,624,550
Free cost $473,652,242 $558,960,484 $719,786,185 $268,029,562 $281,158,802 $325,849,151
Total cost $558,868,512 $659,524,830 $849,285,191 $316,251,608 $331,742,970 $384,473,701

Revenue & gross profit

Value
Revenue ($20 × paid users)$381,200,000
Gross profit (baseline mid)$-278,324,830
Gross profit (optimized mid)$49,457,030
Break‑even sub price (baseline mid)$34.60/mo
Break‑even sub price (optimized mid)$17.41/mo
C. Anthropic Chat P&L

Paid vs free usage — scaled costs and revenue

Infographic: Claude MAU mid ≈ 24.4M (StatsUp, SQ). 10% paid ⇒ 2.4M paid, 22.0M free. Free:paid token ratio borrowed from ChatGPT derivation = 5.56:1.

Token volumes (est.)

Tokens / month
Paid users1.5T
Free users (est.)8.2T
Total9.6T

Costs & revenue (optimized mid)

Value
Paid cost$6,488,892
Free cost (est.)$36,066,803
Total cost$42,555,696
Revenue ($20 × paid users)$48,900,000
Gross profit (opt‑mid)$6,344,304
D. API Business

Users, traffic, and per‑million token margins

OpenAI: "Three million developers build with OpenAI" (official); reported ~2.2B API requests/day (SQ Magazine).
Anthropic: reported ~25B API calls/month (≈ 833,333,333/day) (SQ Magazine). Pricing: official. OpenAI pricing: official.

API “DAU” (active orgs/keys per day) — scenarios

ScenarioPer‑org calls/dayOpenAI API DAU (orgs)Anthropic API DAU (orgs)
Small apps1,0002,200,000833,333
Mid apps10,000220,00083,333
Large apps100,00022,0008,333

These are org‑level DAU estimates, not end users. A mid‑case of ~10k calls/org/day ⇒ ~220k active orgs/day at OpenAI, consistent with a subset of ~3M total developers grouped into teams/apps.

Per‑million pricing → margins

Blended price per 1M total tokens: P = p_in·(1−s_out) + p_out·s_out for output share s_out. We show 50/50 and output‑heavy 30/70 mixes. Margins subtract our serving COGS $/M (H100 TCO): best‑case (optimized‑min) → worst‑case (baseline‑max).

Model & mixBlended price / 1MMargin band ($/M)Margin band (%)
OpenAI GPT‑5 — 50/50 in:out$5.62$1.41 → $-5.7025% → -101%
OpenAI GPT‑5 — 30/70 in:out$7.38$3.16 → $-3.9543% → -54%
OpenAI GPT‑4.1 — 50/50 in:out$7.50$3.28 → $-3.8244% → -51%
OpenAI GPT‑4.1 — 30/70 in:out$9.30$5.08 → $-2.0255% → -22%
Anthropic Claude Sonnet 4 (≤200k) — 50/50 in:out$9.00$4.78 → $-2.3253% → -26%
Anthropic Claude Sonnet 4 (≤200k) — 30/70 in:out$11.40$7.18 → $0.0863% → 1%
E. Verdict

Can $20 paid subs subsidize free chat usage?

Figures are serving COGS only (GPUs, power, cooling, colocation); exclude staff, R&D, networking/egress, training amortization, idle capacity.

References
  1. DeepSeek‑V3.1 (Hugging Face); NextPlatform: active params ≈37B
  2. NVIDIA H100; FlashAttention‑2; vLLM docs; DeepSeek V3 H100 bench; SGLang; PagedAttention
  3. SQ Magazine: ChatGPT DAU/WAU; TechCrunch: 2.5B prompts/day
  4. StatsUp: Claude MAU; SQ Magazine: Claude stats
  5. Brightlio: colocation pricing; CBRE: Data center trends
  6. Anthropic pricing; OpenAI pricing; OpenAI API overview; OpenAI API requests/day; Claude API calls/month