Free Calculator

Free AI Token Cost Calculator

The AI Token Cost Calculator estimates how much your application will spend on LLM API calls. Enter your tokens per request and request volume to compare costs across every major AI provider side-by-side.

16 models5 providersPer-request mathNo signup

Want curated AI news for builders? Subscribe to AI Wins

How it works

Estimate your AI bill in three steps

1

Describe your workload

Enter expected input tokens per request, output tokens per request, and how many requests you expect each day. Or pick a preset.

2

Compare every model

See per-request, per-day, and per-month cost across 16 models from OpenAI, Anthropic, Google, Meta, and DeepSeek. The cheapest is highlighted.

3

Pick the best fit

Filter by provider tier, balance cost against capability, and copy the winning model into your stack with confidence.

Calculator

Describe your workload

Prompt + system + retrieved context

Tokens the model generates

Monthly cost = daily × 30

Workload presets

Filter providers

Estimated monthly cost: For 800 input / 400 output tokens per request at 5,000 requests/day, the Llama 3.1 8B is the cheapest at $32.40/mo. Most expensive (Claude Opus 4.5): $6,300.00/mo (194.4× more).

Results

Cost breakdown by model

Showing 16 of 16 models, cheapest first

Meta · Efficient

Llama 3.1 8B

Cheapest

Tiny, ultra-cheap open-weight

/ request
$0.0002
/ day
$1.08
/ month
$32.40
Context: 128K$0.18 in / $0.18 out per 1M

Google · Efficient

Gemini 2.0 Flash

Cheapest tier from Google

/ request
$0.0002
/ day
$1.20
/ month
$36.00
Context: 1M$0.10 in / $0.40 out per 1M

OpenAI · Efficient

GPT-4o Mini

Fast, cheap, great for high-volume

/ request
$0.0004
/ day
$1.80
/ month
$54.00
Context: 128K$0.15 in / $0.60 out per 1M

DeepSeek · Balanced

DeepSeek V3

Strong general model at low cost

/ request
$0.0007
/ day
$3.28
/ month
$98.40
Context: 64K$0.27 in / $1.10 out per 1M

Meta · Balanced

Llama 3.3 70B

Open-weight, hosted via Together/Groq

/ request
$0.0007
/ day
$3.60
/ month
$108.00
Context: 128K$0.60 in / $0.60 out per 1M

Google · Efficient

Gemini 2.5 Flash

Fast with large context

/ request
$0.0012
/ day
$6.20
/ month
$186.00
Context: 1M$0.30 in / $2.50 out per 1M

DeepSeek · Frontier

DeepSeek R1

Open-weight reasoning model

/ request
$0.0013
/ day
$6.58
/ month
$197.40
Context: 64K$0.55 in / $2.19 out per 1M

OpenAI · Balanced

o3-mini

Efficient reasoning model

/ request
$0.0026
/ day
$13.20
/ month
$396.00
Context: 200K$1.10 in / $4.40 out per 1M

Anthropic · Efficient

Claude Haiku 4.5

Fast and inexpensive

/ request
$0.0028
/ day
$14.00
/ month
$420.00
Context: 200K$1.00 in / $5.00 out per 1M

Meta · Frontier

Llama 3.1 405B

Largest open-weight Meta model

/ request
$0.0042
/ day
$21.00
/ month
$630.00
Context: 128K$3.50 in / $3.50 out per 1M

OpenAI · Balanced

GPT-4.1

Long-context refresh of GPT-4o

/ request
$0.0048
/ day
$24.00
/ month
$720.00
Context: 1M$2.00 in / $8.00 out per 1M

Google · Balanced

Gemini 2.5 Pro

Massive context window

/ request
$0.0050
/ day
$25.00
/ month
$750.00
Context: 2M$1.25 in / $10.00 out per 1M

OpenAI · Balanced

GPT-4o

OpenAI's general-purpose flagship

/ request
$0.0060
/ day
$30.00
/ month
$900.00
Context: 128K$2.50 in / $10.00 out per 1M

Anthropic · Balanced

Claude Sonnet 4.5

Strong coding and tool use

/ request
$0.0084
/ day
$42.00
/ month
$1,260.00
Context: 200K$3.00 in / $15.00 out per 1M

OpenAI · Frontier

o1

Reasoning model for hard problems

/ request
$0.0360
/ day
$180.00
/ month
$5,400.00
Context: 200K$15.00 in / $60.00 out per 1M

Anthropic · Frontier

Claude Opus 4.5

Anthropic's top-tier model

/ request
$0.0420
/ day
$210.00
/ month
$6,300.00
Context: 200K$15.00 in / $75.00 out per 1M

FAQ

Frequently Asked Questions

How is LLM API cost calculated?

API providers charge separately for input tokens (your prompt and context) and output tokens (the model's response). Cost per request equals (input_tokens times input_price_per_million divided by 1,000,000) plus (output_tokens times output_price_per_million divided by 1,000,000). This calculator does the math for you across every major model so you can compare side by side.

What is a token?

A token is roughly 0.75 of an English word, or about 4 characters. So 1,000 tokens is about 750 words. The exact count varies by model and language. For precise counting, use a tokenizer like tiktoken (OpenAI) or the LLM Token Counter at /tools/llm-token-counter.

Which LLM is cheapest in 2026?

For most general workloads, GPT-4o-mini, Gemini 2.0 Flash, Claude Haiku 4.5, DeepSeek V3, and Llama 3.1 8B sit at the cheap end - all under $1 per million input tokens. Use this calculator with your real workload to find the actual winner. Per-million headline prices can mislead because input/output token ratios differ across applications.

Why is output more expensive than input?

Output tokens are generated one at a time and require more compute per token than processing input. Most providers price output 3-5 times higher than input. That is why workloads with long answers (agents, summarizers) cost much more than workloads with long prompts and short answers (classifiers, RAG with one-line responses).

How accurate are these numbers?

Pricing is sourced from each provider's published rate cards. Actual bills can differ for these reasons: cached input tokens (50-90 percent discount on supporting providers), batch API discounts (typically 50 percent), volume tiers, and price changes. Use this as a planning estimate, not a billing guarantee. Always confirm with each provider's pricing page before committing to a vendor.

How do I estimate tokens per request for my app?

For a chat app, count the system prompt plus the user message plus retrieved context as input, and the model's reply as output. Typical patterns: support chatbot 500-1500 in / 200-600 out; RAG 2000-6000 in / 300-800 out; coding agent 4000-15000 in / 1000-5000 out; classification 100-400 in / 5-50 out. When in doubt, log a few real requests in development and average them.

Related tools

Keep exploring