Meta · Efficient
Llama 3.1 8B
Tiny, ultra-cheap open-weight
- / request
- $0.0002
- / day
- $1.08
- / month
- $32.40
The AI Token Cost Calculator estimates how much your application will spend on LLM API calls. Enter your tokens per request and request volume to compare costs across every major AI provider side-by-side.
Want curated AI news for builders? Subscribe to AI Wins
How it works
Enter expected input tokens per request, output tokens per request, and how many requests you expect each day. Or pick a preset.
See per-request, per-day, and per-month cost across 16 models from OpenAI, Anthropic, Google, Meta, and DeepSeek. The cheapest is highlighted.
Filter by provider tier, balance cost against capability, and copy the winning model into your stack with confidence.
Calculator
Prompt + system + retrieved context
Tokens the model generates
Monthly cost = daily × 30
Workload presets
Filter providers
Estimated monthly cost: For 800 input / 400 output tokens per request at 5,000 requests/day, the Llama 3.1 8B is the cheapest at $32.40/mo. Most expensive (Claude Opus 4.5): $6,300.00/mo (194.4× more).
Results
Showing 16 of 16 models, cheapest first
Meta · Efficient
Tiny, ultra-cheap open-weight
Google · Efficient
Cheapest tier from Google
OpenAI · Efficient
Fast, cheap, great for high-volume
DeepSeek · Balanced
Strong general model at low cost
Meta · Balanced
Open-weight, hosted via Together/Groq
Google · Efficient
Fast with large context
DeepSeek · Frontier
Open-weight reasoning model
OpenAI · Balanced
Efficient reasoning model
Anthropic · Efficient
Fast and inexpensive
Meta · Frontier
Largest open-weight Meta model
OpenAI · Balanced
Long-context refresh of GPT-4o
Google · Balanced
Massive context window
OpenAI · Balanced
OpenAI's general-purpose flagship
Anthropic · Balanced
Strong coding and tool use
OpenAI · Frontier
Reasoning model for hard problems
Anthropic · Frontier
Anthropic's top-tier model
FAQ
API providers charge separately for input tokens (your prompt and context) and output tokens (the model's response). Cost per request equals (input_tokens times input_price_per_million divided by 1,000,000) plus (output_tokens times output_price_per_million divided by 1,000,000). This calculator does the math for you across every major model so you can compare side by side.
A token is roughly 0.75 of an English word, or about 4 characters. So 1,000 tokens is about 750 words. The exact count varies by model and language. For precise counting, use a tokenizer like tiktoken (OpenAI) or the LLM Token Counter at /tools/llm-token-counter.
For most general workloads, GPT-4o-mini, Gemini 2.0 Flash, Claude Haiku 4.5, DeepSeek V3, and Llama 3.1 8B sit at the cheap end - all under $1 per million input tokens. Use this calculator with your real workload to find the actual winner. Per-million headline prices can mislead because input/output token ratios differ across applications.
Output tokens are generated one at a time and require more compute per token than processing input. Most providers price output 3-5 times higher than input. That is why workloads with long answers (agents, summarizers) cost much more than workloads with long prompts and short answers (classifiers, RAG with one-line responses).
Pricing is sourced from each provider's published rate cards. Actual bills can differ for these reasons: cached input tokens (50-90 percent discount on supporting providers), batch API discounts (typically 50 percent), volume tiers, and price changes. Use this as a planning estimate, not a billing guarantee. Always confirm with each provider's pricing page before committing to a vendor.
For a chat app, count the system prompt plus the user message plus retrieved context as input, and the model's reply as output. Typical patterns: support chatbot 500-1500 in / 200-600 out; RAG 2000-6000 in / 300-800 out; coding agent 4000-15000 in / 1000-5000 out; classification 100-400 in / 5-50 out. When in doubt, log a few real requests in development and average them.
Related tools
If you already know your total monthly token volume, use this to compare rates across providers without per-request math.
Open toolPaste a real prompt to see exactly how many tokens it uses. Pair with this calculator for an accurate workload estimate.
Open toolCompare the API bill from this tool against the labor hours and revenue an AI feature can unlock.
Open toolBrowse 20+ popular AI tools across six categories with pricing, features, and standout capabilities.
Open tool