AI Token King Logo AI Token King
Get Started

What is the price of AI API? Token fees and function fees should be separated

When many people look at the AI ​​API price list for the first time, it is easiest to just focus on one number: how much per million Tokens. But now the pricing of mainstream

May 22, 2026

What is the price of AI API? Token fees and function fees should be separated

When many people look at the AI ​​API price list for the first time, it is easiest to just focus on one number: how much per million Tokens. But now the pricing of mainstream

platforms has long been beyond this level.

In addition to input, cached input, and output, OpenAI also lists tool-type fees such as web search and containers; Anthropic also separates standard token pricing, prompt caching, web search, code execution, etc.; Google Gemini also lists different fees for input, output, context caching, storage, and Grounding with Google Search.

So the really correct view is not "which model is the cheapest per million Tokens", but to break down two things first:

The Token cost of the model itself

The functional costs incurred after you enable additional functions

Looking at it this way, you will not misjudge a model that looks cheap as it must be the cheapest in actual use. The core of your original manuscript is here, and I will organize it more clearly in this version.

First let’s make it clear: What is the Token fee?

Token cost, in its simplest form, is the cost of "how much it reads in and how much it spits out".

For example, OpenAI’s price page lists input, cached input, and output separately; Anthropic also separates Base Input Tokens, Cache Writes, Cache Hits & Refreshes, and Output Tokens; Gemini also lists input and output per 1M tokens, and even context caching is separately priced according to token price and storage price.

This layer essentially calculates: the workload of the model itself in processing content.

For example, if you do these tasks:

The most basic billing usually starts here.

Token fee is the three most commonly seen fields

The content you give to the model. For example prompt, system command, historical context, attached file text.

Cached input / Cache hits

If the platform supports caching, reused inputs may be calculated cheaper. OpenAI's cached input is significantly lower than general input; Anthropic's cache hits are also significantly lower than base input; Gemini separates context caching price and storage price.

The content the model returns to you. This is also the most easily underestimated cost source for many novices. OpenAI's GPT-5.4, mini, and nano all have output unit prices higher than input; the same goes for Anthropic's Sonnet 4.5 and Haiku 4.5; in Gemini's multiple model price lists, output is also generally higher than input.

What is functional fee?

Function fee is the extra cost you pay to make the model do more things.

This type of fee is not necessarily calculated by token. Common units may be:

per search

per GB/day

per container

per 1,000 grounded prompts

In other words, the function fee does not answer "how much does it cost to read and write text for the model", but:

Have you asked the model to search, capture data, run tools, cache, open containers, and do grounding?

How to view the functional costs of OpenAI

OpenAI’s price page is very typical.

In addition to the input / cached input / output of the GPT model itself, it also lists:

Web search

The OpenAI price page lists $10 / 1k calls, and states that Search content tokens are free. This means that web search itself is charged based on the number of calls, not just the token fee.

Containers

OpenAI also lists container costs independently, such as the price of a 1 GB container, and session pricing methods at different points in time. This is not a general token fee, but an execution environment fee.

Batch API

Although this is not a tool fee, it is a service layer pricing mechanism. OpenAI's official Batch API allows 50% reduction in input and output. This means that the token cost is the same, and the price is different whether you use batch or not.

So if you are looking at the price of OpenAI, you should not only look at the unit price of the model, but also whether you have turned on additional layers such as search, container or batch.

How to look at Anthropic’s functional costs

Anthropic’s pricing logic is also very suitable as an example, because it clearly separates “token cost” and “tool cost”.

Prompt caching

In Claude API pricing, list Base Input, Cache Writes, Cache Hits & Refreshes separately. This shows that prompt caching is not a vague concept, but a cost item that officially enters the billing structure.

Anthropic officially states: Web search usage is charged in addition to token usage, the price is $10 per 1,000 searches, and the search result content will also be included in the standard token cost. This sentence is crucial because it directly proves that the same request can have both a token bill and a tool bill.

Code execution

Anthropic officials also split the cost of the code execution tool.

When paired with web search / web fetch, there is no additional code execution charge, and the excess will still be calculated based on execution time

Each organization has 1,550 free hours per month

The excess is $0.05 per hour, per container

This typically tells you: Claude's API cost is not just input/output, but may also include tool layer execution fees.

How to look at the function fees of Gemini

The same is true for Google Gemini, not just tokens.

Context caching and storage

The Gemini pricing page lists Context caching price and storage price separately. For example, multiple models can see that in addition to the context caching price, there are also storage prices such as $1.00 / 1,000,000 tokens per hour. This means that you are not only paying for the cache content itself, but you may also be paying for the cache retention time.

Grounding with Google Search

The Gemini pricing page lists Grounding with Google Search in many places, usually with a free quota first, followed by $35 / 1,000 grounded prompts. This is not general token pricing, but an independent fee for the search grounding function.

Example of Gemini 2.5 Flash-Lite

Gemini 2.5 Flash-Lite is officially described as the smallest and most cost effective model, built for at scale usage. It seems very cheap, but if your process also uses context caching, storage, and grounding, the final bill will not be determined only by the model input / output.

Why must these two expenses be looked at separately?

Because they answer two completely different questions.

Token cost answers:

How much does it cost for the model itself to process this content.

The answer to function cost is:

Have you asked the model to search, capture data, run tools, save cache, and open containers?

If you mix these two layers, it's easy to draw the wrong conclusions.

For example, the input/output of a certain model is very cheap, but your process uses a lot of web search, grounding, file search, code execution or containers. In the end, the main reason for raising the bill may not be the model itself at all, but the functional layer.

On the other hand, some tasks clearly only require pure text generation, but if you compare them with a bunch of tool costs, your judgment will be distorted.

Three platforms, three typical judgment methods

If you look at OpenAI

What is most easily overlooked is that in addition to the unit price of the model, there are also items such as web search, containers, and batches; but cached input can reduce input costs. Therefore, the price comparison of OpenAI should not only look at the model input/output, but also whether you use a lot of tools and whether you have cache.

If you look at Anthropic

the focus is usually on: which things just add tokens, and which things will charge additional fees.

For example, web search adds $10 for every 1,000 searches and token is added; code execution may be calculated based on container hour. In this case, if you only look at the input/output of Sonnet or Haiku, the judgment will be incomplete.

If you look at Gemini

the most easily overlooked thing is: it not only counts tokens, but also collects context caching, storage, and grounding separately. This means that Gemini’s bill is likely to include:

Model token cost

cache cost

cache storage cost

grounding cost

How do novices compare so that they are not easily misunderstood?

The most practical approach is to do it in two steps.

The first step is to calculate the pure model cost

That is, first estimate how much this task will consume:

input tokens

output tokens

cached tokens

batch discounted token cost

First calculate "how much it will cost if you only run the model."

The second step is to add the function costs back one by one

Is there web search

Is there grounding

Is there file search / retrieval

Is there code execution

Is there containers

Is there context caching storage

This step is what many people really miss. Because once the application changes from a simple chat to a search assistant, RAG, agent workflow or multi-modal process, the functional cost is often no longer a small number.

Which tasks are most likely to overlook functional costs?

The first type: search application

You think you are just asking model questions, but in fact, web search or grounding is being triggered every time. OpenAI, Anthropic, and Gemini all clearly price such functions independently.

Second type: Knowledge base/RAG process

This type of task often uses file search, context caching, cache storage or a large number of repeated prompts. OpenAI has tool and container layer fees, Gemini has context caching and storage, and Anthropic has prompt caching.

The third type: tool-based Agent

As long as the model starts to help you run tools, open containers, execute programs, and edit files, the cost structure will be completely different from that of simple generation. Anthropic's code execution is a typical example.

The price of AI API cannot only depend on Token. A more accurate view is:

First look at the input / output / cache cost of the model itself, and then look at what additional search, cache, storage, tools, containers or grounding functions you use.

The real bill is usually composed of these two layers.

So when you compare models next time, don’t just ask “how much per million Tokens”; you should ask more:

Is this price a pure model fee, or does it also include function fees?

If you look at it this way, you won’t mistake a cheap-looking model for one that is definitely the most cost-effective in actual use.

Should the price of AI API be based on per million Tokens first?

Not enough. In addition to token fees, mainstream platforms now often have additional fees for search, caching, storage, tools, containers or grounding.

Which is more important, token fee or function fee?

Both are important. For pure text generation tasks, the token fee is usually the core; but as long as your process uses search, tool use, grounding, and code execution, the function fee may quickly become larger.

Is OpenAI’s web search counted as token or function fee?

The OpenAI price page directly lists it as $10 / 1k calls, and search content tokens are free, so it is a function fee, not a general input / output token fee.

What about Anthropic’s web search?

Anthropic’s web search is $10 per 1,000 searches, and the search result content is also included in the standard token cost, so it has both a function fee and a token fee.

Why is it easy to miss Gemini’s context caching?

Because Gemini not only counts the context caching token price, but also the storage price. It's easy to underestimate the overall bill if you only look at input/output.

Who is most likely to overlook functional fees?

The most common ones are people who work as search assistants, RAGs, knowledge bases, tool agents, and multi-step workflows. Because these systems often not only generate models, but also are equipped with search, cache, tool execution and external data calling.

Data source and credibility statement

This article is compiled and written based on the official pricing pages and official documents of mainstream AI platforms, focusing on the following sources:

OpenAI | API Pricing | | | Anthropic | Pricing | | The price list is an easy way to read. The direction of your original manuscript is correct. This version of mine is to organize it into a more complete version that can be directly uploaded to the website.

If you want to see more extended content from getting started, you can go directly to AI Token.

This article belongs to the category "AI Token Fees".

This category mainly organizes AI Token prices, AI Token fees, model pricing methods, platform billing structures, functional fee differences and cost interpretation logic to help novices, content creators, case recipients and companies when they come into contact with AI APIs, not only look at the superficial unit price, but really understand what the bills are composed of.

What’s the price of AI Token? Newbies should first understand where the fees come from

What are the billing methods for AI Token? Not every platform is the same

How to calculate the cost of AI Token? It can be seen clearly from the input and output separately

How to view Claude Token billing? Which usage scenarios are suitable

API Pricing

AI API Price

AI Token organizes the basic concepts, calculation methods, API fees and model comparisons of AI Token (word elements), and covers common models such as ChatGPT, Gemini, Claude, etc. to help you establish clear understanding and judgment faster.

  • Function
    Model comparison
    Usage context
    AI Token Calculator
  • Learn
    Getting Started
    Article area

Other information
About us
Privacy Policy

© 2026 AI Token. All rights reserved.

學習
新手入門
文章專區

其他資訊
關於我們
隱私權政策

© 2026 AI Token. All rights reserved.

Share: X / Twitter LinkedIn
Back to Blog