How many tokens are used for a Chinese character? Comparison of the differences between ChatGPT, Claude, and Gemini

When many people start to calculate the cost of AI Tokens, the most common question that arises is not how many tokens are there in the entire article, but a more intuitive sentence: How many tokens will be used for a Chinese character?

Let’s talk about the conclusion directly: there is no fixed formula of “each Chinese character = several tokens” that is common to all three platforms. Because ChatGPT, Claude, and Gemini each have different tokenization rules, and the officials are more inclined to provide token counting tools or APIs, rather than directly promising "how many tokens must be per Chinese character".

OpenAI clearly states that non-English languages often have a higher token-to-character ratio; Gemini gives an official rough estimate of "about 1 token = 4 characters"; Claude's official focus is to provide a Count Tokens API that counts tokens first, and reminds that the result should be regarded as an estimate.

Look at the table first: How many Tokens are used for a Chinese character?

I divided the table below into two columns: one column is the extent to which the official statement has been made, and the other column is the conservative range that can be used for cost estimation. The "conservative range" is a practical estimate based on the direction of official documents, not an official guaranteed value.

Officially confirmed statement

The range that can be grasped when used to estimate Chinese

ChatGPT / OpenAI

English is about 1 token ≈ 4 characters; non-English usually has a higher token-to-character ratio; there are official Tokenizer and token counting that can be measured.

About 0.8~2.0 token / 1 Chinese character

Do not use the algorithm of 4 characters = 1 token in Chinese. OpenAI officials have made it clear that non-English characters are often higher. When making a budget, it will be more conservative to capture 1 word ≈ 1 token.

Claude / Anthropic

The official does not directly give a fixed formula for "how many tokens per Chinese character"; a Count Tokens API is provided, and the result is an estimate. Support system, tools, images, PDF.

Claude The safest way is not to memorize the formula, but to directly run the official count tokens first. When making a rough estimate, you can first use a conservative interval similar to OpenAI.

Gemini / Google

The official statement is clearest: For Gemini models, a token is equivalent to about 4 characters. There is also Count Tokens API.

About 0.25~1.0 token / 1 Chinese character

If you completely follow the official rough estimate, it will be close to 4 characters ≈ 1 token; however, it is still recommended to use the Count Tokens API for official pricing, and do not rely solely on the number of words.

Why did I give a range, not a single number?

Because the most error-prone part of this theme is treating token as the number of words.

OpenAI officials make it very clear that a token can be as short as one character or as long as a whole word; and non-English languages often have a relatively high token-to-character ratio. This means that when the same Chinese sentence is encountered on different platforms, different models, and different formats, the results may be different.

Claude here is another reminder: it does not directly give you a fixed formula of "how many tokens per Chinese character", but directly provides the Count Tokens API, and clearly states that the result should be regarded as an estimate. This actually tells you that the official itself does not regard this matter as a dead formula.

Gemini official is relatively generous, directly stating that 1 token is approximately equal to 4 characters, but it also provides count_tokens and usage_metadata, indicating that when it comes to really looking at the cost, the official standard is still "actual count", not just mental calculations.

If you just want to quickly capture costs, what is the most practical way to estimate it?

ChatGPT: First grab 1 Chinese character for about 1 token, up or down

If you are just doing SEO article, customer service draft, summary content estimation, ChatGPT / OpenAI I would recommend grabbing first:

Loose estimation method: 1 word ≈ 0.8 tokens

Conservative estimation method: 1 word ≈ 1~1.5 tokens

Extremely conservative estimation method: 1 word ≈ 2 tokens

The reason for this is that OpenAI officially states that non-English numbers are usually higher, so Chinese is not suitable for calculating according to the English formula.

Claude: There is no fixed formula for each character. The most stable method is to count first

Claude Here I do not recommend that you memorize "how many tokens for each Chinese character". If you only need to grasp the budget in the early stage, you can first use a conservative range similar to ChatGPT:

about 0.8~2.0 token/word

But as long as you enter the official launch and formal cost control, the most stable way is to directly use Claude's official token counting. Because the official said it themselves, the result is an estimate, and it supports system, tools, images, and PDF, which means that many additional structures will also affect the result. ] token

But in practice, I would not recommend that you really only catch this low, because prompt also has factors such as format, system, context, API packaging, etc. So in terms of planning, you can do this:

Ideal rough estimate: 0.25~0.5 token/word

Safer estimate: 0.5~1 token/word

This will be safer than memorizing "four Chinese characters for one token".

What really affects "how many tokens per Chinese character" is not just the text itself

The most important part of this article is actually not the table, but the following thing:

The tokens you actually pay for in the end are usually not just the Chinese characters in the text.

System prompt will also be counted

OpenAI's token counting API and Claude's count tokens are calculated using a structure close to the formal request, which means that system prompt will inherently affect tokens.

Gemini’s official usage_metadata and OpenAI’s conversation token counting both indicate that context is not provided for free. The longer your conversation, the higher the accumulated tokens are usually.

Tools, images, and PDFs will also be counted

Claude officially states that token counting supports tools, images, and PDFs; Gemini also states that all input/output, including non-text content, will be tokenized.

So if you just ask "How many tokens does a Chinese character cost?", the answer can only help you make a very rough text budget at best. Once you enter the actual use of the API, what you really should look at is the entire request.

If you are doing AI Token cost control, what is the most recommended method?

The most practical approach is not to pursue a magic formula, but to divide the estimation method into two levels.

The first level: Use the interval for early planning

You can first capture it like this:

1 Chinese character first capture 1 token

1 Chinese character first capture 0.5 token

This set of numbers is not about pursuing absolute accuracy, but it is less likely to miss when planning the upper limit of the budget. This is a conservative estimate based on the directions confirmed by three official documents.

Second level: Use official count tokens before official launch

To be really accurate, you should:

OpenAI uses official tokenizer / token counting.

Claude uses the Count Tokens API.

Gemini uses count_tokens.

This is much more reliable than converting by word count.

Conclusion: The 3 most memorable sentences from this article

First, there is no cross-platform fixed number of tokens for a Chinese character

ChatGPT, Claude, and Gemini will not share the same set of "tokens per word" standards.

Second, OpenAI and Claude are not suitable for memorizing every word formula

OpenAI only explicitly talks about the higher ratio of English rough estimation to non-English; Claude takes the count tokens route more directly.

Third, Gemini is most suitable for making a rough estimate of characters first, but officially it still requires count

Google officially gives 1 token ≈ 4 characters, but it is still recommended to look at the Count Tokens API for the official cost.

Is one Chinese character necessarily equal to 1 token?

第二，OpenAI 和 Claude 比較不適合硬背每字公式

OpenAI 只明講英文粗估與非英文較高比率；Claude 則更直接走 count tokens 路線。

第三，Gemini 最適合先做字元粗估，但正式還是要 count

Google 官方確實給了 1 token ≈ 4 characters，但正式成本仍建議看 Count Tokens API。

一個中文字一定等於 1 token 嗎？

uncertain. Different platforms, different models, and different request structures may result in different results. OpenAI officially states that non-English languages usually have a higher token-to-character ratio; both Claude and Gemini also provide official token counting mechanisms instead of guaranteeing a fixed value per character.

Which company is most suitable for using "Estimation Token per Word"?

Gemini is best for making a rough estimate first, because Google officially writes that 1 token is approximately equal to 4 characters. However, count_tokens is still used for formal calculations.

Why is the Chinese token of ChatGPT often higher than expected?

Because OpenAI officials have made it clear that non-English texts usually have a higher token-to-character ratio, so Chinese often cannot fit the English 4 characters = 1 token.

Claude why not just give each Chinese character how many tokens?

Because Anthropic’s official approach is to provide Count Tokens API directly, and it is stated clearly that the result should be regarded as an estimate. This means that the official itself does not encourage you to memorize tokens as a fixed number of words formula.

Data source and credibility statement

This article mainly refers to the OpenAI official Token description, Claude official Token Counting document and Gemini official Token description, as the main source of information for sorting out the topic "Approximately how many Tokens are used for a Chinese character". Because the three platforms have different ways of segmenting tokens, and the official does not provide a cross-platform and fixed formula of "each Chinese character must be equal to several tokens", so this article will separate the official part and the range that can be used for estimation in practice to avoid mistaking the estimated value for a fixed rule.

If you want to understand the differences between models, platforms and costs faster, you can also go back to AI Token to see the complete summary.

This article belongs to the category "AI Token Computing".

This category mainly organizes how AI Token is calculated, the difference between input and output, token consumption logic of different models or platforms, misunderstandings about word count and token conversion, backend usage interpretation and cost control concepts. It helps users who are new to AI API not only know that token will affect the price, but also better understand why the same Chinese content may calculate different tokens on different platforms.

How many tokens will be consumed for the same content in ChatGPT, Claude, and Gemini? Comparison of the differences between the three major platforms

How to view GPT Token billing? It’s enough for novices to understand the key points first

How to check Gemini Token billing? Focused collection of Google model fees

API Token
Gemini Token
Claude Token
Token conversion
ChatGPT Token
Chinese Token

AI Token Organizes the basic concepts, calculation methods, API fees and model comparisons of AI Token (word elements), and covers common models such as ChatGPT, Gemini, Claude, etc. to help you establish clear understanding and judgment faster.

How many tokens are used for a Chinese character? Comparison of the differences between ChatGPT, Claude, and Gemini