How many tokens are used for a Chinese character? Comparison of the differences between ChatGPT, Claude, and Gemini
When many people start to calculate the cost of AI Tokens, the most common question that arises is not how many tokens are there in the entire article, but a more intuitive sentence: How many tokens will be used for a Chinese character?
Let’s talk about the conclusion directly: there is no fixed formula of “each Chinese character = several tokens” that is common to all three platforms. Because ChatGPT, Claude, and Gemini each have different tokenization rules, and the officials are more inclined to provide token counting tools or APIs, rather than directly promising "how many tokens must be per Chinese character".
OpenAI clearly states that non-English languages often have a higher token-to-character ratio; Gemini gives an official rough estimate of "about 1 token = 4 characters"; Claude's official focus is to provide a Count Tokens API that counts tokens first, and reminds that the result should be regarded as an estimate.
Look at the table first: How many Tokens are used for a Chinese character?
I divided the table below into two columns: one column is the extent to which the official statement has been made, and the other column is the conservative range that can be used for cost estimation. The "conservative range" is a practical estimate based on the direction of official documents, not an official guaranteed value.
Officially confirmed statement
The range that can be grasped when used to estimate Chinese
ChatGPT / OpenAI
English is about 1 token ≈ 4 characters; non-English usually has a higher token-to-character ratio; there are official Tokenizer and token counting that can be measured.
About 0.8~2.0 token / 1 Chinese character
Do not use the algorithm of 4 characters = 1 token in Chinese. OpenAI officials have made it clear that non-English characters are often higher. When making a budget, it will be more conservative to capture 1 word ≈ 1 token.
Claude / Anthropic
The official does not directly give a fixed formula for "how many tokens per Chinese character"; a Count Tokens API is provided, and the result is an estimate. Support system, tools, images, PDF.
Claude The safest way is not to memorize the formula, but to directly run the official count tokens first. When making a rough estimate, you can first use a conservative interval similar to OpenAI.
Gemini / Google
The official statement is clearest: For Gemini models, a token is equivalent to about 4 characters. There is also Count Tokens API.
About 0.25~1.0 token / 1 Chinese character
If you completely follow the official rough estimate, it will be close to 4 characters ≈ 1 token; however, it is still recommended to use the Count Tokens API for official pricing, and do not rely solely on the number of words.
Why did I give a range, not a single number?
Because the most error-prone part of this theme is treating token as the number of words.
OpenAI officials make it very clear that a token can be as short as one character or as long as a whole word; and non-English languages often have a relatively high token-to-character ratio. This means that when the same Chinese sentence is encountered on different platforms, different models, and different formats, the results may be different.
Claude here is another reminder: it does not directly give you a fixed formula of "how many tokens per Chinese character", but directly provides the Count Tokens API, and clearly states that the result should be regarded as an estimate. This actually tells you that the official itself does not regard this matter as a dead formula.
Gemini official is relatively generous, directly stating that 1 token is approximately equal to 4 characters, but it also provides count_tokens and usage_metadata, indicating that when it comes to really looking at the cost, the official standard is still "actual count", not just mental calculations.
If you just want to quickly capture costs, what is the most practical way to estimate it?
ChatGPT: First grab 1 Chinese character for about 1 token, up or down
If you are just doing SEO article, customer service draft, summary content estimation, ChatGPT / OpenAI I would recommend grabbing first:
Loose estimation method: 1 word ≈ 0.8 tokens
Conservative estimation method: 1 word ≈ 1~1.5 tokens
Extremely conservative estimation method: 1 word ≈ 2 tokens
The reason for this is that OpenAI officially states that non-English numbers are usually higher, so Chinese is not suitable for calculating according to the English formula.
Claude: There is no fixed formula for each character. The most stable method is to count first
Claude Here I do not recommend that you memorize "how many tokens for each Chinese character". If you only need to grasp the budget in the early stage, you can first use a conservative range similar to ChatGPT:
about 0.8~2.0 token/word
But as long as you enter the official launch and formal cost control, the most stable way is to directly use Claude's official token counting. Because the official said it themselves, the result is an estimate, and it supports system, tools, images, and PDF, which means that many additional structures will also affect the result. ] token
But in practice, I would not recommend that you really only catch this low, because prompt also has factors such as format, system, context, API packaging, etc. So in terms of planning, you can do this:
Ideal rough estimate: 0.25~0.5 token/word
Safer estimate: 0.5~1 token/word
This will be safer than memorizing "four Chinese characters for one token".
What really affects "how many tokens per Chinese character" is not just the text itself
The most important part of this article is actually not the table, but the following thing:
The tokens you actually pay for in the end are usually not just the Chinese characters in the text.
System prompt will also be counted
OpenAI's token counting API and Claude's count tokens are calculated using a structure close to the formal request, which means that system prompt will inherently affect tokens.
Gemini’s official usage_metadata and OpenAI’s conversation token counting both indicate that context is not provided for free. The longer your conversation, the higher the accumulated tokens are usually.
Tools, images, and PDFs will also be counted
Claude officially states that token counting supports tools, images, and PDFs; Gemini also states that all input/output, including non-text content, will be tokenized.
So if you just ask "How many tokens does a Chinese character cost?", the answer can only help you make a very rough text budget at best. Once you enter the actual use of the API, what you really should look at is the entire request.
If you are doing AI Token cost control, what is the most recommended method?
The most practical approach is not to pursue a magic formula, but to divide the estimation method into two levels.
The first level: Use the interval for early planning
You can first capture it like this:
1 Chinese character first capture 1 token
1 Chinese character first capture 0.5 token
This set of numbers is not about pursuing absolute accuracy, but it is less likely to miss when planning the upper limit of the budget. This is a conservative estimate based on the directions confirmed by three official documents.
Second level: Use official count tokens before official launch
To be really accurate, you should:
OpenAI uses official tokenizer / token counting.
Claude uses the Count Tokens API.
Gemini uses count_tokens.
This is much more reliable than converting by word count.
Conclusion: The 3 most memorable sentences from this article
First, there is no cross-platform fixed number of tokens for a Chinese character
ChatGPT, Claude, and Gemini will not share the same set of "tokens per word" standards.
Second, OpenAI and Claude are not suitable for memorizing every word formula
OpenAI only explicitly talks about the higher ratio of English rough estimation to non-English; Claude takes the count tokens route more directly.
Third, Gemini is most suitable for making a rough estimate of characters first, but officially it still requires count
Google officially gives 1 token ≈ 4 characters, but it is still recommended to look at the Count Tokens API for the official cost.
Is one Chinese character necessarily equal to 1 token?
第二,OpenAI 和 Claude 比較不適合硬背每字公式
OpenAI 只明講英文粗估與非英文較高比率;Claude 則更直接走 count tokens 路線。
第三,Gemini 最適合先做字元粗估,但正式還是要 count
Google 官方確實給了 1 token ≈ 4 characters,但正式成本仍建議看 Count Tokens API。
一個中文字一定等於 1 token 嗎?
uncertain. Different platforms, different models, and different request structures may result in different results. OpenAI officially states that non-English languages usually have a higher token-to-character ratio; both Claude and Gemini also provide official token counting mechanisms instead of guaranteeing a fixed value per character.
Which company is most suitable for using "Estimation Token per Word"?
Gemini is best for making a rough estimate first, because Google officially writes that 1 token is approximately equal to 4 characters. However, count_tokens is still used for formal calculations.
Why is the Chinese token of ChatGPT often higher than expected?
Because OpenAI officials have made it clear that non-English texts usually have a higher token-to-character ratio, so Chinese often cannot fit the English 4 characters = 1 token.
Claude why not just give each Chinese character how many tokens?
Because Anthropic’s official approach is to provide Count Tokens API directly, and it is stated clearly that the result should be regarded as an estimate. This means that the official itself does not encourage you to memorize tokens as a fixed number of words formula.
Data source and credibility statement
This article mainly refers to the OpenAI official Token description, Claude official Token Counting document and Gemini official Token description, as the main source of information for sorting out the topic "Approximately how many Tokens are used for a Chinese character". Because the three platforms have different ways of segmenting tokens, and the official does not provide a cross-platform and fixed formula of "each Chinese character must be equal to several tokens", so this article will separate the official part and the range that can be used for estimation in practice to avoid mistaking the estimated value for a fixed rule.
If you want to understand the differences between models, platforms and costs faster, you can also go back to AI Token to see the complete summary.
This article belongs to the category "AI Token Computing".
This category mainly organizes how AI Token is calculated, the difference between input and output, token consumption logic of different models or platforms, misunderstandings about word count and token conversion, backend usage interpretation and cost control concepts. It helps users who are new to AI API not only know that token will affect the price, but also better understand why the same Chinese content may calculate different tokens on different platforms.
How many tokens will be consumed for the same content in ChatGPT, Claude, and Gemini? Comparison of the differences between the three major platforms
How to view GPT Token billing? It’s enough for novices to understand the key points first
How to check Gemini Token billing? Focused collection of Google model fees
- API Token
- Gemini Token
- Claude Token
- Token conversion
- ChatGPT Token
- Chinese Token
AI Token Organizes the basic concepts, calculation methods, API fees and model comparisons of AI Token (word elements), and covers common models such as ChatGPT, Gemini, Claude, etc. to help you establish clear understanding and judgment faster.
Function
Model comparison
Usage context
AI Token Calculator
Learn
Getting Started
Article area
Other information
About us
Privacy Policy
© 2026 AI Token. All rights reserved.