AI Token King Logo AI Token King
Get Started

How to check Claude API usage? Let’s look at Usage, Billing and these 4 fields first

For Claude API usage, you should first look at the Usage page to determine how much you have used, and then look at the Billing page to confirm how many credits are left. The most important fields for a single request are input tokens, output tokens, cache creation input tokens, and cache read input

May 22, 2026

How to check Claude API usage? Let’s look at Usage, Billing and these 4 fields first

For Claude API usage, you should first look at the Usage page to determine how much you have used, and then look at the Billing page to confirm how many credits are left. The most important fields for a single request are input tokens, output tokens, cache creation input tokens, and cache read input tokens.

Anthropic's official Claude Console cost and usage report document clearly states that the Usage page will display detailed usage breakdown by model, date and time, and API key. Billing is used to track credits and automatic recharge; in addition, API usage is a prepaid credit system and is not part of the chat subscription.

After many people start using Claude API, the first thing they get stuck on is not "how to call the API", but "what are the numbers in the background looking at?" I clearly saw token, usage, billing, and even input, output, cache read, cache creation, but it is difficult to judge at a glance: how much I spent this time, whether it was normal, and where there is room for optimization. This is the core direction captured in your original draft. I changed it this time to a version that more directly answers the search intent.

Let’s make it clear first: Claude subscription fee is not equal to Claude API fee

This point must be made clear first. Claude's chat subscription plan and Claude Console/API are separate products and are charged separately. Anthropic officially stated that API and Workbench are currently billed through prepaid usage credits. You must buy credits before you can use them. Once the credits are used up, you can no longer call the API or use Workbench.

This distinction is important because many people first use the wrong mental model when looking at the backend numbers. If you think of the Claude API as "I've already paid for Claude Pro or Team, so it shouldn't hurt to run a little more", it's easy to underestimate the actual cost. The correct understanding should be: chat subscription solves your experience in Claude App, and API solves the execution cost when you connect Claude to your website, system, workflow, and automated scripts. The two cannot be mixed up. This is also consistent with the focus of your original manuscript.

Where can I check Claude API usage? The most important are two places

The first place: Usage page

Anthropic's official Cost and Usage Reporting document is very clear. The Usage page will provide a detailed breakdown of API usage, and can be filtered by model, date and time, and API key. You can also click on the bar chart to see hour and minute granularity, and supports CSV export. This is the first place you usually see "who is eating, which model is burning, and which period of time has the surge."

Second place: Billing page

The Billing page does not look at traffic, but at money. Anthropic officially states that API usage is deducted through prepaid usage credits. You can check the credit balance on the Billing page, and you can also set auto-reload to automatically replenish the balance when the balance is below a certain threshold. This means that if you only look at Usage and not Billing, you will know "how much you have used", but you will not know "how much is left to use".

The most practical reading habits

Usage is usually tracked every day or every week to see who is using it, which model is consuming, and which API key cost is high. To stay on top of your budget or avoid service outages, check out Billing to confirm credit balance and auto-reload. In this way, when you look at the backend numbers, you will not only see the flow but not the cash.

What are you looking at on the Usage page? Not just the total number of tokens

Anthropic’s official description of the Usage page is very clear: it doesn’t just display a total number, but allows you to slice the data using different dimensions. Usage page can be viewed:

Disassembled by date/time

Disassembled by API key

input / output token chart

total input / output token statistics

requests blocked by rate limit

Visual data of ITPM ​​/ OTPM comparison

In other words, the Usage page can help you answer not just "how many tokens have I used in total", but more practical questions, such as: Which API key is being burned? Is a certain model particularly heavy? Is it a sudden increase during a certain period, or is a certain workflow always eating up the volume? None of these judgments can rely solely on a total token number.

Many novices make a mistake, that is, they panic when they see a series of big numbers, thinking that they are overused. But a large token does not necessarily mean that the cost must be unreasonable. You also need to look at whether it is input or output, whether there is cache read, whether a more expensive model is used, and whether there is additional tool use cost. In other words, the Usage page is the entry point, but not the whole answer. The real answer usually depends on the usage field of a single request. This is also one of the most important reminders of your original article.

To understand the cost of a single request, first understand the 4 core fields

If you want to understand why a Claude API request is expensive or cheap, the first thing you should focus on is these 4 numbers:

1. input tokens

This field is not as simple as "all the inputs you send in the entire request." Anthropic's official Token counting document clearly states that Token counting supports system prompts, tools, images and PDFs, and also reminds that the actual token count is an estimate. In practice, if you do prompt caching, it is easy to misjudge your actual input amount by looking at the input tokens alone.

2. output tokens

This is more intuitive, it is the number of output tokens actually generated by the model. Many projects end up being expensive not because the prompt is too long, but because the output settings are too large, the answers are too lengthy, or the task itself requires lengthy output. This is especially important in Claude pricing, because the unit price of output is usually higher than that of input.

3. cache creation input tokens

This represents how many input tokens were written to the cache for this request to create a new cache entry. It's not free, and writing to the cache for the first time doesn't mean you'll save money immediately. Your draft captures this, so I'll keep this direction.

4. cache read input tokens

This represents how many tokens were read back from the existing cache in this request. This is usually a good thing, because caching really starts to save money when reads become stable. If you have a fixed system prompt, long background information, long file context, or long conversation prefix, this column is usually the key to determining whether the optimization is effective. This is consistent with the cost idea of ​​your original manuscript.

The most common mistake for many people is that input tokens are not equal to the total input

This point is worth mentioning separately. The input tokens you see in the background are not necessarily equal to the complete input amount of your entire request, especially when there is a cache, it is easier to misjudge. This is why many people will say:

"I obviously sent a long system prompt and file, how come there are so few input tokens?"

The answer is usually not that you really sent too few, but that the previous fixed content may have been cached or processed in other ways, so not all of them fall into the input number that you intuitively understand. A more practical view is: don't just focus on input tokens, but look at output, cache creation, and cache read together.

Only in this way will you know whether your caching strategy is saving money. Your original article was very right about this point, but I have retained it here and condensed it into a more readable version.

How to calculate the cost? First look at the unit price of the model, and then multiply it by the respective token type

The core cost of Claude API is still: unit price of the model × corresponding token usage. Anthropic's official pricing page currently lists the input/output unit prices of different models. For example, the cost gap between the input and output of different Claude models is obvious. Usually the unit price of output is higher than that of input.

The most common misunderstanding here is: many people only look at the input unit price, but ignore that the output unit price is actually higher. This means that if your application scenario often requires the model to output large pieces of content, such as:

Batch production of product copy

, then the output token is often the real big part.

So when you look at the Claude API backend, don't just ask "how much did I send in", but also "how much did the model spit out". Many costs are out of control. In fact, it is not that the prompt is too long, but that you set the max tokens too high or do not control the output range, causing the output to be silently amplified. This is completely consistent with the core judgment of your original draft.

When caching is used, the cost is not as simple as input + output

If you do prompt caching, you need to add a layer of caching logic to see it. What you should really ask at this time is not "Is the cache enabled?", but:

Is there a lot of cache creation

Is there a stable cache read hit

Is there enough actual reuse

For novices, the easiest way to judge is not to design a very complex cache first, but to look at the background first: if cache creation input tokens often have value, but cache read input tokens rarely appear, it usually means that your cache strategy has not really brought reuse value. On the other hand, if read occurs steadily and the number is significantly higher than the uncached input, it usually means that your cache is starting to be effective. This way of looking is one of the most valuable aspects of your manuscript.

Besides token, what else affects Claude API fees?

First: server-side tools

Your draft mentioned web search and other tools, which is the right direction. As long as you enter the tool scenario, the cost is no longer just the input/output token, but may also be the additional cost of using the tool itself.

Second: tool use itself will also make the request fatter

When you use tools, it is not just "helping the model do one more thing", but the entire request structure will become larger, including tools parameters, tool use blocks, and tool result blocks, which may increase token usage. So many people think it’s just a tool added, but in fact the entire request cost structure has changed.

Third: The rate limit issue does not necessarily equal the cost issue

In addition to tokens, Anthropic's Usage page will also let you see the comparison between rate-limited requests and ITPM ​​/ OTPM. This means that if you see that the request fails, it may not necessarily mean that you have no money, but it may also be that you rushed too fast within the unit time.

This information is valuable because it can help you distinguish:

If the two are confused, the optimization direction will be completely different.

Will there be any charges for failed requests?

Anthropic’s official Help Center clearly states that failed requests will not be charged, only successful API calls and completed tasks will be charged. This is important for troubleshooting costs, because if you see a lot of error logs, it does not necessarily mean that credits are also being lost.

But this does not mean that you can ignore failed requests. Because commercially it will still cause two costs:

You retry for recovery, and the total number of successful requests will eventually increase

So "failure is not charged" is good news, but it does not mean that the failure rate can be allowed to be high.

Three habits that novices should establish first when looking at the Claude API backend

The first habit: Separate Usage and Billing

Usage solves "what is the traffic like", and Billing solves "the quota and cash flow status". If you only look at one of them, your judgment will easily be incomplete. Anthropic's official documentation has made this very clear.

The second habit: every time you check a single request, not only look at the input tokens

output tokens

cache creation input tokens

cache read input tokens

Especially when you have a cache, looking at the input tokens alone will almost certainly lead to a misjudgment.

The third habit: Split the cost judgment into three layers

This is a section of your manuscript that is worth keeping. I will help you condense it into a more clear version:

Model layer: Which Claude model did you choose

Output layer: Is the output too long

Structural layer: Are there cache, tools, thinking or other additional costs

People who can really understand Claude API costs usually don't just look at the token, but can look at these three layers together.

For Claude API usage, you must first look at the Usage page to understand the model, time, and API key usage, and then look at the Billing page to confirm credits and auto-reload. The most important fields for a single request are input tokens, output tokens, cache creation input tokens, and cache read input tokens.

When you look at these fields separately from model unit price, tool usage, and rate limit, the numbers that originally seemed messy in the Claude Console will become a dashboard that you can use to control costs, risks, and performance. This is also the core point of your original text that is worth retaining.

What is the difference between the Usage page and the Billing page of Claude API?

The Usage page mainly looks at the breakdown of API usage, such as token and rate limit data by model, date, API key; the Billing page looks at prepaid credits, balance and auto-reload settings. The former focuses on traffic analysis, while the latter focuses on payment and quota management.

Why do the input tokens of Claude API seem to be less than what I sent in?

Because the content you actually send may not only fall in the single field of input tokens, especially when you do caching, it is easier to misjudge. At this time, you need to look at cache creation and cache read together.

Claude API has a chat subscription, do I need to pay additional API fees?

Yes. Anthropic officially states clearly that chat subscription and Console/API are separate products, and API and Workbench usage are billed through prepaid usage credits.

Will there be any charges for failed requests from Claude API?

No. Officially, only successful API calls and completed tasks will be billed.

Does Claude API caching really save money?

Yes, but only if you reuse the same prefix stably. What is really valuable is not just seeing cache creation, but the steady appearance of subsequent cache reads.

Claude API usage is high, does it necessarily mean that the model is too expensive?

Not necessarily. It may also be that the output is too long, the tools structure is too fat, the cache is not hitting at all, or it is just a rate limit problem, not purely a model price problem.

If you want to return to the main page of AI Token usage tutorials, you can read this article first: AI Token Tutorial for Lazy People: From Getting Started, Calculation to Saving Costs, Understand at Once

Data Source and Credibility Statement This article is mainly based on Anthropic official documents and official support center information, including Cost and Usage Reporting in the Claude Console, How do I pay for my Claude API usage?, Token counting, and Anthropic official pricing information and other sources. Since the interface, model prices and function fields of Claude Console may still be adjusted in the future, the actual screen and latest rates should still be subject to the official Anthropic backend and official documents. The focus of this article is to help novices and enterprise users establish a correct interpretation framework, rather than replacing the official pricing page.

This article belongs to the category "AI Token Usage Tutorial"

This category mainly organizes the actual use of AI Token, API introduction, usage interpretation, cost estimation and platform operation logic, to help novice users, content creators, case recipients and enterprises, when they come into contact with AI API and model platforms, quickly understand how to start using, how to check the usage, and how to avoid pitfalls at the beginning.

What about Claude Token billing? What usage scenarios are it suitable for

How to judge if the Claude chat version is not enough? In these 5 situations, you should read Claude API instead

What should you confirm before using Claude API? Costs, models, and permissions are summarized

What can Claude API do? File processing, customer service, content flow 3 major usages

  • Anthropic API
  • Claude API
  • API Token
  • Prompt Caching
  • Claude API usage
  • Claude API fee
  • Claude Console

AI Token organizes the basic concepts, calculation methods, API fees and model comparisons of AI Token (word elements), and covers common models such as ChatGPT, Gemini, and Claude to help you establish clear understanding and judgment faster.

Function
Model comparison
Usage context
AI Token Calculator

Learn
Getting Started
Article area

Other information
About us
Privacy Policy

© 2026 AI Token. All rights reserved.

Share: X / Twitter LinkedIn
Back to Blog