Why Dify Token statistics differ from cloud provider

Dify_Forum_Helper · February 13, 2026, 10:23am

Okay, I’ll provide an “easy-to-understand” summary for this post, making it convenient for newcomers to quickly grasp the situation.

Overview in One Sentence

Dify backend’s Token / Cost statistics are estimates based on your configured tokenizer + price list;
Cloud provider consoles calculate based on actual billing rules.
Plus differences in “statistical scope,” “time window,” and “whether retries/caching are included”—
It’s normal for the numbers on both sides to not be exactly the same; as long as the magnitude is close, it’s considered normal.

I. Why are Dify and Cloud Provider Numbers Inconsistent?

This can be understood in three layers.

1. Differences in “How Statistics Are Calculated” (Methodology Differences)

Dify: Counts tokens itself + calculates costs itself

In “Model Providers / Custom Model,” you select:
- Which tokenizer (OpenAiTokenizer / QwenTokenizer / AnthropicTokenizer…)
- The unit price per million tokens (input / output / cached input / cache write…)
Dify will then use:

Tokens counted by the local tokenizer × the unit price you entered

To derive an “estimated usage & estimated cost.”

Cloud Providers: Bill according to their internal actual logic

They use their own tokenizer / compression strategies / cache billing, etc.:
- For the same piece of text, the token count might be slightly different from Dify’s implementation;
- Some providers even bill by character count or request count, which is not a 1:1 relationship with tokens.

→ Even if you select the “correct” tokenizer in Dify and copy the prices directly from the official website, it’s essentially still an “as close as possible” estimate.

2. Differences in “What Is Counted” (Scope Differences)

Cloud providers will bill for:
- system / user / tools / historical conversations,
- model output,
- potential retry requests.
  All of these are billed.
Different Dify views/versions might:
- Only show tokens for main model calls;
- Certain intermediate nodes (tool nodes, sub-workflows, RAG retrieval, embedding, rerank) might not be in the report you’re currently viewing;
- Have different counting strategies for failed requests, retries, or streaming interruptions.

Common phenomena therefore include:

Cloud provider usage > Dify: The cloud provider includes retries / failures / full context;
Occasionally Dify > Cloud provider: For example, if you estimate tokens based on the prompt, but the upstream hit a cache, the actual billing might be 0.

3. Differences in “When and What to Look At” (Time Window & Dimension Differences)

Different Time Ranges
- Dify allows selecting “Last 24 hours / Custom Date”;
- Cloud providers often aggregate “by UTC day / month.”
Different Scope Dimensions
- Dify: a specific app / workflow / tenant;
- Cloud provider: the entire account / project / API Key, and other services might be using the same key.

→ If the time period, model, and key scope are not perfectly aligned, it’s inherently difficult to match them one-to-one.

II. What Do the Two Charts in Custom Model Truly Indicate?

The two core configurations in your screenshot:

Tokenizer Dropdown
Determines: Which set of tokenization rules Dify uses to “simulate” the provider’s billing methodology.
- OpenAI series → Select OpenAiTokenizer
- Claude → Select AnthropicTokenizer
- Tongyi Qianwen (Qwen) → Select QwenTokenizer
- For others compatible with the OpenAI protocol, you can approximate with OpenAiTokenizer first.
Pricing (input / output / cached / cache write cost)
Determines: How much these tokens are converted into money.
- Corresponds to the cloud provider’s documentation for: prompt unit price, completion unit price, cache-related unit prices.

As long as one of these two parts doesn’t perfectly match the cloud provider’s rules:

Dify’s token / cost statistics will show varying degrees of deviation;
This is a “normal phenomenon,” not an error in calculation, but rather “different rules and methodologies.”

III. What Are Langfuse / LangSmith and Other Monitoring Integrations For?

Reference: Enhance LLM Application observability on Dify with LangSmith and Langfuse - Dify Blog

This topic also mentioned Dify’s integration with LLM observability platforms like Langfuse / LangSmith, which address the third dimension of the problem:

It’s not just about “how many tokens were used / how much money was spent,”
but also about figuring out “which call, which prompt segment, which workflow path led to these expenses.”

These platforms generally provide:

Trace: The entire call chain, input/output of each node, and time taken;
More granular token / cost: Can be viewed by single call, specific prompt version, or specific branch;
Quality evaluation: Playback, scoring, A/B testing, RAG evaluation, etc.

So, the three perspectives can be divided as follows:

Cloud Provider Console:
- The true bill & cost settlement.
Dify Backend (tokenizer + pricing):
- For “in-application operations”:
  - Which app / workflow consumes the most;
  - Setting quotas and rate limits for users / keys;
  - Estimating costs for product-level decisions.
Langfuse / LangSmith and other monitoring:
- For “engineering & operations & optimization”:
  - Precisely identifying which step is wasting tokens;
  - Debugging prompts, routing, RAG strategies.

IV. Practical Advice for Newcomers

If you want Dify’s statistics to be “as close as possible” to the cloud provider’s:

In Custom Model:
- For the tokenizer, try to select the one that matches the actual model;
- Copy the prices item by item from the cloud provider’s official documentation (distinguishing input / output / cached, etc.).
Perform a “small sample reconciliation”:
- Fix a simple application and call it dozens of times within a day;
- Compare Dify’s usage with the cloud provider’s during this period;
- See if the deviation is within an acceptable range (e.g., 5–10%).
Agree on a unified methodology:
- “Finance relies on the cloud provider’s bill;
  Product/operations look at Dify backend usage;
  Debugging & optimization look at Langfuse / LangSmith and other monitoring.”

This way, everyone will understand “why the numbers are different” and know which layer of statistics to trust for their respective needs.

If you later have specific cloud provider + model configurations (e.g., a certain Qwen / Claude / GPT model), you can also paste how you’ve configured it in Dify now, and I can help you adjust the tokenizer & pricing to a more reasonable set based on the official price list.

Topic		Replies	Views
Tongyi 模型调用报错：Incorrect model credentials provided Discussion	8	469	January 21, 2026
工作流中同时上传了文件和一个问题，发现llm在思考时说未收到任何上传的文档 Discussion	20	643	January 21, 2026
Markdown转换器无法获取到文件的存储位置吗，输出变量不可用？ Discussion	6	460	January 23, 2026
播客精华：Dify 从被低估到成为明星项目，到底做对了什么｜对谈 Dify 创始人路宇 Chinese 🇨🇳	3	867	December 23, 2025
每天被工单淹没，我终于用 Dify + HAP 做了一个会“听人说话”的工单系统 Help Me Build	1	585	December 9, 2025
The dify process executed without any errors, but the API did not return a value（dify流程执行没有报错，但是API没有返回值） Discussion	15	573	April 16, 2026
Dify+webhook+poll+error Discussion	6	217	January 26, 2026
Dify 的最新版本仍然不支持视频上传吗？ Discussion	15	527	January 27, 2026
部署dify时报错表不存在，这是为什么？ Discussion	15	424	January 21, 2026
ToolProviderOAuthError when using Gemini model Discussion	3	91	January 22, 2026