Okay, I’ll provide an “easy-to-understand” summary for this post, making it convenient for newcomers to quickly grasp the situation.
Overview in One Sentence
Dify backend’s Token / Cost statistics are estimates based on your configured tokenizer + price list;
Cloud provider consoles calculate based on actual billing rules.
Plus differences in “statistical scope,” “time window,” and “whether retries/caching are included”—
It’s normal for the numbers on both sides to not be exactly the same; as long as the magnitude is close, it’s considered normal.
I. Why are Dify and Cloud Provider Numbers Inconsistent?
This can be understood in three layers.
1. Differences in “How Statistics Are Calculated” (Methodology Differences)
- Dify: Counts tokens itself + calculates costs itself
-
In “Model Providers / Custom Model,” you select:
- Which tokenizer (OpenAiTokenizer / QwenTokenizer / AnthropicTokenizer…)
- The unit price per million tokens (input / output / cached input / cache write…)
-
Dify will then use:
Tokens counted by the local tokenizer × the unit price you enteredTo derive an “estimated usage & estimated cost.”
- Cloud Providers: Bill according to their internal actual logic
- They use their own tokenizer / compression strategies / cache billing, etc.:
- For the same piece of text, the token count might be slightly different from Dify’s implementation;
- Some providers even bill by character count or request count, which is not a 1:1 relationship with tokens.
→ Even if you select the “correct” tokenizer in Dify and copy the prices directly from the official website, it’s essentially still an “as close as possible” estimate.
2. Differences in “What Is Counted” (Scope Differences)
-
Cloud providers will bill for:
- system / user / tools / historical conversations,
- model output,
- potential retry requests.
All of these are billed.
-
Different Dify views/versions might:
- Only show tokens for main model calls;
- Certain intermediate nodes (tool nodes, sub-workflows, RAG retrieval, embedding, rerank) might not be in the report you’re currently viewing;
- Have different counting strategies for failed requests, retries, or streaming interruptions.
Common phenomena therefore include:
- Cloud provider usage > Dify: The cloud provider includes retries / failures / full context;
- Occasionally Dify > Cloud provider: For example, if you estimate tokens based on the prompt, but the upstream hit a cache, the actual billing might be 0.
3. Differences in “When and What to Look At” (Time Window & Dimension Differences)
- Different Time Ranges
- Dify allows selecting “Last 24 hours / Custom Date”;
- Cloud providers often aggregate “by UTC day / month.”
- Different Scope Dimensions
- Dify: a specific app / workflow / tenant;
- Cloud provider: the entire account / project / API Key, and other services might be using the same key.
→ If the time period, model, and key scope are not perfectly aligned, it’s inherently difficult to match them one-to-one.
II. What Do the Two Charts in Custom Model Truly Indicate?
The two core configurations in your screenshot:
-
Tokenizer Dropdown
Determines: Which set of tokenization rules Dify uses to “simulate” the provider’s billing methodology.- OpenAI series → Select OpenAiTokenizer
- Claude → Select AnthropicTokenizer
- Tongyi Qianwen (Qwen) → Select QwenTokenizer
- For others compatible with the OpenAI protocol, you can approximate with OpenAiTokenizer first.
-
Pricing (input / output / cached / cache write cost)
Determines: How much these tokens are converted into money.- Corresponds to the cloud provider’s documentation for: prompt unit price, completion unit price, cache-related unit prices.
As long as one of these two parts doesn’t perfectly match the cloud provider’s rules:
- Dify’s token / cost statistics will show varying degrees of deviation;
- This is a “normal phenomenon,” not an error in calculation, but rather “different rules and methodologies.”
III. What Are Langfuse / LangSmith and Other Monitoring Integrations For?
Reference: Enhance LLM Application observability on Dify with LangSmith and Langfuse - Dify Blog
This topic also mentioned Dify’s integration with LLM observability platforms like Langfuse / LangSmith, which address the third dimension of the problem:
It’s not just about “how many tokens were used / how much money was spent,”
but also about figuring out “which call, which prompt segment, which workflow path led to these expenses.”
These platforms generally provide:
- Trace: The entire call chain, input/output of each node, and time taken;
- More granular token / cost: Can be viewed by single call, specific prompt version, or specific branch;
- Quality evaluation: Playback, scoring, A/B testing, RAG evaluation, etc.
So, the three perspectives can be divided as follows:
-
Cloud Provider Console:
- The true bill & cost settlement.
-
Dify Backend (tokenizer + pricing):
- For “in-application operations”:
- Which app / workflow consumes the most;
- Setting quotas and rate limits for users / keys;
- Estimating costs for product-level decisions.
- For “in-application operations”:
-
Langfuse / LangSmith and other monitoring:
- For “engineering & operations & optimization”:
- Precisely identifying which step is wasting tokens;
- Debugging prompts, routing, RAG strategies.
- For “engineering & operations & optimization”:
IV. Practical Advice for Newcomers
If you want Dify’s statistics to be “as close as possible” to the cloud provider’s:
-
In Custom Model:
- For the tokenizer, try to select the one that matches the actual model;
- Copy the prices item by item from the cloud provider’s official documentation (distinguishing input / output / cached, etc.).
-
Perform a “small sample reconciliation”:
- Fix a simple application and call it dozens of times within a day;
- Compare Dify’s usage with the cloud provider’s during this period;
- See if the deviation is within an acceptable range (e.g., 5–10%).
-
Agree on a unified methodology:
- “Finance relies on the cloud provider’s bill;
Product/operations look at Dify backend usage;
Debugging & optimization look at Langfuse / LangSmith and other monitoring.”
- “Finance relies on the cloud provider’s bill;
This way, everyone will understand “why the numbers are different” and know which layer of statistics to trust for their respective needs.
If you later have specific cloud provider + model configurations (e.g., a certain Qwen / Claude / GPT model), you can also paste how you’ve configured it in Dify now, and I can help you adjust the tokenizer & pricing to a more reasonable set based on the official price list.