Different similarity scores for identical text chunks in Knowledge Base

Sohei · February 5, 2026, 6:34am

I have a question about Dify’s Knowledge Base vector search behavior.

Issue

In the search test, I use the query “mac”
The search results include multiple chunks with exactly the same text
However, the similarity scores (SCORE) for these chunks are different

Example:

Identical text:

“Macの場合はマウスの支給をしません。各自で調達してください。”
(“For Mac, a mouse is not provided. Please prepare one yourself.”)
But the scores differ, for example:
- SCORE: 0.26
- SCORE: 0.19

(See attached screenshot)

Questions

I would like to understand why this happens.

Is it expected that identical text chunks can have different similarity scores due to:
- Being stored in different documents
- Different chunk IDs or ingestion order
- Differences in metadata (document title, folder, description, etc.)
- Different surrounding context when the text was chunked
Or could this be related to:
- Embedding timing / re-embedding behavior
- Vector database implementation details

Assumptions / Environment

The chunk text itself is exactly the same string
Search type is vector search (not keyword search)
No similarity threshold is explicitly configured

Purpose

I want to clarify whether:

“Identical text should generally result in identical similarity scores”, or
“Some level of score variance for identical text is expected behavior in Dify”

If anyone has experienced similar behavior or knows the underlying design/specification, I would really appreciate your insights.

kurokobo · February 5, 2026, 1:43pm

Hi, if the text inside the chunks is the same, it’s reasonable to expect that the two will have the same vector.

I’d like to know a bit about your environment. Are you using Dify Cloud, or are you self-hosting Dify? Which embedding model are you using? If you’re self-hosting, which Dify version and which vector database are you using?

It also seems like the other post is behaving in a counter-intuitive way.

Since I can’t reproduce the issue in my environment, there might be some kind of inconsistency in the data within your vector database.
Could you try creating a new knowledge base from scratch, upload the same document, and see if the issue still occurs?

Sohei · February 10, 2026, 1:36am

Hi, thanks for your detailed response.

I agree that if the text inside the chunks is exactly the same, the resulting vectors should also be the same.

Here is a bit more context about my environment:

I am self-hosting Dify.
Dify version: 1.11.4
Embedding model: amazon.titan-embed-text-v2:0
Vector database: Weaviate (default configuration)

As you suggested, I will try the following to verify the behavior:

Create a new knowledge base from scratch
Upload the same document
Check whether the issue still occurs

If the issue does not occur in the new knowledge base, it may indicate some inconsistency in the original environment.

Thanks again for the suggestion — I’ll report back once I’ve tested this.

Topic		Replies	Views
Active chunks are pushed out of topK by UI-deleted chunks — do we need physical deletion? Seeking help	0	42	February 5, 2026
Knowledge database ingestion errors Discussion commuity	2	148	February 12, 2026
Dataset not found" when uploading documents to Knowledge Base Feedback	4	355	March 11, 2026
升级到1.10.1后，知识库召回出错 Discussion	1	474	December 11, 2025
How to Fix Knowledge Base Vector Search After Upgrading Dify \| Dify 升级后知识库向量检索报错的修复方法 Discussion case	2	501	January 11, 2026
升级最新版后知识库召回报错，grpc 错误，请求指导 Discussion	3	634	December 11, 2025
Understanding of Dify Knowledbases Discussion	1	217	November 12, 2025
求助，如何在不损失查询结果准确性的情况下优化知识库的检索速率 Seeking help case	0	152	January 6, 2026
「途中で応答がなくなる」問題に関して Discussion case	4	350	December 17, 2025
Plugin: Advanced Markdown Chunker – smarter Markdown chunking for RAG Discussion	2	394	January 14, 2026

Different similarity scores for identical text chunks in Knowledge Base

Issue

Questions

Assumptions / Environment

Purpose

Related topics