Different similarity scores for identical text chunks in Knowledge Base

Hi, if the text inside the chunks is the same, it’s reasonable to expect that the two will have the same vector.

I’d like to know a bit about your environment. Are you using Dify Cloud, or are you self-hosting Dify? Which embedding model are you using? If you’re self-hosting, which Dify version and which vector database are you using?

It also seems like the other post is behaving in a counter-intuitive way.

Since I can’t reproduce the issue in my environment, there might be some kind of inconsistency in the data within your vector database.
Could you try creating a new knowledge base from scratch, upload the same document, and see if the issue still occurs?

1 Like