I’m encountering an issue where valid (active) chunks are pushed out of the topK retrieval results in Dify.
Specifically, when some chunks that were deleted or disabled via the UI still have high similarity scores, they occupy the topK results. As a result, active chunks are excluded from topK and are not retrieved at all.
What’s confusing is that these chunks were already removed from the UI, so from a user perspective they no longer exist. This makes me wonder whether UI deletion is only a logical deletion, and whether the vectors of those chunks still remain in the vector database and participate in ranking.
This leads to my main questions:
-
Is it expected that UI-deleted or inactive chunks can still affect topK ranking?
-
Does preventing this require physical deletion of vectors at the system level?
-
What is the recommended approach in Dify to ensure that only active chunks compete for topK results in production RAG systems?
Any clarification on how topK ranking, deletion, and vector lifecycle are supposed to work together would be appreciated.