Knowledge metadata search bug

[BUG?] Metadata Filter in Knowledge Search Node Not Working Since v1.13.0

In a workflow that was working correctly until about a week ago (around the v1.13.0 release), the metadata filter suddenly stopped working.

Environment

  • Dify Cloud

Current Status

  • Metadata product_name = "ProductA" is set in the knowledge base.
  • product_name = [Variable: user_input] is set in the metadata filter of the Knowledge Search Node.
  • Even when executed with user_input = "ProductA", { "result": [] } is returned.
  • The cencern value is not correctly passing the knowledge, and it’s possible that the search is not being performed using metadata.

Steps Taken

  • Confirmed field name match.
  • Even when changing the Variable to a Constant (fixed value \"ProductA\"), [] is returned.
  • Even when rolling back the workflow to last month’s version, it does not improve.

In v1.11.3, fix DatasetRetrieval._process_metadata_filter_func miss in operator (#30199) was fixed, but a regression may have occurred in a subsequent update.

Is anyone else experiencing a similar issue?

2 Likes

I am experiencing the same issue. Filtering metadata by variables does not work for me. Constant values work as expected.

1 Like

If the part in the attached photo is a variable, searching is not possible; if it’s a constant, searching works without issues. Until recently, I was able to search using variables, so if there’s a way to search for metadata using variables, please let me know.

1 Like

I am also struggling with the exact same symptoms. Was it working before?

This appears to be a bug and has already been reported on GitHub.
I plan to fix it as is, but it will likely be tomorrow or later, and perhaps the people inside (the project/company) might be faster.

It used to work. It feels like it suddenly stopped working with variables since March…

Thank you. I will wait for it to improve.

Thank you so much. I hope it can be fixed soon :flexed_biceps:

It seems like it was quickly addressed before I even started.
It will be fixed in the next release, so please wait for the release :relieved_face:

Thank you very much!
That was helpful!

Is there any alternative solution? The problem is as follows:

The Dify system has encountered an issue. A file I originally uploaded, kb2-test (text), was used in the KB2 stage with the input b9101, which is the file name identified by the previous node. The KB2 node was able to read that file and answer accordingly (showing that the file had been invoked). However, files uploaded later cannot be invoked at this node after being bound.

The system clearly invokes the original kb2-test file based on the file name input at KB2 (even after I deleted b9101 from within the file, it can still retrieve the content). I do not know why, but the KB2 files uploaded later cannot be invoked.

I am certain that for kb2-test, regardless of whether a metadata filter is added (added after upload, since there is no option to add metadata filter parameters during upload), the system can still invoke it (that is, the KB2 node can correctly invoke the file based on input b9101). However, for files uploaded later, regardless of whether metadata is set (post-upload), they cannot be invoked.

I even changed the file name (to b1234), and the system can still invoke it. It is therefore clear that the system uses content-based retrieval for the file kb2-test, rather than name-based retrieval, nor retrieval based on the file name field within the document.

My questions are:

  1. Can the Dify system use a metadata filter (which I believe is a basic function), including setting it during upload (I did not see any function to add metadata filters when uploading knowledge files; there is only “sync from Notion”, as well as “access to API” on the embedding page, or “create knowledge from pipeline” — what are these functions?), and then using that metadata filter in the system after the file has metadata set?

  2. If metadata cannot be used, why can the system not directly invoke files by file name? If Dify can only perform retrieval based on the semantic content within documents, is that not absurd?

  3. Is there a better method to achieve either:

    • (1) invoking files via metadata filter, or

    • (2) invoking files by file name?

If the system can only perform retrieval based on the semantic meaning of all document contents, it is like a reader going into a library to find a book, and the librarian has to pull out all books and search through them. Is that not absurd?

1 Like