Problem Background
I am developing a custom code parsing tool based on the Dify workflow, specifically targeting the Zig language. To achieve more precise RAG results, I have disabled the system’s automatic chunking and instead manually implemented a “parent-child chunking” logic within a code node (Node.js/Python):
- Parent Chunk: A complete function implementation or type definition.
- Child Chunk: Fine-grained semantic units derived from the parent chunk (e.g., comments, function signatures).
Core Pain Point
The biggest obstacle I’m facing is: The JSON object output by the code node cannot be recognized by the knowledge base node, or it prompts Output parent_child_structure is missing. Although I’ve tried mimicking the output format of tool nodes, the lack of official documentation defining the Schema for the (multimodal)parent_child_structure type has led to frequent failures in variable mapping.
Actions Taken
- Data Structure Restructuring: I’ve tried returning a plain array, as well as an Object containing
parent_modeandparent_child_chunks. - Output Variable Definition: In the code node’s “Output Variables,” I manually declared
resultas typeObject, but the variable selector in the downstream knowledge base node still fails to correctly parse its internal sub-properties. - Environment Check: Confirmed that the Embedding model is functioning normally, and
child_contentsare all non-empty string arrays.
Questions for Guidance
- Official Schema Definition: What is the complete JSON Schema for the strongly typed variable
parent_child_structure? Besidesparent_modeandparent_child_chunks, are there hidden metadata fields or specific$schemaidentifier requirements? - Variable Recognition Logic: Why is the
Objectoutput by the code node often filtered out (not displayed) in the knowledge base node’s variable selector? Is there a specific variable naming convention or “Output Variable” declaration method that must be followed? - Best Practices for Manual Chunking: If I want to bypass Dify’s default cleaning logic and directly store preprocessed parent-child chunks into the knowledge base, aside from the “code node → knowledge base node” path, is there a more mature API or plugin approach?
Attachment: Current Output Format Reference
{
"parent_child_structure": {
"parent_mode": "paragraph",
"parent_child_chunks": [
{
"parent_content": "pub fn main() void { ... }",
"child_contents": ["pub fn main()", "void { ... }"]
}
]
}
}
I would greatly appreciate guidance from official documentation or experienced users—thank you very much!