Why not use LLM to auto-match Pipeline templates and auto-configure parameters for Knowledge Base?

Lee_Louis · March 9, 2026, 4:39am

Hi Dify team,

First of all, great work on the new RAG Pipeline feature — the graph-based workflow engine is a huge improvement over the old hardcoded ChatAppRunner flow. I’ve been reading through the source code and have two questions/suggestions regarding the Pipeline UX:

Why not use LLM to auto-match Pipeline templates?
Currently, when creating a Knowledge Base, users must manually browse and select a Pipeline template (PipelineBuiltInTemplate / PipelineCustomizedTemplate). I’m curious about the design rationale behind this decision rather than having an LLM automatically analyze the uploaded document(s) and recommend/select the most appropriate template.

I can think of a few possible reasons:

Irreversibility risk: Template selection directly affects how documents are chunked, embedded, and indexed. If the LLM picks the wrong template, the user would need to delete and re-process the entire dataset — a costly mistake.
Hallucination concerns: LLMs may confidently choose an incorrect template, especially for ambiguous file types or domain-specific documents.
Business context gap: Choosing the right template isn’t just about “what type of document is this” — it also involves decisions about chunk structure, embedding strategy, and downstream retrieval needs, which are business-level decisions that the LLM cannot infer from file content alone.
Additional cost & latency: Every document upload would require an extra LLM inference call, adding token costs and processing delay before indexing even begins.
Could the team confirm whether these are the actual considerations? Are there plans to offer an optional “smart template suggestion” feature in the future (perhaps as a recommendation rather than an auto-selection)?

Why not use LLM for one-click parameter auto-configuration?
Beyond template selection, each Pipeline workflow contains multiple node-level parameters (chunk size, overlap, separator, embedding model, retrieval strategy, etc.). For users who are not RAG experts, configuring these parameters is a significant barrier.

Has the team considered an “LLM-assisted auto-configuration” mode where:

The user uploads a sample document
An LLM analyzes the document structure (e.g., long-form text vs. structured table vs. code)
The system automatically suggests or fills in optimal parameters for each node
I understand the concerns might include:

Determinism: Parameter configuration requires precise, deterministic values — LLMs may produce inconsistent suggestions across runs
Accountability: If auto-configured parameters lead to poor retrieval quality, it’s harder for users to debug
Cost-effectiveness: A rule-based heuristic (e.g., “if document is PDF with tables → use smaller chunk size”) might achieve 90% of the benefit at 0% of the LLM cost

Thanks for your time!