Loop Node in Practice: Processing Ultra-Long Text with Rolling Summaries (Including Pitfalls and Solution Comparison)
Dify 1.13 Local Deployment | DSL 0.6.0 | 2026-03
Scenario
I have an AI assistant (based on OpenClaw) that generates a large volume of conversations with me daily (multiple sessions, max 235k characters in a single day). To enable it to maintain memory of our conversations, I set up a scheduled task to refine and archive conversations every night. The previous archiving method involved creating a cron job and dispatching a sub-agent to read all day’s conversations at once and then summarize them. However, I frequently encountered two problems:
- LLM Context Overflow: The daily conversation volume often reaches 50,000 to 100,000 characters, exceeding the effective processing capacity of most models.
- Unstable Execution: A single LLM call processing too much content is prone to timeouts or missing information. The action chain was too long: each time it needed to filter conversations with new content for the day, filter invalid command data, index and refine, write to a memory file, send archive success/failure notifications, and also extract to-dos from conversations and write them to another to-do record.
Almost every week, there are one or two days when conversations are particularly numerous and important, but nothing is remembered. Therefore, I wanted to build a workflow in Dify that could digest large texts in chunks. This workflow should also be applicable to: performing relatively high-quality digestion and refinement of ultra-long texts such as meeting minutes, project transaction logs, knowledge domain content (and similar business scenarios where information is generated sequentially over time), to form human-friendly summary minutes. (Not suitable for texts like distributed code repositories that require frequent definition, context indexing, and jump calls.)
Before the Solution: Why Dify?
Actually, the AI assistant’s priority suggestion was to write scripts directly, which could solve the problem simply and roughly. However, I don’t want a pile of scripts here and there under the AI assistant; it’s troublesome to manage, and flexibility is too poor. Any upgrade or environment change could make it malfunction, and it might malfunction silently without being noticed. The previous long action chain agent was unstable and consumed a lot of tokens; it barely ran stably even with a GPT 5.2-level base model. When conversations became long, all cost-effective models would fail.
Ultimately, I chose Dify as the tool for implementation and transformation: 1. Good readability and maintainability. Dify provided me with an interface where I could clearly see how each job works, and errors could be quickly localized and resolved. 2. Stable and token-efficient. After replacing most fixed actions with Dify, the entire archiving process became fast and stable. The base model focuses on the pure information refinement stage; models at the Kimi2 to MinMax2.5 level can perfectly handle it, and the refinement quality is also very good (furthermore, this part of the prompt can be optimized independently, avoiding previous bottlenecks in overall prompt optimization).
Solution Analysis: Why Loop and Not Iteration?
After I was ready to start implementing my idea with Dify, I tasked my AI assistant with researching how Dify could achieve my objectives.
There were roughly two approaches:
- Pre-process long text by splitting it into fixed-size segments—input them into Dify for segment-by-segment refinement—and finally integrate and re-refine them;
- Use a stream-like processing method, continuously inputting long text into Dify, where Dify performs iterative refinement and compression. (I originally intended to consider repeated calls using the agent mode, but the AI assistant suggested Loop to me.)
After the AI assistant combined my use case with its analysis and research on Dify, it proposed the following solution, which was then validated through practical interaction.
The following content is the AI assistant’s organized output:
(The “I” below refers to my AI assistant; this article’s main content is organized from an AI’s perspective)
Dify has two types of loop-like containers: Iteration and Loop.
First, let’s look at the essential differences between the two:
| Iteration | Loop | |
|---|---|---|
| Semantics | ForEach — Iterates through an array, each element processed independently | While — Conditional loop, with shared mutable state |
| Relationship between elements | Unaware of each other’s existence | Each round can see the output of the previous round |
| Parallelism | Supports parallel execution | Strictly serial |
Key Decision Point: If you need to “read a segment, summarize it, and then understand the next segment with the previous summary”—this is state accumulation, which only Loop can do.
Iteration can only perform Map-Reduce (each segment summarized independently, then merged at the end), but cross-segment context is lost during the merging phase.
My Choice: Loop for rolling summary, where each round’s input = previous summary + current new segment, and the context window size is constant, not growing with the iteration count.
Pitfalls Encountered (Key)
I (referring to the AI assistant, same below) conducted multiple controlled experiments to understand the behavior of the Loop node and discovered several key points not clearly documented.
Pitfall 1: loop-end Node = break Statement
This was the biggest pitfall. My initial loop body structure was:
loop-start → Code → LLM → Assigner → loop-end
Result: No matter what conditions were set, it always exited after only 1 iteration.
I conducted 3 sets of controlled experiments to exclude variables (pure counting, adding break conditions, adding LLM nodes), and all ran only 1 iteration. Finally, through a test workflow manually built in the Dify UI (with only loop-start → Code, no loop-end), I discovered the pattern.
Conclusion: loop-end is not a marker for “end of this round, return to loop head”. It is a break statement—execution exits the loop directly when loop-end is reached.
Correct approach: The last node of the loop body (e.g., Assigner) should be a dead end, not connected to loop-end, nor to any outgoing edges. The loop engine will automatically return to loop-start to begin the next round.
loop-start → Code → LLM → Assigner ← Ends here, no outgoing edge
The loop engine automatically returns to loop-start
Pitfall 2: The semantic of break_conditions is “exit when satisfied”
Not “continue when satisfied”.
For example, is_done ≥ 1 means: exit the loop when is_done >= 1, not “continue the loop when is_done >= 1”.
Pitfall 3: Comparison Operators Must Use Unicode
Dify DSL’s comparison_operator does not accept ASCII notation:
>=will report an error, must be written as≥<=will report an error, must be written as≤!=will report an error, must be written as≠
Pitfall 4: break_conditions Must Reference the Output of the Code Node
The variable_selector of break_conditions cannot directly reference loop variables; it must reference the output field of a Code node within the loop body.
(Note: The analysis conclusion in this section is not entirely accurate; the more correct way is to check the source code and documentation’s introduction to loops. However, here I maintain the AI’s original style of exploration, summary, and output, without modification. According to the AI’s guidance, it is at least “usable,” but these are more akin to empirical summaries than official standards.)
Workflow Architecture
The final working workflow looks like this:
Start(conversation_text, date)
│
↓
Code [Segment Slicing]
Cut by paragraph boundaries, each segment ≤12k characters
Output: chunks[], chunk_count
│
↓
Loop (loop_count=25, break: is_done ≥ 1)
loop_variables: index=0, running_summary="", done=0
│
├─ Code [Get Current Segment]
│ Input: chunks, index
│ Output: current_chunk, new_index, is_done
│
├─ LLM [Rolling Refinement]
│ system: Refinement rules (filter noise, three-layer structure, integrate themes)
│ user: [Rule re-injection] + date + running_summary + current_chunk
│ Output: text (updated complete summary)
│
└─ Assigner [Update Loop Variables]
index ← new_index
running_summary ← LLM.text
done ← is_done
(dead end, no outgoing edges)
│
↓
End → summary = running_summary
Several key design points:
- Chunking within Dify: Code node cuts by paragraph boundaries, not truncating in the middle of a paragraph.
- loop_count = 25: Covers approximately 300k characters (25 × 12k), enough to handle the historically largest daily volume.
- Break condition: Code node calculates
is_done = (index+1 >= chunk_count), exit timing determined by data volume.
Test Results
Basic Functionality Verification
| Test | Input Size | Iterations | Time Taken | Result |
|---|---|---|---|---|
| Single large session | 83k chars | 7 rounds | 49.5s | succeeded |
| Full-day bundle (6 sessions) | 92k chars | 8 rounds | 158.5s | succeeded |
| Historical max daily (estimated) | 235k chars | ~20 rounds | - | loop_count=25 can be covered |
Solution Comparison: Per-session Processing vs. Full-day Bundle One-time Processing
Two solutions were run using data from the same day (6 sessions, 92k characters), and then LLM was used for structured quality assessment:
Solution A: Each session processed individually through the workflow, producing 6 independent summaries (10,097 characters)
Solution B: Full-day bundle processed once into Loop (initial version 3,541 characters, optimized to 8,748 characters)
| Evaluation Dimension | Solution A (Per-session) | Solution B (Full-day bundle) |
|---|---|---|
| Fidelity | 7/10 — Factually accurate but distorted | 8.5/10 — Faithful reconstruction |
| Depth | 6/10 — Overly structured, fabricating non-existent frameworks | 8/10 — Retains original reasoning process |
| Meaning Recognition | 5/10 — Systematic attribution errors | 9/10 — Accurately identifies key moments |
Unexpected Finding: Solution A had more words but worse quality. The reason is that the full context of a single session gave the LLM too much “creative space”—it would invent non-existent analytical frameworks and attribute AI’s contributions to the user. Conversely, Solution B’s rolling summary, because it only saw a small segment each round, was forced to faithfully reproduce rather than reorganize, resulting in better quality.
Key Optimization: Re-injecting Rules Each Round
The initial version of Solution B had a problem: the iteration curve showed collapse.
Initial version: 1502 → 2573 → 4585 → 6803 → [810] → 1468 → 2099 → 3541
↑
Here it dropped sharply from 6803 to 810
The 5th round processed several very short automated sessions (pure tool calls, essentially no conversational content). The LLM rewrote and compressed the previously accumulated 6803-character summary into 810 characters. The system prompt stated “retain existing summary as is during noise,” but it was ineffective—in long-context scenarios, LLM compliance with system instructions can degrade.
(—Here, the AI fell into an endless loop of thinking inertia, so the human author intervened and prompted the AI with the repeated prompt solution.)
Solution: Re-inject key rules at the beginning of the user message for each LLM round:
【Mandatory Rules for This Round】
R1 Noise Judgment: If the new content in this round consists entirely of system logs/tool calls/startup greetings,
the "Existing Summary" must be output verbatim, without reduction, reorganization, or rewriting.
R2 Faithful Reproduction: Only record facts genuinely present in the original text. Prohibit constructing frameworks not found in the original.
R3 Integrate, Don't Segment: When the same topic reappears, integrate it into existing paragraphs.
R4 Three-Layer Structure: Each topic includes "What was done," "Judgment and Decision-Making," and "Reusable Patterns."
【End of Rules, Data for This Round Below】
Archive Date: {date}
---
Existing Summary: {running_summary}
---
New Conversation Content for This Round: {current_chunk}
Effect:
Optimized: 1502 → 2573 → 4585 → 6580 → [8059] → 8150 → 8559 → 8748
↑
No longer collapsing, normal growth
Principle: The system prompt can be “forgotten” in long contexts, but content at the beginning of the user message is in the most recent position of the model’s attention, leading to significantly higher compliance. Repeated injection in each round is equivalent to refreshing instruction weights in the most recent context of each LLM call.
Summary
The Loop node is fully capable of handling windowed refinement scenarios for large texts. Key learnings:
- Loop-end is a break, not an end-of-iteration — This is the easiest pitfall to encounter; the loop body should have a dead-end structure.
- Loop is suitable for state accumulation, Iteration for independent batch processing — Rolling summaries are better suited for Loop than Iteration.
- Re-inject rules each round — to address the issue of LLM instruction compliance degrading in multi-round iterations.
- Segment-by-segment processing is not necessarily worse quality than one-time processing — Rolling summaries are actually more faithful than full context because the LLM has no room to “elaborate excessively”.
Actual test data: 92k characters (6 sessions merged), 8 iterations, completed in approximately 330 seconds, resulting in an 8,748-character structured archived summary. It has been integrated into the daily automated cron task and is running stably. (Compared to the original sub-agent solution which took 600+ seconds and often failed due to timeouts.)
——————————
The specific practical content above was largely autonomously explored and executed by the AI; I only provided prompts and direction confirmation at critical points. The final generation quality met the requirements. In the short term, I chose to run two solutions in parallel (the original solution: sub-agent handles everything; the new solution: sub-agent is responsible for calling the Dify workflow and sending notifications). Daily archives are saved as Date Archive and Date Archive_dify, running simultaneously for a period, and in the long term, I will review stability and quality to choose one or redefine the division of labor.