Markdown converter can't get file path, output variable unavailable?

zhaoliang · January 21, 2026, 9:15am

In the output variables of the Markdown converter, the three corresponding keys are text, files, and json. Among them, files is an array, containing basic file information and the URL.

In the next node, when I use files as context, an error occurs, as follows:

Run failed: Invalid context structure: dify_model_identity=‘dify__file’ id=None tenant_id=‘ccd7d781-3643-4d21-a0e9-72ad023’ type=\u003cFileType.DOCUMENT: ‘document’\u003e transfer_method=\u003cFileTransferMethod.TOOL_FILE: ‘tool_file’\u003e remote_url=None related_id=‘9eb8576-436c-b83e-529cba96c66c’ filename=‘20260121_170237.xlsx’ extension=‘.xlsx’ mime_type=‘application/vnd.openxmlformats-officedocument.spreadsheetml.sheet’ size=9845

Why can’t I extract the URL I want, and Dify also reports an error??? Please, experts, advise!

lysonober · January 21, 2026, 6:16pm

Looking at the screenshot, it seems you want to extract content from xlsx and then feed it to an LLM for analysis? Don’t use ‘context’ to reference files; for the correct method, refer to: Document Extractor - Dify Docs

@Dify_Forum_Helper, please provide some information, combining basic knowledge and story points.

Additionally, the Marketplace also has some interesting Plugins related to Excel that can be used for future exploration @Dify_Forum_Helper, you can also search for them.

Dify_Forum_Helper · January 21, 2026, 6:17pm

There are indeed two easily confused points here: “file objects” and “text content usable by LLM”.

1. Why does using `files` as “context” cause an error?

The output of the Markdown converter is roughly structured like this:

{
  "text": "...",
  "files": [
    {
      "dify_model_identity": "__dify__file__",
      "type": "document",
      "filename": "20260121_170237.xlsx",
      "extension": ".xlsx",
      "mime_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
      "size": 9845,
      "url": "https://...signed_link..."
    }
  ],
  "json": [...]
}

Each item in files is essentially a Dify internal file object / handle, containing information like __dify__file__, FileType.DOCUMENT, etc.
The “context” of an LLM node expects string text or specific structured text, not this kind of “file object”.

So, when you bind the entire files array directly to the “context” in the next LLM node, the LLM node receives a “file object array”, whose structure does not match what it expects, leading to the error you saw:

Run failed: Invalid context structure: dify_model_identity='__dify__file__' ... type=<FileType.DOCUMENT: 'document'> ...

It’s not that the url cannot be retrieved, but rather: this entire object should not be passed as context to the LLM at all.

Analogy: You are currently passing “an Excel file handle + metadata” to the LLM, not “table content”, so the model naturally “doesn’t understand”.

2. If you want the LLM to analyze xlsx content, what is the correct approach?

Based on the screenshot/description, your goal should be:
To have the LLM read the table content from an xlsx file and then perform analysis.
In this case, do not directly reference files using “context”. Instead, you should:

Use a “Document Extractor node” to parse the file content
- Input: The file variable output by the upstream node (can be an array) – for example, the files from the Markdown converter, or files uploaded via the Start node.
- Output: Plain text (e.g., converting Excel to Markdown table text).

In the LLM node, use the text output by the Document Extractor node as prompt / context
For example, in the system prompt or user prompt, write something like:

Below is the Excel content uploaded by the user (converted to a Markdown table):
{{ doc_extractor.text }}
Please answer the user's questions based on the table above...

This way, the LLM receives plain text tables, and the Invalid context structure error will no longer be triggered.

3. What if you just want to get the file’s URL?

If you merely want to get a field like files[0].url, rather than directly “feeding” this object to the LLM, you can:

Parse it in a code node (Python / JS):

file_url = inputs["markdown_node"]["files"][0]["url"]

Then output this file_url for subsequent nodes to use (e.g., calling your own service, etc.).

However, this is also not suitable for direct use as LLM “context”, because the URL points to a binary Excel file, which the LLM itself still cannot read. It can only be converted to text by an intermediate “parsing node”.

4. Extended usage for Excel

In addition to the general method of using a “Document Extractor node”, Dify’s Marketplace also offers some plugins for Excel (e.g., more flexible reading, writing, querying by sheet name/cell range).
If you want to perform more complex Excel processing later, you can consider:

Using Excel-related plugins for structured processing
Or combining multiple steps: Plugin → Document Extractor → LLM

Summary

files is an array of file objects and cannot be used directly as LLM context, which is why it reports Invalid context structure.
To have the LLM read xlsx content:
Markdown / Start node (produces files) → Document Extractor node → Outputs text → Used as context / Prompt for the LLM node.
To simply get the URL: Use a code node to extract it from files[i].url, do not directly pass the entire files to the LLM.

If you wish, you can paste a brief structure of your current workflow (each node type + key connections), and I can help you write a specific connection and variable reference example for “uploading Excel to LLM analysis”.

zhaoliang · January 23, 2026, 3:34am

Thanks for the answer!

My requirement is just to extract the file URL returned by the Markdown converter, to concatenate it into a complete file download URL (absolute path).

Currently, I have obtained files[i].url using a code block, so the problem is solved.

Another question arises: in the upstream LLM node, I requested the LLM to reply with a standard Markdown table, with the goal of having the next Markdown converter node generate Excel.

By checking the data format returned by the LLM, I believe there’s no issue. For example, the text in the image below conforms to Markdown syntax, but the Markdown converter did not generate corresponding Excel sheets based on the ## heading syntax. How should I handle this?

@Dify_Forum_Helper

Dify_Forum_Helper · January 23, 2026, 3:36am

先说结论：目前你的用法（让 LLM 输出带 ## 标题的 Markdown，希望 Markdown 转换器按 ## 自动切 Sheet）在 Dify 里是不保证生效的——Markdown 转换器现在并没有一个“官方约定”的规则：## 一定会变成 Excel 的工作表名。

换句话说：

它确实能把 Markdown 表格转成 Excel；
但“多 Sheet + Sheet 命名逻辑”目前是比较「黑盒」的，不是一个对外文档化、可配置的能力，所以你看到“Markdown 看起来没问题，但没按 ## 分 Sheet”是符合当前实现现状的。

下面分几块说你可以怎么处理。

1. Understanding the Current Behavior

Based on the structure you provided:

{
  "text": "## XXX\n\n| 列1 | 列2 |\n| --- | --- |\n| ... | ... |"
}

Your expectation is:

## XXX → An Excel Sheet named XXX appears
## YYY → A new Sheet is created

However, the Markdown converter currently behaves more like this:

It treats the entire Markdown section as a single document;
The focus is on the ‘table itself’ rather than ‘heading level → Sheet’.

Therefore, common phenomena are:

It might only generate one Sheet;
Or it uses a default Sheet name (e.g., Sheet1) instead of the text within ##.

This is not because your Markdown is incorrect, but rather because the current product design does not treat ## as a formal protocol for ‘Sheet boundary + Sheet name’.

2. Solutions for Achieving ‘Split Sheets by Heading’

If you have a strong requirement to ‘split Sheets by ##’, you can consider bypassing the Markdown converter’s default rules by using a ‘code node + Excel library’ approach to explicitly write your desired structure into multiple Sheets.

Approach A: LLM Outputs Structured JSON, Then Code Generates Excel

In the LLM node, do not directly have it output Markdown; instead, have it output structured JSON, for example:

{
  "sheets": [
    {
      "name": "SheetA",
      "table": [
        ["列1", "列2"],
        ["a1", "a2"],
        ["b1", "b2"]
      ]
    },
    {
      "name": "SheetB",
      "table": [
        ["列1", "列2"],
        ["x1", "x2"]
      ]
    }
  ]
}

Use a code node (Python recommended) to parse this JSON, and use libraries like openpyxl or pandas to create the Excel file yourself, with full control over multiple tables and Sheets.

Pseudocode example:

import io
from openpyxl import Workbook
import json

data = json.loads(inputs["llm"]["text"])  # Assuming LLM outputs the JSON above

wb = Workbook()
# Delete default sheet
default_ws = wb.active
wb.remove(default_ws)

for sheet in data["sheets"]:
    ws = wb.create_sheet(title=sheet["name"][:31])  # Excel sheet names are max 31 characters
    for row in sheet["table"]:
        ws.append(row)

# Save to memory and return to subsequent nodes as a file
buffer = io.BytesIO()
wb.save(buffer)
buffer.seek(0)

outputs["excel_file"] = {
    "type": "document",
    "filename": "result.xlsx",
    "content": buffer.read()
}

Subsequently, you can provide this excel_file as a regular file for users to download, or pass it to other nodes.

Advantages:

Completely independent of the Markdown converter’s internal rules;
Sheet names, number of Sheets, and content of each Sheet are all under your control;
The LLM’s task is also clearer: it’s only responsible for ‘structured planning,’ not Excel details.

Approach B: Continue Having LLM Output Markdown, But Use Code to Split and Write Excel

If your current LLM Prompt is already fixed to output Markdown, and you prefer Markdown’s readability, you can:

Still require the LLM to use a similar structure:

## SheetA

| 列1 | 列2 |
| --- | --- |
| a1  | a2  |
| b1  | b2  |

## SheetB

| 列1 | 列2 |
| --- | --- |
| x1  | x2  |

Add a code node downstream to do two things:

Use regular expressions / Markdown parsing libraries to split the text into blocks by ## headings;
Extract the first table in each block as a 2D array, then write multiple Excel Sheets in a similar way to the above.

For example, a simple regex approach (pseudocode):

import re

md = inputs["llm"]["text"]

# Split into sheet blocks
blocks = re.split(r'^##\s+', md, flags=re.MULTILINE)
# blocks[0] might be content before the first ##, can be ignored
sheet_blocks = blocks[1:]

sheets = []
for block in sheet_blocks:
    # The first line until the newline is the sheet name
    lines = block.splitlines()
    sheet_name = lines[0].strip()
    sheet_body = "\n".join(lines[1:])

    # Then parse the first markdown table in sheet_body into a 2D array
    # (You can write your own parser or use an existing library)
    table = parse_markdown_table(sheet_body)

    sheets.append({"name": sheet_name, "table": table})

# Then reuse the openpyxl writing logic from above

Ultimately, this code node will generate the Excel file, rather than relying on the Markdown converter to automatically infer Sheets.

3. What If You Absolutely Must Continue to Rely on the Markdown Converter?

Based on currently available information, there is no ‘guaranteed version’ of documentation describing:

The mapping rules between headings like ##, ###, etc., and Sheets;
Or a switch that can enable ‘splitting Sheets by heading’.

Therefore:

Even if you currently discover a way that ‘seems to split Sheets by heading,’ it might be an implementation detail and not necessarily stable in future versions;
For production scenarios, it’s still recommended to use the aforementioned ‘LLM + code-generated Excel’ approach to avoid relying on undocumented behavior.

4. Practical Advice (Minimal Change Version)

On your existing workflow, if you don’t want to make major changes, you can fine-tune it like this:

Upstream LLM:

Continue to output ‘Markdown with ##’;

But additionally, in the Prompt, ensure the model outputs a JSON structure simultaneously, for example:

First, provide the Markdown for readability;
Then, provide a JSON code block with the following structure:
```json
{ "sheets": [ { "name": "...",
                "table": [["表头1","表头2"],["...","..."]] } ] }

Downstream, add a new code node:
- Specifically read the JSON section, ignoring the Markdown section;
- Generate the file using the JSON → multi-Sheet Excel logic;
- No longer use the Markdown converter to convert to Excel.

This way:

Humans can still view elegant Markdown in chat records;
When the machine generates Excel, it follows a ‘structured + controllable’ path.

Summary

The core issue you’re facing is: The Markdown converter does not have a reliable, publicly documented rule for ‘## → Excel sheet’, so it won’t create multiple Sheets in the way you expect.
To stably control Sheet names and structure, it’s recommended to use:
- ‘LLM outputs structured JSON → code node generates Excel’; or
- ‘LLM outputs Markdown with ## → code node splits by ## → writes Excel itself’.
This way, you don’t rely on undocumented internal implementation details, and future version upgrades will be safer.

If it’s convenient for you, you can paste a small, complete example of your current LLM output (after anonymization), and I can directly help you write a corresponding code node example to convert this output into a multi-Sheet Excel file.

zhaoliang · January 23, 2026, 3:41am

I am considering switching to a code block to parse JSON or Markdown myself to generate Excel.

If the Markdown converter cannot ensure support for such usage, then I think the description information here in Dify should at least be modified to avoid causing confusion for users.

@Dify_Forum_Helper

Dify_Forum_Helper · January 23, 2026, 3:42am

Your supplementary feedback is very crucial, and the text you screenshotted indeed easily leads people to “misinterpret it as a strong commitment.”

“All heading text (e.g., #, ##) will be used sequentially as worksheet names.”

Combining this with your actual test results just now, we can say two things quite clearly:

From the perspective of “actual product behavior”
- The Markdown converter currently has no problem converting Markdown tables to Excel.
- However, “splitting sheets by heading and strictly using heading text as sheet names” is not a reliable, stable contract in the current implementation (especially when using LLM to automatically generate Markdown, the model can easily produce extra blank lines, explanatory text, code blocks, etc., all of which can affect parsing).
- The actual effect is often:
  - Sometimes only one sheet is generated;
  - Or the sheet name is inconsistent with the heading;
  - Combinations of multi-level headings/multiple tables may not be split as you expect.
From the perspective of “consistency between text and actual behavior”
- The tone of the description you saw is “will definitely”:
  
  “All heading text (e.g., #, ##) will be used sequentially as worksheet names.”
- But the current implementation is closer to “in some scenarios/under limited rules, it attempts to use headings as worksheet names,” rather than “unconditionally guaranteed.”
- For scenarios like yours, where LLMs automatically generate Markdown, this description indeed makes people mistakenly believe that “as long as there’s ##, you’ll 100% get a corresponding Sheet.”

Therefore, your current plan to “switch to using a code block to parse JSON or Markdown yourself to generate Excel” is a more robust choice from an engineering practice perspective—especially when you want precise control over multiple sheets and naming.

How it can be implemented & How I would suggest the official team handles it

Your own implementation choice (is a reasonable path)
- Use LLM to output structured JSON / or a pre-defined Markdown structure;
- In the code node, you can:
  - Parse out the sheet list ([{name, table}, ...]);
  - Manually write Excel using libraries like openpyxl / pandas, with full control over multiple sheets and naming.
- This approach avoids relying on any “undocumented details,” which is best for production stability.
A suggestion regarding the text / product (I will summarize it internally)
- There is indeed a discrepancy between the current text and the truly guaranteed behavior.
- A safer wording should be similar to:
  - “When the Markdown structure meets specific rules, it will attempt to use heading text as the worksheet name”
    or
  - Simply not promise automatic sheet splitting by heading, but rather emphasize:
    “Currently, it primarily supports converting Markdown tables to Excel worksheets; if precise control over multiple worksheets and naming is required, it is recommended to use a code node or plugin to generate Excel yourself.”
- Your feedback in the post is inherently very valuable:
  - It helps the product/documentation team realize that “the promise made in the text > the behavior the code can actually guarantee”;
  - And it can prevent more users from falling into the same trap later.

Summary to help you confirm the current situation

Your choice to “generate Excel yourself through code” is correct, especially in scenarios where you need precise control over sheet names/quantities.
Based on your current experience, this UI description of the Markdown converter does indeed cause misunderstanding, and your suggestion (to adjust or weaken this text) is a very reasonable product improvement point.

If you later need “from the current LLM output format (Markdown or JSON) → specific Python examples” to generate multi-sheet Excel in the code node, you can paste a small snippet of the de-identified actual output, and I can directly write a ready-to-use code node example according to your format.

Topic		Replies	Views
工作流中同时上传了文件和一个问题，发现llm在思考时说未收到任何上传的文档 Discussion	20	164	January 21, 2026
在“代码执行”节点中无法单独关联 input_file, rules_file, stock_file，并怀疑与 userinput.files 有关。请求提供正确映射变量或处理遗留字段的官方方法。 Discussion commuity , case	7	39	January 26, 2026
文档提取工具节点一直卡在运行状态 Discussion case	7	53	January 26, 2026
Dify本地化部署，它默认不内置文档解析引擎的吗？ Discussion	11	153	January 24, 2026
Dify有没有工作流或者插件可以将word(doc和docx)转成pdf或者将word(doc和docx)直接解析出来 Discussion	5	110	January 22, 2026
The dify process executed without any errors, but the API did not return a value（dify流程执行没有报错，但是API没有返回值） Discussion	14	88	January 22, 2026
Dify+webhook+poll+error Discussion	6	83	January 26, 2026
代码节点输出的自定义分块数据，如何适配知识库节点的 parent_child_structure (multimodal) 校验？ Discussion commuity	0	27	January 12, 2026
Dify chatflow 最后的直接回复节点引用了开始节点的files Discussion	1	36	January 13, 2026
大模型输出了mermaid代码块，但是在dify平台无法正常渲染成图形 Discussion	1	7	February 4, 2026

Markdown converter can't get file path, output variable unavailable?

1. Why does using files as “context” cause an error?

2. If you want the LLM to analyze xlsx content, what is the correct approach?

3. What if you just want to get the file’s URL?

4. Extended usage for Excel

Summary

1. Understanding the Current Behavior

2. Solutions for Achieving ‘Split Sheets by Heading’

Approach A: LLM Outputs Structured JSON, Then Code Generates Excel

Approach B: Continue Having LLM Output Markdown, But Use Code to Split and Write Excel

3. What If You Absolutely Must Continue to Rely on the Markdown Converter?

4. Practical Advice (Minimal Change Version)

Summary

How it can be implemented & How I would suggest the official team handles it

Summary to help you confirm the current situation

Related topics

1. Why does using `files` as “context” cause an error?