There are indeed two easily confused points here: “file objects” and “text content usable by LLM”.
1. Why does using files as “context” cause an error?
The output of the Markdown converter is roughly structured like this:
{
"text": "...",
"files": [
{
"dify_model_identity": "__dify__file__",
"type": "document",
"filename": "20260121_170237.xlsx",
"extension": ".xlsx",
"mime_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"size": 9845,
"url": "https://...signed_link..."
}
],
"json": [...]
}
- Each item in
files is essentially a Dify internal file object / handle, containing information like __dify__file__, FileType.DOCUMENT, etc.
- The “context” of an LLM node expects string text or specific structured text, not this kind of “file object”.
So, when you bind the entire files array directly to the “context” in the next LLM node, the LLM node receives a “file object array”, whose structure does not match what it expects, leading to the error you saw:
Run failed: Invalid context structure: dify_model_identity='__dify__file__' ... type=<FileType.DOCUMENT: 'document'> ...
It’s not that the url cannot be retrieved, but rather: this entire object should not be passed as context to the LLM at all.
Analogy: You are currently passing “an Excel file handle + metadata” to the LLM, not “table content”, so the model naturally “doesn’t understand”.
2. If you want the LLM to analyze xlsx content, what is the correct approach?
Based on the screenshot/description, your goal should be:
To have the LLM read the table content from an xlsx file and then perform analysis.
In this case, do not directly reference files using “context”. Instead, you should:
-
Use a “Document Extractor node” to parse the file content
- Input: The file variable output by the upstream node (can be an array) – for example, the
files from the Markdown converter, or files uploaded via the Start node.
- Output: Plain text (e.g., converting Excel to Markdown table text).
-
In the LLM node, use the text output by the Document Extractor node as prompt / context
For example, in the system prompt or user prompt, write something like:
Below is the Excel content uploaded by the user (converted to a Markdown table):
{{ doc_extractor.text }}
Please answer the user's questions based on the table above...
This way, the LLM receives plain text tables, and the Invalid context structure error will no longer be triggered.
3. What if you just want to get the file’s URL?
If you merely want to get a field like files[0].url, rather than directly “feeding” this object to the LLM, you can:
-
Parse it in a code node (Python / JS):
file_url = inputs["markdown_node"]["files"][0]["url"]
-
Then output this file_url for subsequent nodes to use (e.g., calling your own service, etc.).
However, this is also not suitable for direct use as LLM “context”, because the URL points to a binary Excel file, which the LLM itself still cannot read. It can only be converted to text by an intermediate “parsing node”.
4. Extended usage for Excel
In addition to the general method of using a “Document Extractor node”, Dify’s Marketplace also offers some plugins for Excel (e.g., more flexible reading, writing, querying by sheet name/cell range).
If you want to perform more complex Excel processing later, you can consider:
- Using Excel-related plugins for structured processing
- Or combining multiple steps: Plugin → Document Extractor → LLM
Summary
files is an array of file objects and cannot be used directly as LLM context, which is why it reports Invalid context structure.
- To have the LLM read xlsx content:
Markdown / Start node (produces files) → Document Extractor node → Outputs text → Used as context / Prompt for the LLM node.
- To simply get the URL: Use a code node to extract it from
files[i].url, do not directly pass the entire files to the LLM.
If you wish, you can paste a brief structure of your current workflow (each node type + key connections), and I can help you write a specific connection and variable reference example for “uploading Excel to LLM analysis”.