Does Dify's latest version still not support video upload?

Dify_Forum_Helper · January 27, 2026, 5:53am

At the moment, the situation can be viewed on two levels: whether the model supports video, and whether the Dify frontend/API accepts video file types.

First, regarding whether file upload is supported, it depends on whether features has enabled the video file type under files.

Second, for GLM, according to the official manual, the sample code is as follows:

# Video understanding example, upload video URL
from zhipuai import ZhipuAI

client = ZhipuAI(api_key="YOUR API KEY")  # Fill in your own API key
response = client.chat.completions.create(
    model="glm-4v-plus-0111",  # Fill in the model name to call
    messages=[
      {
        "role": "user",
        "content": [
          {
            "type": "video_url",
            "video_url": {
                "url" : "https://sfile.chatglm.cn/testpath/video/xxxxx.mp4"
            }
          },
          {
            "type": "text",
            "text": "Please describe this video in detail."
          }
        ]
      }
    ]
)
print(response.choices[0].message)

This shows that for GLM models, users only need to pass a valid video URL for the model to analyze the video.

If using the Qwen model series, according to the official Qwen API documentation:

Qwen-VL analyzes content by extracting a sequence of frames from the video. The frame extraction frequency determines the granularity of analysis. Different SDKs have different default frame extraction frequencies, and the model supports controlling the frequency via the fps parameter (extract one frame every 1/fps seconds, range [0.1, 10], default 2.0). It is recommended to set a higher fps for fast-motion scenes and a lower fps for static or long videos.

import dashscope
import os

# The following is the Singapore region base_url. If using the Virginia region model,
# change base_url to https://dashscope-us.aliyuncs.com/api/v1
# If using the Beijing region model, change base_url to: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
    {"role": "user",
        "content": [
            # fps can control the video frame extraction frequency, meaning one frame is extracted every 1/fps seconds.
            # Full usage: https://www.alibabacloud.com/help/en/model-studio/use-qwen-by-calling-api?#2ed5ee7377fum
            {"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
            {"text": "What is this video about?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    # API keys differ by region. Get an API key:
    # https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you don't have environment variables configured, replace the next line with: api_key="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen3-vl-plus',
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

Another form, based on Alibaba’s official manual:

import os
# dashscope version must be >= 1.20.10
import dashscope

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [{"role": "user",
             "content": [
                 # If the model belongs to the Qwen2.5-VL series and an image list is provided,
                 # you can set fps to indicate that the image list was extracted from the original video every 1/fps seconds.
                 {"video": ["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
                  "fps": 2},
                 {"text": "Describe the detailed process shown in this video."}]}]
response = dashscope.MultiModalConversation.call(
    # If you don't have environment variables configured, replace the next line with: api_key="sk-xxx"
    # API keys differ for Singapore/Virginia and Beijing regions. Get an API key:
    # https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen2.5-vl-72b-instruct',  # Example model; replace as needed. Model list:
    # https://www.alibabacloud.com/help/en/model-studio/models
    messages=messages
)
print(response["output"]["choices"][0]["message"].content[0]["text"])

From the screenshot, the key point is not whether the user is using GLM or Qwen, but that the user is using the SiliconFlow plugin. According to SiliconFlow’s official documentation, its vision handling works like this:

2. Usage

For VLM models, you can call the /chat/completions endpoint and construct message content that includes an image URL or a base64-encoded image. Use the detail parameter to control the image preprocessing mode.

2.1 Detail parameter

SiliconFlow provides three detail options: low, high, and auto. For currently supported models, if detail is omitted or set to high, high (“high resolution”) mode is used; if set to low or auto, low (“low resolution”) mode is used.

4. Billing for visual inputs

Visual inputs such as images are converted into tokens and billed together with text as part of the context. Different models convert visual content differently; below is the current conversion rule.

4.1 Qwen series

Rules:

Qwen supports a maximum resolution of 3584 × 3584 = 12,845,056 pixels and a minimum resolution of 56 × 56 = 3,136 pixels. Each image is first resized so that both sides are multiples of 28, i.e. (h * 28) × (w * 28). If the result falls outside the min/max pixel range, it is further scaled proportionally into that range.

When detail=low, all images are resized to 448 × 448, which maps to 256 tokens.

When detail=high, the image is scaled proportionally: first round width/height up to the nearest multiple of 28, then scale proportionally into the pixel range (3136, 12845056) while keeping both sides as multiples of 28.

Examples:

For images sized 224 × 448, 1024 × 1024, and 3172 × 4096, choosing detail=low always costs 256 tokens.

For 224 × 448 with detail=high: it is within the pixel range and both sides are multiples of 28, so cost is (224/28) × (448/28) = 8 × 16 = 128 tokens.

For 1024 × 1024 with detail=high: round up to 1036 × 1036 (nearest multiples of 28), within range, so cost is (1036/28) × (1036/28) = 1369 tokens.

For 3172 × 4096 with detail=high: round up to 3192 × 4116, exceeding max pixels, then scale proportionally down to 3136 × 4060, so cost is (3136/28) × (4060/28) = 16240 tokens.

The official API docs do not mention video handling, which indicates that the core issue is not Dify itself, but that SiliconFlow does not support video processing. This also explains why the user mentioned doing manual frame extraction.

Solution

Wait for SiliconFlow to officially support video/stream inputs, and notify Dify maintainers (and the SiliconFlow plugin author) to update the plugin accordingly.
Alternatively, switch to the Dify Tongyi plugin and use an Alibaba Cloud Bailian (Model Studio) API key.

Topic		Replies	Views
工作流中同时上传了文件和一个问题，发现llm在思考时说未收到任何上传的文档 Discussion	20	666	January 21, 2026
Dify是不是只能发挥模型的文本能力？ Discussion readme , commuity , case	3	142	January 18, 2026
dify(Version 1.10.1) 通过api上传pptx文件到流水线知识库报错 Discussion case	2	291	January 1, 2026
Tongyi 模型调用报错：Incorrect model credentials provided Discussion	8	474	January 21, 2026
Dify Version 1.9.2 知识库转成工作流形式之后文件无法上传，单独知识库文件可以上传。查看api日志发现没有请求过来 Help Me Build commuity	1	317	December 4, 2025
The dify process executed without any errors, but the API did not return a value（dify流程执行没有报错，但是API没有返回值） Discussion	15	574	April 16, 2026
Dify工作流知识库上传图片报错 Seeking help readme , commuity , case	1	207	January 17, 2026
Dify有没有工作流或者插件可以将word(doc和docx)转成pdf或者将word(doc和docx)直接解析出来 Discussion	5	343	January 22, 2026
Dify chatflow 最后的直接回复节点引用了开始节点的files Discussion	1	59	January 13, 2026
Dify-chatflow-iteration Discussion commuity , case	6	316	January 7, 2026

Does Dify's latest version still not support video upload?

2. Usage

2.1 Detail parameter

4. Billing for visual inputs

4.1 Qwen series

Solution

Related topics