Where is the Doc Extractor?

r3-yamauchi · December 17, 2025, 12:39am

I am trying to build a multimodal RAG using nova-2-multimodal-embeddings-v1:0.
When I processed a PDF file containing both images and text with Dify Extractor, I was only able to obtain textual information.

Based on the following documentation, I believe that I need to use Doc Extractor instead of Dify Extractor:
https://docs.dify.ai/en/use-dify/knowledge/knowledge-pipeline/knowledge-pipeline-orchestration#doc-extractor

However, I cannot find Doc Extractor in my environment at https://cloud.dify.ai/.
Where is this Data Processing Tool located?

ninesunsabiu · December 17, 2025, 3:01am

r3-yamauchi · December 17, 2025, 5:23am

Thank you. I was able to find it by changing the language setting to English.

However, even when using this, it seems that only the text contained in the PDF can be extracted.
I don’t know how to turn a document that contains a mix of images and text into a knowledge base while preserving its meaning.

With Amazon Bedrock Knowledge Bases, you can specify a “foundation model as a parser” in the “parsing strategy,” but…

ninesunsabiu · December 17, 2025, 2:46pm

According to the documentation, the Dify extractor plugin is what you need. I haven’t used it yet, but I’ll give it a try later.

Topic		Replies	Views
Dify平台的文档提取器不支持扫描版的pdf的吗？ Discussion	2	71	January 26, 2026
Dify有没有工作流或者插件可以将word(doc和docx)转成pdf或者将word(doc和docx)直接解析出来 Discussion	5	112	January 22, 2026
Dify本地化部署，它默认不内置文档解析引擎的吗？ Discussion	11	154	January 24, 2026
以流水线创建知识库，节点工具dify文本提取器始终出错 Discussion	0	47	December 15, 2025
How to use Box Datasource? Help Me Build	2	57	January 13, 2026
Dify Workflow – Phase 2 \| Document Extractor: Processing Uploaded Files English 🇬🇧 ai , course-beginner	0	36	January 29, 2026
Plugin: Advanced Markdown Chunker – smarter Markdown chunking for RAG Discussion	2	186	January 14, 2026
🎉 Dify v1.11.1 Multimodal Knowledge Base Is Live Activities	0	386	December 15, 2025
Dify是不是只能发挥模型的文本能力？ Discussion case , commuity , readme	3	82	January 18, 2026
Dataset not found" when uploading documents to Knowledge Base Feedback	2	137	December 2, 2025

Where is the Doc Extractor?

Related topics