ドキュメントエクストラクターはどこにありますか？

r3-yamauchi · 2025 年 12 月 17 日午前 12:39

私はnova-2-multimodal-embeddings-v1:0を使用してマルチモーダルRAGを構築しようとしています。
Dify Extractorを使用して、画像とテキストを含むPDFファイルを処理した際、テキスト情報のみを取得できました。

以下のドキュメントに基づくと、Dify ExtractorではなくDoc Extractorを使用する必要があると考えています：
https://docs.dify.ai/en/use-dify/knowledge/knowledge-pipeline/knowledge-pipeline-orchestration#doc-extractor

しかし、https://cloud.dify.ai/の環境でDoc Extractorを見つけることができません。
このデータ処理ツールはどこにありますか？

ninesunsabiu · 2025 年 12 月 17 日午前 3:01

r3-yamauchi · 2025 年 12 月 17 日午前 5:23

ありがとうございます。言語設定を英語に変更することで、見つけることができました。

しかし、これを使っても、PDFに含まれるテキストのみを抽出できるようです。画像とテキストが混在したドキュメントを、その意味を保ったまま知識ベースに変換する方法がわかりません。

Amazon Bedrock Knowledge Basesでは、「パース戦略」で「基礎モデルをパーサーとして指定」できますが、…

ninesunsabiu · 2025 年 12 月 17 日午後 2:46

ドキュメントによると、Difyのエクストラクタープラグインが必要です。まだ使用したことはありませんが、後で試してみます。

トピック		返信	表示
Dify平台的文档提取器不支持扫描版的pdf的吗？ Discussion	2	96	2026 年 1 月 26 日
Dify有没有工作流或者插件可以将word(doc和docx)转成pdf或者将word(doc和docx)直接解析出来 Discussion	5	187	2026 年 1 月 22 日
Dify本地化部署，它默认不内置文档解析引擎的吗？ Discussion	11	254	2026 年 1 月 24 日
以流水线创建知识库，节点工具dify文本提取器始终出错 Discussion	0	59	2025 年 12 月 15 日
How to use Box Datasource? Help Me Build	2	120	2026 年 1 月 13 日
Dify Workflow – Phase 2 \| Document Extractor: Processing Uploaded Files English 🇬🇧 ai , course-beginner	0	148	2026 年 1 月 29 日
Plugin: Advanced Markdown Chunker – smarter Markdown chunking for RAG Discussion	2	257	2026 年 1 月 14 日
🎉 Dify v1.11.1 Multimodal Knowledge Base Is Live Activities	0	496	2025 年 12 月 15 日
Dify是不是只能发挥模型的文本能力？ Discussion case , commuity , readme	3	106	2026 年 1 月 18 日
Dataset not found" when uploading documents to Knowledge Base Feedback	4	236	2026 年 3 月 11 日