Does Dify have a workflow or plugin that can convert Word (doc and docx) to PDF, or directly parse Word (doc and docx) into Page 1: Content. Page 2: Content.?
Yes, there are. File conversion can be done using Markdown Exporter; as for content parsing, you can solve it with the built-in nodes’ doc extractor. If your PDF is an image version, you can try Unstructured Plugin and also search for OCR plugins such as: OCR recognition service. Related post: 工作流中同时上传了文件和一个问题,发现llm在思考时说未收到任何上传的文档 - #12 by lysonober
1 Like
@Dify_Forum_Helper, please summarize the Q&A.
Okay, here is the Q&A summary for this topic:
Question
The user asks if there is a Dify workflow or plugin that can achieve the following functions:
- Convert Word documents (doc/docx) to PDF.
- Directly parse Word document content, preferably extracting it page by page (e.g., “Page 1: Content”).
Answer
- Direct Content Parsing (Core Solution):
- Use the built-in Doc Extractor node in the Dify workflow. This is the most direct method for parsing document content.
- File Format Conversion:
- You can use the Markdown Exporter plugin from the Dify plugin marketplace to handle file conversion requirements.
- Handling Image/Scanned PDFs:
- If the parsed file is an image-based PDF, it is recommended to use the Unstructured Plugin.
- Alternatively, search for OCR-related plugins in the plugin marketplace, such as OCR recognition service.
Related Resources:
- Plugin Marketplace: Search for “OCR” or check the “Tools” category.
- Built-in Nodes: Look for
Doc Extractorin the workflow editor.
Thank you very much for your answer, it’s very detailed, I’ll give it a try ![]()
![]()
1 Like
you can try PaddleOCR as well
1 Like

