Okay, here is the Q&A summary for this topic:
Question
The user asks if there is a Dify workflow or plugin that can achieve the following functions:
- Convert Word documents (doc/docx) to PDF.
- Directly parse Word document content, preferably extracting it page by page (e.g., “Page 1: Content”).
Answer
- Direct Content Parsing (Core Solution):
- Use the built-in Doc Extractor node in the Dify workflow. This is the most direct method for parsing document content.
- File Format Conversion:
- You can use the Markdown Exporter plugin from the Dify plugin marketplace to handle file conversion requirements.
- Handling Image/Scanned PDFs:
- If the parsed file is an image-based PDF, it is recommended to use the Unstructured Plugin.
- Alternatively, search for OCR-related plugins in the plugin marketplace, such as OCR recognition service.
Related Resources:
- Plugin Marketplace: Search for “OCR” or check the “Tools” category.
- Built-in Nodes: Look for
Doc Extractorin the workflow editor.