Document extraction tool node stuck in running state

20260122141329_rec_
It was normal before, but now it’s stuck at this point.

@Dify_Forum_Helper How to solve this problem

It seems your setup was working fine before, but now the “Document Extractor” node is suddenly stuck in a running state and won’t finish. Most likely, the underlying parsing service or queue is stuck. Since I don’t have access to your instance’s logs and configuration, I can only provide you with a systematic troubleshooting approach that you can try step by step:


1. First, Confirm the Scope of Impact

  1. Test with a very small, simple file

    • For example, a few KB plain text .txt file or a very small PDF.
    • If small files also get stuck: This indicates an issue with the parsing service / platform itself.
    • If small files work normally but large files get stuck: It might be due to the file being too large / timeout / insufficient resources.
  2. Create a new, simplest workflow in the same application for testing

    • Start → Document Extractor → Output (only output text).
    • Check if this new workflow also gets stuck at the extractor node.
    • If the new workflow works normally but the old one doesn’t: The old workflow’s configuration might have an issue (e.g., variable binding, no result from the previous node, etc.).

2. Check Workflow Run Details

  1. Open the “Run History / Logs” for this run

    • Check if the Document Extractor node returned any error messages, status codes, timeouts, etc.
    • If it only shows “Executing” without finishing or reporting errors, it’s highly likely that the request has not returned (e.g., an external parsing service call is stuck).
  2. Check the Document Extractor node’s configuration

    • Are the input variables using the output of the previous node?
    • You can temporarily change it to “Manually upload file” for a test to rule out variable issues.

3. Key Checks for Self-Hosted Environments (If you are using Docker / Server Deployment)

If you have deployed Dify yourself:

  1. Restart relevant services

    • Directly restarting the entire suite of services is usually the fastest method, as it can clear many queue blockages and stuck child processes.
    • For Docker deployments, it’s usually:
      docker compose down
      docker compose up -d
      
    • After restarting, test the same workflow again.
  2. Check logs (Crucial)

    • In the logs of the backend API / worker services, search for the following near the time of this run:
      • unstructured, document extractor, extractor related errors
      • Timeout, OOM (Out Of Memory), connection failed, and other errors
    • If there are clear errors in the logs, you can paste the error messages into the post to facilitate further investigation.
  3. Resource Utilization

    • Is the server’s CPU / memory maxed out?
    • If resources are insufficient, parsing large files can easily get stuck or become very slow, appearing as if the node is continuously running.
  4. Have you configured an external document parsing service?

    • If you previously integrated unstructured or other third-party parsing services:
      • Confirm that these services are still running and network connectivity is normal (check if container names / IPs have changed).
      • If the service is down, Dify waiting for a response might also get stuck indefinitely.

4. Cloud Version / Non-Self-Hosted

If you are using the official cloud version:

  1. Try:

    • Try again with a different browser or in incognito mode (to rule out browser cache / plugin interference).
    • Duplicate this workflow, create an identical one, and see if the duplicated one works normally.
  2. Record this information to facilitate posting it or submitting it to official support:

    • Your Dify version (or state “Cloud Version”).
    • The App ID / Workflow ID for this process (do not post sensitive information).
    • Run ID or approximate timestamp.
    • Does the same node also get stuck in other workflows?

5. Information Recommended to Supplement in Your Current Post

To help you pinpoint the issue faster, please supplement this post with a few more details:

  1. You are currently using:
    • Official cloud, or
    • Self-deployed (Docker / K8s, what is the version number)?
  2. Document type and approximate size (e.g., PDF, 20MB, includes images/scans).
  3. This issue:
    • Does it get stuck for all files? Or only for a few specific files?
    • Are all other workflows using the Document Extractor also stuck?
  4. If self-hosted: Paste a small snippet of backend logs from the relevant time (after anonymization).

You can first quickly try steps 1-3 above (small file test, create simplest workflow, restart services) and reply with the results in this post. I can then help you narrow down the problem based on those results.

@Dify_Forum_Helper I’m using the official cloud, and it keeps getting stuck even with empty Word documents.

Your additional information is crucial, thank you for clarifying.

We can now confirm a few phenomena:

  • Using ‘official cloud’ instead of self-hosting
  • Document type is Word, and even ‘empty content’ gets stuck at the document extractor node
  • This indicates it’s not strongly related to file size or content complexity, but rather some kind of ‘task not ending correctly’ issue (possibly related to recent version updates or backend service anomalies)

Since it’s a cloud environment, what you can primarily do is ‘rule out frontend issues + help the official team pinpoint the problem’; backend logs can only be checked by the official team. I suggest you quickly try the steps below and post the results directly:


1. Quickly Conduct 3 Comparative Tests

  1. Create a Minimal Workflow (to avoid interference from existing process configurations)
  • Create a brand new workflow application
  • Only include three nodes:
    • Start (User Input)
    • Document Extractor (Upload File)
    • Output (only output the extracted text)
  • Test once with a ‘blank Word’ document to see if it still gets stuck at the document extractor:

A. If the newly created minimal process also gets stuck:

  • There’s a high probability that the document parsing service for the current account / current region has an issue, requiring official investigation.

B. If the new process works normally, but the original process is still stuck:

  • It’s possible that a configuration in the old process is ‘broken’ (e.g., variable type changes, remnants of old version nodes, etc.), and you might consider rebuilding the process from scratch.
  1. Conduct AB Testing by Changing File Types
  • Also try again in the newly created minimal process:
    • A few KB .txt text file
    • A very small .pdf
  • Compare the results:
    • If txt / pdf works normally, but only Word gets stuck: this indicates that the cloud’s current Word parsing might have an issue. You can clearly state in the post ‘txt / pdf works normally, but Word (even empty documents) gets stuck’.
    • If all types get stuck: this indicates an anomaly in the entire document parsing pipeline.
  1. Test again with a different browser / incognito mode
  • The purpose is simply to rule out interference from browser cache, plugins, etc.
  • Run the same minimal process again in an incognito window to confirm if the phenomenon is exactly the same.

2. Supplement These Key Troubleshooting Results in the Post

It is recommended that you directly supplement the current post with information in the format below, which will help the official team or community members quickly pinpoint the issue and make it more likely to be treated as a ‘platform-side problem’:

  1. Usage Environment:

    • ‘Official Cloud Dify (non-self-hosted)’
  2. Reproduction Steps (as simple and clear as possible):

    • Create workflow: Start → Document Extractor (Upload File) → Output
    • Uploaded file:
      • Blank Word (.docx)
      • Did you also test txt / pdf (write the results clearly)
    • Result:
      • The document extractor node continuously shows ‘Running’, does not finish even after several minutes, and there is no error message.
  3. Add some time information:

    • Approximately when did the problem start (e.g., ‘started around 14:00 this afternoon’)
    • The same process was normal before.
  4. If convenient:

    • Attach a screenshot of the newly created minimal process + being stuck at the document extractor node.

This way, the official team will find it easier to determine if it’s:

  • A document parsing service/queue anomaly in a specific region
  • Or if a recent upgrade to the document extractor introduced a bug (especially for Word / empty documents)

3. For now, you can temporarily bypass it this way (if you need to use it urgently)

Before the issue is officially fixed, if you just need to continue processing documents, you can consider:

  1. First, use a local tool to export the Word document as .txt or .pdf, then try uploading it to the document extractor.
  2. If txt / pdf works normally on your end, at least you can ensure business continuity;
  3. After official confirmation / fix, then resume using Word normally.

If you are willing, you can provide:

  • The test results of the newly created minimal process (performance of word / txt / pdf respectively)
  • Approximate reproduction time point

Reply directly below this post, and I can help you organize it into a ‘reproduction description’ template that can be directly used for official support; you just need to copy and paste it.

@Dify_Forum_Helper PDF format is fine, but Word documents don’t work. Hope the official team fixes it soon.

You’ve already helped narrow down the scope very clearly:

  • Official cloud
  • Document Extractor
  • PDF works normally
  • Word (even empty documents) consistently causes a freeze

This basically indicates a new issue with the ‘Word parsing pipeline’ on the cloud side, rather than a problem with your account or workflow configuration.

Currently, there are two things you can do: one is a temporary workaround, and the other is to provide complete information to facilitate official investigation.


1. The Most Stable Temporary Solution

Before the official fix, I suggest you:

  1. First, convert Word to PDF or TXT locally

    • Save Word as .pdf / .txt
    • Upload PDF / TXT in the Document Extractor, instead of the original .docx
    • You’ve already verified that PDF can be processed normally, which can ensure your operations aren’t blocked.
  2. If you have a small amount of content, you can also:

    • Directly copy the Word text content and paste it into ‘User Input’, instead of uploading a file (suitable for when there isn’t much content).

2. I Suggest You Add a “For Official Review” Message to Your Post

You just said:

PDF format works, but Word documents don’t. Hope the officials fix it soon.

You can write it a bit ‘more like a bug reproduction description’, for example, by replying with (you can copy and modify this):

Environment: Official Dify Cloud
Reproduction Steps:

  1. Create the simplest workflow: Start → Document Extractor (upload file) → Output text
  2. Upload an empty or any content Word (.docx) file
    Expected: Node execution completes, outputting the extracted text
    Actual: The Document Extractor node remains “running” indefinitely, without error
    Comparison: Under the same workflow, uploading a small PDF file completes normally

This description makes it easier for official staff to ‘reproduce it directly’ and is more likely to be treated as a platform bug.


3. How to Follow Up

  • Since this is a cloud-side issue, you cannot see the backend logs and can only wait for the official operations/development team to investigate.
  • You can continue to use the “Word → PDF/TXT then upload” method as a temporary measure.
  • If an official staff member replies to the post later with a fix progress or timeline, you can also @ me, and I can help you confirm if your existing workflow configuration needs further adjustment.

If you’d like, I can also help you refine a “bilingual Chinese-English” reproduction description, which would be convenient for you to submit to support or use on other channels later.

What is the size of your Word document? Is the PDF used for testing converted from a Word document?