Knowledge database ingestion errors

When I am uploading documents (with the high quality selection checked, so it can be sent to the vector database for embedding), it shows me the following log,

worker-1  | 2026-02-02 06:36:37.363 INFO [MainThread] [strategy.py:161]  - Task tasks.batch_clean_document_task.batch_clean_document_task[1db775c8-3563-44f8-97cc-3e87a320905c] received
worker-1  | 2026-02-02 06:36:37.385 INFO [Dummy-4896] [trace.py:128] 05fe11d48b515a4d87099be1d6831c9a - Task schedule.workflow_schedule_task.poll_workflow_schedules[16015282-0b0e-4136-ac8c-1ebcb866899d] succeeded in 0.03709391795564443s: None
worker-1  | 2026-02-02 06:36:37.386 INFO [Dummy-4897] [batch_clean_document_task.py:29] f2b1ea40dba5564fae6244aa07e98468 - e[32mStart batch clean documents when documents deletede[0m
worker-1  | 2026-02-02 06:36:37.387 ERROR [Dummy-4897] [batch_clean_document_task.py:90] f2b1ea40dba5564fae6244aa07e98468 - Cleaned documents when documents deleted failed
worker-1  | Traceback (most recent call last):
worker-1  |   File "/app/api/tasks/batch_clean_document_task.py", line 38, in batch_clean_document_task
worker-1  |     raise Exception("Document has no dataset")
worker-1  | Exception: Document has no dataset
worker-1  | 2026-02-02 06:36:37.407 INFO [Dummy-4897] [trace.py:128] f2b1ea40dba5564fae6244aa07e98468 - Task tasks.batch_clean_document_task.batch_clean_document_task[1db775c8-3563-44f8-97cc-3e87a320905c] succeeded in 0.0212203289847821s: None
worker-1  | 2026-02-02 06:36:37.747 INFO [MainThread] [strategy.py:161]  - Task schedule.check_upgradable_plugin_task.check_upgradable_plugin_task[bc716ba9-f333-4ae8-b09c-359a474ea359] received
worker-1  | 2026-02-02 06:36:37.748 WARNING [Dummy-4898] [log.py:232] 1431caf35b645ad6990f827746a667e4 - Start check upgradable plugin.
worker-1  | 2026-02-02 06:36:37.748 WARNING [Dummy-4898] [log.py:232] 1431caf35b645ad6990f827746a667e4 - Now seconds of day: 23767.74876189232
worker-1  | 2026-02-02 06:36:37.752 WARNING [Dummy-4898] [log.py:232] 1431caf35b645ad6990f827746a667e4 - Total strategies: 0
worker-1  | 2026-02-02 06:36:37.752 WARNING [Dummy-4898] [log.py:232] 1431caf35b645ad6990f827746a667e4 - Checked upgradable plugin success latency: 0.003996776009444147
worker-1  | 2026-02-02 06:36:37.772 INFO [Dummy-4898] [trace.py:128] 1431caf35b645ad6990f827746a667e4 - Task schedule.check_upgradable_plugin_task.check_upgradable_plugin_task[bc716ba9-f333-4ae8-b09c-359a474ea359] succeeded in 0.023636055004317313s: None
worker-1  | 2026-02-02 06:36:39.749 INFO [MainThread] [strategy.py:161]  - Task tasks.document_indexing_task.priority_document_indexing_task[7fcd950d-d867-4bf8-9f59-ae27d602c98b] received
worker-1  | 2026-02-02 06:36:39.750 INFO [Dummy-4899] [document_indexing_task.py:173] a30e62a6aea75c8d944147278c72cc6e - priority document indexing task received: 91c6a589-acea-47d0-bc8b-9098cb994971 - e93b6695-d31b-4de3-863e-ac0ada45e13a - ['9b99a16f-8d9b-4236-a318-c1b65ac413da', 'a51ca586-83df-423d-aac2-1b5b05477ae4', 'cde11e47-4bc8-42b8-9d64-0b01105cea50', '3d6f67a6-8964-480c-9c10-8e0b090e9ecb', '75991a21-6cf1-41d8-b6d4-21c0423a1291', '6f4ee1b6-5ec6-4daf-90bc-3109b492497b', 'f1f87fd0-b356-4fdd-9af0-ec2a648e3394', 'ee27d1e7-367c-4d22-a97e-d0eb9e79aa40', 'e0c780f7-f52a-45a2-91f2-6f4b053c42b8', '206dc7f2-15af-4b78-bb8e-4151acdcdbef']
worker-1  | 2026-02-02 06:36:39.751 INFO [Dummy-4899] [document_indexing_task.py:51] a30e62a6aea75c8d944147278c72cc6e - e[33mDataset is not found: e93b6695-d31b-4de3-863e-ac0ada45e13ae[0m
worker-1  | 2026-02-02 06:36:39.752 INFO [Dummy-4899] [document_indexing_task.py:131] a30e62a6aea75c8d944147278c72cc6e - document indexing tenant isolation queue 91c6a589-acea-47d0-bc8b-9098cb994971 next tasks: []
worker-1  | 2026-02-02 06:36:39.771 INFO [Dummy-4899] [trace.py:128] a30e62a6aea75c8d944147278c72cc6e - Task tasks.document_indexing_task.priority_document_indexing_task[7fcd950d-d867-4bf8-9f59-ae27d602c98b] succeeded in 0.021346897003240883s: None
worker-1  | 2026-02-02 06:36:41.751 INFO [MainThread] [strategy.py:161]  - Task schedule.workflow_schedule_task.poll_workflow_schedules[478e2abd-1cae-47f5-ae26-533cf9110d7b] received

as you can see the embedding is doing fine and then there’s one error, saying the document is not found, and then it proceeds to continue to the next embedding.

Once all the process is done, one of the document get stuck in the pending mode with 0 bytes

Why is this happening? I am using Dify community v1.11.3. Thanks.

Perhaps the knowledge base usage was too high at that time, causing too many backlogs. Is this still a problem now?

Hi @Sherry_M , I had got the official response from the Dify support team:

The error you’re seeing is not related to your knowledge ingestion process. The message “Document has no dataset” comes from batch_clean_document_task, which is a background cleanup task. This task runs when previously deleted documents still have residual data that needs to be removed.
In this case, the associated dataset no longer exists, so the cleanup task reports an error. This does not affect your knowledge ingestion or embedding process in any way—as you observed, embeddings continue to proceed normally after this message. You can safely ignore it.

I will mark this thread as resolved. Thanks.