I am trying to implement a tool/plugin that uses DuckDB to execute SQL queries (SELECT statements) on text data.
When I run python -m main in my local environment to debug the plugin, I was able to run the query tool as shown in the attached image.
However, when I packaged this plugin and uploaded/installed it on my local OSS version or on https://cloud.dify.ai/, the workflow terminates after the DuckDB node takes about five minutes to process.
I suspect that the large binary size of DuckDB and similar factors may be affecting the issue I am facing.
Could you tell me about the limitations on Python libraries that can be used inside a tool/plugin (sandbox)?
In our cloud version, most of the libs are not installed by default, due to the security concerns, actions like db connection and http connections are disabled by default, please consider that we are using the default ssrf_proxy template.
Thank you for your comment.
Is my understanding correct that the cloud version has limitations on which Python libraries can be used, and that plugins cannot bypass those restrictions?
If I need to build a workflow that uses libraries unavailable to plugins, would it be best practice to implement an API-based extension outside of Dify—for example, by using Cloudflare Workers—instead of using a plugin?
In the self-hosted (non-cloud) version of Dify, is it possible to avoid these restrictions by modifying the ssrf_proxy settings or through other configuration changes?
Is my understanding correct that the cloud version has limitations on which Python libraries can be used, and that plugins cannot bypass those restrictions?
We do not limit specific libraries in our cloud version, those restrictions are configured in the squid.conf, and in the runner(which is a aws lambda container, it might has it’s own restrictions as well)
If I need to build a workflow that uses libraries unavailable to plugins, would it be best practice to implement an API-based extension outside of Dify—for example, by using Cloudflare Workers—instead of using a plugin?
Unfortunately, yes.
In the self-hosted (non-cloud) version of Dify, is it possible to avoid these restrictions by modifying the ssrf_proxy settings or through other configuration changes?
Yes, most of the libraries require specific port to connect other services like databases, elasticsearch, etc.
DuckDB (https://duckdb.org/) is merely an engine for executing queries (SELECT statements) against text data, so it does not need to connect to any external services nor communicate over specific ports.
Regarding the sandbox you can read the docs above.
These syscalls is the sandbox already added: 0,1,3,8,9,10,11,12,13,14,15,16,16,24,25,35,39,60,96,102,105,106,110,131,186,201,202,217,228,230,231,233,234,257,262,270,273,291,318,334. You need to compare what is the extras syscall numbers of previous step. You can use a simple script or ask LLM to archive that. In this case, it’s 5, 17, 28, 63, 204, 237, 281, 435
Thank you.
No errors such as “xxx.so: cannot open shared object file: No such file or directory” or “Operation not permitted” occur; it just waits for five minutes and then exits, so I’m not sure whether it’s actually violating any system call restrictions.
However, since the issue only happens after packaging — the debug run works fine but the packaged version makes me wait five minutes — I’ll look into whether there might be another cause.
Maybe it’s related to PLUGIN_MAX_EXECUTION_TIMEOUT, but it’s set to 600 instead. Not sure about this value in our cloud environment, but it’s possible that if you can not get the results in time, then the execution should be terminated by this setting.