How can we pass “instructions” to the TTS model (like gpt-4o-mini-tts) in Dify?

Question:

Hi everyone,

I’m using the Text-to-Speech (TTS) block in Dify with the gpt-4o-mini-tts model from OpenAI.
In the OpenAI API or Python SDK, we can include an additional field called instructions to control the tone, style, or mood of the generated audio. For example:

client.audio.speech.with_streaming_response.create(
    model="gpt-4o-mini-tts",
    voice="coral",
    input="Today is a wonderful day to build something people love!",
    instructions="Speak in a cheerful and positive tone."
)

However, in Dify’s TTS block settings, I don’t see any option to add instructions — only model, voice, and input text fields.

Is there a way in Dify to pass the same kind of instructions parameter (e.g., “Speak in a calm and professional tone”) to the TTS model?
Or do we need to use a workaround, like embedding the tone directly in the input text or using a custom HTTP block to call the OpenAI TTS endpoint?

Would really appreciate any guidance or examples from others who’ve implemented tone/style control in TTS within Dify workflows.

@Kirtan_Bhad
Unfortunately, this is currently a known limitation of the built-in TTS node.

As a workaround, you can use the Podcast Generator plugin, which supports the Instructions feature (I added this feature for this purpose :slight_smile: ). Although this plugin is designed to generate conversation-style voices for two people, you can generate a single voice by passing in a one-line script and filling the Voice 2 parameters with dummy data.

Of course, you can also use the HTTP node to call the OpenAI API directly.

Hope this helps.

1 Like