Question:
Hi everyone,
I’m using the Text-to-Speech (TTS) block in Dify with the gpt-4o-mini-tts model from OpenAI.
In the OpenAI API or Python SDK, we can include an additional field called instructions to control the tone, style, or mood of the generated audio. For example:
client.audio.speech.with_streaming_response.create(
model="gpt-4o-mini-tts",
voice="coral",
input="Today is a wonderful day to build something people love!",
instructions="Speak in a cheerful and positive tone."
)
However, in Dify’s TTS block settings, I don’t see any option to add instructions — only model, voice, and input text fields.
Is there a way in Dify to pass the same kind of instructions parameter (e.g., “Speak in a calm and professional tone”) to the TTS model?
Or do we need to use a workaround, like embedding the tone directly in the input text or using a custom HTTP block to call the OpenAI TTS endpoint?
Would really appreciate any guidance or examples from others who’ve implemented tone/style control in TTS within Dify workflows.