Google has released Gemini 3.1 Flash TTS, a new text-to-speech model available through the standard Gemini API using the model ID gemini-3.1-flash-tts-preview. The model is designed to convert text to audio output and can be directed through natural language prompts, allowing users to guide its behavior beyond simple text-to-speech conversion.
The model represents an advancement in controllable audio generation by enabling prompt-based direction of speech synthesis. Unlike traditional TTS systems that simply convert written text to audio with fixed parameters, Gemini 3.1 Flash TTS appears to accept instructions that influence how the speech is generated, offering greater flexibility in voice output characteristics.
The release marks Google's expansion of the Gemini family into specialized audio generation. The availability through the standard Gemini API integration suggests the company is positioning prompt-directed TTS as a core capability for developers, potentially enabling new applications in accessibility, content creation, and interactive systems where nuanced voice control is needed.
Key Takeaways
- 1 Flash TTS, a new text-to-speech model available through the standard Gemini API using the model ID gemini-3.
- The model is designed to convert text to audio output and can be directed through natural language prompts, allowing users to guide its behavior beyond simple text-to-speech conversion.
- The model represents an advancement in controllable audio generation by enabling prompt-based direction of speech synthesis.
- Unlike traditional TTS systems that simply convert written text to audio with fixed parameters, Gemini 3.
Read the full article on Simon Willison
Read on Simon Willison