French
TTS Voices
French text-to-speech voices with phrase-level prosody
French phonology and prosody
Vowels that travel through the nose
French has nasal vowels[1]: /ɛ̃/, /ɑ̃/, /ɔ̃/: produced with airflow through both the mouth and the nasal cavity. English has no equivalent phonemes. The words "vin," "bon," and "un" each carry a distinct nasal vowel that changes meaning if denasalized. Combined with tenser articulation and more extreme lip rounding[2] on vowels like /y/ in "tu," French demands a vowel space English-trained models simply don't map. Synthesizing these sounds accurately requires models that run where the audio is rendered: not piped across providers that flatten the nasal-oral distinction in transit.
Rhythm without a downbeat
English is stress-timed[1]: strong and weak syllables alternate, and unstressed vowels collapse toward schwa[2]. French runs closer to syllable-timed[3], distributing duration more evenly across every syllable. Where English "I don't want to GO" hammers one word and swallows the rest, French "Je ne veux pas y aller" keeps each syllable roughly equal in weight[4]. A TTS system built on English stress-timed assumptions will impose strong-weak patterning that sounds immediately wrong. Even rhythm at this precision requires inference co-located with audio processing, with no hops to introduce timing artifacts.
Stress locked to the phrase edge
In English, stress is lexical: it falls on different syllables and distinguishes words[1] ("REcord" vs. "reCORD"). French stress is predictable and phrase-final[2], landing on the last full syllable of each prosodic group. It marks boundaries, not meanings. French vowels also maintain their quality in unstressed positions[3] rather than reducing: an /o/ stays /o/ regardless of where stress falls. Voice infrastructure that handles French needs to track phrase-level grouping and place prominence at the edge, running synthesis and telephony in one stack so prosodic boundaries survive intact.