Text-to-Speech Studio

Clone voices and synthesize natural speech with AI

Text-to-Speech Studio logo

Select one of the preset voices for quick speech generation:

English Voices

Michael

James

Mark

Sarah

Kate

Emma

Text to be voiced

0 / 20000 characters
Results include generated speech audio.

What's new

RSS Feed
Search news
Enter a query to search through headlines and tags
Filter by tags
#anime#bg-remove#chat#common#computer-vision#face-swap#image#image-editor#image-enhance#image-extender#image-gen#image-outpaint#music-gen#obj-remove#text#tts#video#video-enhance#video-gen
Improved quality, reduced cost
2026-01-18
#tts
  • Faster generation
  • Fewer artifacts
  • Reduced cost
  • Improved tag handling and updated documentation

Description

Tag Usage Guide

⏸️ Pause Tags System

NEW: Intelligent pause insertion for precise speech rhythm control!

Syntax

You can use pause tags anywhere in the text. Various formats are supported:

  • Seconds: [pause:1.5], [pause:2s], [pause:3]
  • Milliseconds: [pause:500ms], [pause:1200ms], [pause:800ms]

Usage Examples

Welcome to our show! [pause:1s] Today we will discuss interesting topics.
[Alice] I am so excited! [pause:500ms] This will be great.
[pause:2] Let's move on to the main part.

🎭 Paralinguistic Tags

The service has built-in support for tags to add non-verbal sounds (breathing, laughter, etc.). These tags are processed directly by the model during speech generation, ensuring a natural sound.

Single Tags (Sound Insertion)

These tags insert a sound at the location where they are placed in the text.

TagEffectExample
<breath>BreathI'm tired <breath> let's rest
<quick_breath>Quick breathRunning <quick_breath> almost there
<laughter>LaughterThat's hilarious <laughter>!
<cough>CoughExcuse me <cough> sorry
<sigh>SighFine <sigh> I'll do it
<gasp>Gasp (fright/surprise)Oh no <gasp> what happened?
<noise>Background noiseWalking <noise> through the forest
<hissing>HissingThe snake <hissing> slithered away
<vocalized-noise>Vocalized noiseHmm <vocalized-noise> interesting
<lipsmack>Lip smackTasty <lipsmack> food
<mn>Humming "mm"I think <mn> maybe
<clucking>CluckingDisapproving <clucking>
<accent>Accent/EmphasisVery <accent> important

Wrapper Tags (Emotional Coloring)

TagEffectExample
<laughing>text</laughing>Speak text with laughter<laughing>so funny</laughing>!
<strong>text</strong>Emphasize text<strong>very important</strong>

Language and Character Switching

Language switching (EN, ZH, JA, KO, DE, ES, FR, IT, RU). It is recommended to use the standard square bracket syntax:

[en:Alice] Hello world
[ru:Bob] Привет мир
[zh:] 你好世界

Recommendations

  1. Format: Use angle brackets <breath> to avoid conflicts with character names.
  2. Naturalness: Insert tags where appropriate in live speech (pauses before an answer, sighs when tired).
  3. Moderation: 1-2 tags per sentence is recommended.
  4. Limitations: The strength of effects (laughter volume, sigh duration) is regulated by the model itself and cannot be changed by parameters.