audio, tts
Text-to-Speech Studio
Clone voices and synthesize natural speech with AI

What's new
RSS FeedSearch news
Enter a query to search through headlines and tags
Filter by tags
#anime#bg-remove#chat#common#computer-vision#face-swap#image#image-editor#image-enhance#image-extender#image-gen#image-outpaint#music-gen#obj-remove#text#tts#video#video-enhance#video-gen
Improved quality, reduced cost
2026-01-18
#tts
- Faster generation
- Fewer artifacts
- Reduced cost
- Improved tag handling and updated documentation
Description
Tag Usage Guide
⏸️ Pause Tags System
NEW: Intelligent pause insertion for precise speech rhythm control!
Syntax
You can use pause tags anywhere in the text. Various formats are supported:
- Seconds:
[pause:1.5],[pause:2s],[pause:3] - Milliseconds:
[pause:500ms],[pause:1200ms],[pause:800ms]
Usage Examples
Welcome to our show! [pause:1s] Today we will discuss interesting topics.
[Alice] I am so excited! [pause:500ms] This will be great.
[pause:2] Let's move on to the main part.
🎭 Paralinguistic Tags
The service has built-in support for tags to add non-verbal sounds (breathing, laughter, etc.). These tags are processed directly by the model during speech generation, ensuring a natural sound.
Single Tags (Sound Insertion)
These tags insert a sound at the location where they are placed in the text.
| Tag | Effect | Example |
|---|---|---|
<breath> | Breath | I'm tired <breath> let's rest |
<quick_breath> | Quick breath | Running <quick_breath> almost there |
<laughter> | Laughter | That's hilarious <laughter>! |
<cough> | Cough | Excuse me <cough> sorry |
<sigh> | Sigh | Fine <sigh> I'll do it |
<gasp> | Gasp (fright/surprise) | Oh no <gasp> what happened? |
<noise> | Background noise | Walking <noise> through the forest |
<hissing> | Hissing | The snake <hissing> slithered away |
<vocalized-noise> | Vocalized noise | Hmm <vocalized-noise> interesting |
<lipsmack> | Lip smack | Tasty <lipsmack> food |
<mn> | Humming "mm" | I think <mn> maybe |
<clucking> | Clucking | Disapproving <clucking> |
<accent> | Accent/Emphasis | Very <accent> important |
Wrapper Tags (Emotional Coloring)
| Tag | Effect | Example |
|---|---|---|
<laughing>text</laughing> | Speak text with laughter | <laughing>so funny</laughing>! |
<strong>text</strong> | Emphasize text | <strong>very important</strong> |
Language and Character Switching
Language switching (EN, ZH, JA, KO, DE, ES, FR, IT, RU). It is recommended to use the standard square bracket syntax:
[en:Alice] Hello world
[ru:Bob] Привет мир
[zh:] 你好世界
Recommendations
- Format: Use angle brackets
<breath>to avoid conflicts with character names. - Naturalness: Insert tags where appropriate in live speech (pauses before an answer, sighs when tired).
- Moderation: 1-2 tags per sentence is recommended.
- Limitations: The strength of effects (laughter volume, sigh duration) is regulated by the model itself and cannot be changed by parameters.