Text to audio (podcast)
Escriba doesn’t just turn documents into text for an LLM — it can also turn the result back into sound. From any conversion, open the Audio / Podcast option to generate an MP3 and listen to your document.
Two modes
Section titled “Two modes”- Narration — a single voice reads the document end to end.
- Podcast — an AI writes a short two-host dialogue about the document (host + expert), and Escriba voices it with two alternating voices and stitches it together. Podcast mode needs an AI provider configured (it writes the script); narration does not.
Two engines
Section titled “Two engines”- Local (Piper) — the default. Voices run on your server, fully offline — the text never leaves your machine. Escriba ships 14 voices across Spanish, English, Portuguese, French, Italian, German and Chinese.
- Cloud (OpenAI) — optional, higher quality. Uses your OpenAI API key; the text is sent to OpenAI only when you pick a cloud voice. Great for languages without a local voice (e.g. Japanese).
Controls
Section titled “Controls”Like a studio panel, you choose:
- Voice — language + speaker (local or cloud).
- Pitch — low / medium / high.
- Speed — slow / normal / fast.
- Volume — low / medium / high.
A built-in player lets you preview the audio before downloading the MP3.
Who can use it & limits
Section titled “Who can use it & limits”Audio generation is available to the ANGEL and DIOS levels (like audio/video and OCR). Because Piper synthesizes on the CPU, there’s a per-request character limit to protect the server — configurable per role:
| Setting | Default | Meaning |
|---|---|---|
GOD_TTS_CHARS | 0 | DIOS: no limit |
ANGEL_TTS_CHARS | 100000 | ANGEL: max characters per MP3 |
HUMAN_TTS_CHARS | 20000 | HUMANO (only if HUMAN_TTS=true) |
TTS_TIMEOUT | 600 | Max seconds per synthesis |
TTS_OPENAI_MODEL | tts-1 | Cloud model (tts-1 or tts-1-hd) |
See Configuration for the full list. A very long document with a local voice can take a while to synthesize — that’s the CPU, not a bug.