Text to audio (podcast)

Escriba doesn’t just turn documents into text for an LLM — it can also turn the result back into sound. From any conversion, open the Audio / Podcast option to generate an MP3 and listen to your document.

Two modes

Narration — a single voice reads the document end to end.
Podcast — an AI writes a short two-host dialogue about the document (host + expert), and Escriba voices it with two alternating voices and stitches it together. Podcast mode needs an AI provider configured (it writes the script); narration does not.

Two engines

Local (Piper) — the default. Voices run on your server, fully offline — the text never leaves your machine. Escriba ships 14 voices across Spanish, English, Portuguese, French, Italian, German and Chinese.
Cloud (OpenAI) — optional, higher quality. Uses your OpenAI API key; the text is sent to OpenAI only when you pick a cloud voice. Great for languages without a local voice (e.g. Japanese).

Controls

Like a studio panel, you choose:

Voice — language + speaker (local or cloud).
Pitch — low / medium / high.
Speed — slow / normal / fast.
Volume — low / medium / high.

A built-in player lets you preview the audio before downloading the MP3.

Who can use it & limits

Audio generation is available to the ANGEL and DIOS levels (like audio/video and OCR). Because Piper synthesizes on the CPU, there’s a per-request character limit to protect the server — configurable per role:

Setting	Default	Meaning
`GOD_TTS_CHARS`	`0`	DIOS: no limit
`ANGEL_TTS_CHARS`	`100000`	ANGEL: max characters per MP3
`HUMAN_TTS_CHARS`	`20000`	HUMANO (only if `HUMAN_TTS=true`)
`TTS_TIMEOUT`	`600`	Max seconds per synthesis
`TTS_OPENAI_MODEL`	`tts-1`	Cloud model (`tts-1` or `tts-1-hd`)

See Configuration for the full list. A very long document with a local voice can take a while to synthesize — that’s the CPU, not a bug.