Converting documents
Escriba converts almost anything into Markdown. Detection is automatic — you rarely need to tell it what kind of file you dropped.
What you can convert
Section titled “What you can convert”- Documents — PDF, Word, Excel, PowerPoint, HTML, CSV, EPUB, ZIP and more.
- Images — automatic OCR (Tesseract); optional AI description.
- Audio & video — local, offline transcription with Whisper (mp3, wav, mp4, mov, mkv…).
- URLs & YouTube — convert a web page, or fetch a YouTube transcript.
Smart OCR
Section titled “Smart OCR”Text inside images is recognized automatically. Scanned and rotated PDFs are detected, OCR’d and auto-straightened on the fly. If a PDF looks scanned and your access level allows OCR, Escriba applies it without you asking.
You can also force OCR from the advanced options — useful for PDFs with broken accents (e.g. exported from LaTeX). Forcing OCR uses the document language you choose, so set it for best results.
Page selection
Section titled “Page selection”For long PDFs, convert only the pages you need. Next to each queued PDF there’s a page picker that shows the document’s page count and lets you choose:
- The whole document (default).
- A range — e.g. pages
5to67. - Individual pages or ranges — e.g.
1, 6, 9, or a mix like1, 2, 5-67.
There’s no syntax to memorize: the picker is built for it. The selection is made per file, so different PDFs in the same batch can use different pages.
Advanced options
Section titled “Advanced options”Open the advanced panel to fine-tune a conversion:
- Document language — improves audio transcription and forced OCR.
- Force OCR — for scanned PDFs or broken accents.
- Advanced PDF extraction — an opt-in OpenDataLoader engine for complex layouts: better reading order and heading hierarchy, with automatic fallback to the default extractor. Slower, but sharper on tricky documents.
- Anonymization — strip or replace personal data; see Anonymization.
- AI provider — optional. The default is No AI (local text / OCR only).
Edit before you export or voice it
Section titled “Edit before you export or voice it”The result isn’t read-only. Hit Edit to open it in a full-screen Markdown editor with a live preview, tidy it up — drop boilerplate, fix a heading, trim noise — and Save. Your edits become the result: everything downstream (export, audio, copy and download) uses the cleaned text. Nothing is sent anywhere; it’s all in your browser until you act.
Batches
Section titled “Batches”Add several files at once (your access level sets how many). Convert
them all, then download everything as a .zip. Uploaded files are deleted right after
conversion — nothing is stored.