Skip to content

The Escriba Suite

The Escriba Suite is a family of self-hosted, open-source tools for one job: turning any source into clean, private, model-ready data. Each tool is excellent on its own — but they were designed as a single instrument.

Escriba is the hub: the universal translator that takes a document and produces clean, anonymized Markdown ready for any LLM. Around it orbit four specialists, each the best at capturing one kind of source and handing it back to the hub.

Escriba — the hub

Documents, audio, video and pages → clean Markdown, with PII redaction, OCR, transcription, export to 10+ formats and a podcast mode.

Fisherboy — web capture

Any URL → Markdown or structured data, with tiered anti-blocking, file/video/gallery downloads and a section spider.

Anonimal — privacy engine

The serious anonymization layer: local ML/NER plus regex, opaque or reversible. The satellites plug into it.

Fulgoria — document data

Bank statements, PDFs and images → CSV rows plus a reusable template, all in the browser.

Selega — financial control

Financial statements validated by 14 live numeric cross-checks, proposing the legalization outcome.

  • Self-hosted. A single Docker image (or a small compose file) you run on your own hardware. Your files never touch a third-party cloud.
  • Open source. MIT or Apache-2.0. Yours to read, fork and deploy.
  • Private by design. Nothing is stored after the job is done; the sensitive work happens locally.
  • One look, seven languages. The same interface — English, Español, Français, Português, Italiano, 中文, 日本語 — auto-detected and switchable.

The suite feels like one product because every app honours two simple contracts.

Every app uses the same design language: light theme by default with a dark mode, the same typography (Inter Variable + JetBrains Mono), line icons rather than emojis, and the same components — each app carrying its own accent colour so you always know where you are.

ToolAccentRole
EscribaCoral #e06a3aThe hub
FisherboyOcean teal #0f8f6aWeb capture
AnonimalMask indigo #4a4e7cPrivacy engine
FulgoriaViolet #6c5cf0Document data
SelegaBordó #a8324aFinancial control

Any satellite can hand its result to the hub without a round-trip to a server. It writes the captured content to the browser’s sessionStorage under escriba.handoff using a small JSON contract, then opens Escriba — which reads it and creates a ready-to-use item.

{
"from": "fisherboy",
"version": 1,
"title": "Captured page",
"source": "https://example.com/article",
"mime": "text/markdown",
"content": "# Clean markdown…",
"alt": { "csv": "…optional…" },
"ts": 1719000000000
}

Anonimal is the owner of serious anonymization in the suite — the full ML/NER + Privacy-Filter engine. To stay usable on their own, the satellites also ship a minimal built-in regex so they never depend on anything to run standalone.

When you point a satellite at Anonimal (via an ANONIMAL_URL environment variable), it unlocks the full power — names included — and fails closed if Anonimal is unavailable, rather than silently downgrading to regex. Privacy never degrades by accident.

Each tool runs independently, but the intended topology for the full experience is one domain behind a reverse proxy, so every app is same-origin. That makes the design feel seamless and lets the sessionStorage handoff work across the suite.