Security

Escriba is built to run with confidential documents. The guiding principle: the control stays on the human layer — your files are processed on your server and you decide what ever reaches an LLM.

Private by design

Nothing is stored. Uploaded files are deleted right after conversion.
No third-party cloud. Conversion, OCR, transcription and anonymization run locally on your host.
The restore map stays local. Pseudonymization’s token→original map never leaves your browser.

Hardened by default

Fail-closed anonymization — if anonymization is requested and the Anonimal service is unreachable, the request errors out; raw text is never emitted as a fallback.
Anti-SSRF — URL fetching blocks internal IPs and redirects; local-file and file:// access is restricted to DIOS only.
XSS sanitization — the preview is sanitized with DOMPurify; a strict Content-Security-Policy and security headers are set.
Rate limiting & lockout — per-role request limits, shared across workers via the embedded Redis, plus login lockout on repeated failures.
Non-root container — runs as an unprivileged user with no-new-privileges.
Safe regex — user-supplied anonymization rules run on RE2 (linear time), which is immune to ReDoS.
DoS guards — uploads are size-capped via streaming; the page selector is capped to prevent range-bomb expansion.

Audited

The codebase went through a strict multi-perspective audit and a red-team pen-test, with every finding fixed and verified. Hardening highlights include a random per-install hashing key, sanitized X-Forwarded-For handling (trusted proxies only), session revocation, scrubbed PDF metadata on redaction, and no-cache headers on static assets.