Install Anonimal

Anonimal runs local / self-hosted: the data never leaves your infrastructure. The image is published to GitHub Container Registry and exposes port 8000 inside the container.

Choose an image

There are two images. Pick by coverage versus weight.

Image	Tag	Size	Detects	When
full (ML)	`:latest`, `:<ver>`	~6-7 GB image + ~3 GB RAM	structured + names / addresses (OPF)	maximum coverage; replacing the ecosystem’s Anonimal
lite (regex)	`:lite`, `:<ver>-lite`	tens of MB	structured only (email, phone, card, DNI, CUIT, CBU, secrets)	lightweight, no ML; does not see free-form names

Requirements

Docker (the only hard requirement).
lite: a few tens of MB of disk and minimal RAM. Starts instantly.
full: ~6-7 GB of disk and ~3 GB of RAM for the resident model; it is CPU-bound. Give the container roughly 6 GB of RAM.

# Starts instantly, no model
docker run -d --name anonimal -p 8920:8000 \
  ghcr.io/diegoparras/anonimal-svc:lite

# Loads the model into RAM on first boot — give it a moment
docker run -d --name anonimal -p 8920:8000 \
  ghcr.io/diegoparras/anonimal-svc:latest

First boot

Check health, then anonymize a sample. The web UI is served at the same address.

curl -s localhost:8920/health
curl -s localhost:8920/anonymize -H "Content-Type: application/json" \
  -d '{"text":"email juan@acme.com, CUIT 20-12345678-6","mode":"pseudo"}'

On the full image, /health returns immediately while the model loads in the background; ml.ready flips to true once the checkpoint is warm. The UI lives at http://localhost:8920.

Configuration: regex vs ML engine

The ANONIMAL_ENGINE variable selects the detection engine:

auto (default) — use ML if it is ready, otherwise fall back to lite.
lite — regex only (structured + LATAM IDs). Always available.
ml — force the OpenAI Privacy Filter engine (returns 503 if it is not available, e.g. on the lite image).

OPF_DEVICE (full image only) switches the ML engine between cpu and cuda.

Environment variables

Variable	Default	Purpose
`ANONIMAL_ENGINE`	`auto`	Engine selection: `auto` · `lite` · `ml`.
`ANONIMAL_MODE`	`pseudo`	Default replacement mode for the API / UI.
`ANONIMAL_TOKEN`	(empty)	Service token. If set, every request must carry it (`Authorization: Bearer` or `X-Anonimal-Token`).
`ANON_HASH_KEY`	(random per process)	Key for `hash` mode; set it for stable pseudonyms across restarts.
`ANONIMAL_MAX_CHARS`	`500000`	Max input length (over this returns `413`).
`ANONIMAL_MAX_PDF_BYTES`	`26214400` (25 MB)	Max PDF size for redaction.
`OPF_DEVICE`	`cpu`	`cpu` or `cuda` (full image only).
`OPF_CHECKPOINT`	(default)	Path to a custom OPF checkpoint (full image only).

Security

Designed to run local. If you expose it on a network: set ANONIMAL_TOKEN (required on every request) and put a TLS reverse proxy in front. The image runs as a non-root user and enforces size caps (ANONIMAL_MAX_CHARS). Inside the ecosystem, keep Anonimal on the internal network with no public domain and let Escriba reach it by internal hostname.

Read the full documentation