Skip to content

Install Anonimal

Anonimal runs local / self-hosted: the data never leaves your infrastructure. The image is published to GitHub Container Registry and exposes port 8000 inside the container.

There are two images. Pick by coverage versus weight.

ImageTagSizeDetectsWhen
full (ML):latest, :<ver>~6-7 GB image + ~3 GB RAMstructured + names / addresses (OPF)maximum coverage; replacing the ecosystem’s Anonimal
lite (regex):lite, :<ver>-litetens of MBstructured only (email, phone, card, DNI, CUIT, CBU, secrets)lightweight, no ML; does not see free-form names
  • Docker (the only hard requirement).
  • lite: a few tens of MB of disk and minimal RAM. Starts instantly.
  • full: ~6-7 GB of disk and ~3 GB of RAM for the resident model; it is CPU-bound. Give the container roughly 6 GB of RAM.
Terminal window
# Starts instantly, no model
docker run -d --name anonimal -p 8920:8000 \
ghcr.io/diegoparras/anonimal-svc:lite

Check health, then anonymize a sample. The web UI is served at the same address.

Terminal window
curl -s localhost:8920/health
curl -s localhost:8920/anonymize -H "Content-Type: application/json" \
-d '{"text":"email juan@acme.com, CUIT 20-12345678-6","mode":"pseudo"}'

On the full image, /health returns immediately while the model loads in the background; ml.ready flips to true once the checkpoint is warm. The UI lives at http://localhost:8920.

The ANONIMAL_ENGINE variable selects the detection engine:

  • auto (default) — use ML if it is ready, otherwise fall back to lite.
  • lite — regex only (structured + LATAM IDs). Always available.
  • ml — force the OpenAI Privacy Filter engine (returns 503 if it is not available, e.g. on the lite image).

OPF_DEVICE (full image only) switches the ML engine between cpu and cuda.

VariableDefaultPurpose
ANONIMAL_ENGINEautoEngine selection: auto · lite · ml.
ANONIMAL_MODEpseudoDefault replacement mode for the API / UI.
ANONIMAL_TOKEN(empty)Service token. If set, every request must carry it (Authorization: Bearer or X-Anonimal-Token).
ANON_HASH_KEY(random per process)Key for hash mode; set it for stable pseudonyms across restarts.
ANONIMAL_MAX_CHARS500000Max input length (over this returns 413).
ANONIMAL_MAX_PDF_BYTES26214400 (25 MB)Max PDF size for redaction.
OPF_DEVICEcpucpu or cuda (full image only).
OPF_CHECKPOINT(default)Path to a custom OPF checkpoint (full image only).

Designed to run local. If you expose it on a network: set ANONIMAL_TOKEN (required on every request) and put a TLS reverse proxy in front. The image runs as a non-root user and enforces size caps (ANONIMAL_MAX_CHARS). Inside the ecosystem, keep Anonimal on the internal network with no public domain and let Escriba reach it by internal hostname.

Read the full documentation