Install Anonimal
Anonimal runs local / self-hosted: the data never leaves your infrastructure. The image is published to GitHub Container Registry and exposes port 8000 inside the container.
Choose an image
Section titled “Choose an image”There are two images. Pick by coverage versus weight.
| Image | Tag | Size | Detects | When |
|---|---|---|---|---|
| full (ML) | :latest, :<ver> | ~6-7 GB image + ~3 GB RAM | structured + names / addresses (OPF) | maximum coverage; replacing the ecosystem’s Anonimal |
| lite (regex) | :lite, :<ver>-lite | tens of MB | structured only (email, phone, card, DNI, CUIT, CBU, secrets) | lightweight, no ML; does not see free-form names |
Requirements
Section titled “Requirements”- Docker (the only hard requirement).
- lite: a few tens of MB of disk and minimal RAM. Starts instantly.
- full: ~6-7 GB of disk and ~3 GB of RAM for the resident model; it is CPU-bound. Give the container roughly 6 GB of RAM.
Docker run
Section titled “Docker run”# Starts instantly, no modeldocker run -d --name anonimal -p 8920:8000 \ ghcr.io/diegoparras/anonimal-svc:lite# Loads the model into RAM on first boot — give it a momentdocker run -d --name anonimal -p 8920:8000 \ ghcr.io/diegoparras/anonimal-svc:latestFirst boot
Section titled “First boot”Check health, then anonymize a sample. The web UI is served at the same address.
curl -s localhost:8920/healthcurl -s localhost:8920/anonymize -H "Content-Type: application/json" \ -d '{"text":"email juan@acme.com, CUIT 20-12345678-6","mode":"pseudo"}'On the full image, /health returns immediately while the model loads in the
background; ml.ready flips to true once the checkpoint is warm. The UI lives at
http://localhost:8920.
Configuration: regex vs ML engine
Section titled “Configuration: regex vs ML engine”The ANONIMAL_ENGINE variable selects the detection engine:
auto(default) — use ML if it is ready, otherwise fall back to lite.lite— regex only (structured + LATAM IDs). Always available.ml— force the OpenAI Privacy Filter engine (returns503if it is not available, e.g. on the lite image).
OPF_DEVICE (full image only) switches the ML engine between cpu and cuda.
Environment variables
Section titled “Environment variables”| Variable | Default | Purpose |
|---|---|---|
ANONIMAL_ENGINE | auto | Engine selection: auto · lite · ml. |
ANONIMAL_MODE | pseudo | Default replacement mode for the API / UI. |
ANONIMAL_TOKEN | (empty) | Service token. If set, every request must carry it (Authorization: Bearer or X-Anonimal-Token). |
ANON_HASH_KEY | (random per process) | Key for hash mode; set it for stable pseudonyms across restarts. |
ANONIMAL_MAX_CHARS | 500000 | Max input length (over this returns 413). |
ANONIMAL_MAX_PDF_BYTES | 26214400 (25 MB) | Max PDF size for redaction. |
OPF_DEVICE | cpu | cpu or cuda (full image only). |
OPF_CHECKPOINT | (default) | Path to a custom OPF checkpoint (full image only). |
Security
Section titled “Security”Designed to run local. If you expose it on a network: set ANONIMAL_TOKEN
(required on every request) and put a TLS reverse proxy in front. The image runs
as a non-root user and enforces size caps (ANONIMAL_MAX_CHARS). Inside the
ecosystem, keep Anonimal on the internal network with no public domain and let
Escriba reach it by internal hostname.