Structured data (both engines)
Emails, phone numbers, credit cards (validated with the Luhn check), URLs, IPv4 addresses and common secrets.
Anonimal detects PII in text and replaces it according to a chosen mode. A detector only finds spans; the replacement is decided separately. Everything runs locally — the original data never leaves the machine.
Detection depends on the active engine.
Structured data (both engines)
Emails, phone numbers, credit cards (validated with the Luhn check), URLs, IPv4 addresses and common secrets.
LATAM identifiers (both engines)
Argentine DNI, CUIT / CUIL (with check digit) and CBU bank numbers.
Free-form PII (ML engine only)
People’s names and addresses in running prose, via NER — the part regex cannot see.
Custom rules
User-supplied rules: always-hide / never-hide whitelists and your own
{regex, placeholder} patterns.
A detector exposes detect(text) → [Span]; overlaps are resolved by the longest
span (ties broken by label priority).
anonimal_lite
library.Select with ANONIMAL_ENGINE: auto (ML if ready, else lite) · lite
· ml. Requests may override the default per call with an engine field.
Two modes are opaque (one-way) and one is reversible. A single anonymizer is used per document, so the same value always gets the same replacement (consistency).
| Mode | Result | Reversible |
|---|---|---|
typed | [EMAIL] (placeholder by category) | no |
anon | «REDACTADO» (single opaque token) | no |
pseudo | EMAIL_1 (stable numbered pseudonym) | yes (returns a map) |
mask | j***@***.com / ****-****-****-1234 (type-aware) | no |
hash | EMAIL_a1b2c3d4e5 (deterministic HMAC) | no |
typed, anon, mask and hash produce non-reversible output. Use them when
you only need to share or store text safely. The hash mode is deterministic:
set ANON_HASH_KEY so the same value hashes identically across restarts (stable
linkage without storing a map).
pseudo is the reversible mode. It replaces each value with a stable token
(EMAIL_1, PERSON_2, …) and returns a token → original map. The workflow:
POST /anonymize with mode: "pseudo" → get anonymized output plus map.output to the LLM (the original PII never reaches it).POST /deanonymize with the LLM’s answer and the same map → the original
values are re-hydrated back into the text.Anonimal preserves the format of files that are already text: txt, md,
log, srt, html, CSV (cells anonymized, columns intact) and JSON
(string values anonymized, keys never touched; output stays valid JSON). A single
anonymizer per file means a consistent map for the whole document.
Converting Word / Excel / images / audio / URLs is not Anonimal’s job — that
belongs to Escriba, Extracta and Fisherboy, which feed already-converted text in.
Anonimal does, however, offer real PDF redaction (/redact_pdf): genuine
black-out of detected spans plus metadata removal.
All endpoints except /health are gated by require_auth (see
authentication). Base URL is your deployment, e.g.
http://localhost:8920.
| Method | Path | What it does |
|---|---|---|
GET | /health | Status + ML engine availability. Always open. |
POST | /detect | {text} → detected spans. |
POST | /anonymize | {text, mode, engine?} → {output, map, summary}. |
POST | /deanonymize | {text, map} → original text. |
POST | /anonymize_file | File upload + mode → anonymized content (same format). |
POST | /redact_pdf | PDF → redacted PDF (black-out + metadata wiped). |
POST /anonymizeRequest:
{ "text": "email juan@acme.com, CUIT 20-12345678-6", "mode": "pseudo", "engine": "auto", "rules": null}Response:
{ "engine": "lite", "mode": "pseudo", "output": "email EMAIL_1, CUIT ID_1", "spans": [ { "label": "EMAIL", "start": 6, "end": 19, "text": "juan@acme.com" } ], "map": { "EMAIL_1": "juan@acme.com", "ID_1": "20-12345678-6" }, "reversible": true, "summary": { "EMAIL": 1, "ID": 1 }}The map is only populated for pseudo; reversible reflects that.
POST /deanonymize{ "text": "reply to EMAIL_1", "map": { "EMAIL_1": "juan@acme.com" } }→ { "output": "reply to juan@acme.com" }. A missing or empty map returns
422.
/anonymizeCalling POST /anonymize without a mode returns the legacy contract used by
the embedded Anonimal — {text, detected_spans, redacted_text, summary} with a
placeholder per span. This lets Escriba and Fisherboy point their ANONIMAL_URL
at the new service without changing a line of code.
GET /healthReturns status, the default engine and mode, and an ml block with
available, ready and error. Used by the container healthcheck.
401 (missing/invalid token or session), 413 (text or PDF over the size cap),
422 (invalid mode / missing map / invalid rules_json), 503 (ML engine or PDF
support unavailable).
Anonimal accepts two independent credentials on the API:
ANONIMAL_TOKEN. Every request must then carry it,
either as Authorization: Bearer <token> or as the X-Anonimal-Token header.
This is how Escriba and Fisherboy authenticate over the internal network.ANONIMAL_AUTH_ENABLED=true, a signed cookie from
the /login page also satisfies the API gate (for the web UI).If neither is configured, the API is open (it assumes localhost). /health
is always reachable for healthchecks.
Anonimal is the single owner of anonymization in the Escriba Suite; the satellites delegate to it.
ANONIMAL_URL set calls Anonimal over HTTP
(full ML coverage), authenticating with X-Anonimal-Token.ANONIMAL_URL, a product falls back to the
bundled anonimal_lite (regex only, pure stdlib), so it can still anonymize
standalone.pip install "anonimal-lite @ git+https://github.com/diegoparras/anonimal.git@v0.4.0"from anonimal_lite import LiteEngine, Anonymizer, deanonymize
eng = LiteEngine()out = Anonymizer("pseudo").process(text, eng.detect(text))There are two flows into Anonimal: a human path (Extracta/Fisherboy hand off
to Escriba via sessionStorage['escriba.handoff'], and Escriba’s “Anonymize”
button calls the API) and an automatic path (an unattended worker calls the
API directly). Either way, Anonimal stays the only place anonymization happens.
/detect, /anonymize (field rules) and /anonymize_file (rules_json)
accept a rules object: always (always hide), never (never hide) and
patterns ({regex, placeholder}). Patterns are a superset of Escriba’s rules,
with optional RE2 to guard against ReDoS.