Page to Markdown or JSON
Clean fit_markdown (Crawl4AI) with a Trafilatura fallback, or structured extraction to a JSON Schema via an LLM.
Any web page, ready for your AI.
Fisherboy is the web-extraction satellite of the Escriba family. Point it at any page and get back clean Markdown or structured JSON — pruned of navigation and boilerplate, anonymized before it leaves, and ready to feed to an LLM. It only fights harder when a site fights back, escalating from a plain HTTP request all the way to a real browser, and it can capture the hidden JSON/XHR that single-page apps already load.
Fisherboy is self-hostable as a single Docker image. It runs standalone with its own web UI, or headless behind Escriba as a REST + MCP service.
curl, n8n, Claude Code or Escriba over REST or MCP.Page to Markdown or JSON
Clean fit_markdown (Crawl4AI) with a Trafilatura fallback, or structured extraction to a JSON Schema via an LLM.
Tiered anti-blocking fetch
Escalates only when blocked: tier 0 static HTTP, tier 1 TLS fingerprint, tier 2 stealth browser, tier 3 real browser. The winning tier is cached per domain.
Hidden API capture
Instead of fighting rendered HTML, watch the XHR/fetch JSON the page already loads — the most reliable way to scrape SPAs and dynamic grids.
Spider & tarantula crawl
Follow internal links into a tree, sweep pagination, and capture each node’s content plus API into a data tree.
Download everything
Files, video (yt-dlp), galleries (gallery-dl) and platform comments — beyond just the page text.
PII anonymization
Three privacy modes — opaque, reversible and direct — bounded by role and fail-closed, with full NER via Anonimal or a built-in regex fallback.
Proxies & cookies
Paste a proxy in any format and test your exit IP; paste cookies or read them from your local browser for pages behind a login.
Role-based access
Three levels — dios / angel / humano — each with its own password and capability limits, enforced on REST and MCP.
REST + MCP
Drive it from curl, n8n, Claude Code or Escriba. The same pipeline is exposed as MCP tools.