API — Document Analyzer

Overview

Every page that renders a scan is backed by a small JSON over HTTP API. The same endpoints the web UI calls are available to scripts: look a sample up by hash, search the corpus, submit a file for scanning, and poll the resulting job. Responses are application/json.

Replace https://analyzer.sh in the examples below with this deployment’s base URL. All endpoints are relative to it.

Authentication & access

Authenticate with a bearer token (API key). Create one under your account → API access; the secret is shown once, so copy it then. Send it on every request as an Authorization header:

Authorization: Bearer anlz_<your-token>

A token carries your account’s full access. You can hold several at once and create, refresh (rotate the secret), or revoke them at any time from the account page. Treat a token like a password; only its hash is stored server-side, so a lost token cannot be recovered — refresh or revoke it instead.

Authenticated bearer: Requests carrying a valid token get the full, un-redacted payload and the higher (signed-in) rate limits. Because a bearer token is an explicit credential a browser never attaches on its own, token requests are exempt from the same-origin / CSRF gate below — this is the path for scripted use.
Anonymous: Most read endpoints also work with no token, but the response is redacted: filenames, embedded URLs, email headers, matched byte snippets, and the internal unknown_exploit_score are stripped, and result sets are capped (see search limits). The unknown: search operator is refused for anonymous callers.
Browser sessions same-origin: The website itself authenticates with a session cookie rather than a token. For those cookie-based requests, state-changing endpoints (file upload, rescan, flag, tagging) require a same-origin call: the Origin/Referer (or Sec-Fetch-Site: same-origin) must match the site host, and mutating /api/ calls also need the X-CSRF-Token header. None of this applies when you authenticate with a bearer token.

Rate limits

Limits are per client IP, as a token bucket. On breach the endpoint returns 429 with a Retry-After header and a JSON {"error": …} body. The expensive read endpoints additionally offer anonymous callers a proof-of-work challenge instead of a hard block.

Default: 20 requests / 60 s — lookups, uploads, rescans.
Job polling: 150 requests / 60 s — /api/scan/job/<job_id>, so you can poll tightly.
Browse: 20 requests / 60 s for anonymous corpus browsing (tag / type / by-hash listings).
Expensive: 20 requests / 60 s + proof-of-work for search, similarity, and URL aggregation.

Look up a scan by hash

The primary read endpoint. Accepts a hex MD5 (32), SHA-1 (40), or SHA-256 (64).

GET /api/scan/<hash>

# anonymous (redacted payload):
curl -s https://analyzer.sh/api/scan/<sha256>

# authenticated (full payload):
curl -s -H 'Authorization: Bearer anlz_<token>' \
     https://analyzer.sh/api/scan/<sha256>

Returns 200 with the scan payload, 404 if the hash has never been scanned, or 400 if the value isn’t a valid hash. The payload mirrors the data shown on the /scan/<sha256> page:

{
  "sha256": "…",
  "file_type": "PDF",
  "verdict": "malicious",        // clean | suspicious | malicious
  "score": 180,                  // summed severity weights; can exceed 100
  "heuristics": [
    {
      "rule_id": "PDF_JAVASCRIPT",
      "name": "…",
      "severity": "high",
      "description": "…"
    }
  ],
  "scan_urls": [ "…" ],   // embedded URLs (authenticated only)
  "first_scanned_at": "2026-06-01T12:00:00Z",
  "size": 51234
}

For a human-readable, server-rendered report use GET /scan/<sha256> instead (MD5/SHA-1 and upper-case hashes 301-redirect to the canonical lower-case SHA-256 URL).

Search the corpus

Run the same query language as the website search box (see the operator reference) and get JSON back.

GET /api/search?q=<query>&offset=0&limit=50

curl -s -H 'Authorization: Bearer anlz_<token>' \
     'https://analyzer.sh/api/search?q=type:PDF+sig:PDF_JAVASCRIPT&limit=50'

q: Required. The query string (URL-encoded), max 512 chars.
limit: Page size, default 50, max 50.
offset: Row offset for pagination.

Anonymous searches are capped at 100 matches total and the response carries truncated / anonymous_limit flags; sign in for full, paginated results. The unknown: operator returns 403 for anonymous callers. This is an expensive endpoint (rate-limited + proof-of-work for anonymous use).

Submit a file for scanning

Upload happens through POST /analyze as multipart/form-data with a file field. Authenticate with a bearer token — that satisfies the endpoint’s anti-CSRF gate, so no Origin/Referer juggling is needed.

POST /analyze    (multipart/form-data; field: file, optional force_rescan=1)

curl -s -H 'Authorization: Bearer anlz_<token>' \
     -F file=@sample.doc \
     https://analyzer.sh/analyze

Already scanned → 200 with the full result payload immediately.
New file → 202 with {"job_id": …, "sha256": …, "state": "queued"}; scanning runs asynchronously.
Errors: 400 (missing / empty / unsupported / oversized file), 429 (too many concurrent scans, with Retry-After), 503 (queue full or backend unavailable).

Maximum upload size is 200 MB. Accepted extensions: .7z, .doc, .docm, .docx, .hta, .htm, .html, .hwp, .hwpx, .jpe, .jpeg, .jpg, .js, .jse, .mht, .mhtml, .pdf, .ppt, .pptm, .pptx, .ps1, .psd, .psm1, .rtf, .vbe, .vbs, .wri, .wsf, .wsh, .xls, .xlsm, .xlsx, .zip. Email (.eml/.msg) is not accepted.

Poll an async scan job

When /analyze returns 202, poll the job until it finishes:

GET /api/scan/job/<job_id>

# while queued:
{ "state": "queued", "position": 3 }

# on failure:
{ "state": "error", "error": "…" }

# when finished: the full result payload, with
{ "state": "done", "sha256": "…", "verdict": "…", … }

job_id is a 32-char hex string. This endpoint has a generous poll limit (150/60 s). Once you have the SHA-256 you can also fetch the result later via GET /api/scan/<sha256>.

Status codes

200: Success; body is the JSON payload.
202: Scan accepted and queued; poll the returned job_id.
400: Malformed request (bad hash, missing/oversized/unsupported file, invalid parameter).
401 / 403: Authentication required, or operation not permitted for your access level (e.g. unknown: while anonymous, or a cross-origin write).
404: No scan found for that hash.
429: Rate limited; honour Retry-After.
503: Scan queue full or a backend is temporarily unavailable.