Queue a handful of URLs or a hundred thousand. Walk away. Your endpoint receives a single signed POST when the work completes — no polling, no held connections, no retry plumbing on your side.
Webhooks deliver the results of async jobs from the scraping API. Full reference lives in the API docs.
Queue a job of any size — a handful of URLs or a hundred thousand — and walk away. Your server gets a single clean POST when the work is done. No long-held connections, no polling loops, no retry plumbing on your side.
We sign every delivery with an HMAC-SHA256 over the raw body, keyed by your account's webhook secret. Your endpoint verifies the signature before trusting the payload. Forgery becomes infeasible and replay attacks become detectable through the timestamp.
Each delivery carries the job_id. Treat it as an idempotency key: record it before processing and ignore a repeat. Because a retried delivery is byte-identical to the first, your handler can dedupe safely without losing data.
Send a single signed test payload to your endpoint with the test-webhook call. It uses the exact production signing scheme and retry policy, so you confirm your receiver works before any real job depends on it.
Pass a webhook_url to any async endpoint and we POST the finished result back to you.
# Queue an async batch and walk away. No held connection.
curl -X POST https://api.ollagraph.com/v1/scrape/batch/async \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://a.example.com", "https://b.example.com"],
"format": "markdown",
"webhook_url": "https://yourapp.com/hooks/ollagraph"
}'
# Response: { "status": "queued", "job_id": "..." }
# We POST the finished result to your webhook_url when the work completes.// What lands on your endpoint when the job finishes.
// Method: POST Content-Type: application/json
// Header: X-Ollagraph-Signature: t=<unix_ts>,v1=<hex_hmac_sha256>
{
"job_id": "job_3f9a...",
"status": "completed",
"result": {
// The same block you would get from the sync endpoint:
// markdown / html / text / links, plus per-URL status.
}
}
// The result block mirrors the corresponding sync response, so your
// handler parses it exactly like a direct /v1/scrape/batch call.Recompute the HMAC over the raw bytes and compare in constant time. Never trust an unverified body.
import crypto from 'crypto';
import express from 'express';
const app = express();
app.use(express.raw({ type: 'application/json' })); // raw bytes for HMAC
app.post('/hooks/ollagraph', (req, res) => {
const header = req.headers['x-ollagraph-signature'] || '';
// Header format: "t=<unix_ts>,v1=<hex_hmac_sha256>"
const parts = Object.fromEntries(header.split(',').map(p => p.split('=')));
const ts = parts.t, sig = parts.v1;
// Reject anything older than five minutes (defeats replay).
if (!ts || Math.abs(Date.now() / 1000 - Number(ts)) > 300) {
return res.status(401).send('stale or missing timestamp');
}
// Signed body is: <ts>.<exact_json_bytes>
const expected = crypto
.createHmac('sha256', process.env.OLLAGRAPH_WEBHOOK_SECRET)
.update(`${ts}.`)
.update(req.body) // raw bytes, not re-serialized JSON
.digest('hex');
if (!sig || !crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(expected))) {
return res.status(401).send('invalid signature');
}
// Acknowledge fast, then process out of band — see the idempotency note.
const { job_id, result } = JSON.parse(req.body);
res.sendStatus(200);
});import hmac, hashlib, os, time
from flask import Flask, request, abort
app = Flask(__name__)
SECRET = os.environ["OLLAGRAPH_WEBHOOK_SECRET"].encode()
@app.post("/hooks/ollagraph")
def callback():
header = request.headers.get("x-ollagraph-signature", "")
# Header format: "t=<unix_ts>,v1=<hex_hmac_sha256>"
parts = dict(p.split("=", 1) for p in header.split(",") if "=" in p)
ts, sig = parts.get("t"), parts.get("v1")
if not ts or abs(time.time() - int(ts)) > 300:
abort(401)
body = request.get_data() # raw bytes
# Signed body is: <ts>.<exact_json_bytes>
expected = hmac.new(SECRET, f"{ts}.".encode() + body, hashlib.sha256).hexdigest()
if not sig or not hmac.compare_digest(sig, expected):
abort(401)
payload = request.get_json()
# process payload["result"] ... then return fast
return "", 200Full-site crawls deliver one webhook when done. Or skip webhooks entirely and read the job directly.
# Crawl an entire site, get a single signed POST when the crawl finishes.
curl -X POST https://api.ollagraph.com/v1/crawl \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-d '{
"url": "https://docs.example.com",
"max_pages": 500,
"webhook_url": "https://yourapp.com/hooks/ollagraph"
}'
# Prefer to poll instead? Omit webhook_url and read GET /v1/jobs/{job_id}.See the broader async surface on the scraping API and the crawl API.
Three async endpoints accept a webhook_url: /v1/scrape/async for a single page, /v1/scrape/batch/async for a list of URLs, and /v1/crawl for a full site. Each returns a job_id immediately and POSTs the finished result to your webhook_url when the work completes.
A POST with a JSON body containing the job_id and a result block. The result mirrors what you would get from the corresponding synchronous endpoint, so your handler parses the markdown, html, text, or links exactly as it would a direct call. The request also carries the signature header used for verification.
Each request carries an X-Ollagraph-Signature header in the form t=<unix_ts>,v1=<hex>. The v1 value is an HMAC-SHA256 of the bytes <ts>.<raw_body> keyed by your account's webhook secret. The verification samples above show the full algorithm. Compare in constant time and reject timestamps older than five minutes to defeat replay attacks.
Call GET /v1/me with your API key and read the webhook_secret field. Rotate it at any time via POST /v1/me/webhook-secret/rotate. The previous secret stops working immediately, so drain in-flight jobs before rotating or their deliveries will fail verification.
Call POST /v1/me/webhooks/test with your webhook_url. We send a single signed payload using the exact production signing scheme, retry behavior, and timeout, so the result reflects what a real delivery looks like. Use it to confirm your receiver verifies the signature correctly before you queue real jobs.
If your endpoint returns a non-2xx response or the connection errors, we retry with backoff. Because a retried delivery is byte-identical to the original, the job_id is your idempotency key: record it on first receipt and skip duplicates. Acknowledge quickly with a 200 and do heavy processing out of band so a slow handler does not trigger an unnecessary retry.
No. Webhook URLs must be HTTPS. Plain HTTP endpoints are rejected at job-creation time. This protects the payload and the signature in transit.
Omit webhook_url and poll instead. Every async job is readable at GET /v1/jobs/{job_id}, which returns the current status and, once finished, the same result block a webhook would deliver. Webhooks are recommended for production because they remove the constant polling overhead, but polling is always available as a fallback.
1,000 free credits, one bearer token, signed webhooks on every plan including the free tier.