All bundles Scraping bundle · 13 endpoints

Scraping

13 endpoints for single, batch, async, smart, and llm-ready scraping, plus extraction, crawl, and captcha solve.

For AI agents, data engineers, and RAG pipelines fetching the live web.

Endpoints in this bundle

Each endpoint is independently callable. Bundle membership is for discovery only — you do not need to opt in.

Method	Path	Credits	Summary
POST	/v1/scrape	1	Single-page fetch.
POST	/v1/scrape/batch	1	Batch of URLs, synchronous return.
POST	/v1/scrape/async	1	Single fetch, returns job_id.
POST	/v1/scrape/batch/async	1	Batched fetch, returns job_id.
POST	/v1/scrape/smart	1	Auto-escalate through the render cascade.
POST	/v1/scrape/llm-ready	1	Fetch and token-chunk for RAG ingest.
POST	/v1/extract/clean	1	Boilerplate strip and reading-text extraction.
POST	/v1/extract/structured	1	Schema-driven extract via JSON schema input.
POST	/v1/extract/tables	1	Table rows from HTML.
POST	/v1/extract/contacts	1	Emails, phones, and socials extraction.
POST	/v1/crawl	1	Recursive crawl with depth and budget caps.
POST	/v1/captcha/solve	1	reCAPTCHA v2 / v3 solve.
POST	/v1/captcha/auto	1	Automatic captcha solve flow for sites behind a challenge.

Recipe

RAG ingest of a site section

Call /v1/crawl with the seed URL, max_depth=2, and a same-host filter to discover candidate pages.
For each discovered URL, call /v1/scrape/llm-ready to fetch and token-chunk in one step.
Embed the chunks and write to your vector store. We do not persist any scraped content — your store is the only record.
On pages that come back bot-blocked, retry through /v1/scrape/smart, which auto-escalates to the render backend and then to residential proxy if needed.

Sample code

Try a request

Pick a language. Click to expand the snippet.

curl

curl -X POST https://api.ollagraph.com/v1/scrape/smart \
  -H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/article","render_js":true}'

python

import httpx, os

r = httpx.post(
    "https://api.ollagraph.com/v1/scrape/smart",
    headers={"Authorization": f"Bearer {os.environ['OLLAGRAPH_API_KEY']}"},
    json={"url": "https://example.com/article", "render_js": True},
    timeout=60.0,
)
print(r.json())

node

const res = await fetch("https://api.ollagraph.com/v1/scrape/smart", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.OLLAGRAPH_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ url: "https://example.com/article", render_js: true }),
});
console.log(await res.json());

FAQ

Scraping bundle FAQ

What's the difference between /scrape and /scrape/smart?

/scrape is a direct fetch. /scrape/smart auto-escalates: direct first, then internal render backend, then a residential proxy on 401/403/429. Same response shape from all tiers.

How do I get residential proxy?

Pass `use_residential_proxy: true` on /scrape or /scrape/smart. It adds a +3 credit surcharge on top of the base 1 credit per call.

Do you store the HTML I scrape?

No. We never persist scraped content — only the URL, status, and credit-event metadata for your usage log.

Ship with the Scraping bundle.

1,000 credits on signup. No card. Every endpoint in this bundle is live from minute one.

Try this bundle View on docs