Question 1

What does /v1/crawl actually do?

Accepted Answer

It walks the link graph of a site starting from one seed URL, returns clean content for every page it discovers, and respects depth, page-budget, concurrency, and robots controls along the way. It is asynchronous: the call returns a job_id immediately and the result arrives by webhook or by polling the jobs endpoint.

Question 2

How big can a crawl be?

Accepted Answer

The defaults are 500 pages and depth 3. You can raise both — production customers regularly crawl tens of thousands of pages per job. For very large or scheduled crawls, talk to us about enterprise capacity on the enterprise page.

Question 3

Does the crawler respect robots.txt?

Accepted Answer

Yes, by default. Each crawl honors the robots.txt of the target site. You can override that with respect_robots set to false, but only do so where you have explicit permission to crawl, such as your own site or a partner's. Check what a site declares first with the robots endpoint.

Question 4

How do I seed a crawl from a sitemap?

Accepted Answer

Call the sitemap endpoint to pull the site's declared URLs, then start the crawl from the seed URL. Reading the sitemap and robots policy up front lets you scope the run to the paths you actually want rather than discovering them by depth-first link following.

Question 5

How does webhook delivery work?

Accepted Answer

Provide a webhook_url with the request. When the job completes, we send a single POST to your URL with the full result body and the job id. The job runs detached so your application is never holding a long-lived connection open while the crawl runs.

Question 6

What if I do not want to set up a webhook?

Accepted Answer

Skip the webhook_url and poll instead. The crawl call returns a job_id; pass it to GET /v1/jobs/{job_id} and you will get the status, and the full result once the status is completed. This is the simplest pattern for a one-off run from a notebook or a terminal.

Question 7

How is a crawl billed?

Accepted Answer

Crawling is metered per call like the rest of the surface — one credit per call, and failed calls are auto-refunded so a crawl that errors out never costs you a credit. New accounts start with 1,000 free credits to prove the value first.

Crawl whole sites. In one job.

Depth and budget caps

Robots-aware by default

Sitemap unrolling

Webhook delivery

Examples that work today.

Crawler questions