AUTOMATION

Drive a browser like an agent.

Open a persistent session and steer it in plain English — navigate, act, observe, extract — or send a deterministic JSON action macro when you already know the page. Multi-step journeys across login walls and dynamic apps, without a single line of Playwright.

Start free Read the docs

For single-page extraction see the scraping API; for raw headless-session control see the browser API.

Natural-language actions

Tell the session what to do in plain English — goto, act, observe, extract. The model resolves each instruction to a concrete element and action (click, type, select, scroll), so a copy change on the target site does not break your script. No selectors to babysit.

Persistent agent sessions

Open a session once and drive it across many calls. Cookies, storage, and the logged-in context stay alive between steps, so multi-page journeys — log in, navigate, act, extract — run as one coherent flow instead of disconnected requests.

JSON action macros too

When you already know the page, skip the model and send a deterministic JSON array of click / type / wait / scroll steps to the scraping endpoint. Same hosted browser, fully reproducible, zero per-step model latency. Mix and match per task.

No Playwright fleet to run

You write instructions, not browser code. There is no headless Chrome to deploy, no Puppeteer or Playwright scripts to maintain, no stealth tuning — the hosted browser handles fingerprinting and rendering. Bring a managed model (no key required), or bring your own.

Drive it four ways.

Open a session, act in plain English, observe, extract — or fall back to deterministic macros and raw session control.

Open a sessionPOST /v1/stagehand
# Open a persistent, LLM-driven browser session.
# Free to open — you only pay per action (goto/act/observe/extract).
curl -X POST https://api.ollagraph.com/v1/stagehand \
  -H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'
# -> { "stagehand_session_id": "sh_...", "llm_provider": "...",
#      "model_name": "...", "expires_at": "..." }
# Bring your own model instead (held in memory for the session only):
#   {"llm_provider": "openai", "model_name": "...", "llm_api_key": "sk-..."}
Navigate & actPOST /v1/stagehand/{session_id}/act
# Drive the session in plain English — no selectors to maintain.
SID=sh_...
# 1) Navigate
curl -X POST https://api.ollagraph.com/v1/stagehand/$SID/goto \
  -H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
  -d '{"url": "https://example.com/search"}'

# 2) Act — the model resolves the instruction to a real element + action
curl -X POST https://api.ollagraph.com/v1/stagehand/$SID/act \
  -H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
  -d '{"instruction": "type the phrase wireless headphones into the search box and press Enter"}'
# -> { "ok": true, "success": true, "message": "<selector used>", "url_after": "..." }
Observe & extractPOST /v1/stagehand/{session_id}/extract
# Observe what is actionable, then extract structured data.
SID=sh_...
# List candidate actions on the current page (optionally focus with an instruction)
curl -X POST https://api.ollagraph.com/v1/stagehand/$SID/observe \
  -H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
  -d '{"instruction": "find the add-to-cart buttons"}'
# -> { "ok": true, "candidates": [{"description": "...", "method": "click", "selector": "..."}] }

# Pull structured data, coerced to your JSON Schema
curl -X POST https://api.ollagraph.com/v1/stagehand/$SID/extract \
  -H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
  -d '{
    "instruction": "extract the product name and price",
    "schema": {"type": "object", "properties": {
      "name": {"type": "string"}, "price": {"type": "number"}}}
  }'
Deterministic JSON macroPOST /v1/scrape
# Prefer deterministic JSON macros? Drive a multi-step flow in one shot.
curl -X POST https://api.ollagraph.com/v1/scrape \
  -H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
  -d '{
    "url": "https://example.com/login",
    "stealth": true,
    "format": "markdown",
    "actions": [
      {"type": "type", "selector": "#email", "text": "[email protected]"},
      {"type": "type", "selector": "#password", "text": "..."},
      {"type": "click", "selector": "button[type=submit]"},
      {"type": "wait", "ms": 2500},
      {"type": "click", "selector": ".load-more"},
      {"type": "wait", "ms": 1500}
    ]
  }'
Raw persistent sessionPOST /v1/session/{session_id}/render
# Raw persistent session: render + run a script across calls, keeping cookies.
SID=$(curl -s -X POST https://api.ollagraph.com/v1/session \
  -H "Authorization: Bearer $OLLAGRAPH_API_KEY" | jq -r .session_id)

curl -X POST https://api.ollagraph.com/v1/session/$SID/render \
  -H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
  -d '{"url": "https://example.com/dashboard"}'

# Reuse the same authenticated context as many times as you need, then close it.
curl -X DELETE https://api.ollagraph.com/v1/session/$SID \
  -H "Authorization: Bearer $OLLAGRAPH_API_KEY"

Flows that ship today.

Patterns customers run in production — each is a handful of readable calls.

Search, then extract results

goto the search page, act to type a query and submit, observe the result cards, then extract a structured list — name, price, URL — coerced to your JSON Schema. One session, four readable calls.

Multi-step authenticated journey

Open a session, act through the login form, navigate to the protected page, and extract — all on one persistent context. The cookies set during login carry forward to every later step.

Deterministic macro for known pages

When the DOM is stable, send a JSON action macro to the scraping endpoint: type credentials, click submit, wait, click load-more, return markdown. No model in the loop, fully reproducible.

Walk an infinite-scroll feed

Loop act instructions like “scroll down and load more results” until the feed stops growing, then extract everything rendered so far. The session keeps the full page state between scrolls.

Discover before you act

Not sure what is on the page? Call observe to list every actionable element with a description, method, and selector — then decide which act instructions to issue next. Great for building resilient agent loops.

Raw persistent browser control

Need lower-level control? Open a session, render URLs and run scripts inside it across calls while cookies persist, then delete it when done. See the browser API for the full session surface.

Working through a login wall? See the behind-a-login recipe, browse all recipes, or wire results to your stack with webhooks.

Automation questions

What is the difference between this, the scraping API, and the browser API?

This automation page is about multi-step, agent-style flows: a persistent session you drive with natural-language instructions (goto, act, observe, extract) or deterministic JSON action macros. The scraping API is single-shot — send a URL, optionally a JSON action macro, get clean data back in one call. The browser API exposes the raw persistent-session surface: open a session, render and run scripts inside it across calls, then close it. Use automation when the task is a journey across several pages; use scraping when it is one page; use the browser API when you want low-level session control.

Do I need to write CSS selectors?

No. With act and extract you describe what you want in plain English and the model resolves it to a precise element and action under the hood. The response from act even returns the selector it used, so you can inspect or pin it later. If you prefer to be explicit, the JSON action macro on the scraping endpoint takes selectors directly.

Which model powers the natural-language actions?

By default a managed model handles act, observe, and extract with no key required from you. If you would rather use your own provider, pass your provider, model name, and key when you open the session, or point at any OpenAI-compatible base URL. Bring-your-own keys are held in memory for the life of the session only and are never logged or persisted.

Are sessions persistent across calls?

Yes. Opening a session is free; you are billed per action (goto, act, observe, extract, screenshot). The session keeps cookies, storage, and the logged-in context alive between calls, so a login on step one carries through to an extract on step five. Each session has an idle timeout that you can set when you open it, and a keepalive call resets the timer.

How do I keep a session from being reaped?

Set an idle timeout when you open the session, and send a keepalive call to reset the idle timer during long pauses. When you are finished, delete the session to free its browser context immediately rather than waiting for the timeout.

Can I get reproducible, model-free automation?

Yes. Send a JSON action macro to the scraping endpoint: an ordered array of click, type, wait, and scroll steps that runs exactly the same way every time, with no model in the loop and no per-step inference latency. Use natural-language actions when the page changes often; use macros when the DOM is stable and you want determinism.

What does observe return, and when should I use it?

Observe lists the actionable elements on the current page — each with a human-readable description, the method (such as click or type), and a selector. It is the discovery step in an agent loop: observe first to see what is possible, then issue the act instructions that make sense. You can pass an optional instruction to focus the search, or omit it to list everything.

Can I extract data against a fixed shape?

Yes. Extract takes a plain-English instruction and an optional JSON Schema. When you pass a schema, the result is coerced and validated against it, so you get typed fields like name and price instead of free-form text. Omit the schema for quick free-form extraction.

How is this billed?

Opening a session is free. Each goto, act, observe, extract, and screenshot counts as a single metered action; keepalive and closing the session are free. JSON action macros on the scraping endpoint are billed as the scrape call that carries them. Failed calls are auto-refunded, so a timeout or a missed element never costs you. See pricing for the current credit packs and the free monthly grant.

Skip the headless-browser fleet.

1,000 free credits, one bearer token, failed calls auto-refund. Sessions are free to open — you only pay per action.

Start free See all 145 endpoints