What auditing a site for AI search actually means in 2026
Auditing your site for AI search means checking whether answer engines — ChatGPT, Perplexity, Google AI Overviews, and Claude — can complete four steps on each of your pages: crawl it without being blocked, fetch the real content without a browser, parse the structure cleanly, and trust the page enough to cite it. If any of those four steps breaks, your page can rank fine in classic search and still be invisible in AI answers.
That sentence is the AEO answer. The rest of this page is the working recipe: a pipeline that walks all four steps end to end, with real curl commands against the Ollagraph AEO toolkit, so you can see exactly what is blocking AI crawlers and fix it in priority order.
The problem you are actually trying to solve
You don't want an AEO score. You want to be the page the model quotes. There is a difference, and the difference is a chain of small failures that classic SEO tools were never built to catch.
The reader of this page usually falls into one of three buckets. A content or SEO lead watching organic clicks erode as AI Overviews answer the query in-page. A founder or marketer who keeps seeing a competitor cited in ChatGPT and wants to know why it isn't them. Or an agency running AEO as a service across a roster of client sites and needing a repeatable, defensible audit rather than a vibe check.
All three have the same hidden requirement: the audit has to be mechanical and reproducible. "Make it more AI-friendly" is not a deliverable. A ranked list of concrete blockers — this crawler is blocked in robots.txt, this content is JavaScript-only, this page has no last-updated date, these headings skip a level — is a deliverable. That list is what this recipe produces.
The four-stage AEO pipeline
An answer engine has to succeed at four sequential stages before it can cite you. We'll audit them in order, because a failure at an early stage makes the later ones moot. No point scoring your schema if the crawler is blocked at the door.
Stage 1 — Crawl: is the bot even allowed in?
The cheapest reason to be absent from AI answers is the most common: your robots.txt blocks the crawler, often by accident, often inherited from a template someone copied years ago. Start the audit here.
The AI bot allowlist endpoint checks, for each of the major named AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and others — whether your robots.txt currently allows or blocks it. It takes a bare domain.
export OLLAGRAPH_API_KEY="osk_xxxxxxxxxxxx"
curl -X POST https://api.ollagraph.com/v1/aeo/ai-bot-allowlist \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"domain": "ollagraph.com"
}'
The response is a per-crawler verdict: which named AI bots your site lets in and which it turns away. Blocking a crawler is a legitimate choice — protecting proprietary content is a real reason — but it should be a deliberate one. The audit's job is to make sure no crawler you want is blocked by accident.
While you're inspecting crawl-time signals, audit your llms.txt. This is the emerging convention of a plain-text file at your domain root that points AI systems to your most citable pages — a sitemap written for language models. The llms-txt audit endpoint fetches the file, validates its structure, scores its section coverage, and checks its links for rot.
curl -X POST https://api.ollagraph.com/v1/aeo/llms-txt-audit \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"domain": "ollagraph.com"
}'
If the file doesn't exist yet, the audit tells you so. It isn't a hard ranking factor today, but it is cheap to maintain and signals intent — worth having once the rest of the pipeline is clean.
Stage 2 — Fetch: does the bot get your real content?
This is where most modern sites silently lose. A page built on a client-rendered framework looks perfect in a browser — but many AI crawlers fetch raw HTML and never run your JavaScript. What renders for a human can be an empty shell to a bot, and you'd never know from looking at the page.
The LLM fetch simulator fetches one URL as each of eleven named AI crawlers, plus a browser baseline, and reports what each one actually received — status code, content length, a preview of the visible text — and flags two failure modes that matter enormously for AEO: cloaking (serving bots something different from humans) and JS-only content (where the bot's payload is empty because the real content never executed).
curl -X POST https://api.ollagraph.com/v1/aeo/llm-fetch-simulator \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://ollagraph.com/"
}'
Read the result by comparing each crawler's content length against the browser baseline. If the browser sees ten thousand characters of text and a given AI crawler sees two hundred, you have found your problem: that crawler is getting an unhydrated shell, and any content you care about needs to be present in the server-rendered HTML. This single probe is often the difference between a site that AI can cite and one it can't.
Stage 3 — Parse: can the bot understand the structure?
Once a crawler has your real content, it has to make sense of it. Answer engines reward extractable structure — machine-readable schema, a clean heading outline, and content shaped the way snippets are shaped. Three probes cover this stage.
First, schema coverage. This endpoint parses every JSON-LD block, plus Microdata and OpenGraph, on a page and scores it against the schema.org types that matter most for AEO. If your content is hydrated client-side, pass use_residential_proxy so the fetch renders JavaScript and sees schema that only appears after hydration.
curl -X POST https://api.ollagraph.com/v1/aeo/schema-coverage \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://ollagraph.com/architecture/"
}'
Second, heading hierarchy. AI Overviews and featured snippets lean heavily on a page's heading outline, and they especially favor question-style headings that mirror how people phrase queries. This probe audits your H1–H6 structure, surfacing a missing H1, multiple H1s, hierarchy skips like an H2 jumping straight to an H4, and a count of question-style headings.
curl -X POST https://api.ollagraph.com/v1/aeo/heading-hierarchy-score \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://ollagraph.com/capabilities"
}'
Third, snippet-format detection. Google pulls featured snippets in four shapes — paragraph, list, table, and definition — and a page that offers content in the right shape for its query is far likelier to be lifted. This endpoint classifies your content into those four shapes, counts each, gives example snippets, and predicts the best-fit format.
curl -X POST https://api.ollagraph.com/v1/aeo/snippet-format-detect \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://ollagraph.com/docs"
}'
If you're auditing a page you're still drafting and don't want to deploy it first, both the heading and snippet probes also accept raw HTML in place of a URL — handy for wiring an AEO check into a CI step before a page ever goes live.
Stage 4 — Cite: does the bot trust the page enough to quote it?
The final stage is the hardest to fake and the most durable to earn. Answer engines preferentially cite pages that carry EEAT-style trust signals: specific numbers rather than vague claims, named entities, authoritative outbound links, a visible author byline, a last-updated date, and enough depth to be substantive. Two probes cover this.
Citation readiness scores a page mechanically — no model call, so it's fast and deterministic — against exactly those post-ChatGPT trust signals.
curl -X POST https://api.ollagraph.com/v1/aeo/citation-readiness \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://ollagraph.com/architecture/"
}'
Freshness signal is the trust dimension teams most often neglect. Answer engines weight recency, and a page that looks abandoned gets passed over. This probe reads every dated signal on the page — schema.org dateModified, the OpenGraph modified-time tag, the HTTP Last-Modified header, any visible "Updated" text, and the copyright year — scores the page's overall freshness, names the most recent dated signal, and flags inconsistencies between them. A page claiming a 2026 update in its byline while its schema says 2023 sends a confused signal, and this catches it.
curl -X POST https://api.ollagraph.com/v1/aeo/freshness-signal \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://ollagraph.com/blog/"
}'
The one-call shortcut: full page audit
Running eight probes by hand is the right way to understand the pipeline, but in production you'll reach for the orchestrator. The page-audit endpoint runs the component probes in parallel against a single URL and returns one consolidated report: a headline score, category breakdowns, the top issues it found, and a ranked list of recommendations.
curl -X POST https://api.ollagraph.com/v1/aeo/page-audit \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://ollagraph.com/architecture/"
}'
This is a premium orchestrator — it fans out many probes at once — so it costs a few credits per call rather than the inexpensive single-signal rate. If the fetch fails for any reason, the call is automatically refunded, so a flaky target page never costs you anything. The output is the deliverable: a scored report of what's blocking AI from crawling, fetching, parsing, and citing the page, ordered so you fix the highest-impact problems first. Use the individual probes from the four stages above when you want to drill into one dimension; use page-audit when you want the whole picture in one call.
Benchmarking against the pages that beat you
An absolute score is useful; a relative one is persuasive. The most common question in an AEO engagement isn't "how good is my page" but "why does that competitor keep getting cited instead of me." The competitor-diff endpoint answers it directly.
It accepts two to five URLs. The first is treated as your page; the rest are competitors. It runs a full audit on each in parallel and returns head-to-head rankings plus exactly where your page wins and loses against the set.
curl -X POST https://api.ollagraph.com/v1/aeo/competitor-diff \
-H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://ollagraph.com/docs/intel-geoip/",
"https://competitor-one.example.com/feature",
"https://competitor-two.example.com/feature"
]
}'
Like page-audit, this is a premium orchestrator billed at a few credits regardless of how many URLs you compare, with the same automatic refund on failure. The output reframes the audit as a gap analysis: not "your schema score is 60" but "the page outranking you has FAQ schema and a visible last-updated date that you're missing." That is the slide that gets a content sprint approved.
A realistic scenario
Consider a B2B SaaS company whose documentation used to win the long-tail "how do I do X with Y" queries and is now watching those answers get synthesized into AI Overviews — with a competitor cited as the source. Organic clicks to the docs are down, and nobody can say precisely why.
The team runs page-audit on their ten most important doc pages and the fetch simulator on the docs homepage. The simulator is the smoking gun: their docs are built on a client-rendered framework, and three of the named AI crawlers receive a near-empty shell because the content never hydrates without JavaScript. The browser baseline sees the full article; the bots see almost nothing.
They ship server-side rendering for the doc routes, add Article and FAQ schema flagged by the schema-coverage probe, and rewrite a dozen section headings into the question form the heading-hierarchy probe rewards. Two weeks later, competitor-diff against the pages that had been beating them shows the gap closed on every signal except domain authority. The audit didn't just produce a number — it produced an ordered punch list, and the punch list produced citations.
Wiring the audit into your workflow
A one-time audit is a snapshot; AEO is a moving target. Three patterns turn this recipe into an ongoing practice.
First, audit in CI. Because the heading and snippet probes accept raw HTML, you can run an AEO check on a page's built output before it ever deploys, failing the build if a page drops below a heading-hierarchy threshold. Catching a skipped H-level in a pull request is cheaper than catching it in a quarterly review.
Second, schedule a recurring sweep. Run page-audit across your top pages on a cadence — monthly as a default, weekly if you publish often — and trip an alert when a score regresses. The freshness probe makes this especially valuable: it surfaces pages that have quietly gone stale and need a refresh to stay citable. You can fan these scheduled jobs out through webhooks so results land in your own systems.
Third, pair AEO with classic technical SEO. The two disciplines overlap but don't replace each other — render budget, canonical tags, and internal linking still matter for the blue links even as AEO governs the answers. Run both audits on the same pages and reconcile the findings. Our writeup on AEO versus SEO lays out where the two diverge and where they reinforce each other.
Why run this on Ollagraph
The signals in this recipe aren't secret — a determined engineer could script most of them. The reason teams reach for the AEO toolkit instead is the same reason they reach for any managed API: every probe is a single call behind one bearer token, the crawler-simulation and rendering tiers are handled server-side so you don't stand up a browser farm to see what GPTBot sees, and the pricing is transparent with failed calls auto-refunded. Where a generic crawler hands you HTML and asks you to judge it, these endpoints hand you a verdict.
For agencies, that packaging is the product. A reproducible, per-signal audit you can run across a client roster — and re-run to prove the lift after a content sprint — is what turns AEO from a buzzword into a billable, defensible service. That workflow is the focus of the AEO agencies guide.
What to do next
Sign up for a key, paste the page-audit command above against one of your real URLs, and read the ranked recommendations it returns. Then walk the four stages in order on your single most important page — crawl, fetch, parse, cite — and fix the earliest failure first. Most teams find their biggest blocker in the fetch simulator within the first ten minutes.
Read the docs, explore the full AEO toolkit, and browse the other recipes when you're ready to ship.