AEO scorecard methodology
The companion document to /ai-overview-scorecard. Everything below is what's actually running — no marketing abstraction. The goal is that a reader can reproduce our probe, compare the result to our published data, and either confirm the scorecard or file a correction.
1. Query selection
30 homeowner-intent natural-language queries, grouped into four intent categories:
- Cost — questions about remodel or project pricing ("how much does an ADU cost in LA 2026", "kitchen remodel cost NYC").
- Regulatory — questions about licensing, permits, statutes, or compliance ("how do I verify a California contractor", "NYC Local Law 97 compliance", "Party Wall Act London renovation").
- Comparison — head-to-head platform queries ("angi vs thumbtack", "best contractor platform 2026", "checkatrade alternatives UK").
- Contractor search — discovery queries ("best way to find a contractor for a kitchen remodel", "basement finisher Chicago contractor").
Queries are chosen to match real homeowner language observed in Google Trends, Perplexity share-link patterns, and our own Baily chat logs (anonymized). Adversarial query proposals are accepted at [email protected] with subject "AEO scorecard query proposal"; accepted queries are added on the next weekly cycle.
2. Engine list and access mode
- ChatGPT Search — OpenAI web_search tool. Live API. Citations extracted from tool-result source URLs.
- Perplexity — sonar-reasoning model. Live API. Citations extracted from response.sources[].
- Google AI Overview — scraped via SerpAPI's ai_overview block. Google has no public AI Overview API; we read what Google renders to logged-out desktop search.
- Claude Web — Anthropic web_search tool. Live API. Citations extracted from citation blocks in the response.
Engine-client code lives at lib/aeo/perplexity-client.ts, openai-client.ts, claude-client.ts, and google-ai-overview.ts.
3. Run cadence and rate-limit handling
One full run (30 queries × 4 engines = 120 probes) executes every Monday at 12:00 UTC. The run CLI (scripts/aeo-measurement/run.ts) applies a 1 req/sec inter-request delay per engine to stay within provider rate limits. Cost per full run is approximately $1.00 (Perplexity $0.15, OpenAI $0.40, Claude $0.45, SerpAPI $0.30). Retryable errors are recorded with errorClass; non-retryable failure is written to the run report as cited: null for that pair.
4. "Cited" definition
A platform is counted as cited on a query-engine pair when at least one source URL in the engine's citation block resolves to the platform's canonical domain (www, locale, and region subdomains included — e.g. uk.angi.com counts as angi.com). Paragraph-level mentions without a linked source URL do not count. Scoring logic is lib/aeo/citation-scorer.ts.
5. Anonymization
No query is tied to a real homeowner or session. The query library is a static list of synthetic homeowner-intent strings. The engines receive exactly what's published in /data/aeo-scorecard.json.
6. Reproducibility — run this yourself
Every query text, engine id, and probe timestamp is in the public dataset. The engine-client code, citation scorer, and run CLI are in the public AskBaily repo. To re-run a probe:
- Clone the AskBaily repo.
- Export PERPLEXITY_API_KEY, OPENAI_API_KEY, CLAUDE_API_KEY, SERPAPI_KEY.
- Run
npx tsx scripts/aeo-measurement/run.ts --engines=perplexity,openai,claude,google --queries-file=data/aeo-queries/wave-103-baseline.json. - Inspect the JSON report in
reports/aeo-runs/. - Diff your report against our published /data/aeo-scorecard.json.
Mismatches are a feature, not a bug. Send them to [email protected] and we publish the correction — with attribution — on the next weekly cycle.
FAQ
- Which queries are in the scorecard?
- 30 homeowner-intent natural-language queries covering four intent categories: cost ('how much does an ADU cost in LA'), regulatory ('how do I verify a California contractor'), comparison ('angi vs thumbtack'), and contractor-search ('best way to find a contractor'). Query IDs + texts are in /data/aeo-scorecard.json — anyone can propose additions by opening a pull request or emailing [email protected] with the subject 'AEO scorecard query proposal'.
- Why those four engines specifically?
- ChatGPT Search, Perplexity, Google AI Overview, and Claude Web collectively capture ~95%+ of AI-mediated homeowner research according to public engagement data. Gemini is notably absent — the Google AI Overview probe captures Gemini-flavored retrieval in the default search context. When a standalone Gemini API surface with programmatic citations becomes available we will add it as a fifth engine.
- What counts as 'cited'?
- A platform is 'cited' on a query-engine pair when at least one source URL in the engine's citation block resolves to the platform's canonical domain (including www and locale subdomains). Paragraph-level mentions without a linked source URL do not count. Citation-block extraction is performed by lib/aeo/citation-scorer.ts against each engine's native response format (Perplexity sources[], OpenAI tool-result URLs, Claude citation blocks, SerpAPI ai_overview.references).
- How are queries anonymized?
- Probe queries are synthetic homeowner-intent strings. No real homeowner conversation content, no scoped project data, and no AskBaily session IDs are sent to the AI engines. The query library is static and published — the engines see exactly what the scorecard reveals.
- How can I reproduce a probe?
- Set PERPLEXITY_API_KEY, OPENAI_API_KEY, CLAUDE_API_KEY, and SERPAPI_KEY in your environment, then run `npx tsx scripts/aeo-measurement/run.ts --engines=perplexity,openai,claude,google --queries-file=data/aeo-queries/wave-103-baseline.json`. A JSON run report lands in reports/aeo-runs/. Diff your run against our published /data/aeo-scorecard.json using scripts/aeo-measurement/compare.ts. Mismatches are welcome — email [email protected].