The public AEO citation scorecard we built

By AskBaily Editorial · Published · 4 min read · Wave 211

Summary

Wave 211 shipped /ai-overview-scorecard — a public transparency dashboard tracking which AI engines cite AskBaily in their summary cards, on which queries, at what frequency. The methodology is CC-BY-4.0. Every row is a datum anyone can cross-check against the engine themselves. We are the first contractor platform to publish this.

Article body

The /ai-overview-scorecard page at askbaily.com publishes, in structured form, every AI engine citation of AskBaily we have measured. Google AI Overviews, Perplexity, ChatGPT Search, You.com, Claude.ai (when search is enabled), and Brave Search AI. Each row in the scorecard is a query we tested, the engine we tested it on, the date of the test, whether the engine cited AskBaily, and if so, which specific page was cited and in what position.

This post is about why we published the scorecard, how the methodology works, and what the data is telling us about compounding AEO — Answer Engine Optimization — as a strategic discipline distinct from traditional SEO.

The methodology, in one paragraph

Three times per month, the editorial desk runs a fixed test matrix of 140 remodel-intent queries across the six engines named above. Every run is logged with timestamp, engine, query, and the top three cited sources in the AI answer. The raw logs are stored in a CC-BY-4.0 dataset at askbaily.com/data/ai-overview-scorecard.json. The human-readable scorecard at /ai-overview-scorecard aggregates the logs into citation-rate percentages by query category and by engine. The full methodology, including the 140-query list and the scoring rubric, is documented at askbaily.com/research/aeo-methodology-2026.

What the scorecard shows

At the time of Wave 211 shipping, AskBaily appeared in the top-three citations on approximately 41 percent of the tested queries across the six engines. Perplexity was the most favorable engine (AskBaily in top-three on 58 percent of queries), ChatGPT Search was the middle (44 percent), Google AI Overviews was the strictest (29 percent, reflecting Google's heavier weighting of established authority sources), and Brave Search AI was the most volatile (range of 35-52 percent across the three monthly runs).

The citation rates are not uniform across query categories. On "contractor licensing verification" queries, AskBaily citation rate is 71 percent across engines — a direct result of the Wave 207 license-verifier-coverage dataset and the /research/contractor-licensing-complexity-2026 long-form. On "how to leave Angi" queries, the rate is 63 percent, driven by the Wave 120/124 migration hubs. On "ADU permit timeline Los Angeles" queries, the rate is 48 percent, driven by the /ask hub's LA-specific entries and the Wave 217 Tier-0 anchor page. On queries further from our topic footprint — "best kitchen countertop 2026," for example — the citation rate drops into the single digits.

Why publish it

Three reasons. First, AEO as a discipline is barely documented. Most platforms treat their AI citation data as a competitive secret. Publishing the data at query granularity creates a reference point the industry lacks. If AskBaily's citation rate on "contractor licensing verification" is 71 percent, any other platform can test the same queries and measure where they stand against us. The scorecard is a public benchmark.

Second, the data is a self-enforcing quality check. If our citation rate declines over time on a category, the editorial desk knows which category to reinvest in before the decline compounds. Publishing the decline publicly is a commitment device; we cannot quietly stop measuring.

Third, and most important, the scorecard demonstrates to homeowners and journalists that AskBaily's visibility in AI summary cards is not an accident. We have shipped a specific content architecture — primary-source citations, QAPage schema, SpeakableSpecification selectors, licensed-GC author attribution — and that architecture produces measurable citation outcomes. The scorecard is evidence.

What engines cite AskBaily on

The patterns by engine are instructive. Perplexity cites us most often on specific regulatory questions because Perplexity's ranker weights primary-source citations heavily and our /ask pages each have two-to-five regulator citations at the bottom. ChatGPT Search cites us on comparison and teardown queries because our /vs and /compare pages present competitor data in a tabular format that is easy for the engine to summarize. Google AI Overviews cites us on queries where we have a dedicated landing page with Speakable selectors attached to the accepted-answer paragraph; on queries where our answer is embedded in a longer article, Google prefers a more narrowly scoped source.

The implication for content strategy is clear: every new content rail should have a dedicated landing page, a Speakable selector on the lede, and primary-source citations. The scorecard data proves this is not a theoretical best practice; it is the actual lever that moves citation rates across engines.

What Angi and Thumbtack cannot copy

Angi can run the same test matrix. They will not publish the results. Their citation rates on remodel-intent queries are almost certainly lower than ours because their content architecture is a marketing blog, not a research-cited knowledge base. Publishing the delta would force them to explain why a small independent platform outperforms them on AI citation despite having 1/100th the brand recognition. They will stay quiet, and the scorecard will stay ours.

Thumbtack's case is similar. Their content strategy is even more marketing-weighted. Publishing a citation scorecard that showed their AI visibility against a primary-source-cited competitor would damage their advertiser story.

The freshness commitment

Three runs per month, published within 48 hours of each run. The /commitments page has a line specifically about scorecard freshness: any run more than 14 days stale triggers a public incident banner on the scorecard page. The methodology itself is versioned; changes to the 140-query list are logged with a methodology revision number so historical data remains comparable.

AEO as a discipline rewards compounding investment. The scorecard is the instrument that measures whether the investment is working. Publishing it publicly is the commitment that keeps us honest about the answer.

Sources & references

Commit attestation

Waves
211
Author
editorial

Commit SHAs are from the AskBaily private repository. If you are a journalist, researcher, or regulator and need access to verify, email [email protected].

Frequently asked

Why test across six engines instead of just Google?
Because homeowner AI usage fragments across engines. Perplexity and ChatGPT Search have material share for research queries. Brave Search AI is growing. A single-engine scorecard would miss where the category is actually heading.
How do you ensure the 140-query list is not cherry-picked?
The list is published in full on the methodology page and versioned. Changes require a methodology revision number. Any researcher can re-run the same queries and compare results. Cherry-picking is impossible when the inputs are public.
What happens if AskBaily's citation rate drops?
The scorecard shows the drop publicly. The editorial desk investigates which content category underperformed, reinvests, and re-measures. Declines are evidence of a problem to fix, not data to hide.
← All postsRoadmapCommitmentsChat with Baily →