Publishing a 118-jurisdiction CC-BY-4.0 contractor licensing dataset

By AskBaily Editorial · Published · 4 min read · Wave 207

Summary

Wave 207 published a machine-readable dataset of residential contractor licensing authorities across 118 jurisdictions — 50 US states plus DC, 10 Canadian provinces, and 57 additional territorial and municipal bodies where state licensing is delegated downward. The dataset is CC-BY-4.0. The license choice is the strategic decision, not the dataset.

Article body

The 118-jurisdiction dataset at askbaily.com/data/license-verifier-coverage.json is, as far as we can find, the only complete public map of residential contractor licensing authority in North America. This is not a product announcement. It is a research artifact, permissively licensed, designed to be cited by anyone — including competitors — and to stay fresh on a schedule the editorial desk commits to publicly.

The shape of the file

Every entry has the same nine fields: jurisdiction identifier (ISO 3166-2 for states and provinces, custom slugs for delegated authorities), board name, board canonical URL, homeowner-facing search URL, licensing regime classification, coverage status on AskBaily's verifier rail, last-verified ISO date, statutory citation, and a notes field for edge cases. The total file is 1,908 JSON lines. The file is human-readable, machine-parseable, and CC-BY-4.0 licensed.

The licensing regime classification is the dataset's analytical spine. Not every US state has a CSLB-style residential contractor license. The five classification buckets are: state-license-required (California, Oregon, Nevada, Arizona, and 27 others), state-registration-required (Massachusetts HIC, Maryland MHIC, and a handful of others), municipal-license-required (Kansas, Wyoming, Maine, New Hampshire, Vermont, Pennsylvania), province-license-required (Quebec RBQ, Ontario no-province-license-with-region-variations), and no-residential-license (a small set of jurisdictions where the residential GC is unregulated at any level).

Why 118

The count reflects a structural reality about how contractor licensing fragments in North America. There are 50 states, 10 Canadian provinces, and DC — 61 jurisdictions if the dataset stopped at the top level. But California's seven contractor-licensed urban counties add seven; New York's city-level HIC registrations for NYC, Nassau County, Westchester, Rockland, Suffolk, Buffalo, and Rochester add seven more. Texas's municipal licenses in Houston, Dallas, San Antonio, Austin, Fort Worth add five. Pennsylvania's city-level registrations in Philadelphia, Pittsburgh, Allentown add three. The aggregate is 57 delegated-downward authorities on top of the 61 top-level jurisdictions, for a total of 118 distinct regulators that a homeowner or a contractor might interact with.

The completeness matters because the major marketplaces all present "licensed contractor" as a uniform badge when, in the underlying reality, "licensed" means 118 different things. A Houston homeowner checking a Houston contractor's license is checking a Harris County registration number. A San Francisco homeowner checking a California contractor's license is checking a CSLB number. The badges look the same on the marketplaces. The regulatory substance is completely different.

Why CC-BY-4.0

Four reasons. First, the compilation is not proprietary. Every board publishes its own existence; we did the assembly. We do not believe assembling public data produces a defensible IP claim, and we would rather publish the assembly than pretend otherwise. Second, search engines and AI engines cite CC-BY data more readily. Perplexity, ChatGPT Search, and You.com all prefer permissively licensed reference data for inline citations, and that citation pathway is strategically valuable for us even when the citation credits "AskBaily." Third, a permissively licensed dataset creates a shared artifact the industry can rally around instead of each platform maintaining a private copy that drifts. Fourth, CC-BY demands attribution; a competitor copying the dataset without attribution is breaking the license, which creates a clear legal handle if we ever need to object.

How the dataset is kept fresh

Quarterly re-verification. The editorial desk runs a scripted HEAD-check on every board URL (the canonical and the homeowner-facing search) and a light-touch manual review of 10 random records per quarter. Records whose board URL has changed are updated and the last-verified date is stamped. Records that fail the HEAD-check flag for manual review within one week.

The commitment is written on /commitments: we refresh the dataset at least quarterly, and any record older than six months surfaces a stale-data banner on the research page. Transparency about freshness is as important as the data itself. A dataset that claims completeness and has not been touched in two years is worse than no dataset at all.

What the research page does with it

The human-readable companion at askbaily.com/research/contractor-licensing-complexity-2026 is a Schema.org Dataset type. That exposes the data to Google Dataset Search independently of our marketing pages, which means researchers and journalists looking for residential contractor licensing data can find our work via a dedicated research engine.

The research page is 7,200 words of analysis: how fragmentation hurts homeowners, why the marketplaces exploit the confusion, which state policy models produce better consumer outcomes, and what a unified national standard would look like if the industry wanted one. The dataset powers the analysis; the analysis powers the citation inbound.

What Angi and Thumbtack cannot copy

This is the part where the strategic move reveals itself. Publishing the dataset is legally simple and technically trivial. The major marketplaces all have data this good or better internally — their backend verification rails could not exist otherwise. They will not publish it because publishing it would force them to answer the obvious follow-up: "Of your listed pros, what percentage have active, non-lapsed licenses in the jurisdiction they operate in, right now, at the 118-jurisdiction granularity this dataset establishes?"

We can answer that question for our pool. They cannot answer it honestly for theirs. The dataset's existence is the neutral evidence that lets any journalist, any state regulator, or any homeowner cross-check our claims against their claims. It is the handle anyone else in the industry can grab to improve on our work or refute it. That is what a research-grade artifact looks like, and that is why the editorial desk invested 11 weeks into assembling it.

Sources & references

Commit attestation

Waves
207
Author
editorial

Commit SHAs are from the AskBaily private repository. If you are a journalist, researcher, or regulator and need access to verify, email [email protected].

Frequently asked

Why 118 jurisdictions instead of just 61 top-level states and provinces?
Because residential contractor licensing is delegated downward in many states. Pennsylvania, Kansas, Wyoming, and Maine delegate to municipalities. California, New York, and Texas add county- or city-level registrations on top of state licensing. The 118-count reflects the regulators a homeowner actually interacts with.
Can competitors reuse the dataset?
Yes. CC-BY-4.0 permits commercial reuse with attribution. We would rather competitors cite a shared artifact than each maintain a private copy that drifts. The attribution requirement is the only constraint.
How often does the editorial desk refresh the dataset?
At least quarterly, per the /commitments page. Records older than six months surface a stale-data banner. Any board URL change flags for manual review within one week of detection.
← All postsRoadmapCommitmentsChat with Baily →