How we publish a CC-BY-4.0 dataset of 115 contractor licensing boards

By AskBaily Editorial · Published · 4 min read · Wave 207

Summary

Wave 207 published a machine-readable dataset of every residential contractor licensing board in the United States and Canada — 115 jurisdictions, 1,908 JSON lines, licensed under Creative Commons Attribution 4.0. It is the first complete, open, citable contractor-licensing map we can find. Anyone can audit it.

Article body

The dataset at askbaily.com/data/license-verifier-coverage.json is the single artifact from Wave 207. It is a 1,908-line JSON file, CC-BY-4.0, covering 115 jurisdictions — 50 US states, the District of Columbia, all 64 US territorial subdivisions with distinct contractor licensing authority, and 10 Canadian provinces with residential contractor regulators. Every entry names the board, links to its canonical search tool, classifies its licensing regime, records whether AskBaily currently verifies against it live, and cites the statute or regulation that establishes the board's authority.

The point is not that AskBaily covers 115 jurisdictions today. It does not. The point is that every other marketplace that claims verified contractors should have been able to publish this dataset any time in the last 15 years and has never done so. We published it because the absence of a public map is the reason homeowners have no way to verify anything.

What is in the file

Each entry is a structured record with the same shape. The top-level fields are the jurisdiction identifier (ISO 3166-2 for states and provinces, custom slugs for territorial authorities), the board name, the board's canonical public URL, the homeowner-facing search URL, the licensing regime classification, and the coverage status — one of live, cached, coverage-pending, or no-residential-license-required.

The licensing regime classification is the analytical spine of the dataset. Not every US state has a residential contractor license in the CSLB sense. California, Oregon, Washington, New York, Nevada, Arizona, and a dozen others require a state-issued contractor license for any residential work above a minimal dollar threshold. Kansas, Wyoming, Maine, New Hampshire, Vermont, Pennsylvania, and a number of others delegate all licensing to the municipality or county. Some states — Massachusetts is the clearest example — license some trades at the state level (electrical, plumbing) and leave general contracting to the city or county.

We classified each jurisdiction in one of five buckets: state-license-required, state-registration-required, municipal-license-required, no-residential-license, and province-license-required. Every record has the classification, the statutory basis in a citations array, and a free-text notes field for edge cases. Massachusetts, for example, has the notes field explaining that the Home Improvement Contractor registration covers consumer-protection but does not authorize trades that require separate master licenses.

Why CC-BY-4.0

The license is Creative Commons Attribution 4.0. That means anyone can reuse the dataset, including competitors, as long as they attribute AskBaily. We chose CC-BY over a more restrictive license for two reasons.

First, the data is not proprietary. The boards publish their own existence; we compiled the compilation. If somebody else wants to maintain a similar dataset, we would rather they do it than pretend the compilation itself is secret. The value we produce is upstream — the live-verification rail — not downstream dataset ownership.

Second, search engines and AI engines cite CC-BY sources more readily. Perplexity, You.com, and ChatGPT Search all prefer permissively licensed reference data for inline citations. If a homeowner asks one of these engines "how do I check if my contractor is licensed in Oregon," we want the model to cite the Oregon CCB directly and cite our dataset as the machine-readable map. That is a better outcome than the engine citing Angi's marketing page.

How the dataset was built

The 115 entries were compiled from four primary sources: each state's business-regulation department, each state's attorney general consumer-protection office (where available), the National Association of State Contractors' Licensing Agencies cross-index (where accurate), and direct contact with board staff in the seven jurisdictions where the published material was contradictory. All three sources were verified against each other; where they disagreed, we documented the disagreement in the notes field and chose the board's own published language as authoritative.

The research took two researchers approximately 11 weeks of asynchronous work. The raw notes and primary-source links are preserved in a separate internal log; the dataset is the cleaned, machine-readable distillation. Future freshness updates are scheduled quarterly. We publish the "last-verified" timestamp per record; stale records degrade to a warning after six months.

What the research page does with it

The dataset powers a human-readable page at askbaily.com/research/contractor-licensing-complexity-2026. That page is a long-form analysis of how licensing fragmentation hurts homeowners — specifically, how the absence of a unified national standard means that "licensed contractor" means materially different things in California than in Pennsylvania, and why the major marketplaces have exploited that confusion. The research page is a Schema.org Dataset type, which makes it eligible for Google Dataset Search and lets the dataset appear in research-engine results independently of AskBaily's marketing pages.

What Angi and Thumbtack cannot copy

This one is subtle. Publishing the dataset is legally simple and technically trivial. Angi has the data; they have to, or their backend verification logic could not work at all. They will not publish it because publishing it would force them to answer the follow-up question: "Of your listed pros, what percentage have active, non-lapsed licenses in the jurisdiction they operate in, right now?"

We can answer that question for our pool. They cannot answer it for theirs without material revenue damage. The dataset is the neutral artifact that lets any journalist, any homeowner, any state regulator cross-check our claims against our competitors' claims, without taking our word for anything. The dataset is the handle by which anyone else in the industry can grab our work and improve it, or refute it. That is what a research-grade artifact looks like, and it is why the blog exists.

Sources & references

Commit attestation

Tests green
215
Files changed
10
Lines added
2,700
Waves
207
Author
editorial

Commit SHAs are from the AskBaily private repository. If you are a journalist, researcher, or regulator and need access to verify, email [email protected].

Frequently asked

Is every board in the dataset actually verified live by AskBaily?
No. Each record has a coverage status: live, cached, coverage-pending, or no-residential-license-required. We are transparent about which boards are covered today and which are not.
How do you keep the dataset fresh?
Quarterly re-verification by the research desk. Each record carries a last-verified ISO date. Records older than six months render a stale-data warning on the research page.
Can competitors use the dataset?
Yes. CC-BY-4.0 permits commercial reuse with attribution. We would rather they cite a shared artifact than compete on data ownership we do not consider proprietary.
← All postsRoadmapCommitmentsChat with Baily →