How Accurate Are The Cook Political Report's Race Ratings?

April 10, 2024

Ethan Jasny is completing his junior year at Harvard College, studying Government and Statistics. In the summer of 2023, he interned at the Cook Political Report. On campus, he is a member and former chair of the Harvard Public Opinion Project, which designs and analyzes the biannual Harvard Youth Poll of young Americans aged 18-29.

Since its founding in 1984, The Cook Political Report has forecast at least 10,374 elections, including House, Senate, gubernatorial and presidential races. In advance of our 40th anniversary, we set out to compile a full archive of our race ratings for the first time in our history.

Much has changed about American politics since our first ratings were released in 1984—back when Ronald Reagan won the state of Massachusetts and Barack Obama was a year out of college. But our simple, seven-tiered rating scheme (ranging from Solid D to Solid R) has remained the same. Throughout our history, these ratings have maintained a consistent level of accuracy, even as the Cook Political Report team has changed over time.

We gathered the final rating for every on-cycle race we’ve ever forecast, as well as some special Senate and governor elections. Typically, these ratings were published in late October or early November, though in The Cook Political Report’s nascent years, the final ratings were sometimes published earlier in the cycle. The chart below displays the proportion of races that went for Democrats or Republicans, grouped by rating.

Additionally, independents won seven races rated as Toss Ups. And a total of 15 other races were rated as either Lean Independent, Likely Independent, or Solid Independent (Bernie Sanders is responsible for 10 of them); all of these predictions proved correct.

What Do Lean, Likely, and Solid Mean, Anyway?

As the table below demonstrates, even a “Lean” rating historically implies a high degree of certainty. Combined, “Lean” ratings are correct around 94% of the time, “Likely” ratings are correct around 97% of the time, and “Solid” ratings are virtually 100% accurate.

Notably, Republicans seem to slightly outperform expectations relative to Democrats. Across all offices, a Lean R rating was more likely to be correct than a Lean D rating, and a Likely R rating was more likely to be correct than a Likely D rating. One possible explanation for this is that the Republican waves of 1994 and 2010 were larger than the Democratic waves of 2006 and 2018. The bigger the Republican wave, the more missed Lean D and Likely D ratings you’d expect to see, and vice versa.

Meanwhile, Toss Up victories are closely divided, though Republicans again seem to have a slight edge — particularly in presidential and House races.

Of course, whether a rating is “correct” in the binary sense is not the full picture. In theory, we could rate every even remotely competitive race a Toss Up; we would then almost never make a wrong call, and Toss Up victories would still be roughly divided 50/50 in the long run.

One way of examining how well-calibrated our ratings are is by looking at the typical margin by which races with a certain rating are decided. Across all ratings, the average “Lean” race is decided by a margin of around 10.6 percentage points (in terms of the two-party vote), the average “Likely” race is decided by around 18.2 percentage points, and the average “Solid” race is decided by around 36.8 percentage points. Again, Republicans seem to have a slight advantage, winning the average Toss Up by 0.7 percentage points.

Note that races in our archive that were uncontested or that featured prominent third candidates are excluded for the purpose of calculating average two-party margins.

As the chart below shows, there is some variation depending on the type of race; presidential races, for example, are decided by tighter margins in general. And Democrats often run up huge numbers in uncompetitive House seats. But overall, the typical two-party margins for a particular rating are mostly consistent across the type of race and the favored party.

The average margins of victory associated with our ratings have also remained remarkably consistent over time. For example, Republicans won the average Lean R race by around the same margin in 2020 as they did in The Cook Political Report’s inaugural year, which is also nearly the same as the average margin across all Lean R races in our archive.

The chart below demonstrates how our ratings swing with the political currents of the given election cycle. But it does not point to any clear long-term trends that would suggest that the meanings of our ratings have changed over the past 40 years.

Another way to assess the accuracy of our predictions is to calculate the proportion of Toss Ups that actually come down to the wire. If our ratings were too cautious, we might expect to see a large fraction of Toss Ups decided by relatively wide margins.

This is one area where our predictions seem to be becoming better calibrated: The proportion of Toss Ups decided by under 10 points of two-party vote share has gradually increased over the past 40 years. In 2022, nearly 98% of Toss Ups were decided by under 10 points.

Whether this is the result of us becoming better at making predictions or the result of elections becoming more predictable is another question altogether. But there is some evidence for the former conclusion. The Cook Political Report used to have a general policy that required unindicted incumbents to be rated no worse than Toss Up. This rule evolved over time, and exceptions were made for particularly weak incumbents — there are examples throughout the archives of seats with Democratic incumbents being rated as Lean R, and vice-versa.

The change in how we handled incumbents reflects the shifting nature of congressional politics in the 1990s. “Incumbents tended to survive seemingly impossible situations more” in the early days of CPR, Charlie Cook recalls. But after the Republican Revolution of 1994, incumbents had greater difficulty withstanding countervailing national forces, as House races became less individualized and more polarized along party lines.

Still, some of the Toss Up elections that were decided by the widest margins came in races in the early 1990s in which vulnerable incumbents were blown out. But for the old incumbency guidelines, these races might have been rated as Lean D/R. This change in approach could thus be one of the reasons why our Toss Up ratings have become more precise.

House Outliers

Of the over seven thousand “Solid” races in our archive, only two were called incorrectly. Both were bizarre House elections: Louisiana’s 2nd District in 2008 and Utah’s 3rd District in 1990.

In the case of LA-02, nine-term incumbent Rep. William Jefferson was defeated in his D+27 New Orleans seat by Republican Joseph Cao in a December general election. Jefferson had been embroiled in a corruption scandal for several years, and in June 2007, a federal grand jury indicted him on 16 charges of corruption. Ultimately, exhausted by the recent presidential race and Jefferson’s scandals, Black voters did not turn out for the congressman. According to our own estimates from the time, turnout fell 72% in precincts in which Obama had won over 80% of the vote in November but only 59% in precincts Obama had won less than 80%.

The scandal that sank heavily-favored Republican nominee Karl Snow in Utah’s 3rd District in 1990 was not a high-profile bribery indictment but rather a newspaper ad. Two days before the 1990 election, the Republican Party bought a large ad in the Utah County Journal, which was delivered to nearly every home in the district. The ad depicted Snow standing alongside his large family with the caption “Karl Snow and his family." It then showed a headshot of the bachelor Orton with the caption “Bill Orton and his family.” The ad concluded, “Some candidates want you to believe that their personal values don't matter… Families do matter! Vote Republican."

The party received significant backlash for going negative, and Snow was forced to apologize. In the end, Orton won by over 20 points, carrying many largely rural counties.

Orton has become forever linked to The Cook Political Report in more ways than one.

Charlie Cook remembers Orton taking him to speak to a group of Utah Democrats at either the 1996 or 2000 Democratic National Convention. “I’m gonna tell you something Charlie doesn’t even know,” Cook recalls Orton saying at the event. “My wife and I wouldn’t be married but for Charlie.”

Orton explained that in September 1993, he asked out Jacquelyn Massey, an associate director of the National Association of Federal Credit Unions and a former House Banking Committee staffer. Massey was unsure about the prospect of dating a sitting member of Congress, but after looking up Orton’s profile in the Cook Political Report, she decided to give it a go. They were married next July.

Republican attacks on Orton’s bachelorhood may have doomed the Cook Political Report to its one and only missed Solid R rating. But ironically, thanks to the Cook Political Report, Orton was no longer a bachelor.

While these are only two instances of solid seat upsets, there have been other close calls. In 2014, for example, Republican Scott Milne came within 2,500 votes of incumbent Vermont Gov. Peter Shumlin, a race we had kept in the Solid D column. This past cycle, Adam Frisch came just over 500 votes away from unseating Rep. Lauren Boebert and joining Cao and Orton in the club of solid seat slayers.

Statewide Outliers

Our Senate and governors editors (Jessica Taylor, and before her, Jennifer Duffy) are on something of a hot streak, having not miscalled a single race since 2002.

In retrospect, perhaps the most notable miscalled race in Cook Political Report history occurred during our inaugural year; Jefferson County Judge Mitch McConnell upset incumbent Sen. Walter Huddleston in the Likely D-rated 1984 Kentucky Senate race.

As Charlie Cook wrote after the election, “Huddleston never trailed in the polls and up until the closing week or so before the election, held a relatively large, though somewhat soft, lead.” But a few tactical errors from the Huddleston campaign combined with Reagan’s huge numbers at the top of the ticket were enough to earn McConnell a roughly 5,000-vote victory. “The one thing that kept the McConnell challenge distinct from the other GOP challengers throughout the campaign was money,” Cook noted. “McConnell continually turned in the highest receipts in his FEC reports of any of the Republican challengers.”

hat race was also memorable for a now-famous TV spot Roger Ailes cut for the McConnell campaign. The ad showed a pack of bloodhounds unsuccessfully searching for Huddleston, who was off giving speeches in Los Angeles and Puerto Rico instead of casting votes on the Senate floor. “Maybe we ought to let him make speeches and switch to Mitch for senator,” it concluded.

Electoral College Outliers

Finally, our Electoral College miscalls demonstrate the influence of a single bad cycle. Half of all our missed presidential calls occurred in 2016 — Michigan, Pennsylvania, and Wisconsin were all rated Lean D but won by Trump.

Learn more about the CPR archives here.

What is The Cook Political Report?

The Cook Political Report is an independent, non-partisan newsletter that analyzes elections and campaigns for the US House of Representatives, US Senate, Governors and President as well as American political trends.

Subscribe Today

How Accurate Are The Cook Political Report's Race Ratings?

More from the Cook Political Report

Will the New York Trump Trial Matter?

Indiana’s Congressional Delegation Prepares for Massive Turnover

Erin Covey: Pennsylvania House Rundown

Unpacking the Probabilities of the Biden-Trump Matchup

Episode 7: All Eyes on the Keystone State

Pennsylvania Primary Results: Reps. Summer Lee and Brian Fitzpatrick Easily Defeat Their Primary Opponents

Will the New York Trump Trial Matter?

Indiana’s Congressional Delegation Prepares for Massive Turnover

Erin Covey: Pennsylvania House Rundown

Will the New York Trump Trial Matter?

Subscribe Today