Skip to content

March Madness Regression – Building on Success

Basketball Pie Chart

Last year, I published the first iteration of my March Madness regression model. I had predicted a good-but-not-great performance. Somehow, I ended up with a bracket in the 98.3 percentile.

This was a brilliant result. Of the six “upsets” I had predicted, three came to pass, which I’m going to call a solid performance. But the crowning achievement of the bracket was picking Baylor over Gonzaga in the final.

I will say that the regression was dangerously close to picking Illinois over Michigan in the final, which would’ve made the bracket way less impressive, somewhere around the 75 percentile range. This is still good, sure, but it’s not going to win you any pools.

Updating the Model

Considering the model’s elite performance last year, I decided not to overhaul it. I figured the best thing I could do was incorporate last year’s tournament results into the training data and run it back.

I was aware of the fact that linear regression tries to exploit every correlation no matter how spurious, so I was expecting the increased degrees of freedom to bring the R-squared down a touch, and past predictions may not look as strong. This comes with the territory of decreasing overfitting.

In this table, we can see how the coefficients in the model changed from last year to this year:

Variable2021 Coef.2022 Coef.2022 Prob.
Intercept-7.271-0.2230.9777
SeedLog-0.646-0.6283.22E-11
SOS0.0630.0280.2645
RPI-7.130-2.5230.5179
W0.0750.0040.8727
L0.0740.0460.2163
Pts-0.194-0.0780.4541
OpPts0.1090.0740.3725
FGP36.19024.4400.0554
OpFGP-11.913-15.1400.1086
X3P0.3160.2120.0973
X3PP-4.370-3.1730.312
Op3P-0.215-0.1900.0955
Op3PP2.3551.5450.6579
FT0.0690.0250.6488
FTP2.8451.1630.583
OReb0.2580.1760.0943
DReb0.1390.0120.9016
OpReb-0.019-0.0290.6847
Ast-0.072-0.0740.061
TO-0.262-0.1810.115
OpTO0.3070.2460.0517
Blk0.0160.0010.9904
Stl-0.047-0.0620.479
PF-0.083-0.1140.0258

The majority of the coefficients decrease, which means that changes in that variable will have a less pronounced effect on the predicted number of tournament wins. The few exceptions are Opponent Field Goal Percentage, Opponent Rebounds, Assists, Steals, and Personal Fouls.

Past Tournament Observations

Last year’s model only predicted 37 wins out of an expected 63. This may have been in part due to a weak field, but more likely was that the previous year’s model gave a boost to a team for its number of games played, and last year many teams had shortened schedules due to COVID protocols.

The model adjusted this year, giving much less weight to number of games played. In addition to other balancing, this means the model now sees 2021 Baylor as the strongest in the sample, predicting 3.66 wins where last year’s model predicted 2.98.

Next, let’s take a look at some outliers. This plot shows how teams fared in the tournament compared to how they were expected to fare:

PredictedWinsVsTournamentWins2022

Overachievers

These are the teams that exceeded their predicted wins the most.

YearTeamPred. WinsTourn. Wins
2014UConn0.996
2014Kentucky1.085
2016Villanova2.146
2021UCLA0.374
2018Loyola Chicago0.694
2016Syracuse0.744
2017North Carolina2.956
2015Michigan St.0.974
2021Oregon St.0.003
2017South Carolina1.074
2019Virginia3.136
2017Xavier0.163
2018Michigan2.255
2015Duke3.366
2019Texas Tech2.395
2018Villanova3.416
2014Dayton0.423
2021Baylor3.676

As you can see, every tournament champion makes this list. I’m going to make the claim that the average champion isn’t that much better than the average Elite 8 team, and yet they end up twice as far in the tournament. This means that the winner is almost always going to be a significant outlier.

We also see that the 2014 Final was even nuttier than previously believed. This model doesn’t see them as anything other than very normal 7 and 8 seeds, but the runs they each went on was historic.

Underachievers

These teams had the most disappointing performances in the Big Dance.

YearTeamPred. WinsTourn. Wins
2018Virginia3.050
2017Villanova3.281
2015Iowa St.2.010
2015Villanova2.991
2016West Virginia1.950
2014Duke1.930
2021Ohio St.1.840
2016Michigan St.1.810
2018Wichita St.1.780
2021Illinois2.681
2021Tennessee1.660
2018Cincinnati2.641

Virginia’s first round upset by UMBC is no surprise as the most disastrous. 2018 and 2021 each had three heavyweights go down way earlier than expected.

One thing to note is that there aren’t many repeats on either list. It seems like a windfall or catastrophe of this magnitude is a very rare thing for a team. Each list only has one repeat member, but unbelievably, this member is the same for both…

Villanova

They were a 1 or 2 seed for four straight years. Twice they won the championship, twice they didn’t escape the first weekend. A stretch that volatile has to be unprecedented, and there’s nothing close to it in our sample.

Predicting This Year

Applying our model to the field this year, we can see the number of wins the model predicts. To pick our bracket, we simply choose the team with more predicted wins. This table is what the model gives us:

SeedTeamPred. Wins
1Baylor3.26
1Gonzaga3.20
1Kansas2.99
1Arizona2.90
2Kentucky2.64
2Duke2.62
3Texas Tech2.48
2Auburn2.40
2Villanova2.28
5Iowa2.05
5Houston2.02
3Tennessee1.79
4UCLA1.67
4Illinois1.66
3Purdue1.64
5UConn1.49
6Texas1.47
4Arkansas1.47
6LSU1.42
7Murray St.1.29
6Alabama1.18
12UAB1.18
7Ohio St.1.15
8Seton Hall1.15
5Saint Mary’s (CA)1.08
11Virginia Tech0.98
3Wisconsin0.97
8Boise St.0.84
10Miami (FL)0.79
6Colorado St.0.77
9Creighton0.75
9Marquette0.75
4Providence0.69
10San Francisco0.69
7Southern California0.69
9TCU0.62
11Iowa St.0.58
13Chattanooga0.53
13Akron0.52
15Delaware0.51
8San Diego St.0.51
12Indiana0.51
10Davidson0.50
10Loyola Chicago0.49
13Vermont0.43
9Memphis0.42
16Wright St.0.40
11Michigan0.39
11Notre Dame0.38
15Jacksonville St.0.35
16Georgia St.0.35
14Colgate0.34
8North Carolina0.34
12Richmond0.32
7Michigan St.0.31
12New Mexico St.0.20
13South Dakota St.0.14
14Montana St.0.09
14Longwood0.02
16Norfolk St.0.00
16Texas Southern-0.12
15Cal St. Fullerton-0.19
15Saint Peter’s-0.22
14Yale-0.57

This result is a little less chalky than last year. I mean it’s still really chalky, but not as bad as last year. Because of the way we take the log of the seed and it being our most important variable, a huge boost is given to the top seeds, especially the 1 seeds.

We can again summarize the bracket with the upsets and the final.

9 Creighton over 8 San Diego State

9 Marquette over 8 North Carolina

10 Davidson over 7 Michigan St.

10 Miami (FL) over 7 Southern California

5 Houston over 4 Illinois

5 Iowa over 4 Providence

5 UConn over 4 Arkansas

6 LSU over 3 Wisconsin

1 Baylor over 1 Kansas in the championship game

This year, eight upsets are predicted instead of last year’s six, which I’ll call an improvement. Compared to last year, the model still really likes Baylor and Creighton and it really does not like North Carolina and Arkansas. It flipped on Wisconsin though.

Michigan St. and North Carolina are the most popular 7 and 8 seed respectively to make a deep run this year, but my model sees them as the weakest of their seeds. Maybe they’re slotted where they are due to name recognition and my model sees through it, or maybe there are intangibles at play my model is blind to.

Its selection of Baylor to repeat as champions is also interesting, as they are the least popular 1 seed. These value plays could help separate the bracket from the pack a bit on what is otherwise a very boring bracket.

It didn’t have to be this boring, however. If teams got placed in the right brackets, the model would see 11 seed Virginia Tech make a run to the Sweet 16 over Colorado St. and Wisconsin and 12 seed UAB to do the same over St. Mary’s (CA) and Providence. These are significant claims for the model to make since it’s so reliant on seeding, and I wish we would have seen them. Instead, each is matched up with a strong first round opponent that the model prefers, so no big upset picks. Fooey.

 

Last year I was very tempered in my expectations for the bracket, but I was pleasantly surprised. This year, my confidence has grown, so it’s only fair that I be brought back down to Earth. But whatever happens, I will be sure to blame it on the math.