Previewing the 2017 Men's NCAA Basketball Tournament

March Madness officially tips off tomorrow with the First Four games in Dayton before the round of 64 begins on Thursday. In this post, we’ll look at each team’s chance of advancing and winning the national title. We’ll also look at who was help and hurt most by how the committee seeded the tournament.

As always, the code and data for this post are available on my Github page.

The Ratings

The team ratings come from my sports analytics website, Hawklytics. Using the composite ratings, we can calculate the probability of any team beating another using the Log-5 formula. Using those probabilities, we can calculate the probability of any team advancing to each round.

library(dplyr)
library(ggplot2)

bracket2017 <- readRDS("datafiles/2017bracket.rds")

Round by Round Probabilities

bracket2017 %>%
  select(Seed, School, Region, Round_3:Champion) %>%
  knitr::kable(digits = 3, col.names = gsub("_", " ", colnames(.)),
    align = "c", booktabs = TRUE)
Seed School Region Round 3 Sweet 16 Elite 8 Final 4 Final Champion
1 Gonzaga West 0.976 0.825 0.539 0.399 0.245 0.158
1 North Carolina South 0.976 0.811 0.606 0.371 0.226 0.119
1 Villanova East 0.977 0.712 0.418 0.277 0.163 0.101
2 Louisville Midwest 0.962 0.636 0.423 0.257 0.143 0.072
4 West Virginia West 0.909 0.651 0.311 0.208 0.110 0.061
2 Kentucky South 0.952 0.571 0.375 0.210 0.119 0.058
1 Kansas Midwest 0.958 0.703 0.415 0.224 0.114 0.052
5 Virginia East 0.875 0.521 0.267 0.165 0.090 0.051
2 Duke East 0.942 0.665 0.375 0.165 0.078 0.040
4 Florida East 0.866 0.433 0.201 0.114 0.057 0.030
10 Wichita State South 0.762 0.368 0.228 0.119 0.063 0.028
4 Purdue Midwest 0.825 0.493 0.261 0.130 0.061 0.025
3 Baylor East 0.888 0.510 0.274 0.111 0.049 0.023
3 Oregon Midwest 0.910 0.584 0.261 0.130 0.058 0.023
3 Florida State West 0.896 0.627 0.344 0.125 0.050 0.022
6 Southern Methodist East 0.787 0.417 0.219 0.086 0.037 0.017
7 Saint Mary’s (CA) West 0.736 0.431 0.250 0.091 0.037 0.016
5 Iowa State Midwest 0.770 0.401 0.197 0.091 0.039 0.015
4 Butler South 0.884 0.583 0.220 0.091 0.037 0.013
3 UCLA South 0.907 0.485 0.190 0.081 0.034 0.012
2 Arizona West 0.925 0.471 0.248 0.079 0.028 0.011
6 Cincinnati South 0.649 0.360 0.147 0.065 0.029 0.010
8 Wisconsin East 0.732 0.244 0.098 0.047 0.020 0.009
7 Michigan Midwest 0.536 0.202 0.106 0.049 0.020 0.007
5 Notre Dame West 0.763 0.287 0.094 0.047 0.017 0.007
6 Creighton Midwest 0.634 0.282 0.100 0.040 0.014 0.004
10 Oklahoma State Midwest 0.464 0.160 0.078 0.034 0.013 0.004
8 Miami (FL) Midwest 0.573 0.181 0.069 0.023 0.007 0.002
5 Minnesota South 0.602 0.258 0.071 0.022 0.006 0.002
7 South Carolina East 0.503 0.165 0.059 0.015 0.004 0.001
10 Marquette East 0.497 0.162 0.057 0.014 0.004 0.001
11 Xavier West 0.544 0.200 0.073 0.016 0.004 0.001
8 Arkansas South 0.532 0.104 0.042 0.011 0.003 0.001
11 Kansas State South 0.190 0.080 0.023 0.007 0.002 0.001
9 Vanderbilt West 0.507 0.088 0.025 0.009 0.002 0.001
9 Michigan State Midwest 0.427 0.111 0.035 0.010 0.002 0.000
8 Northwestern West 0.493 0.084 0.023 0.008 0.002 0.000
6 Maryland West 0.456 0.151 0.050 0.009 0.002 0.000
11 Rhode Island Midwest 0.366 0.120 0.029 0.008 0.002 0.000
11a Wake Forest South 0.162 0.065 0.017 0.005 0.002 0.000
9 Seton Hall South 0.468 0.083 0.031 0.007 0.002 0.000
7 Dayton South 0.238 0.057 0.019 0.005 0.001 0.000
10 Virginia Commonwealth West 0.264 0.092 0.032 0.006 0.001 0.000
12 Middle Tennessee South 0.398 0.135 0.028 0.006 0.001 0.000
9 Virginia Tech East 0.268 0.043 0.009 0.002 0.000 0.000
12 Nevada Midwest 0.230 0.061 0.014 0.003 0.001 0.000
11 Providence East 0.119 0.032 0.008 0.001 0.000 0.000
12 Princeton West 0.237 0.042 0.006 0.001 0.000 0.000
13 Vermont Midwest 0.175 0.045 0.009 0.002 0.000 0.000
11a Southern California East 0.094 0.023 0.005 0.001 0.000 0.000
12 North Carolina-Wilmington East 0.125 0.025 0.004 0.001 0.000 0.000
13 East Tennessee State East 0.134 0.021 0.003 0.000 0.000 0.000
13 Bucknell West 0.091 0.020 0.002 0.000 0.000 0.000
14 New Mexico State East 0.112 0.018 0.003 0.000 0.000 0.000
14 Florida Gulf Coast West 0.104 0.022 0.003 0.000 0.000 0.000
13 Winthrop South 0.116 0.024 0.002 0.000 0.000 0.000
14 Iona Midwest 0.090 0.014 0.001 0.000 0.000 0.000
14 Kent State South 0.093 0.010 0.001 0.000 0.000 0.000
15 Troy East 0.058 0.007 0.001 0.000 0.000 0.000
15 Northern Kentucky South 0.048 0.004 0.000 0.000 0.000 0.000
15 North Dakota West 0.075 0.006 0.001 0.000 0.000 0.000
16 North Carolina Central Midwest 0.031 0.004 0.000 0.000 0.000 0.000
15 Jacksonville State Midwest 0.038 0.003 0.000 0.000 0.000 0.000
16 South Dakota State West 0.024 0.003 0.000 0.000 0.000 0.000
16 Texas Southern South 0.024 0.003 0.000 0.000 0.000 0.000
16a New Orleans East 0.013 0.001 0.000 0.000 0.000 0.000
16a UC-Davis Midwest 0.011 0.001 0.000 0.000 0.000 0.000
16 Mount St. Mary’s East 0.010 0.001 0.000 0.000 0.000 0.000

Gonzaga comes in as the favorite using my ratings with a 15.8% chance winning the title, followed by North Carolina and overall number 1 seed Villanova. Kansas, the other number 1 seed is the 7th most likely team to cut down the nets with a 5.2% chance.

Region Difficulty

To see who got help and hurt by their seeding, we can first look at the talent level in each region. To do this, I’ll take the average offensive and defensive rating of all the teams in a region, and then calculate a net rating. To keep the extra teams playing in the First Four from bringing down the region average, I’ll keep only favorite from each of the play in games.

bracket2017 %>%
  filter(!grepl("a", Seed)) %>%
  group_by(Region) %>%
  summarize(Offense = mean(Offense), Defense = mean(Defense)) %>%
  mutate(Net = Offense - Defense) %>%
  arrange(desc(Net)) %>%
  knitr::kable(digits = 2, align = "c", booktabs = TRUE)
Region Offense Defense Net
East 113.21 93.71 19.50
Midwest 113.43 94.62 18.81
West 112.41 94.76 17.66
South 112.24 94.95 17.29

Villanova and Kansas, although the top two seeds in the tournament, weren’t given any favors, as they ended up in the two toughest regions in field. North Carolina, on the other hand, has the easiest region using these ratings.

We can also look at the direct impact of the seeding process. Using the consensus bracket from the Bracket Matrix, we can get a good gauge of where teams should have been seeded. I have calculated the probability of each team making the Final Four using the consensus bracket so we can compare these numbers to the probabilities from the real bracket.

seeding <- bracket2017 %>%
  select(Seed, School, Region, Con_Final4, Final_4) %>%
  mutate(Change = Final_4 - Con_Final4) %>%
  select(-(Con_Final4:Final_4)) %>%
  arrange(desc(Change))

knitr::kable(head(seeding), digits = 3, align = "c", booktabs = TRUE)
Seed School Region Change
1 North Carolina South 0.073
1 Kansas Midwest 0.064
1 Gonzaga West 0.046
4 West Virginia West 0.036
4 Butler South 0.033
2 Louisville Midwest 0.033

Although Kansas is in the second hardest region, they are 6.4% more likely to make the Final Four under the real bracket compared to the consensus bracket. This is because the bottom half of the Midwest region is much stronger relative to other regions, while the top half is slightly easier. Thus, the region as a whole is strong, but Kansas would only have to beat one of the teams from the bottom half in order to advance to the Final Four. Unsurprisingly, North Carolina, who is in the easiest region, benefits the most from the real seeding. They are 7.3% more likely to make the Final Four than when using the consensus bracket.

knitr::kable(tail(seeding), digits = 3, align = "c", booktabs = TRUE)
Seed School Region Change
8 Wisconsin East -0.020
3 Baylor East -0.038
2 Duke East -0.054
4 Florida East -0.056
2 Kentucky South -0.059
1 Villanova East -0.129

On the other end of the spectrum, Villanova was hurt the most by the real bracket by far. They are 12.9% less likely to make the Final Four than if the consensus bracket were used. In fact, this list is dominated by good teams who were all placed in the East region, and therefore have to fight each other to get out. The exception is Kentucky, who is 5.9% less likely to make the Final Four after drawing potential matchups with Wichita State, UCLA, and North Carolina.

Session info

devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.0 (2017-04-21)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/Chicago             
#>  date     2017-05-25
#> Packages -----------------------------------------------------------------
#>  package    * version    date       source                            
#>  assertthat   0.2.0      2017-04-11 CRAN (R 3.4.0)                    
#>  backports    1.0.5      2017-01-18 CRAN (R 3.4.0)                    
#>  base       * 3.4.0      2017-04-21 local                             
#>  bindr        0.1        2016-11-13 cran (@0.1)                       
#>  bindrcpp   * 0.1        2016-12-11 cran (@0.1)                       
#>  blogdown     0.0.41     2017-05-23 Github (rstudio/blogdown@a367835) 
#>  bookdown     0.4        2017-05-23 Github (rstudio/bookdown@b9f0e40) 
#>  codetools    0.2-15     2016-10-05 CRAN (R 3.4.0)                    
#>  colorspace   1.3-2      2016-12-14 CRAN (R 3.4.0)                    
#>  compiler     3.4.0      2017-04-21 local                             
#>  datasets   * 3.4.0      2017-04-21 local                             
#>  devtools     1.13.1     2017-05-13 CRAN (R 3.4.0)                    
#>  digest       0.6.12     2017-01-27 CRAN (R 3.4.0)                    
#>  dplyr      * 0.6.0      2017-05-23 Github (tidyverse/dplyr@c7ca374)  
#>  evaluate     0.10       2016-10-11 CRAN (R 3.4.0)                    
#>  ggplot2    * 2.2.1      2016-12-30 CRAN (R 3.4.0)                    
#>  glue         1.0.0      2017-04-17 cran (@1.0.0)                     
#>  graphics   * 3.4.0      2017-04-21 local                             
#>  grDevices  * 3.4.0      2017-04-21 local                             
#>  grid         3.4.0      2017-04-21 local                             
#>  gtable       0.2.0      2016-02-26 CRAN (R 3.4.0)                    
#>  highr        0.6        2016-05-09 CRAN (R 3.4.0)                    
#>  htmltools    0.3.6      2017-04-28 CRAN (R 3.4.0)                    
#>  knitr      * 1.16       2017-05-18 CRAN (R 3.4.0)                    
#>  lazyeval     0.2.0      2016-06-12 CRAN (R 3.4.0)                    
#>  magrittr     1.5        2014-11-22 CRAN (R 3.4.0)                    
#>  memoise      1.1.0      2017-04-21 CRAN (R 3.4.0)                    
#>  methods      3.4.0      2017-04-21 local                             
#>  munsell      0.4.3      2016-02-13 CRAN (R 3.4.0)                    
#>  plyr         1.8.4      2016-06-08 CRAN (R 3.4.0)                    
#>  R6           2.2.1      2017-05-10 cran (@2.2.1)                     
#>  Rcpp         0.12.11    2017-05-22 CRAN (R 3.4.0)                    
#>  rlang        0.1.1.9000 2017-05-23 Github (hadley/rlang@7c2f7e8)     
#>  rmarkdown    1.5.9000   2017-05-23 Github (rstudio/rmarkdown@54bf8fc)
#>  rprojroot    1.2        2017-01-16 CRAN (R 3.4.0)                    
#>  scales       0.4.1      2016-11-09 CRAN (R 3.4.0)                    
#>  stats      * 3.4.0      2017-04-21 local                             
#>  stringi      1.1.5      2017-04-07 CRAN (R 3.4.0)                    
#>  stringr      1.2.0      2017-02-18 CRAN (R 3.4.0)                    
#>  tibble       1.3.1      2017-05-17 CRAN (R 3.4.0)                    
#>  tools        3.4.0      2017-04-21 local                             
#>  utils      * 3.4.0      2017-04-21 local                             
#>  withr        1.0.2      2016-06-20 CRAN (R 3.4.0)                    
#>  yaml         2.1.14     2016-11-12 CRAN (R 3.4.0)