6 Making Predictions
To make predictions, I will use the retained iterations from the gri_stanfit
object. Alternatively, I could use the means of the posterior distributions to calculate a single lambda for each score within each game. These lambda values could then be used to generate a sample of scores which could be used to predict the outcome of the game. However, this approach would ignore the uncertainty in the parameter estimates. A better solution would be to calculate lambda values at each iteration of the chain, using the current estimates of the parameters. Thus, at each iteration, it is possible to simulate a score for each team at each iteration, creating a posterior distribution for the score of each game.
6.1 Predict individual games
First, I will extract the parameters we need from the fitted model, and load in the team codes associated with each team.
model_params <- rstan::extract(gri_stanfit, pars = c("mu", "eta", "alpha",
"delta", "sigma_g"))
load("_data/team_counts.rda")
I then use the predict_games
function (see Appendix D.1) to predict the outcome of a game between any two teams included in the model. For example, we can predict the winner of a game between Barcelona and Real Madrid played in Barcelona.
library(dplyr)
library(ggplot2)
library(purrr)
prediction <- predict_game(home = "Barcelona", away = "Real Madrid",
neutral = FALSE, visualize = TRUE, team_codes = team_counts,
chains = model_params)
prediction$predictions %>%
select(Club = club, `Expected Goals` = expected_goals,
`Win Probability` = prob_win, `Tie Probability` = prob_tie) %>%
knitr::kable(caption = "Prediction for Real Madrid at Barcelona",
align = "c", digits = 3)
Club | Expected Goals | Win Probability | Tie Probability |
---|---|---|---|
Barcelona | 2.76 | 0.591 | 0.171 |
Real Madrid | 1.74 | 0.238 | 0.171 |
Because I specified visualize = TRUE
in the call to predict_game
we can use the multiplot
function (Appendix D.2) to visualize the range of possible outcomes from the posteriors.
library(grid)
multiplot(plotlist = prediction$plots, cols = 2)
6.2 Predict domestic leagues
To predict entire leagues, I follow the same general process, simulating an outcome for each retained iteration of the chain. The difference for leagues is that instead of simulating a single game at each iteration, we simulate the remainder of the league season, and calculate the league winner. This is all done by the predict_league
function (Appendix D.3).
In order to simulate these outcome, I’ll first need to load in the full data set that includes future games
library(lubridate)
library(rvest)
library(tidyr)
library(scales)
load("_data/full_data.rda")
load("_data/club_rankings.rda")
Then, I can use the predict_league
function to get championship probabilities for each league.
6.2.1 English Premier League
predict_league(league = "Premier League", games = full_data,
chains = model_params, team_codes = team_counts) %>%
left_join(select(club_rankings, club, exp_offense, exp_defense),
by = "club") %>%
arrange(desc(champ_pct)) %>%
mutate(champ_pct = percent(ifelse(is.na(champ_pct), 0, champ_pct))) %>%
select(Club = club, Offense = exp_offense, Defense = exp_defense,
`Expected Points` = sim_points, `Championship Probability` = champ_pct) %>%
knitr::kable(caption = "Premier League Championship Probabilities",
align = "c", digits = 2)
Club | Offense | Defense | Expected Points | Championship Probability |
---|---|---|---|---|
Chelsea | 1.93 | 0.78 | 90.2 | 92.5% |
Tottenham Hotspur | 1.81 | 0.72 | 84.6 | 7.5% |
Liverpool | 1.73 | 0.81 | 75.2 | 0.0% |
Manchester City | 1.80 | 0.81 | 74.7 | 0.0% |
Manchester United | 1.54 | 0.69 | 71.1 | 0.0% |
Arsenal | 1.90 | 0.91 | 69.5 | 0.0% |
Everton | 1.41 | 0.90 | 62.4 | 0.0% |
West Bromwich Albion | 1.09 | 0.98 | 47.9 | 0.0% |
Southampton | 1.08 | 0.88 | 46.9 | 0.0% |
AFC Bournemouth | 1.25 | 1.19 | 45.4 | 0.0% |
Leicester City | 1.24 | 1.06 | 45.0 | 0.0% |
Stoke City | 1.02 | 1.04 | 43.0 | 0.0% |
Watford | 1.00 | 1.10 | 42.9 | 0.0% |
Burnley | 1.00 | 0.95 | 43.4 | 0.0% |
West Ham United | 1.13 | 1.20 | 41.7 | 0.0% |
Crystal Palace | 1.17 | 1.10 | 40.8 | 0.0% |
Hull City | 1.13 | 1.13 | 37.7 | 0.0% |
Swansea City | 1.23 | 1.20 | 36.2 | 0.0% |
Middlesbrough | 0.99 | 0.92 | 30.6 | 0.0% |
Sunderland | 0.81 | 1.15 | 23.8 | 0.0% |
6.2.2 French Ligue 1
predict_league(league = "Ligue 1", games = full_data,
chains = model_params, team_codes = team_counts) %>%
left_join(select(club_rankings, club, exp_offense, exp_defense),
by = "club") %>%
arrange(desc(champ_pct)) %>%
mutate(champ_pct = percent(ifelse(is.na(champ_pct), 0, champ_pct))) %>%
select(Club = club, Offense = exp_offense, Defense = exp_defense,
`Expected Points` = sim_points, `Championship Probability` = champ_pct) %>%
knitr::kable(caption = "Ligue 1 Championship Probabilities",
align = "c", digits = 2)
Club | Offense | Defense | Expected Points | Championship Probability |
---|---|---|---|---|
AS Monaco | 2.07 | 0.92 | 91.8 | 95.6% |
Paris Saint-Germain | 1.99 | 0.73 | 87.3 | 4.4% |
Nice | 1.28 | 0.92 | 81.1 | 0.0% |
Lyon | 1.62 | 0.99 | 62.8 | 0.0% |
Bordeaux | 1.16 | 1.03 | 60.1 | 0.0% |
Marseille | 1.31 | 0.97 | 60.0 | 0.0% |
St Etienne | 0.91 | 0.85 | 53.3 | 0.0% |
Nantes | 0.88 | 1.09 | 51.1 | 0.0% |
Stade Rennes | 0.85 | 1.07 | 47.5 | 0.0% |
Guingamp | 1.06 | 1.12 | 48.9 | 0.0% |
Lille | 0.93 | 0.98 | 47.0 | 0.0% |
Toulouse | 0.99 | 0.91 | 47.3 | 0.0% |
Montpellier | 1.14 | 1.32 | 42.3 | 0.0% |
Angers | 1.00 | 1.07 | 42.8 | 0.0% |
Metz | 0.89 | 1.39 | 41.8 | 0.0% |
Lorient | 1.08 | 1.38 | 38.0 | 0.0% |
Dijon FCO | 1.08 | 1.18 | 36.8 | 0.0% |
Caen | 0.92 | 1.41 | 35.4 | 0.0% |
AS Nancy Lorraine | 0.80 | 0.99 | 35.0 | 0.0% |
Bastia | 0.77 | 1.16 | 33.3 | 0.0% |
6.2.3 German Bundesliga
predict_league(league = "Bundesliga", games = full_data,
chains = model_params, team_codes = team_counts) %>%
left_join(select(club_rankings, club, exp_offense, exp_defense),
by = "club") %>%
arrange(desc(champ_pct)) %>%
mutate(champ_pct = percent(ifelse(is.na(champ_pct), 0, champ_pct))) %>%
select(Club = club, Offense = exp_offense, Defense = exp_defense,
`Expected Points` = sim_points, `Championship Probability` = champ_pct) %>%
knitr::kable(caption = "Bundesliga Championship Probabilities",
align = "c", digits = 2)
Club | Offense | Defense | Expected Points | Championship Probability |
---|---|---|---|---|
Bayern Munich | 2.13 | 0.70 | 80.1 | 100% |
RB Leipzig | 1.43 | 0.87 | 66.8 | 0% |
TSG Hoffenheim | 1.48 | 0.89 | 62.6 | 0% |
Borussia Dortmund | 1.83 | 0.91 | 63.0 | 0% |
Hertha Berlin | 1.10 | 0.91 | 50.6 | 0% |
Werder Bremen | 1.35 | 1.20 | 47.9 | 0% |
SC Freiburg | 1.09 | 1.25 | 47.0 | 0% |
FC Cologne | 1.18 | 0.95 | 46.6 | 0% |
Borussia Monchengladbach | 1.30 | 0.99 | 47.6 | 0% |
Schalke 04 | 1.37 | 0.88 | 46.6 | 0% |
Eintracht Frankfurt | 0.95 | 0.87 | 45.2 | 0% |
Bayer Leverkusen | 1.26 | 1.04 | 40.2 | 0% |
FC Augsburg | 0.99 | 1.15 | 37.3 | 0% |
Mainz | 1.17 | 1.15 | 36.9 | 0% |
VfL Wolfsburg | 0.92 | 1.07 | 36.5 | 0% |
Hamburg SV | 0.96 | 1.26 | 36.4 | 0% |
FC Ingolstadt 04 | 0.95 | 1.16 | 32.4 | 0% |
SV Darmstadt 98 | 0.84 | 1.29 | 25.9 | 0% |
6.2.4 Italian Serie A
predict_league(league = "Serie A", games = full_data,
chains = model_params, team_codes = team_counts) %>%
left_join(select(club_rankings, club, exp_offense, exp_defense),
by = "club") %>%
arrange(desc(champ_pct)) %>%
mutate(champ_pct = percent(ifelse(is.na(champ_pct), 0, champ_pct))) %>%
select(Club = club, Offense = exp_offense, Defense = exp_defense,
`Expected Points` = sim_points, `Championship Probability` = champ_pct) %>%
knitr::kable(caption = "Serie A Championship Probabilities",
align = "c", digits = 2)
Club | Offense | Defense | Expected Points | Championship Probability |
---|---|---|---|---|
Juventus | 1.66 | 0.64 | 92.0 | 99.2% |
AS Roma | 1.75 | 0.87 | 82.3 | 0.7% |
Napoli | 1.76 | 0.92 | 81.7 | 0.1% |
Lazio | 1.49 | 0.92 | 74.0 | 0.0% |
Atalanta | 1.37 | 0.93 | 71.0 | 0.0% |
AC Milan | 1.24 | 0.95 | 64.5 | 0.0% |
Internazionale | 1.44 | 1.13 | 62.2 | 0.0% |
Fiorentina | 1.36 | 1.06 | 60.9 | 0.0% |
Torino | 1.51 | 1.14 | 54.4 | 0.0% |
Sampdoria | 1.05 | 0.99 | 50.8 | 0.0% |
Udinese | 1.08 | 1.09 | 48.1 | 0.0% |
Cagliari | 1.20 | 1.32 | 45.6 | 0.0% |
Chievo Verona | 1.01 | 1.09 | 45.7 | 0.0% |
Sassuolo | 1.15 | 1.16 | 43.9 | 0.0% |
Bologna | 1.00 | 1.06 | 43.0 | 0.0% |
Genoa | 0.97 | 1.29 | 34.0 | 0.0% |
Empoli | 0.75 | 1.12 | 33.4 | 0.0% |
Crotone | 0.83 | 1.15 | 28.9 | 0.0% |
Palermo | 0.83 | 1.37 | 23.7 | 0.0% |
US Pescara | 0.89 | 1.44 | 18.5 | 0.0% |
6.2.5 Spanish La Liga
predict_league(league = "La Liga", games = full_data,
chains = model_params, team_codes = team_counts) %>%
left_join(select(club_rankings, club, exp_offense, exp_defense),
by = "club") %>%
arrange(desc(champ_pct)) %>%
mutate(champ_pct = percent(ifelse(is.na(champ_pct), 0, champ_pct))) %>%
select(Club = club, Offense = exp_offense, Defense = exp_defense,
`Expected Points` = sim_points, `Championship Probability` = champ_pct) %>%
knitr::kable(caption = "La Liga Championship Probabilities",
align = "c", digits = 2)
Club | Offense | Defense | Expected Points | Championship Probability |
---|---|---|---|---|
Barcelona | 2.30 | 0.83 | 87.9 | 65.6% |
Real Madrid | 2.18 | 0.94 | 87.6 | 34.4% |
Atletico Madrid | 1.43 | 0.67 | 77.0 | 0.0% |
Sevilla FC | 1.33 | 0.94 | 72.8 | 0.0% |
Villarreal | 1.15 | 0.82 | 67.1 | 0.0% |
Athletic Bilbao | 1.21 | 0.93 | 66.1 | 0.0% |
Real Sociedad | 1.29 | 1.04 | 64.9 | 0.0% |
Eibar | 1.26 | 1.03 | 57.2 | 0.0% |
Espanyol | 1.09 | 0.98 | 54.8 | 0.0% |
Alavés | 0.97 | 0.86 | 52.4 | 0.0% |
Celta Vigo | 1.28 | 1.03 | 48.2 | 0.0% |
Málaga | 1.08 | 1.15 | 45.1 | 0.0% |
Valencia | 1.22 | 1.25 | 44.4 | 0.0% |
Las Palmas | 1.27 | 1.32 | 42.3 | 0.0% |
Real Betis | 0.94 | 1.17 | 40.2 | 0.0% |
Deportivo La Coruña | 1.02 | 1.19 | 35.6 | 0.0% |
Leganes | 0.83 | 1.12 | 33.5 | 0.0% |
Sporting Gijón | 0.96 | 1.35 | 27.6 | 0.0% |
Granada | 0.78 | 1.40 | 22.6 | 0.0% |
Osasuna | 0.91 | 1.51 | 21.8 | 0.0% |
6.3 UEFA Champions League
Simulating the UEFA Champions League is very similar to the process used for simulating the domestic leagues. At each retained iteration of the MCMC chain, I simulate the remainder of the Champions League matches. Because there isn’t a true bracket, and the opponents are drawn randomly before each round, I first define the current match-ups.
matchups <- list(
c("Real Madrid", "Atletico Madrid"),
c("AS Monaco", "Juventus")
)
I can then use the predict_ucl
function (Appendix D.4) to calculate the probability of each team advancing to each subsequent round.
predict_ucl(matchups = matchups, games = full_data, chains = model_params,
team_codes = team_counts) %>%
left_join(select(club_rankings, club, exp_offense, exp_defense),
by = c("Club" = "club")) %>%
arrange(desc(Champion)) %>%
mutate(Quarterfinals = percent(Quarterfinals),
Semifinals = percent(Semifinals), Final = percent(Final),
Champion = percent(Champion)) %>%
select(Club, Offense = exp_offense, Defense = exp_defense, Quarterfinals,
Semifinals, Final, Champion) %>%
knitr::kable(caption = "UEFA Champions League Probabilities",
align = "c", digits = 2)
Club | Offense | Defense | Quarterfinals | Semifinals | Final | Champion |
---|---|---|---|---|---|---|
Juventus | 1.66 | 0.64 | 100% | 100% | 90.5% | 48.7% |
Real Madrid | 2.18 | 0.94 | 100% | 100% | 94.7% | 43.6% |
AS Monaco | 2.07 | 0.92 | 100% | 100% | 9.5% | 5.0% |
Atletico Madrid | 1.43 | 0.67 | 100% | 100% | 5.3% | 2.7% |