6 Making Predictions

To make predictions, I will use the retained iterations from the gri_stanfit object. Alternatively, I could use the means of the posterior distributions to calculate a single lambda for each score within each game. These lambda values could then be used to generate a sample of scores which could be used to predict the outcome of the game. However, this approach would ignore the uncertainty in the parameter estimates. A better solution would be to calculate lambda values at each iteration of the chain, using the current estimates of the parameters. Thus, at each iteration, it is possible to simulate a score for each team at each iteration, creating a posterior distribution for the score of each game.

6.1 Predict individual games

First, I will extract the parameters we need from the fitted model, and load in the team codes associated with each team.

model_params <- rstan::extract(gri_stanfit, pars = c("mu", "eta", "alpha",
  "delta", "sigma_g"))
load("_data/team_counts.rda")

I then use the predict_games function (see Appendix D.1) to predict the outcome of a game between any two teams included in the model. For example, we can predict the winner of a game between Barcelona and Real Madrid played in Barcelona.

library(dplyr)
library(ggplot2)
library(purrr)
prediction <- predict_game(home = "Barcelona", away = "Real Madrid",
  neutral = FALSE, visualize = TRUE, team_codes = team_counts,
  chains = model_params)

prediction$predictions %>%
  select(Club = club, `Expected Goals` = expected_goals,
    `Win Probability` = prob_win, `Tie Probability` = prob_tie) %>%
  knitr::kable(caption = "Prediction for Real Madrid at Barcelona",
    align = "c", digits = 3)
Table 6.1: Prediction for Real Madrid at Barcelona
Club Expected Goals Win Probability Tie Probability
Barcelona 2.76 0.591 0.171
Real Madrid 1.74 0.238 0.171

Because I specified visualize = TRUE in the call to predict_game we can use the multiplot function (Appendix D.2) to visualize the range of possible outcomes from the posteriors.

library(grid)
multiplot(plotlist = prediction$plots, cols = 2)
Visualizations for Real Madrid at Barcelona

Figure 6.1: Visualizations for Real Madrid at Barcelona

6.2 Predict domestic leagues

To predict entire leagues, I follow the same general process, simulating an outcome for each retained iteration of the chain. The difference for leagues is that instead of simulating a single game at each iteration, we simulate the remainder of the league season, and calculate the league winner. This is all done by the predict_league function (Appendix D.3).

In order to simulate these outcome, I’ll first need to load in the full data set that includes future games

library(lubridate)
library(rvest)
library(tidyr)
library(scales)

load("_data/full_data.rda")
load("_data/club_rankings.rda")

Then, I can use the predict_league function to get championship probabilities for each league.

6.2.1 English Premier League

predict_league(league = "Premier League", games = full_data,
  chains = model_params, team_codes = team_counts) %>%
  left_join(select(club_rankings, club, exp_offense, exp_defense),
    by = "club") %>%
  arrange(desc(champ_pct)) %>%
  mutate(champ_pct = percent(ifelse(is.na(champ_pct), 0, champ_pct))) %>%
  select(Club = club, Offense = exp_offense, Defense = exp_defense,
    `Expected Points` = sim_points, `Championship Probability` = champ_pct) %>%
  knitr::kable(caption = "Premier League Championship Probabilities",
    align = "c", digits = 2)
Table 6.2: Premier League Championship Probabilities
Club Offense Defense Expected Points Championship Probability
Chelsea 1.93 0.78 90.2 92.5%
Tottenham Hotspur 1.81 0.72 84.6 7.5%
Liverpool 1.73 0.81 75.2 0.0%
Manchester City 1.80 0.81 74.7 0.0%
Manchester United 1.54 0.69 71.1 0.0%
Arsenal 1.90 0.91 69.5 0.0%
Everton 1.41 0.90 62.4 0.0%
West Bromwich Albion 1.09 0.98 47.9 0.0%
Southampton 1.08 0.88 46.9 0.0%
AFC Bournemouth 1.25 1.19 45.4 0.0%
Leicester City 1.24 1.06 45.0 0.0%
Stoke City 1.02 1.04 43.0 0.0%
Watford 1.00 1.10 42.9 0.0%
Burnley 1.00 0.95 43.4 0.0%
West Ham United 1.13 1.20 41.7 0.0%
Crystal Palace 1.17 1.10 40.8 0.0%
Hull City 1.13 1.13 37.7 0.0%
Swansea City 1.23 1.20 36.2 0.0%
Middlesbrough 0.99 0.92 30.6 0.0%
Sunderland 0.81 1.15 23.8 0.0%

6.2.2 French Ligue 1

predict_league(league = "Ligue 1", games = full_data,
  chains = model_params, team_codes = team_counts) %>%
  left_join(select(club_rankings, club, exp_offense, exp_defense),
    by = "club") %>%
  arrange(desc(champ_pct)) %>%
  mutate(champ_pct = percent(ifelse(is.na(champ_pct), 0, champ_pct))) %>%
  select(Club = club, Offense = exp_offense, Defense = exp_defense,
    `Expected Points` = sim_points, `Championship Probability` = champ_pct) %>%
  knitr::kable(caption = "Ligue 1 Championship Probabilities",
    align = "c", digits = 2)
Table 6.3: Ligue 1 Championship Probabilities
Club Offense Defense Expected Points Championship Probability
AS Monaco 2.07 0.92 91.8 95.6%
Paris Saint-Germain 1.99 0.73 87.3 4.4%
Nice 1.28 0.92 81.1 0.0%
Lyon 1.62 0.99 62.8 0.0%
Bordeaux 1.16 1.03 60.1 0.0%
Marseille 1.31 0.97 60.0 0.0%
St Etienne 0.91 0.85 53.3 0.0%
Nantes 0.88 1.09 51.1 0.0%
Stade Rennes 0.85 1.07 47.5 0.0%
Guingamp 1.06 1.12 48.9 0.0%
Lille 0.93 0.98 47.0 0.0%
Toulouse 0.99 0.91 47.3 0.0%
Montpellier 1.14 1.32 42.3 0.0%
Angers 1.00 1.07 42.8 0.0%
Metz 0.89 1.39 41.8 0.0%
Lorient 1.08 1.38 38.0 0.0%
Dijon FCO 1.08 1.18 36.8 0.0%
Caen 0.92 1.41 35.4 0.0%
AS Nancy Lorraine 0.80 0.99 35.0 0.0%
Bastia 0.77 1.16 33.3 0.0%

6.2.3 German Bundesliga

predict_league(league = "Bundesliga", games = full_data,
  chains = model_params, team_codes = team_counts) %>%
  left_join(select(club_rankings, club, exp_offense, exp_defense),
    by = "club") %>%
  arrange(desc(champ_pct)) %>%
  mutate(champ_pct = percent(ifelse(is.na(champ_pct), 0, champ_pct))) %>%
  select(Club = club, Offense = exp_offense, Defense = exp_defense,
    `Expected Points` = sim_points, `Championship Probability` = champ_pct) %>%
  knitr::kable(caption = "Bundesliga Championship Probabilities",
    align = "c", digits = 2)
Table 6.4: Bundesliga Championship Probabilities
Club Offense Defense Expected Points Championship Probability
Bayern Munich 2.13 0.70 80.1 100%
RB Leipzig 1.43 0.87 66.8 0%
TSG Hoffenheim 1.48 0.89 62.6 0%
Borussia Dortmund 1.83 0.91 63.0 0%
Hertha Berlin 1.10 0.91 50.6 0%
Werder Bremen 1.35 1.20 47.9 0%
SC Freiburg 1.09 1.25 47.0 0%
FC Cologne 1.18 0.95 46.6 0%
Borussia Monchengladbach 1.30 0.99 47.6 0%
Schalke 04 1.37 0.88 46.6 0%
Eintracht Frankfurt 0.95 0.87 45.2 0%
Bayer Leverkusen 1.26 1.04 40.2 0%
FC Augsburg 0.99 1.15 37.3 0%
Mainz 1.17 1.15 36.9 0%
VfL Wolfsburg 0.92 1.07 36.5 0%
Hamburg SV 0.96 1.26 36.4 0%
FC Ingolstadt 04 0.95 1.16 32.4 0%
SV Darmstadt 98 0.84 1.29 25.9 0%

6.2.4 Italian Serie A

predict_league(league = "Serie A", games = full_data,
  chains = model_params, team_codes = team_counts) %>%
  left_join(select(club_rankings, club, exp_offense, exp_defense),
    by = "club") %>%
  arrange(desc(champ_pct)) %>%
  mutate(champ_pct = percent(ifelse(is.na(champ_pct), 0, champ_pct))) %>%
  select(Club = club, Offense = exp_offense, Defense = exp_defense,
    `Expected Points` = sim_points, `Championship Probability` = champ_pct) %>%
  knitr::kable(caption = "Serie A Championship Probabilities",
    align = "c", digits = 2)
Table 6.5: Serie A Championship Probabilities
Club Offense Defense Expected Points Championship Probability
Juventus 1.66 0.64 92.0 99.2%
AS Roma 1.75 0.87 82.3 0.7%
Napoli 1.76 0.92 81.7 0.1%
Lazio 1.49 0.92 74.0 0.0%
Atalanta 1.37 0.93 71.0 0.0%
AC Milan 1.24 0.95 64.5 0.0%
Internazionale 1.44 1.13 62.2 0.0%
Fiorentina 1.36 1.06 60.9 0.0%
Torino 1.51 1.14 54.4 0.0%
Sampdoria 1.05 0.99 50.8 0.0%
Udinese 1.08 1.09 48.1 0.0%
Cagliari 1.20 1.32 45.6 0.0%
Chievo Verona 1.01 1.09 45.7 0.0%
Sassuolo 1.15 1.16 43.9 0.0%
Bologna 1.00 1.06 43.0 0.0%
Genoa 0.97 1.29 34.0 0.0%
Empoli 0.75 1.12 33.4 0.0%
Crotone 0.83 1.15 28.9 0.0%
Palermo 0.83 1.37 23.7 0.0%
US Pescara 0.89 1.44 18.5 0.0%

6.2.5 Spanish La Liga

predict_league(league = "La Liga", games = full_data,
  chains = model_params, team_codes = team_counts) %>%
  left_join(select(club_rankings, club, exp_offense, exp_defense),
    by = "club") %>%
  arrange(desc(champ_pct)) %>%
  mutate(champ_pct = percent(ifelse(is.na(champ_pct), 0, champ_pct))) %>%
  select(Club = club, Offense = exp_offense, Defense = exp_defense,
    `Expected Points` = sim_points, `Championship Probability` = champ_pct) %>%
  knitr::kable(caption = "La Liga Championship Probabilities",
    align = "c", digits = 2)
Table 6.6: La Liga Championship Probabilities
Club Offense Defense Expected Points Championship Probability
Barcelona 2.30 0.83 87.9 65.6%
Real Madrid 2.18 0.94 87.6 34.4%
Atletico Madrid 1.43 0.67 77.0 0.0%
Sevilla FC 1.33 0.94 72.8 0.0%
Villarreal 1.15 0.82 67.1 0.0%
Athletic Bilbao 1.21 0.93 66.1 0.0%
Real Sociedad 1.29 1.04 64.9 0.0%
Eibar 1.26 1.03 57.2 0.0%
Espanyol 1.09 0.98 54.8 0.0%
Alavés 0.97 0.86 52.4 0.0%
Celta Vigo 1.28 1.03 48.2 0.0%
Málaga 1.08 1.15 45.1 0.0%
Valencia 1.22 1.25 44.4 0.0%
Las Palmas 1.27 1.32 42.3 0.0%
Real Betis 0.94 1.17 40.2 0.0%
Deportivo La Coruña 1.02 1.19 35.6 0.0%
Leganes 0.83 1.12 33.5 0.0%
Sporting Gijón 0.96 1.35 27.6 0.0%
Granada 0.78 1.40 22.6 0.0%
Osasuna 0.91 1.51 21.8 0.0%

6.3 UEFA Champions League

Simulating the UEFA Champions League is very similar to the process used for simulating the domestic leagues. At each retained iteration of the MCMC chain, I simulate the remainder of the Champions League matches. Because there isn’t a true bracket, and the opponents are drawn randomly before each round, I first define the current match-ups.

matchups <- list(
  c("Real Madrid", "Atletico Madrid"),
  c("AS Monaco", "Juventus")
)

I can then use the predict_ucl function (Appendix D.4) to calculate the probability of each team advancing to each subsequent round.

predict_ucl(matchups = matchups, games = full_data, chains = model_params,
  team_codes = team_counts) %>%
  left_join(select(club_rankings, club, exp_offense, exp_defense),
    by = c("Club" = "club")) %>%
  arrange(desc(Champion)) %>%
  mutate(Quarterfinals = percent(Quarterfinals),
    Semifinals = percent(Semifinals), Final = percent(Final),
    Champion = percent(Champion)) %>%
  select(Club, Offense = exp_offense, Defense = exp_defense, Quarterfinals,
    Semifinals, Final, Champion) %>%
  knitr::kable(caption = "UEFA Champions League Probabilities",
    align = "c", digits = 2)
Table 6.7: UEFA Champions League Probabilities
Club Offense Defense Quarterfinals Semifinals Final Champion
Juventus 1.66 0.64 100% 100% 90.5% 48.7%
Real Madrid 2.18 0.94 100% 100% 94.7% 43.6%
AS Monaco 2.07 0.92 100% 100% 9.5% 5.0%
Atletico Madrid 1.43 0.67 100% 100% 5.3% 2.7%