At 10h11 am, we have some great football enthusiasts, including our Data Science Director, William Brojanigo. The prospect of a hot summer and football made him want to use his skills to make a prognosis on team scores during this tournament.
A clever mix of science and sport !
We used a study published in 2016 on the R-bloggers website to make a prognosis of the winning teams for the 2016 UEFA European Championship.
The principle is therefore to use the information given by the bookmakers – William used the odds of 22 bookmakers (dated 22 March 2018) – from the 32 participating national teams. After removing the profit margin of the bookmarkers, we grouped them together and ranked them according to a common average and then ended up deducting the teams’ capacities using the Bradley-Terry model for duet comparisons.
The average winning probability per team
For the most curious, we will give you more details about the steps :
• Step 1 : It is estimated that the odds come from odds that could be described as “true” to which the 22 bookmakers had on average added 21.61% of profits. After a quick calculation, we were able to obtain the “true” odds, called logs_odds (the logarithm of the odds ratio) to continue our estimates.
• Step 2 : Use of the Bradley-Terry approach
Using the teams’ capabilities, we were able to set up a simulation of the tournament process by determining which team will win and continue until the final. In order to ensure better accuracy in our estimates, we have developed 1 million tournament simulations.
What is Bradley-Terry’s approach ?
The Bradley-Terry approach is quite similar to the Elo rating approach – popular in sports circles – and is used to describe possible outcomes when two elements of a pair are compared.
In this case, it is based on the probability that team A will defeat team B given the capabilities and strengths attributed to both teams.
To summarize the formula :
Probability (Team A beats Team B) = (Team A skills) / (Team A skills + Team B skills)
Graph showing the probabilities of a team A against a team B
The application :
The proposed approach to estimating team skills is the “randomized block design” approach. In this methodology, we have two factors to consider :
• An experimental factor : fixed effects will be used. → This factor will be the bookmaker because we have to determine if there are repetitive differences between the bookmakers’ prognoses.
• A blocking factor : random effects will be used.
→ This factor will be the team because it is a known source in terms of variability.
The odds will therefore be modelled with a specific random effect model for teams and a specific fixed effect model for bookmakers.
And for what results ?
After testing 1 million tournament executions we estimated the 3 teams with the best chances of winning the 2018 World Cup:
• Germany with 15.62%, big winner of 2014- Brazil with 15.05%, leader in the number of wins- France with 12.42%, big favourite at UEFA 2016
We were also able to define the most likely final:
• Germany – Brazil with (6.21%)
Probabilities by group
Achim Zeileis, Christoph Leitner, Kurt Hornik (2016-15), Predictive bookmaker consensus model for the UEFA Euro 2016, Working Papers in Economics and Statistics.
Leitner, Christoph and Zeileis, Achim and Hornik, Kurt (2008) Who is Going to Win the EURO 2008? A Statistical Investigation of Bookmakers Odds. Research Report Series / Department of Statistics and Mathematics, 65. Department of Statistics and Mathematics, WU Vienna University of Economics and Business, Vienna.