twitter
email

  • Home
  • Books
    • Soccernomics (UK)
    • Soccernomics 2018 World Cup edition (US)
    • Edge
    • Soccernomics
    • Twelve Yards
    • Money and Soccer
  • The Blog
  • Speaking Engagements
  • Predictions Model
  • About Us
  • Contact Us



The Soccernomics predictions model

31, 01, 19
by Stefan Szymanski
0 Comment

This season Soccernomics has been posting weekly predictions for games across 25 European leagues – you can see the predictions here. This is the work of Guy Wilkinson, who completed his PhD at the University of Michigan in 2017 and is now an assistant professor at the University of Stirling.

This post is about the methodology behind the research, and how it relates to the evolving field of soccer analytics. If you interested in the full details of how Guy’s predictions are generated, you can read the relevant chapter from his PhD here.

The basic idea of the model is that every player on the field contributes to the result, and that therefore an index of individual ability should be constructed by crediting players, based on the results of each game played. That’s like saying that a drug should be rated according the outcome for a patient’s health when they take the drug. The efficacy of a drug is measured by the sum of outcomes for individual patients who take it; the efficacy of a player is the sum of outcomes for games in which they played.

Once you know the efficacy of each player, you can predict the outcome of the next game they are expected to play in- if the sum of efficacies for team A is greater than that of team B, then team A is predicted to win.

It’s a simple enough idea, and the difficulty lies in

  1. collecting the data on games, players and results
  2.  estimating the model

The first step is an exercise in data-scraping – all this material is now on the web. The second stage is difficult because of the scale of the problem. To estimate the model which best describes the contribution of each player, you need to run an optimization routine. These are easy with present day computing power if you have a few thousand players and a few hundred games, but Guy was working with over 66,000 games and 133,000 team line-ups for these games, covering the 25 leagues over the last decade.

The details of the estimation method are to be found in the paper (also a shout-out to Professor Eric Schwartz who gave Guy a lot help with the process). Once the individual player estimates are generated, the forecasts can be generated, using the assumption that next week’s line-up will be the same as this week’s.

How good are the results? You can see for yourself, but the percentage of correctly predicted results is typically in the range of 40-50%. To put this in context, if you picked the results at random you would get the correct result (win, draw, loss) 33% of the time, while the bookmakers, whose predictions are best available (otherwise they go out of business) tend to get it right about 55% of the time. So the model is somewhere in the middle.

Clearly the model could get better, but would be competitive with most alternative models. A model that gets the results right 50% of the time instead of 45% of the time is better, but in betting markets I doubt this margin would be enough to make large profits taking account of the bookmaker’s margin and after tax.

It’s also not difficult to see how it could be improved. At the moment relies only on the names of the players on the field. It doesn’t distinguish any individual characteristics, and it doesn’t weight more recent games more highly. It doesn’t attempt to predict which players will appear in the next game. And there are a whole host of other technicalities in the estimation procedure which might yield small improvements.

But this is a very different type of modeling from what we see in the soccer analytics world today. That work seems largely focused on trying to make predictions based on individual actions on the field. The question that seldom seems to be answered in that work is “how well does the modeling predict the outcome of games”. That, I think, has to be the ultimate yardstick for the usefulness of soccer analytics.

 

 

 

About the Author
Social Share

Leave a Reply Cancel reply

*
*

captcha *

Soccernomics on Twitter

Tweets by @SoccernomicsLtd

Contact Us

ben@soccernomics-agency.com

From the Blog

  • Abolition of the transfer system
  • Forecasting the final table for the Premier League 19/20 season: Revisited
  • Forecasting the final table for the Premier League 19/20 season
  • Covid-19 and football club insolvency
  • Soccer Analytics update

Soccernomics on Twitter

  • Twitter feed loading

Opinion we like

Anders Red

The Swiss Ramble

Roger Pielke, Jnr

The Sports Economist

John Beech

Zach Slaton

Football Economy

Soccer Analysts

Soccermetrics

A Beautiful Numbers Game

Zonal Marking

The Wages of Wins Journal

Int. Journal of Sport Finance

Rod Fort: Sports Monsters

Data we like

11v11

Football Observatory

RSSSF

European Football Statistics

Football Data

Football Squads

Neil Brown

Soccerbase

MUFPLC

League Managers

Manchester City Analytics

In The Media

Data Analysis at Big Clubs

Becks’ MLS Impact in The Sun

How Liverpool Misread Moneyball

On Racism in Football

NBC’S Premier League Rights Deal

Soccernomics on Baseball Site Honus