As the curtain falls on another Premier League season, leaving only sporadic finals and tired internationals, to be followed by the football-free wasteland of summer sport that happens in a year ending in an odd number – it offers a great chance to reflect on and review prediction models.
This is the first part of my analysis, where I’ll look at the performance of my pre-season prediction model. I use a simple shot based model to assess teams, it’s explained here – but to summarise it works out each team’s attacking and defensive strength by applying factors to goals, shots and shots on target, to produce a rating I call “adjusted goals”.
Adjusted goals for and against = 45.0%*Goals+2.8%*Shots+8.4%*Shots-on-target
This isn’t as sophisticated as Expected Goals models, but is relatively simple to calculate and access the data needed (e.g. from the excellent football-data.co.uk). It also performs fairly well for future predictions.
I used the adjusted goals rating for each team at the beginning of the season to simulate the final Premier League table. The promoted teams’ ratings are adjusted for the fact that they’re in a stronger league.
I wrote up my Pre-season prediction here. For the first time I also adjusted for player changes and other know factors that aren’t reflected in retrospective data – such as whether a team is playing in European competition.
Here’s how the model fared.
I’ve compared the performance against Sporting Index mid points pre-season, and also against points last season (other than for promoted teams).
Overall the model performed well, with one exception – Chelsea. Chelsea have proved perplexing for data based models over the last couple of seasons, with total points going from 87 in 2015 down to 50 and then back up to 93 – without too much player turnover.
I did adjust Chelsea’s rating, to account for the fact they weren’t playing in Europe and player and management change – but it clearly wasn’t enough. The market (Sporting Index) priced in a much greater improvement. The key question from this is what enhancements can be made to the model to allow for this variation? Possibilities might include making some allowance for teams’ wage bill or taking more account of new players.
For example, how important was N’Golo Kante to Chelsea’s 43 point increase? Could this have been anticipated pre-season? Certainly he was highly influential to Leicester’s success in the previous season, and data shows he made the most tackles in the league – but the difficulty is objectively assessing how this would translate to an already strong Chelsea midfield. For example, would Idrissa Gueye or Lucas Leiva (the second and third most prolific tacklers in 2015/16) have had a similar effect on Chelsea’s capabilities?
Sometimes good subjective judgement – e.g. knowledge about the state of mind of Chelsea’s players and management in 2015/16, and the capabilities of Conte and Kante in 2016/17 can supplement a data based model.
But the model did identify strong value in Spurs to both win the League and finish in the top 4. Using my Adjusted Goals rating Spurs have outperformed the rest of the league over the last 2 seasons – significantly stronger this season than everyone else, both in attack and defence. I think that this was genuine value in Spurs, with the market taking too much account of the spending power of their other big six rivals.
Adjusted Goals Rating 2016/17
Assessing Tottenham for next season will be tricky. Based on this season I rate them as clearly the best team – but player departures may affect this, and how to take account of playing home matches at Wembley?
The model also identified some value in Liverpool for the top 4, which they managed, but the value wasn’t as clear as it was for Spurs. I made 4 pre-season bets based on the model – Spurs and Liverpool for the title, and Spurs and Liverpool for the top 4, the last two came off.
Otherwise, the model did fairly well. It didn’t expect Sunderland and Middlesbrough to be quite as bad as they were – but neither did the market. The model also under-estimated West Brom.
The pre-season adjustments to the data driven Adjusted Goals ratings were:
Allowances made on top of last season “Adjusted Goals” and squad changes | ||
Team | Addition to rating | Reason |
Chelsea | 0.35 | No European competition & some allowance for possible anomaly last season |
Hull | -0.3 | Current uncertainty |
Leicester | -0.2 | Champions League |
Liverpool | 0.2 | No European competition |
Southampton | -0.2 | Europa League |
West Ham | -0.2 | Europa League |
Interestingly, it was right to make an adjustment in every case – but none of the adjustments were large enough!
So one of the key learns from the analysis is that external factors, not reflected in retrospective data, are significant – but are difficult to allow for effectively. Ideally I need to find a more objective way to allow for these in the model.
Next part will be assessing individual match predictions for 2016/2017.