Leicester City’s season-start title odds of 5000 to 1 have been liberally used by the media to illustrate the magnitude of their success. But do these represent the genuine likelihood of their achievement? Certainly, the very fact that bookies were prepared to offer this price shows that they rated the likelihood as 5000 to 1 or smaller (given that they usually incorporate a margin). However, in reality, these odds were probably set at the “it will never happen” default price, for supporters wanting a fun punt on their team – so weren’t subject to rigorous risk analysis.
So how do we assess the likelihood of a long-shot like Leicester City winning the league? The usual starting point for predicting future likelihood is past performance – this applies just as much to sporting markets as it does to financial markets. And past performance has been extremely reliable for predicting the Premier League (which I’ll discuss later).
But, there’s an inherent problem using limited past data to understand the chances of a long-shot like Leicester because, by their nature, long-shots don’t happen very often. Say Leicester’s chances really were 5000 to 1; we only have 23 full seasons of Premier League data and even if we extended back to the old First Division that only gives us 127 seasons in total to analyse. Nowhere near long enough to confirm a 100 to 1 likelihood let alone 5000 to 1.
Another problem with using past data is that prior conditions are never quite the same as conditions now – for example, tactics, fitness levels, medical skills and finances all change over time. This means that we can’t necessarily expect everything to be repeatable. The football dynamics of the Premier League era are generally assumed to be distinctive to the old First Division due to the financial paradigm shift. And it’s an established Premier League wisdom that only the elite rich clubs can win the title (or even compete for a top-4 place) backed up by a body of research demonstrating a high correlation between wages and performance.
And, a glance at the Premier League tables since its inauguration in 1992 confirms how predictable it’s been, particularly in recent years. Prior to this season, 22 of 23 titles have been shared between 4 elite clubs; Manchester United, Chelsea, Arsenal and Manchester City. The 23rd was won by Blackburn Rovers, in the third PL season – seemingly benefitting from Jack Walker’s first mover advantage at bankrolling a football team. Even traditional giants such as Liverpool, Spurs and Everton haven’t won the Premier League title. All this helps form expectations (and betting odds).
Looking at past Premier League results in more detail, the correlation between consecutive season’s points is striking. The graph shows that in recent years, correlation between teams’ points in consecutive seasons has been between 80% and 90% – it’s been incredibly predictable. But look at this season – it’s plunged to 51%, even lower than the League’s early years.
This all illustrates how easy it is to fall in to the trap of thinking that things will stay the same. This also applies to modelers as well as Bookmakers. There are many fantastic football modelers that regularly publish their results on Twitter – that use a variety of methods to predict future outcomes. But they’re usually calibrated using previous years’ data. I use a model, which relies heavily on long-term past performance to evaluate teams. In my model at the start of the season only 5 of 5000 simulations resulted in Leicester winning the league, so 1000 to 1, not 5000 to 1 – but still a ridiculously long shot. And even in January, when Leicester were second (only 2 points behind Arsenal) most models (including mine) had Leicester at around 4% to win the league [as demonstrated in @cchappas insightful comparison of modelers’ results http://statsbomb.com/2016/01/a-compilation-of-epl-model-predictions-after-round-2038/].
In the past only the elite teams had managed to stay the Premier League course – so why would Leicester be able compete with the big boys over the remainder of the season? So, back to the question, were Leicester’s chances genuinely 5000 to 1 (or even 1000 to 1)? I don’t think they were that unlikely (easy to say in hindsight!) although they were definitely an enormous long shot. But I believe that, more importantly, there are lessons to learn from Leicester’s victory that can be applied not just to sports prediction, but other predictive disciplines and risk assessment (such as financial companies reserving for “unlikely” events) – these are:
- Genuinely unlikely events are difficult to quantify using limited data.
- In a changing world, don’t assume that things will stay the same, just because they have over the last few years. Conversely, extending the data set (e.g. to include old First Division data) might improve analysis
- Small changes to the way things work might make a big difference to a particular outcome. Examples of changing circumstances that may (or may not) have increased Leicester’s (and other smaller teams’) chances are:
- Improved PL prize money, meaning that even smaller teams can afford and retain top quality players
- Increased use of analytics to inform performance and recruitment
- Improved medical understanding and technology
- A combination of seemingly fortuitous factors working together can have a significant effect on chances – e.g. lack of injuries, refereeing decisions, under-performance of other teams
So will I change my model to take account of this season? No, not yet. As the chart above shows, this season has been peculiar compared with recent seasons. A few teams including Leicester (also Chelsea and Spurs) are unrecognizable from last season. But looking back at metrics, even in hindsight, I still can’t find anything that identifies Leicester’s long-term over performance or Chelsea’s long-term under performance. I think (for the reasons above) that there’s a reasonable likelihood that Premier League clubs outside the elite will continue to close the gap – but it’ll take a few years to properly test this view.