The perils of football rating systems

For those that like football and are of a nerdy persuasion, building a football ratings model is a noble pursuit. Intuitively, football (like many other sports) appears a numerical puzzle that can easily be solved – certain patterns, like the percentage of home wins and goals per game, repeat year after year – and the same teams always seem to finish up in similar positions.

Added to this, there are plenty of practical applications for rating systems – ranging from simply being able to say “I told you so” to fantasy football competitions and assessing value in the betting market. These first two uses are pretty benign but the third can be dangerous, and subject to plenty of perils and pitfalls if used unwarily.

Football is not so easily tamed. Firstly, many outcomes are genuinely random (even if there appears to be an emerging pattern). Secondly, many factors that seem to affect a football match are really difficult to quantify – such as how teams play in different game states, which I discussed here. Some of the pitfalls of using rating systems to assess value in the market are:

Simple or complex?

Rating systems can range from the simple (using a few easily available variables) to the highly complex, utilising the masses of data now available about every action carried out on a football pitch.

For my projections and ratings I use a simple model based on goals, shots on target and shots (discussed here). This does correlate well from season to season, but takes no account of factors such as injuries, suspensions or tactical style. Complex models can allow for these factors, but the more complex a model the more difficult it is to understand the results it produces – particularly if it’s using factors that are difficult to quantify.

Short term or long term?

Many ratings systems simply take the average of values over a number of matches. But what number? Going back over the last 6-8 matches has the advantage of allowing for current form, but can be quite volatile and heavily dependent on luck. It also has the undesirable property that a team’s rating can be heavily influenced by a single result in the short period – e.g. A 5-0 win, if this result disappears from the averaging sequence from one week to the next, a team’s rating is likely to go down even if they won the last match.

I tend to prefer averaging over the long term (for my adjusted-goals rating, over the current and previous season). This is less volatile but reacts slowly to genuine changes in form – however for many leagues teams’ long term performance stays relatively stable. For most of the top leagues (e.g. English premier league) teams correlate well from one season to the next. Using a long term average also better takes advantage of contrarian betting – I.e. Betting against weight of money in the market (which may artificially increase the odds, due to the market overreacting to short-term factors). When back-testing ratings systems I’ve found that long term ratings usually out-perform short term – but I haven’t found a way of objectively identifying genuine short-term performance improvements.

Be wary of being too good

It’s reasonabley easy to build a rating system that gets close to the market. And it’s tempting to believe that by allowing for numerous factors, and getting close to the market for most outcomes – that where value is identified it’s due to the model being able to outsmart the market. That’s generally not the case. In fact it’s probably doing the unhappy opposite of identifying where the model has not adjusted enough – I.e simply finding poor value.

Converting to match probabilities

So you’ve got your team ratings spot on – how do you convert them to match probabilities? There are a number of ways to achieve this – e.g. Regression analysis on home wins, draws and away wins or applying a Poisson distribution to individual scores. But the danger is that any inaccuracy in this translation will be exploited by the market. For example – overestimating home field advantage would cause significant losses for EPL matches so far this season, even if the actual team ratings were a true reflection of ability.

The market changes

In the rare event that a rating system outperforms the market, it’s unlikely it will last. With so much information and competition the market is near perfect, meaning any advantage is unlikely to last. This also needs to be borne in mind when analysing a model’s effectiveness using past data.


Having said all this – it might be tempting to think that building a football ratings model is pointless. Surely in a world where so much information is available and data analytics is big business – any attempt to build something that might outperform what’s already out there is futile?

But that’s definitely not the case. Firstly building a ratings model is fun (for nerdy types like me). Secondly – any model, even if far from perfect, helps understanding and learning. And, thirdly – as Leicester, and particularly Chelsea, have shown us this season – the market, and most existing ratings models, can get things very very wrong.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s