Too much information!

One of the key necessities for predicting a match outcome is determining which information is relevant and which isn’t.  Nowadays there are abundant sources of football data, which can help better model the likelihood of a particular event happening. It’s certainly vastly better than 20 or 30 years ago, when the league table was pretty much all you had to go on. However, too much information can often be a hindrance because it makes it difficult to focus on what’s important.

It’s common for bookmakers, or football stats sites, to tweet snippets of “helpful” information about a team’s recent record. For example – team A has conceded first in 8 of its last 10 matches, or 11 out of 15 of team B’s home matches have seen more than 2 goals. Is this information helpful or relevant? It might be – but if I was to use it for betting there are 3 questions I’d need to answer first, these are:

  1. Are there reasons for the recent form trend – e.g. tactics, opponent difficulty, injuries?
  2. If so, will these factors still apply for the next match?
  3. Do the match odds properly take account of these factors?

To answer these takes significant research, or expertise in the particular teams – but without the answers there’s no way of assessing value. This article is a quick review of where data (that I’ve looked at for football analysis) is available, and how useful I think it is.

  1. Football-data.co.uk

This is still my favourite site for football analysis, and the source of data for all my team ratings. It contains csv files of matches for 11 different European countries, for complete league seasons going back to 1993 – containing plentiful data for each match such as goals, shots, shots on target, halftime score etc. as well as odds from numerous bookmakers. This is a great source for analysing factors that indicate a team’s strength and comparing against the market.

Good points

  • It’s free!
  • Detailed historical and current match data
  • Available in a format for easy analysis
  • Simple to navigate and advertising not overly intrusive

Shortfalls

  • No player data
  • Only includes league data
  1. soccerstats.com

Another good source of free match and team data, which allows simple analysis of scores and trends across many leagues.

Good points

  • Free
  • Detailed historical and current match data
  • Simple to navigate and no advertising

Shortfalls

  • No player data
  • Not as easy to access data for analysis – but the site summarises data well.
  1. whoscored.com

This is the most comprehensive free football data site. Containing detailed team and player information. It’s not the easiest to navigate, but the “detailed” player tab allows analysis of items such as shot zones and pass types. Individual match reports are good with impressive visualisations. It also provides good match previews and reports on injured players.

Good points

  • Free
  • Detailed player data
  • Detailed historical and current match data
  • Superb match reports and previews

Shortfalls

  • It’s not easy to quickly get to the information you need
  • Intrusive advertising.
  • Data not easily accessible for analysis
  1. fantasyfootballscout.co.uk

As the name suggests – focuses on fantasy football, but the members site (currently £12.50 for the season) provides vast amounts of player and team data. It’s only for the Premier League – but the big advantage over whoscored.com is that you can create your own data tables, and it includes more data points. Another useful feature of this site is that it includes articles with great insight about team and player form and tactics (far better than I’ve ever seen on betting focused sites).

Good points

  • Detailed player data
  • Ability to create bespoke data tables
  • Good tactical insight

Shortfalls

  • Paid for (although relatively cheap)
  • Premier League only
  1. footballformlabs.com

This is a subscription data base (currently £200 p.a. with offer code). But I’ve recently taken up a 14-day free trial. This site covers many leagues and gives access to what must be an extensive database of historical match results. It allows the user to generate queries of how teams play in different situations – e.g. at home against bottom 6 opposition.

Much of this information is already accessible on Football-data.co.uk (after a bit of work) or www.soccerstats.com. But there are a couple of features I don’t think are available elsewhere. Firstly, player analysis, which allows queries about how a team has performed with certain players in a team – so you can assess the historical impact of different formations, which is useful. But player data only includes information such as points per game when they have and haven’t played – not individual player data such as goals, shots, tackles etc.

Secondly, in-play analysis allows sophisticated analysis of match outcomes based on a specific match position – e.g. top 6 team losing 1-0 to bottom 6 team after 30 minutes. This could be helpful for in-play betting.

This site is expensive, but I can see it being helpful for someone with the time to carry out the additional research necessary to validate trends and analysis identified.

Good points

  • In play queries
  • Player queries – allowing assessment of impact of different players included or missing.

Shortfalls

  • Expensive
  • League matches only
  • Much of data is available freely elsewhere.

Other sites

Other sources worth mentioning are www.transfermarkt.co.uk which includes good player information, often more easily accessible than whoscored.com. Squawka.com is similar to Whoscored.com, but I never use it because I find Whoscored comprehensively better. Also, oddsportal.com is another good source of historical odds.

Final thoughts

Frustratingly there isn’t a single source of information available to carry out robust analysis of match outcome likelihoods. However, with a bit of work, the sources above can support a decent assessment. But sadly, not even Whoscored.com provides the full set of data that’s captured within the industry – so, for example, it’s impossible to build anything other than rudimentary expected goals models.

This makes it increasingly difficult to outperform predictive models maintained by industry participants such as bookmakers, who not only have access to significant data analysis resources, but also full proprietary data.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s