I have seen the World Football Elo rating system referred to a couple of times during the World Cup. People seem to think that it is much better than the Fifa ranking system, although I am not sure that is a high bar. Nevertheless, I became curious, so I decided to look it up. That was easier said than done, and this post became much longer than planned, but here is the result.
According to the football Elo ratings website and Wikipedia, the Elo rating system was developed by Arpad Elo to rank chess players, and adapted to football by Bob Runyan in 1997. The website is clear in giving the formulas used to compute the actual rankings, but lacks information about the motivation behind the choice of parameter values. A key idea is that it is a 0-sum game where points are distributed on the basis of both the actual result and the strength difference between the teams. Here is how it works:
- Start from some rating R0. After a match, the new ratings are calculated as:
R = R0 + K G (W – We)
- Results in more important matches count more, parameterized by K:
K is the weight constant for the tournament played:
60 for World Cup finals;
50 for continental championship finals and major intercontinental tournaments;
40 for World Cup and continental qualifiers and major tournaments;
30 for all other tournaments;
20 for friendly matches.
K governs both the weights given to different matches, and how much the outcome of the last match counts relative to the previous rating.
- Goal difference also counts, and is taken into account in the following way:
G = 1 if the match is a draw or is won by one goal
G = 3/2 if the match is won by two goals
G = (11+N)/8 if the match is won by goal difference N and N is three or more
- “W is the result of the game (1 for a win, 0.5 for a draw, and 0 for a loss).”
- “We is the expected result (win expectancy), either from the chart or the following formula: We = 1 / (10(-dr/400) + 1), dr equals the difference in ratings plus 100 points for a team playing at home.”
The last one gave me a bit of a hard time. After consulting the book Who’s #1?: The Science of Rating and Ranking (2012) by Langville and Meyer and the paper The predictive power of ranking systems in
association football (ungated) (2013) by Lasek Szlávik and Bhulai, I learnt the following:
We is the expected win probability and results from assumptions about the performance distributions of the two teams, which themselves depend on the prior rating. To calculate this probability one needs to assume performance distributions for the two teams. It appears that to make it a little simpler, what is often assumed is a distribution of the performance differential directly. The chess system for some reason assumes that the performance difference is distributed as a logistic function (to the base 10) of the difference in ratings. Since that function is 1 / (10-x + 1), with x=dr, one can see that we are approaching the expression we are aiming at. But where does the constant 400 come from? Also from the chess rating system. The value of that constant governs the variance of distribution. If it is “really small” there is little variance and the player/team with the higher rating will almost always win, while if it is “really big” the lower rated team will be lucky and win more often. (In the limit as it approaches infinity, any match will be a 50-50 draw, no matter the rating difference.) The home team advantage of 100 rating points looks a little like it is taken out of thin air.
Langville and Mayer praises the Elo system for the flexibility it gives in the choice of the K parameter and the scale (variance) parameter in the win probability distribution, since it means users can tailor-make the system to their particular purpose, but then it does seem a little peculiar that the football system has simply copied most of the chess system.
Without any documentation, the parameter values seem like a haphazard mix of some things kept from the chess rating system and some just made up. That is not to criticize the creator Runyan or people who use it, and apparently the system is performing quite well, but it would be nice to know if there was some specific thought behind the choices made.
For those interested to learn more, I can recommend the two references above.