World Cup | Mysite

Blind betting on the World Cup? No!

By Zimo Wang, 25th Dec 2022
Translated by Chenyang Lou, 21st Oct 2023

Everyone was very excited about the just-concluded World Cup, and everyone was cheering for their favorite team. For some people, betting is also an indispensable activity for the World Cup, which involves countless knowledge. While believing that a certain team can win, we can also use this to understand some rational thinking so that we can use it when needed four years later.

In this article, we will focus on some basic probability models to see the mathematical knowledge behind football matches. Of course, at the same time, we also found some problems with the rules when looking for information. If we have time later, we will discuss it again.

Note: If it's too long to read, you can skip directly to the third section or the end of the article to see the conclusion.

Section 1 Binomial Distribution

To discuss the probability of winning or losing a game, we can first use the simplest binomial distribution to calculate the win or loss of both sides of the game. The simplest form of binomial distribution is as follows:

Let's define these letters first. As a probability experiment, we certainly cannot do it only once, otherwise, the conclusion drawn will definitely be inaccurate. Therefore, we will repeat the experiment n times, and the conditions of these n experiments are all the same and do not interfere with each other. We call an event that occurs in an experiment A, and the probability of A occurring in a certain experiment is p. X also means the total number of times A appears in n experiments, so it can be seen that X's value range can be between 0 and n (closed interval).

Here is a simple example - flipping a coin. Suppose we flip a coin 10 times, then n=10. Among them, we record the number of times the number is up, so event A = number up. We know that when flipping a coin each time, the probability of the number being up is 50%, so p=50%. Suppose after we flip 10 times, there are 5 times when the number is up (of course it is an ideal situation), then X=5. And X=k means that in 10-coin flips, the number of times when the number is up is exactly 5 times, which is a proposition or event. We can simply draw a distribution diagram for the above problem.

Figure 1-Probability distribution of tossing a coin 10 times, heads up

This graph shows the probability density P (Just take it as probability) on the y-axis and k on the x-axis. Remember that we set x=5, which is equivalent to setting k=5. From the graph, we can see that if we flip a coin ten times, the probability of getting 5 heads (k=5) is the highest, which is consistent with common sense. What is the probability of getting all heads in ten-coin flips? From the graph, we can see that it is actually very small, occurring only when k=10. We can also calculate the probability density when k=10 using the formula:

The probability of this event is indeed quite small. To summarize, the binomial distribution tells us the probability of an event A occurring a certain number of times in n independent trials. Therefore, this graph is only affected by two variables: the first is n, which is the total number of times I flip a coin, and p, which is the probability that the number on the coin will be face up each time it is flipped. Therefore, we can write this distribution as:

The “B” in the formula stands for “Binomial”, and it is called the binomial distribution because its formula is exactly like that of the binomial theorem.

Section 2 Poisson Distribution

The Poisson distribution is built on the basis of the aforementioned binomial distribution, but it is also the best tool for predicting ball games. First, we can still imagine that we are flipping a coin, but this time we are flipping more, say 1000 times. We normalize these 1000 times, so each time we call it 1/n. For example, the first time we flip a coin is 1/1000, the second time is 2/1000, and so on until the last time is 1000/1000. We assume that the probability of the front side is λ/n, as a result, the probability of the back side is 1-λ/n. After 1000 times of trial (n=1000), we get the binomial distribution:

When the number of trials gets bigger, or n→∞, according to the law of large numbers, we could simply understand that the accuracy of the experiment increased.

This equation makes a simplification to take better limits. We see that the middle and right part of the right hand side can become respectively when n→∞:

Therefore, the binomial distribution approach:

If you still remember, when we were building up this binomial distribution, we sat the probability of each trial as o p=λ/n, therefore we get λ=pn, which is a pivotal conclusion. Due to that, we could take a look at the probability density graph when 1000 trials are made, in that case, p=0.5, n=1000. So μ (expected value)= 0.5*1000=500.

Figure 2-Poisson distribution with 1000 coin flips

Just like common sense, we probably get 500 times of front side in 1000 flips.

Section 3 Playing Soccer!

3.1 Simplest Poisson distribution

Finally, we have entered the football section. For the probability simulation of football matches, we calculate using the simplest Poisson distribution. First, write out a Poisson distribution:

This distribution describes the number of goals scored by two teams in a match. Specifically, we assume that team i and team j are playing against each other, and in the match, team i scored a total of k goals. For example, in the final between Argentina and France, if Argentina is team i, then k=2 (in regular time). The only external variable for the Poisson distribution is λ. Their paper wrote by Chater et. al. (2021) provides a formula for calculating this as follows:

where α is a correction constant that does not need to be considered too much. The focus is on the two values ri and rj, which represent the ability of team i and team j. The original paper uses the Elo index for calculation, which is actually the ranking of the two teams, but Chater et al. in their paper highlight that they did not use the FIFA rankings because FIFA is not that accurate (FIFA lol). So ri/(ri+rj) means that if team i is very strong, then ri will be very high, so the overall value of the equation will be larger, λ will be larger, and the probability of team i scoring a goal will be higher. This is of course nuanced because he takes the opponent's ability into account as well, taking that team's fraction of the total ability. Also, it can be seen that for the sum of the two teams' λ, there is:

Therefore, α can be understood as the total score of the entire game, and the Poisson distribution model distributes the score by using the level of the two teams as weights. For example, if a game is 3:4, then α is 7, and the final score is positively correlated with the relative level of the teams. Chater et al. hypothesized that the fluctuation range of α would not be too large and would be relatively stable in the first two rounds of the group stage because teams would not adopt extreme tactics. Therefore, the final analysis fitting result was that α=2.5156 for the group stage. With this data, we can obtain the Poisson distribution of each game’s score as long as we know the Elo index ability of each team. Since this is α for the group stage, we can use it to calculate the distribution image of the group stage. For example, to calculate Argentina’s upset in one of their group stage matches, where Saudi Arabia won 2-1 against Argentina. After checking, Argentina and Saudi Arabia’s Elo index were 2143 and 1643 respectively. Therefore, we can calculate the Poisson distribution of goals scored by Argentina and Saudi Arabia separately.

Figure 3-Poisson distribution for Argentina (left) Saudi Arabia (right) at the group stage

Therefore, we see that the probability of Saudi Arabia getting 2 points is really small, only 20%. Therefore, according to probability, the most likely score for this game should be 1:1 or 1:0, but not 1:2. This only shows that football is always more fun than probability.

3.2 Adjusted Poisson distribution

But wait, do you really believe that in normal circumstances, without any upsets, Argentina would draw 1-1 with Saudi Arabia? That would be too unfair to Argentina! Therefore, in the end, we can make some simple adjustments to the calculation of λ. Because we see that the Elo index of the teams entering this World Cup is between 1500 and over 2100, the actual difference is not that big. Therefore, we need to amplify λ a bit. Of course, the authors of the paper also considered this point and provided a method for adjusting λ based on the Elo index：

where g is an exponential correction constant of 3.7581. min(r) is the smallest Elo index among these teams, which is the host Qatar with 1578, while max(r) is the opposite, which is Argentina with the highest Elo index of 2143. Then we calculate the adjusted Elo index for Argentina and Saudi Arabia:

With these two data, we can recalculate their λ. We can see that the difference in their adjusted Elo index is already quite large, so the final Poisson distribution will also be significantly different.

Figure 4-Poisson distribution for Argentina (left) Saudi Arabia (right) at the group stage after adjustment

Ah! That’s more like it! So from the probability distribution, we can see that the most likely score for this game should be Argentina 2:0 Saudi Arabia. Moreover, the probability of Saudi Arabia getting 0 points is about 75%, and the probability of getting 2 points is less than 5%. Therefore, we can only say that probability calculation is just probability and cannot determine the outcome of a game!

Section 4 Conclusion

Finally, we can briefly introduce how to predict the score of a match using the above method:

1. Find the Elo index of both sides.

2. Obtain the relative Elo index by using the adjustment formula.

3. Substitute to obtain λ and draw a Poisson distribution graph.

4. You can see the scoring probability of each team.

Of course, the above method only considers the relative ability of the teams and does not take into account situational factors such as tactics. Therefore, gambling has risks, and sometimes just feel it!

PS: This article does not discriminate against any team or insult any team. Thank you for reading!

Reference

1. Chater, M., Arrondel, L., Gayant, J. P., & Laslier, J. F. (2021). Fixing match-fixing: Optimal schedules to promote competitiveness. European Journal of Operational Research, 294(2), 673-683.

2. World Football Elo Ratings: 2022 World Cup. https://www.eloratings.net/2022_World_Cup.