: When Skill Isn’t Enough
You’re watching your team dominate possession, double the number of shots… and still lose. Is it just bad luck?
Fans blame referees. Players blame “off days.” Coaches mention “momentum.” But what if we told you that randomness—not talent or tactics—might be a major hidden variable in sports outcomes?
This post dives deep into how luck influences sports, how we can attempt to quantify randomness using data, and how data science helps us separate skill from chance.
So, as always, here’s a quick summary of what we’ll go through today:
- Defining luck in sports
- Measuring luck
- Case study
- Famous randomness moments
- What if we could remove luck?
- Final Thoughts
Defining Luck in Sports
This might be controversial, as different people might define it differently and all interpretations would be equally acceptable. Here’s mine: luck in sports is about variance and uncertainty.
In other terms, we could say luck is all the variance in outcomes not explained by skill.
Now, for the fellow data scientists, another way of saying it: luck is the residual noise our models can’t explain nor predict appropriately (the model could be a football match, for example). Here are some examples:
- An empty-goal shot hitting the post instead of going in.
- A tennis net cord that changes the ball direction.
- A controversial VAR decision.
- A coin toss win in cricket or American football.
Luck is everywhere, I’m not discovering anything new here. But can we measure it?
Measuring Luck
We could measure luck in many ways, but we’ll visit three going from basic to advanced.
Regression Residuals
We usually focus on modeling the expected outcomes of an event: hwo many goals will a team score, which will be the point difference between two NBA teams…
No perfect model exists and it’s unrealistic to aim for a 100%-accuracy model, we all know that. But it’s precisely that difference, what separates our model from a perfect one, what we can define as regression residuals.
Let’s see a very simple example: we want to predict the final score of a football (soccer) match. We use metrics like xG, possession %, home advantage, player metrics… And our model predicts the home team will score 3.1 goals and the visitor’s scoreboard will show a 1.2 (obviously, we’d have to round them because goals are integers in real matches).
Yet the final result is 1-0 (instead of 3.1-1.2 or the rounded 3-1). This noise, the difference between the outcome and our prediction, is the luck component we’re talking about.
The goal will always be for our models to reduce this luck component (error), but we could also use it to rank teams by overperformance vs expected, thus seeing which teams are more affected by luck (based on our model).
Monte Carlo Method
Of course, MC had to appear on this post. I already have a post digging deeper into it (well, more specifically into Markov Chain Monte Carlo) but I’ll introduce it anyway.
The Monte Carlo method or simulations consists in using sampling numbers repeatedly to obtain numerical results in the form of the likelihood of a range of results of occurring.
Basically, it’s used to estimate or approximate the possible outcomes or distribution of an uncertain event.
To keep on with our Sports examples, let’s say a basketball player shoots accurately 75% from the free-throw line. With this percentage, we could simulate 10,000 seasons supposing every player keeps the same skill level and generating match outcomes stochastically.
With the results, we could compare the skill-based predicted outcomes with the simulated distributions. If we see the team’s actual FT% record lies outside the 95% of the simulation range, then that’s probably luck (good or bad depending on the extreme they lie in).
Bayesian Inference
By far my favorite way to measure luck because of Bayesian models’ ability to separate underlying skill from noisy performance.
Suppose you’re in a football scouting team, and you’re checking a very young striker from the best team in the local Norwegian league. You’re particularly interested in his goal conversion, because that’s what your team needs, and you see that he scored 9 goals in the last 10 games. Is he elite? Or lucky?
With a Bayesian prior (e.g., average conversion rate = 15%), we update our belief after each match and we end up having a posterior distribution showing whether his performance is sustainably above average or a fluke.
If you’d like to get into the topic of Bayesian Inference, I wrote a post trying to predict last season’s Champions League using these methods: https://towardsdatascience.com/using-bayesian-modeling-to-predict-the-champions-league-8ebb069006ba/
Case Study
Let’s get our hands dirty.
The scenario is the next one: we have a round-robin season between 6 teams where each team played each other twice (home and away), each match generated expected goals (xG) for both teams and the actual goals were sampled from a Poisson distribution around xG:
Home | Away | xG Home | xG Away | Goals Home | Goals Away |
---|---|---|---|---|---|
Team A | Team B | 1.65 | 1.36 | 2 | 0 |
Team B | Team A | 1.87 | 1.73 | 0 | 2 |
Team A | Team C | 1.36 | 1.16 | 1 | 1 |
Team C | Team A | 1.00 | 1.59 | 0 | 1 |
Team A | Team D | 1.31 | 1.38 | 2 | 1 |
Keeping up where we left in the previous section, let’s estimate the true goal-scoring ability of each team and see how much their actual performance diverges from it — which we’ll interpret as luck or variance.
We’ll use a Bayesian Poisson model:
- Let λₜ be the latent goal-scoring rate for each team.
- Then our prior is λₜ ∼ Gamma(α,β)
- And we assume the Goals ∼ Poisson(λₜ), updating beliefs about λₜ using the actual goals scored across matches.
λₜ | data ∼ Gamma(α+total goals, β+total matches)
Right, now we need to decide our values for α and β:
- My initial belief (without looking at any data) is that most teams score around 2 goals per match. I also know that in a Gamma distribution, the mean is computed using α/β.
- But I’m not very confident about it, so I want the standard deviation to be relatively high, above 1 goal certainly. Again, in a Gamma distribution, the standard deviation is computed from √α/β.
Resolving the simple equations that emerge from these reasonings, we find that α=2 and β=1 are probably good prior assumptions.
With that, if we run our model, we get the next results:
Team | Games Played | Total Goals | Posterior Mean (λ) | Posterior Std | Observed Mean | Luck (Obs – Post) |
---|---|---|---|---|---|---|
Team A | 10 | 14 | 1.45 | 0.36 | 1.40 | −0.05 |
Team D | 10 | 13 | 1.36 | 0.35 | 1.30 | −0.06 |
Team E | 10 | 12 | 1.27 | 0.34 | 1.20 | −0.07 |
Team F | 10 | 10 | 1.09 | 0.31 | 1.00 | −0.09 |
Team B | 10 | 9 | 1.00 | 0.30 | 0.90 | −0.10 |
Team C | 10 | 9 | 1.00 | 0.30 | 0.90 | −0.10 |
How do we interpret them?
- All teams slightly underperformed their posterior expectations — common in short seasons due to variance.
- Team B and Team C had the biggest negative “luck” gap: their actual scoring was 0.10 goals per game lower than the Bayesian estimate.
- Team A was closest to its predicted strength — the most “neutral luck” team.
This was a fake example using fake data, but I bet you can already sense its power.
Let’s now check some historical randomness moments in the world of sports.
Famous Randomness Moments
Any NBA fan remembers the 2016 Finals. It’s game 7, Cleveland play at Warriors’, and they’re tied at 89 with less than a minute left. Kyrie Irving faces Stephen Curry and hits a memorable, clutch 3. Then, the Cavaliers win the Finals.
Was this skill or luck? Kyrie is a top player, and probably a good shooter too. But with the opposition he had, the time and scoreboard pressure… We simply can’t know which one was it.
Moving now to football, we focus now on the 2019 Champions League semis, Liverpool vs Barcelona. This one is personally hurtful. Barça won the first leg at home 3-0, but lost 4-0 at Liverpool in the second leg, giving the reds the option to advance to the final.
Liverpool’s overperformance? Or an statistical anomaly?
One last example: NFL coin toss OT wins. The entire playoff outcomes are decided by a 50/50 simple scenario where the coin (luck) has all the power to decide.
What if we could remove luck?
Can we remove luck? The answer is a clear NO.
Yet, why are so many of us trying to? For professionals it’s clear: this uncertainty affects performance. The more control we can have over everything, the more we can optimize our methods and strategies.
More certainty (less luck), means more money.
And we’re rightfully doing so: luck isn’t removable but we can diminish it. That’s why we build complex xG models, or we build betting models with probabilistic reasoning.
But sports are meant to be unpredictable. That’s what makes them thrilling for the spectator. Most wouldn’t watch a game if we already knew the result.
Final Thoughts
Today we had the opportunity to talk about the role of luck in sports, which is massive. Understanding it could help fans avoid overreacting. But it could also help scouting and team management, or inform smarter betting or fantasy league decisions.
All in all, we must know that the best team doesn’t always win, but data can tell us how often they should have.