If you hope to grasp why modern soccer looks the way it does, or the long strides we’ve made recently in understanding how it actually functions, it helps to know about what’s been happening at one of the world’s oldest universities, in Belgium.
That’s where you’ll find the Sports Analytics Lab at the Catholic University of Leuven, headed up by Jesse Davis, a Wisconsinite computer science professor. Davis grew up going to basketball and football games at the University of Wisconsin-Madison and didn’t discover soccer until college, during the 2002 World Cup. When he was hired in Leuven in 2010 to research machine learning, data mining and artificial intelligence, a band of sports-besotted colleagues brought him back to soccer.
Before long, Davis was supervising a stable of post-docs, PhD and master’s students working on soccer data. The richness and complexity of the data lent itself well to the study of AI. The work they produced, and made available to anyone through open-source analytics tools, substantially advanced the science behind the sport, and changed the way some clubs thought about playing.
It may also serve as an example of how funding university research can benefit the public, including the businesses working within the field being studied; a potential parable for the value of academia at a time when it is being squeezed from all sides.
In the early days of the analytics movement in sports, it was broadly believed that soccer didn’t lend itself very well to advanced statistical analysis because it was too fluid. Unlike baseball, or basketball, or gridiron football, it couldn’t be broken down very easily into a series of discrete actions that could be counted and assigned some sort of value. Its most measurable action, shots, and therefore goals, make up a tiny fraction of the events in a given game, presenting a problem for quantifying each player’s contributions – especially in the many positions where players tend not to shoot at all.
But while soccer was slow to adapt and adopt analytics, it got there eventually. Most big clubs now have an extensive data department, and there’s now a disproportionately large genre of (eminently readable) books on this fairly esoteric subject.
The Sports Analytics Lab published its findings on the optimal areas for taking long shots or asking whether, in some situations, it’s more efficient to boot the ball long and out of bounds than to build out of the back. Some of those papers carried inscrutably academic-y titles like “A Bayesian Approach to In-Game Win Probability” or “Analyzing Learned Markov Decision Processes Using Model Checking for Providing Tactical Advice in Professional Soccer.”
Wisely, they also published a blog that broke all of it down in layperson’s terms.
This fresh research led to collaborations with data analysts at clubs such as Red Bull Leipzig, Club Brugge and the German and United States federations. The lab also worked with its local pro club, Oud-Heverlee Leuven and the Belgian federation.
But what’s curious is that a decade and a half on, Davis and his team, which numbers about 10 at any given time, are still doing industry-leading and paradigm-altering research, like its recent work fine-tuning how ball possession is valued.
Now that the sport, at the top end, has fully embraced analytics and baked it into everything it does, you would expect it to outpace and then sideline the outsiders, as has happened in other sports. But it didn’t.
“Elite sport, and not just soccer, has an intense focus on what comes next,” says Davis. “This is particularly true because careers are so fleeting both for players and staff. Consequently, the fact that you may not be around tomorrow does not foster the desire to take risks on projects that, A, may or may not work out or, B, will yield something useful but not in the next six-to-nine months.”
There is innovative work being done within soccer clubs that the outside world doesn’t get to see, because what would be the point of sharing all that hard-won insight? The incentives of professional sports strains against the scientific process, which values taking risks and tinkering endlessly with the design of experiments, none of which might yield anything of use. What’s more, it requires highly skilled practitioners, who can be tricky and pricey to recruit. The payoff of that investment may be limited. And if it arrives at all, the output of that work may not necessarily help a team win games, especially in the short term.
Meanwhile, most of the low-hanging soccer analytics fruit – like shot value, or which types of passes produce the most danger – has already been picked. What remains are far more complicated problems like tracking data and how to make sense of it.
after newsletter promotion
You may find, for instance, that while expected goal models have become pretty good at quantifying and tabulating the chances a team created over the course of a game, they do not work well in putting a number on a certain striker’s finishing ability because of biases in the training data.
Yes. Sure. Great. But now what? What are Brentford (or his potential new club Manchester United) supposed to do with the knowledge that Bryan Mbeumo’s Premier League-leading xG overperformance of +7.7 – that is, Mbeumo’s expected goals from the quality of his scoring chances was 12.3, but he actually scored 20 times this past season – doesn’t actually suggest that he was the best or most efficient finisher in the Premier League?
What’s more, when a club does turn up a useful tidbit, they have to find a way to not only implement that finding, but to track it over the long term. That means building some sort of system to accommodate it, which entails data engineering and software programming. On the club side, this kind of work can take up much, or most, of the labor in analytics work.
“For some of the deep learning models to work with tracking data takes months to code for exceptional programmers,” says Davis. “Building and maintaining this is a big upfront cost that does not yield immediate wins. This is followed by a cost to maintain the infrastructure.”
Academics, on the other hand, have less time pressure and can move on to some new idea if a project doesn’t work out or there is simply no more new knowledge to be gained from it. “I don’t have to worry about setting up data pipelines, building interactive dashboards, processing things in real time, etc,” says Davis.
The research itself is the point. The understanding that issues from it is the end, not the means. And then everybody else benefits from this intellectual progress.
There may be a useful lesson in this for how a federal government, say, may consider the value of investing in scientific inquiry.
-
Leander Schaerlaeckens is at work on a book about the United States men’s national soccer team, out in 2026. He teaches at Marist University.