Infobesity: The Future of Sports Analytics

By the start of this upcoming NBA season, up to 20 teams will be employing wearable GPS devices created by the company Catapult, which will track player biometric data in real time. The championship Warriors were one of a number of teams to have used such devices already last season. The device, or variants on the theme, have already been employed in other major American sports, not to mention internationally (Catapult got started catering to Australian Rules football, presumably measuring statistically just how fucking weird that sports is).

These devices weigh only 3.5 ounces and measure dozens of aspects of player performance. Speed, acceleration, force, even measuring whether individual limbs are performing better or worse than the others. It is likely that the future will see increasing use of these devices (and related information-gathering technology like the now ubiquitous SportVU cameras), both for monitoring player healthy and safety and for trying to derive more accurate statistical measures of efficiency and performance.

There will be an utterly unprecedented volume of data suddenly dumped on individuals tasked by teams with trying to understand it. The same problem that has been increasingly faced by medical researchers in the increasing breadth and availability of electronic health record databases and data scientists mining the unimaginable depths of the Internet.

Putting aside concerns about the Orwellian nature of such constant monitoring and the privacy concerns that go along with the maintenance of these databases, there will also be predictable pushback from (for lack of a better term) “old school” sports fans. If you follow sports at all, you are probably familiar with the argument: the idea that all these fancy analytics and statistical measures of efficiency are just useless numbers, and no substitute for actually watching the game and forming your opinions from that.

To a certain extent, this argument is valid. There is no substitute for watching the game, both from an enjoyment perspective and from a practical one. After all, this miss:

isn’t “the same” as this one (the first one in this bizarrely spiteful compilation):

Similarly, while they are both worth the same number of points, should we really treat this:

the same as this:

This line of argument is further fueled by those situations where metrics like super-duper-advanced-adjusted-plus-minus-per-game-per-36-minutes-4.2 say that Robin Lopez is better than Lebron James. Ridiculous! Many sports fans will fume. And I agree.

If anything, though, these arguments are not one against the idea of analytics, they are for better and more varied analytics. The problem, I think, is twofold: the “black box” nature of many of these scores (how is PER actually calculated?) and their interpretation. Our natural inclination is to want to look at a set of numbers and call whoever has the highest number “the best”, without taking into account the context those numbers were calculated in or the natural variation that necessarily exists across that range of numbers.

Really, the idea of statistical analysis is pretty simple and intuitive. The fact is, no matter how much we enjoy watching and forming opinions on games, it simply isn’t possible for us to process all of that information. Sure, we can watch a game and figure out with a good degree of certainty and accuracy that Lebron James is the best basketball player in the world and that Andrea Bargnani is just awful.

Those are the extreme cases. The extreme cases are easy. But what if you are trying to figure out whether or not Kyle O’Quinn is better than Evan Turner?

Think about it another way, using a super contrived and idiotic example: say you are trying to figure out whether electrical engineers are taller than philosophy majors. You could line up every single engineer and every single philosophy major in a row and look at them and try to figure it out just by looking … but that’s too much information. You can’t reasonable figure it out based on thousands of people, whose individual heights will widely vary. You will have no problem with the extreme cases, like Manute Bol the Electrical Engineer or Wee Man the Socrates Fan.

So how would you, intuitively, go about figuring it out? Likely, your reaction would be to simply figure out the average height of the engineers and compare it to the average height of the philosophy majors. Congratulations! You have just conducted a statistical analysis. Seriously. That is all it is, at heart. Same thing with all analytics and metrics and analyses of sports (or anything else). It is just the acknowledgement of the idea that there is too much information out there for us to be able to understand, so we create easy ways to summarize that data for simple comparisons.

Sure, the more esoteric analytical methods can be difficult to follow, but they are based on this exact method. Comparing the average tendency of large groups of individuals. There are little tweaks here and there (for example, taking into account the variability of the data so you put less weight on a difference between, say, 12.2 and 12.5 then you would between 12.2 and 14.2; or, to go back to basketball, taking into account that 3 point shots are worth more than 2 point shots, etc.), but ultimately they all reflect this same basic and intuitive purpose.

It’s the same thing that even these old school, anti-analytics sports fans already do, anyway. They say that Kevin Durant is better than Thabo Sefalosha because one of them scores 30 points per game and the other scores much, much less. But, as was said, just looking at points per game without watching the games doesn’t give you the full story. And that’s exactly the purpose of advanced analytics. The purpose of PER, Win Shares, Real-Adjusted Plus-Minus, and all of that other nonsense is simply to try and take into account the known structure and reality of the game when calculating these averages. Effective field goal percentage takes into account the fact that different shots are worth more points than others. Eventually, the SportVU cameras will enable us to calculate field goal percentage and adjust it directly for how close the individual is to the basket (closer shots are more likely to go in for everyone, so being marginally better at long range shots is worth more than a comparable close range improvement, for example).

This is exactly where this vaguely dystopian world of constant player tracking data is headed. It is an attempt to saturate us with information until we are able to figure out which bits are and are not salient. The problem, of course, is making this determination. There are no easy answers because, ultimately, our understanding of how aspects of a game like basketball are reflected in this various oblique physical measurements is incredibly poor. And, in a way, this quest for data has taken us in the opposite direction: the purpose of statistics is to reduce the amount of information into something palatable and manageable, but now we are increasing by untold orders of magnitude the amount of information that we need to sift through.

In the long run, though, it will only increase our ability to consume the game from every possible angle. And though it may come to dominate the way we think about basketball and the way teams are managed by owners and executive, it will never be a substitute for actually watching the games. Because there is no way to quantify just how entertaining this is:

Comments are closed.