In case you missed the first 3 parts of “Playoffs are a Crapshoot”, here they are:
In the previous installment, we estimated the Bradley-Terry rating for the playoff teams. While these are the best estimates we can make using only the 2019 head-to-head results, they overstate the strength of the strongest teams. This is because of the phenomenon (often discussed here with respect to player results) of regression to the mean.
Baseball win records are the result of skill and luck. Before we can intelligently discuss regression to the mean, we need to be somewhat more careful about what we mean by this.
Many years ago, I was hired by the government to help them quantify luck versus skill in online poker. The research in that case was interesting, but even more interesting was just asking people I knew what they thought about the question. Almost uniformly, when asked, people thought that poker was 80-90 percent skill, although they weren’t able to really justify this feeling in any obvious way.
Justice Scalia, in my favorite dissenting Supreme Court opinion of all time, PGA Tour v. Martin, expressed the distinction of luck vs. skill in professional golf this way:
In …[the majority decision], the Court first finds that the effects of the change are “mitigated” by the fact that in the game of golf, weather, a “lucky bounce,” and “pure chance” provide different conditions for each competitor and individual ability may not “be the sole determinant of the outcome.” I guess that is why those who follow professional golfing consider Jack Nicklaus the luckiest golfer of all time, only to be challenged of late by the phenomenal luck of Tiger Woods. The Court’s empiricism is unpersuasive.
It may seem counterintuitive that there is anything such as luck at all. What does it even mean? The easiest way (for me) to think about it is as follows. Imagine we cloned the Braves (or any other team) 29 times. Now have those 30 teams play a 162 game schedule against each other. This is baseball, so in every game one of these teams will win. Because the teams are equally skillful by construction, the winner must be determined by something else. This is what we called luck, and we don’t care what it actually is. But in every game, there is an even chance that either of the team wins because of it. (I’m ignoring home field advantage here… just imagine that we flip a coin before the game to decide who the home team is.) When we do this, we certainly don’t expect each team to go exactly 81-81, even though no team is better than any other team. When we flip a coin 162 times, sometimes we’ll get 90 heads. Sometimes we’ll get 75. We can in fact quantify exactly how likely each level of wins is. The graph of those wins looks like this:
About 19 percent of teams will win 80-82 games, and about 5 percent of teams will win 70 or less, with an equal number winning 92 or more. The fundamental insight is this: if actual records exhibited this distribution, they would be indistinguishable from a situation in which every game was random – we could just as well flip a coin as play the game for all that “skill” means.
Let’s be very clear, though about what this doesn’t mean. It doesn’t mean that baseball players aren’t highly skilled. It simply means that they are pretty equally skilled. This is the big lesson I learned in the online poker exercise. People want to win when they play online poker and there are various rooms with people at various skill levels. Players sort themselves into these rooms: if someone has too much success, people move to other rooms and they shift around until they are playing people of roughly the same skill level as them. At that point, the observed results are essentially random.
Chess is thought of as a game with no luck components at all. But what if players of equal skill play one another? Who wins? Do they draw every game? No! I am a terrible chess player, but I can get my winning percentage up to 50 percent by finding someone as bad as I am. But there are limits to this self selection. At the highest level, some chess players are more skillful than everyone else. That doesn’t mean they win every game. (The reasons for this are not philosophically clear, but a simple one is that skill is variable, depending on rest, health, concentration level and any number of other factors.)
At the highest level, better teams win more games than the coin flip model, and worse teams lose more. This is really not saying anything more that there are games in which the probability of a particular team winning is more than 50% and that this amount is due to some underlying talent of the team (as opposed to something like home team advantage.) The higher this probability, and the more often it happens, the less and less the overall distribution will look like the coin-flip distribution above.
As it turns out we can measure the actual dispersion of MLB results. Last year, the distribution of wins was considerably flatter than the random distribution shown above. This is because some teams are considerably better than others. And the difference in the shapes allows us to actually quantify the relative contributions of skill and luck.
If we use the last 4 years as indicative of the overall dispersion of wins, we find that the standard deviation of overall team performance is 13.2 games. (This is actually higher than the dispersion observed in the last twenty years.) This means, among other things, that about 68 percent of teams in any given year will win between 68 and 94 games, and 95 percent of teams will win between 56 and 107 games. If it weren’t for last year’s results, in which 4 teams had over 100 wins and 4 teams had over 100 losses, this figure would have been considerably lower.
In any case, we know that over a 162 season, the variance induced by pure luck (the coin-flip variation) is about 6.3 games. So that suggests that the variance induced by skill is 13.2-6.3, or 6.9 games. Thus, in the last four years, the outcome of the baseball season has been slightly more skill than luck. (This assumes luck is independent of skill.)
Now that we have decomposed baseball into luck and skill, we are ready to adjust for regression to the mean. That’s in the next installment.