Walk Like a Sabermetrician: Playoff Probabilities

I am not a fan of the two wildcard playoff format, but my protestations were not considered and so it will soon be upon us. One thing I meant to do look at eventually was how the second wildcard would impact the probability of teams of a certain presumed strength winning in the playoffs. I’ve never gotten around to it, so I’m pretty much forced to look into it now or forever hold my peace.

The model that I will use to discuss this is admittedly simplified. It makes assumptions that are clearly simpler than reality:

* Teams have a constant strength from game to game. Even the strictest believer in baseball games as an expression of random chance disagrees with this as the identity of the starting pitcher obviously matters.

* Game outcomes are completely independent of one another. While game outcomes are largely independent, the argument for independence is weakened in the playoffs where decisions about how to manage the game (particularly pitcher usage) are clearly influenced by the status of the series.

* Home field advantage is uniform for all teams.

With these assumptions (along with others that have gone unstated), it is easy to construct a model of a playoff series. If one ignores home field advantage, the binomial distribution makes it very easy. That is the way I’ve approached these problems in the past, but I’ve decided to consider home field advantage and make the computations a tad more arduous this time (Of course, as you incorporate HFA into analysis of playoff series, you realize how inconsequential it is barring some intangible psychological force).

I have set up an (excessively) clumsy spreadsheet to do the math. The spreadsheet allows you to enter the playoff teams in order of seeding (i.e. A1 is the AL #1 seed down to N5, the NL’s second wildcard, with the National League assumed to have home field advantage--if the AL does, just enter the AL teams as N1-5 and the NL teams as A1-5) and enter a strength rating for each team (in the form of a win ratio as I use here). It then calculates the probability of each potential playoff series and their outcomes. The probability of each series outcome is figured on the other tabs, and if you are so inclined you could alter the home field pattern for each round. You can also enter the average W% for the home team. I’ve set this to .573, which is the World Series average for 1922-2008. The regular season average is usually around .540, so I think this is a fairly generous assumption in terms of strength of HFA.

The yellow cells are where the user should input custom data. I’ve not provided full documentation for every step as I doubt anyone will actually use this spreadsheet, but if you do and have any questions I will be happy to expound on the documentation. The spreadsheet can be accessed here (change html at the end to xls to download in Excel format).

In the post linked above, I looked at the winning percentages for all playoff teams and theoretical second wildcards for 1995-2010. I’m going to use these averages to set up a theoretical “typical” playoff scenario, and see how the probability of each team advancing to certain round varies with and without the second wildcard team. Using the actual W%s without adjustment to represent the strength of the teams is wrong for a couple of reasons, most notably that some regression is needed to estimate true quality and that no adjustment has been made for the unbalanced schedule, which is a big concern when performing interleague comparisons (and a smaller but still present concern for intraleague comparisons). However, exaggerating the differences in quality between teams will produce a liberal estimate of the differences between playoff formats, which may not be terrible for the sake of discussion. Also note that I’ve not made any adjustment for the fact that under the old format, the wildcard could be matched up with the #2 seed if the #1 seed came from their division.

Here are those average W%s and the resulting CTR (simply W%/(1 - W%) in this case) for each seed:

To rehash that earlier post, much of my antipathy towards the second wildcard and the fetishization of division titles is on display here. The AL wildcard has typically been one of the strongest playoff teams, while the wildcard in both leagues has a better average record than the third division winner. If MLB is hellbent on allowing a fifth team into the playoffs, then I would propose making the playoff between the two qualifying teams with the worst record rather than the two that failed to win their division. This complaint is water under the bridge at this point, though.

First, let’s run through the playoffs with a standard home field advantage (I’m using .543) and just one wildcard:

Now the same scenario, but with a special increased playoff HFA of .573. The last column is the marginal number of World Series victories per 1000 seasons relative to the .543 HFA assumption:

Remember, I’ve given the National League World Series HFA, so the NL teams get more of a boost from assuming a stronger HFA than do the AL teams. The NL picks up 9.6 World Series victories per 1000 seasons as a result of this stronger HFA assumption.

More relevant to the point of this post, here is the effect of adding the second wildcard to the mix. Again, let me emphasize that I’m assuming that there is no difference in expected W% from game-to-game. This is particularly relevant for the wildcard teams as one of the purported benefits of the extra playoff is that it will put the winner at a disadvantage entering the Division Series in terms of pitcher availability, since they will clearly have an incentive to use their best available pitching in the wildcard game. I’m not saying that you shouldn’t attempt to model this and other game-to-game factors when assessing playoff probabilities--but doing so complicates the exercise considerably. Instead, think of what I’m doing here as simply an analysis of the format itself rather than the consequences of that format upon the teams. These probabilities are based on the .573 HFA assumption. The last column is the marginal number of World Series victories per 1000 seasons relative to the single wildcard:

Keeping in mind that this analysis is not a true comparison of the previous format to the current one as it doesn’t account for the wildcard being unable to face a divisional opponent in the Division Series, this actually makes me feel a little better about the two wildcard formats, as it increases the probability of the best teams (#1 and #2 seeds, although the AL wildcard is generally in that class as well) winning the World Series. #1 seeds get an easier Division Series matchup by getting the second wildcard roughly 40% of the time. Obviously the #5 seed benefits the most, going from out of the picture to being a long shot. Equally obviously, the first wildcards take a huge hit.

The interesting result is the decrease in W% for the #3 seed, the reason for which is not immediately obvious. The cause is the increased likelihood of facing the #1 seed in the LCS (and the increased likelihood of facing the other league’s best teams in the World Series).

Still, the effects on the division winners’ odds are relatively small, not much different than the difference in assuming that the typical player HFA is .03 wins greater. The brunt of the impact is felt by the first wildcard.