## Tuesday, July 21, 2009

### On the World Series Home Field Advantage

A week ago, the American League once again defeated the Neanderthal League (*) in the All-Star Game, securing home field advantage for the World Series. The "This time it counts" mantra about the game is premised on the notion that home field advantage is a significant thing to have (or at least the hope that TV viewers will believe that it is). So it is only natural to look back through history and see how home teams have fared in the World Series.

Let's start off with some theoretical calculations based on a few assumptions. Assume that the two teams are evenly matched, that there is no home field advantage, and that the outcome of each game is independent of any other. Therefore, each team has a 50% chance to win each game, and we can calculate the expected frequency of a 4, 5, 6, or 7 game series using the geometric distribution (I apologize for this digression as many of you know this better than I do):

P(x+r game series for one team) = C(x + r - 1, x)*(1 - p)^x*p^r

Where x = number of failures before r successes, p = probability of success, and C is the combination function

In this case, our successes are victories by the eventual series winner (always r = 4), x is losses by the eventual series loser (0-3), and p = .5.

C(x + r -1, x) is the number of different of distinct sets of wins and losses that can occur in the series. C(3, 0) is used for a four-game series, and is equal to 1--the only string of wins and losses that can produce a four-game series is WWWW. The formula for combinations is:

C(n, x) = n!/(x!(n-x)!)

So C(4, 1), the number of different combinations that can produce a five-game series, is 4!/(1!(4-1)!) = 4*3*2*1/(1*(3*2*1)) = 4. You can confirm this, as there are four possible strings (LWWWW, WLWWW, WWLWW, and WWWLW) that produce a five-game series. In fact, you can logically work out all the combinations fairly easily without the math for this application since we are only dealing with a seven-game series.

In a five-game series, the fifth game must be a win (same for the sixth and seventh games of six and seven-game series, respectively). So the victor can lose game 1, game 2, game 3, or game 4.

In a six-game series, the victor can lose games:
12, 13, 14, 15, 23, 24, 25, 34, 35, 45 = 10 combinations

And in a seven-game series:
123, 124, 125, 126, 134, 135, 136, 145, 146, 156, 234, 235, 236, 245, 246, 256, 345, 346, 356, 456 = 20 combinations

Anyway, doing all the math (and then doubling since we have only considered this from the perspective of one team), the theoretical probability of a given series length is:
4 = 12.5%
5 = 25%
6 = 31.25%
7 = 31.25%

So theoretically (since WS home field sites are on a 12-345-56 pattern), in 43.75% of World Series, the number of home games will be equal. 25% of the time, the team with the home field disadvantage on paper will actually play more home games, and 31.25% of the time the team with home field advantage on paper will get to benefit from it--if and only if there is a game seven.

We'll get back to some theoretical stuff later, but let's look at the actual empirical World Series results. I considered all World Series from 1922-2008 (1922 is when the seven-game series returned permanently) with the following exceptions:

* 1922 and 1923--both Giants/Yankees series, in 1922 they shared the Polo Grounds, and in 1923 they didn't follow the 12-345-67 pattern
* 1943-45--in the war years, a 123-4567 format was used to cut down on travel (and in 1944, the Cardinals and Browns shared Sportsman's Park, which would have made it unusual in any case)

First, let's look at the empirical proportions of series by length:

As you can see, the empirical and theoretical don't actually track particularly well. I'm not going to discuss this phenomenon in-depth here, but it is something to keep in mind when we delve back into theoretical stuff at the end of the post. The assumptions are all faulty to some degree or another--the teams are not evenly matched, the results of the games are not truly independent (Even if you start with the premise that this is largely true during the regular season, one could conjecture that it is less true in a short series as behavior will be highly influenced by the series status--teams down 3-1 behave a lot differently than teams up 3-1 or tied 2-2. This is a classic case of what Bill James called the law of competitive balance.), we have not considered home field advantage, etc. For some more reading on this topic, check out Phil Birnbaum's post at Sabermetric Research and the Baseball Research Journal piece referenced there ("Relative Team Strengths in the World Series" by Alexander E. Cassuto and Franklin Lowenthal, BRJ #35).

Getting back to the actual data, we see what I will call a reverse home field advantage (a 5-game series, in which the "road" team actually hosts 3 games and plays two on the road) 20% of the time, no home field advantage (4 or 6 game series) 41% of the time, and a true home field advantage (7-game series) 40% of the time.

How often does the team with paper home field advantage actually win the Series? Let's break it down by series length:

This is pretty interesting, IMO. The paper home team wins 57% of the series, which seems impressive, but their strongest advantage comes when there is no home field advantage (61%), followed by reverse home fields (56%), and just 53% when there is a true home field.

Of course, the sample sizes aren't great when it's broken down like this, and it is unsurprising that the proportion of series won is less in seven games. What is interesting, though, is that the on-paper home team has such an advantage, and even in series in which they don't really benefit from it in the raw count. Are the first two games at home that much of an advantage, or is there something else going on here?

I'll leave that as a rhetorical question. There are a lot of factors in play here--the sample sizes aren't that large, we have not accounted for the quality of specific teams (which is tough to do in any case because of the fact they play in different leagues which were until recently truly separate in the regular season), etc.--and I don't really want to speculate about the influence of these myriad factors.

I did take a look at the regular season W% of the World Series participants, but as I just said, that's not a particularly telling measure, as it is possible that the leagues were unbalanced in any given year and that a lower W% in one could actually be indicative of a higher-quality team. I checked it anyway, and found that, for the group of series defined throughout this post, the winners had a mean W% of .616 with a median of .616, while the losers had a mean W% of .612 with a median of .610.

Teams with on-paper home field advantage had a mean W% of .615 and a median of .611; teams without on-paper home field advantage had a mean W% of .613 and a median of .610. There's no evidence of any sort of fluky quality difference, at least to the extent that W% captures quality. In terms of W%, the World Series winners, losers, on-paper home teams, and on-paper road teams are all essentially equal.

Let's also break down the series outcomes by on-paper home field advantage coupled with which team had a superior record. These figures will exclude the 1949 and 1958 series as the participants had identical regular season records:

So the team with the worse record has actually triumphed in one more series than their higher W% opponents (for reference, the mean W% for teams with the better record is .635 with a median of .636; the mean W% for teams with the lesser record is .593 with a median of .597, again excluding 1949 and 1958). Teams with home field advantage have been very successful, but those with worse records and home field even more so than teams which had both advantages.

Let's break down the home field W% by each game in the series:

As you can see, games 1, 2, and 6, which are home games for the team with on paper HFA, are the ones with the highest home W%. In game 7, the home field advantage is not particularly large. Those who make a big deal out of WS HFA are fond of pointing out that the home team has won the last eight game 7s, but they were just 2-6 in the previous eight, and I doubt there is anything significant going on. (Although I should point out that the period does correspond to the introduction of the designated hitter in WS play, even if I don't believe that has a significant effect (**)) Between 1952 and 1979 (which includes the 2-6 period mentioned above), road teams were 13-3 in game sevens.

One important caveat on comparing the game-by-game numbers is that as the series extends past the minimum of four games, we should expect to see less of a difference as mismatched teams are eliminated. It doesn't explain why the home field advantages are much smaller in games 3, 4, and 5, though, as there's no reason to suspect that the on-paper road teams are of substantially different quality than the on-paper home teams.

The overall World Series home W% is .573, high compared to the regular season average which is generally somewhere in the neighborhood of .540. Let's use this figure in place of a default assumption of a 50% outcome in each game to model the outcome of a series. Using the combinations detailed above, we can find the probability of any series outcome given these assumptions. For example, the probability of a 4-2 series in which the home team wins games 1, 2, 4, 5, and 6 would be .573^5*.427 (five home wins and one road win). Under these assumptions, we get these probabilities for the possible series outcomes (in this table, "home" refers to the teams with on-paper HFA and "road" to their opponents):

Even using the sample home W% of .573, we only expect the team with HFA to win 52.3% of the time. In fact, teams with HFA have won 56.8% of the series (46 of 79). What is the probability that this could have happened by chance, assuming that 52.3% is the true probability and that each series is independent of the others? It's 12.1%. Even if we assume that there is no true home field advantage at all, and each team will win 50% of the time, there is still a 5.7% chance that 46 out of 79 would be observed.

How about the individual game results (home teams are 268-200, .573)? If the true home field W% was .540 as it generally is for the regular season (and given all the other necessary assumptions for use of the binomial distribution), the probability of 268 successes in 468 trials is 7.1%.

So I am decidedly uncomfortable drawing any conclusions about the strength of home field advantage (on the series or game level) in the World Series from the sample data. The actual results show a stronger home field advantage than we might have expected, but not to such an extent that we must conclude that regular season assumptions about home field advantage do not apply.

It's certainly a good thing to have home field advantage for the World Series, or any game for that matter, and I'm not going to try to argue that basing home field on which league won the All-Star Game is anything but a gimmick. However, given that the previous method of determining home field was simply to alternate it yearly between the leagues, I don't think there's any real harm being done by this approach. If you really wanted to reward the stronger league, the overall interleague record would be far more likely to successfully identify the stronger league, but I don't consider the whole matter worth getting exercised over.

I have posted a Google Spreadsheet with the sequence of games in each series if you are interested. The first group of columns marked G1 through G7 indicate whether the eventual WS champion won the game (W) or lost (L). The second group of columns indicate whether the home team in that particular game won (H) or whether the road team won (R).

Finally, I'll close with some useless trivia. You probably know that there have been three series in which the home team won each game (1987 Twins over Cardinals, 1991 Twins over Braves, and 2001 Diamondbacks over Yankees). The most road games ever won in a series (that I considered for this study) is five, which has happened seven times--1926 Cardinals over Yankees, 1934 Cardinals over Tigers, 1952 Yankees over Dodgers, 1968 Tigers over Cardinals, 1972 A's over Reds, 1979 Pirates over Orioles, and 1996 Yankees over Braves.

P.S. After I wrote this post, but before I published it, Sky Andrecheck published a piece on the importance of World Seires HFA at Baseball Analysts. It addresses an interesting question that I will paraphrase as "Since the Dodgers have such a large lead in the playoff race, is the single most important regular season game left on their schedule (with regards to winning the World Series) the All-Star Game?"

I'll let you read Andrecheck's article to find the answer, but there's one minor point which overlaps with this post worth commenting on. Andrecheck notes that the playoff HFA has been higher than the regular season historically, and reasons that this has to do with the home team being the better team more often than not. While this is true for the league playoffs, there's no reason to suspect it to be true for the World Series in which home field alternated between leagues (even if the All-Star result method of determining home field has the effect of giving on-paper home field to a better team more often than not, home field has not been decided by that rule nearly often enough to have any impact on the results, and the amount of noise involved would be incredible in any event). I don't disagree with the notion that we can't say with any certainty that the World Series HFA is of different magnitude than the regular season HFA, but the better team having more home games leaves a lot to be desired as an explanation (again, for the World Series, not the league playoffs).

He also gives the probability of the team with home field winning as 51.26%, assuming that the home W% in the World Series is 54%. I didn't provide this figure in my post, as I approached the question from the standpoint of "Even if .570 is the true HW%...", but I am in agreement with it (naturally, as it is true by definition given the assumptions we both made).

In the comments to Andrecheck's article, there was a link to Cyril Morong's look at WS HFA, published in 2006, which means that I pretty much repeated here what he had done. However, we disagree on the probability of the on-paper home field team winning the series in six games (and thus of course we also disagree on the probability of them winning the series period). I am pretty sure that this is due to a faulty six-game series sequence he used.

(*) Sorry, I can't help it. I SHOULD take the high ground, but the sniveling "It's not REAL baseball" is way too much for me to handle. I'm weak like that.

(**) There was no DH in the World Series until 1978, at which point it was introduced on an alternating year basis. So in 1978, 1980, 1982, etc. the DH was used in all World Series games, and was not used at all in 1979, 1981, 1983, etc. Starting in 1986, the home team's rules were used.

So while the run of Game Seven home wins begins with the Cardinals in 1982 and also includes the Royals in 1985, in those series the road team's rule was being used in Game 7. All of the game sevens that follow, of course, used the home team's rule.