## Monday, September 13, 2010

### The Poor Man's Guide to the Pennant Race

There are a number of different statistical tools at one's disposal to aid in handicapping a pennant race based on the position of the teams at any given time. On the most basic level, we have games behind and magic number, two gauges that are understood by even the most stat-averse baseball fan. On the other side of the spectrum, there are several outlets that offer post-season probability estimates, of varying degrees of complexity and different inputs.

Suppose that you are caught somewhere between these two approaches; you are a sabermetrically aware fan who wants some indicators that you can figure with just the standings and a spreadsheet, eschewing the need for Monte Carlo simulation or a comparably complex tool. I have introduced (actually, rearranging more traditional approaches would be a better description) a few approaches on this blog over the last year or so. In this post, I'll summarize them and offer a quick look at the current standings.

What do these simple tools have in common other than not requiring advanced math to compute? The biggest thread is that because they keep it simple, they can only deal with comparisons between two teams, which makes them limited for races that include three or more teams. Another thing is that they rely on what I will call Game Outcomes Outstanding (GO). A game outcome is simply a win or a loss, but they include your opponent. For example, a team one game behind needs two positive game outcomes to move into first--a win for them and a loss for their opponent. Of course, if they are playing head-to-head, this could be accomplished in just one game, but for the purposes of this post, I still consider that to be two game outcomes. That's another thing the tools here have in common--they assume that teams never play head-to-head.

(1) The first tool is what I called "true GB", even though I didn't like the name. I'll change it to Effective GB here. Traditional games behind only considers a team's position relative to the division leader; it doesn't take into accounts the other non-leader teams that are in front of the team in question. To account for this, John Dewan proposed "GBsum", which sums up all of the deficits a team faces. If the Alphas trail the Bravos by 4 games and the Charlies by 3 games, their GBsum is 7.

I proposed Effective GB because GBsum would have us believe that being in the Alphas' position is just as bad as being seven games behind one opponent? But is that really true? On an fan's emotional level, I think most people would gladly choose to be in third place, four out of first and three out of second, then to be in second, seven games out.

I would argue that there is a simple and accurate explanation for this preference. The Alphas in theory could get into first place by winning four games, having the Bravos lose four and the Charlies lose three--eleven positive game outcomes. But in order to make up a seven-game deficit against one opponent, you need fourteen positive game outcomes--seven wins and seven opponent losses.

So Effective GB takes this into account by figuring the games behind the leader, which is the number of wins needed, and adding one-half of the games behind the other teams (which accounts for the opponent losses needed). The math works out so that:

Effective GB = (GB + GBsum)/2

An annoyance of Effective GB is that it's somewhat abstract--it can't be explained as easily in English as normal GB, and it offers up "quarter-games" (your Effective GB could be 3.25, for instance), which even for people used to the idea of "half-games" are tough to swallow.

(2) The second tool is the magic percentage (M%). M% is a close cousin to the magic number; it simply contextualizes it by dividing M# by the total number of games outstanding between the two teams. That means that the M% is the percentage of outstanding game outcomes that must go in the team's favor in order to win the title outright.

M# is a total; as with all totals, it ignores opportunities. There is a big difference between having a M# of 5 with two weeks to go in the season and having a M# of 5 with one weekend to go. In the first case, the pennant is all but secured; in the second, it's all but lost. Of course, M% as a rate leaves out important context as well. Everyone will start the season with their M% around .500; those that have a M% of around .500 by the end of the season are legitimate contenders. M% is not intended to replace M#--it is intended to complement it, just as batting average complements the raw hit total.

I already said it, but it bears repeating--M% is the proportion of favorable game outcomes outstanding that must go in a team's favor in order to win outright. For this reason, the M% of the two teams under consideration will not add to 1, since there is always a possibility of a tie. The fewer game outcomes are left outstanding, the further the sum of the two will diverge from 1. You could figure M% needed to ensure a tie as well by simply subtracting one from M# before calculating M%, but I'll stick with the M# construct of outcomes needed to win.

(3) The third tool is Crude Playoff Probability (CPP). CPP takes M# and GO, makes some simplifying assumptions (most importantly: no head-to-head games; all teams and opponents are true .500 teams; only two teams are in contention; the games outstanding are equally divided between both teams), and uses the binomial distribution to estimate the probability of winning the race (CPP does consider the possibility of a tie, and assumes a 50% chance of winning the resulting playoff game).

I wrote about CPP at length last week, so I won't rehash it again in detail. Instead, I'll reemphasize that CPP could also be called generic playoff probability. There is actually a finite number of possible games outstanding/games behind combinations, and one could produce an entire CPP table.

Here is just a small excerpt of such a table, showing the CPP from the trailing team's perspective for a number of GB/GO combinations (essentially every month throughout the season). In this case, games outstanding is from the perspective of one team only--the two teams are assumed to be on the same schedule (i.e. there are never any half-games in the standings). With this simplifying assumption, M# = GO (one team) - GB + 1. Remember that CPP only deals with the probability of catching one other team, not multiple teams.

This chart gets at the heart of why I designed (that is an extraordinarily haughty word choice of this situation, but I can't think of anything better) CPP. It certainly wasn't to improve on other people's published playoff odds, because CPP is quite obviously a step backwards. It was to address questions like "Would you rather be 5 games back with a month to play or 2 games back with a week to play"--questions which require a generic method when asked without a specific scenario in mind. Answer: the latter, 15.1% to 8.9%. What about five back after April or three back entering September? Assuming your team as good as the one it's chasing (which is an assumption that should be questioned if they are five back in one month), it's a toss-up: 27.2% to 29.5%. Given the limitations of CPP, none of this is intended to be authoritative, just food for thought.

Here are the standings through September 12, along with the poor man's indicators. Remember that M#, M%, and CPP for teams in third place or lower compare them only to the first place team, not to the teams in front of them; for the second place team, they relate to only the team they trail, not those behind them; and for the first place team, the only comparison is to their closest pursuer. GL is games left for the team in question; GO is total games outstanding for themselves and the first-place team (or the second-place team in the case of the first-place team):

Finally, to end on an only semi-related digression about something I surely have written about before, why does conventional wisdom say that it's better to be behind in the win column than behind in the loss column? Part of it is that being behind in the win column means that the difference can be made up by winning games that your team has yet to play, and so you "control your own destiny". From a probability perspective, though, that and a buck will get you a small fry.

The probabilistic reason why it's better to behind in the win column is that you'd rather have games outstanding than your opponent--assuming that you are both > .500 true quality teams. If the two teams in a pennant race are .550, you have a 55% chance of a favorable outcome when you play but only a 45% chance when your opponent plays.

The flip side of this is that if you have a race in which teams are < .500, you'd rather be ahead in the win column. Force the other team to go out and win games to catch you; don't try to do it yourself. No one really cares about races between bad teams, but it's something to keep in mind if you're checking the standings hoping to see your team avoid finishing in the cellar.