Monday, May 11, 2009

On Games Behind

The standings on Baseball-Reference include a column labeled "GBsum". According to the glossary, this idea was proposed by John Dewan and B-R now tracks it. There has always been something about this that bothered me, but I never really sat down and tried to figure out what it was. But it has now dawned on me, and I can actually put my finger on what why I've never taken it seriously.

Suppose you had a division in which three teams were tied for the lead with an identical record, with a fourth team three games behind them (must be early in the season unless it's the 2005 NL West on steroids). Let's say the three teams in the lead are New York, Florida, and Atlanta, and it is Philadelphia that is three back.

Philadelphia's "GBsum" will be 9, as they are three games behind three distinct opponents. But can you explain to me what this number means, in real baseball terms?

You certainly could for standard games behind, probably more elegantly than I am about to: if Philadelphia is three games behind Atlanta, it means that it would require three sets of Philadelphia wins coupled with Atlanta losses in order to even the two teams up. You know that if the Phillies sweep their three-game weekend series and the Braves are swept, the two clubs will be tied.

But what does the GBsum of 9 mean? What would have to happen so that the Phillies would be tied with the triumvirate, and does the figure of 9 reflect this?

In one sense, it does--it will take 9 pairs of Phillie wins coupled with opponent losses to but the Phillies in a first-place tie.

But let's look at a different imaginary division. Chicago leads Houston by nine games, with no clubs in between. Thus the Astros' "GBsum" is 9, same as the Phillies, and it will take 9 pairs of Astro wins + Cub losses to even things up.

So, if the GBsum figure is an improvement over the traditional games behind, one should conclude that the Phillies and Astros are equally far away from first place, right? Before you answer that, ask yourself if it makes common sense to you. If the baseball gods came to you and allowed you to plop your team down in the place of one of those two, which would you pick?

I think that most people would intuitively choose to place their teams in the division in which they are three games behind three teams, rather than nine games behind one team (ignoring for the moment the proximity of the teams below the Phillies and Astros in their divisions, which I have not defined). And I think they're right.

In order for the Phillies to pull into the lead, we need nine pairs of opposite outcomes:
3 PHI wins + 3 ATL losses
3 PHI wins + 3 NYN losses
3 PHI wins + 3 FLA losses

In order for the Astros to pull even, we need nine pairs of opposite outcomes:
9 HOU wins + 9 CHN losses

In the Phillies' case, though, any win that they manage to earn on their own cuts into all three deficits simultaneously. They don't need nine wins to achieve their goal provided their division opponents are losing--they need three. And thus there are really only twelve required game outcomes (in using the term "game outcomes", I am blithely ignoring head-to-head games, in which a PHI win is by definition a FLA loss if the two are playing; if the loose terminology bothers you, just assume that all of these teams are playing interleague series at the moment)
3 PHI wins + 3 ATL losses + 3 NYN losses + 3 FLA losses

In order for the Astros to pull even, they need eighteen:
9 HOU wins + 9 CHN losses

So I would argue that, even if you want to break away from traditional games behind the leader and consider distance behind non-leaders, the pertinent numbers are 12 and 18 in this case, not 9 and 9. The Phillies are in better position than the Astros.

In order to convert those figures to something resembling a traditional games behind format (they are half-games from the traditional GB perspective), simply divide by two. The "true GB" is six for the Phillies and nine for the Astros. Writing the procedure for calculating "true GB" formally:

"true GB" = games behind leader + (1/2)(games behind others)

So for the Phillies, this is 3 + (1/2)(3 + 3) = 6. I believe that this is a much better gauge of a team's position in the standings than is "GBsum".

Admittedly, "true GB" does have the drawback of being oft-expressed in confusing quarter-games. For example, Toronto leads Boston by one game and New York by 5.5, meaning New York is 4.5 behind Boston. New York could tie Toronto with 5 wins + 6 Blue Jay losses or 6 wins + 5 Blue Jay losses. The formula assumes the average of 5.5 wins, and so assuming they are achieved New York needs 4.5 Boston losses to pull even with the Red Sox. 5.5 wins + 5.5 losses + 4.5 losses = 15.5 game outcomes, which when divided by 2 gives an odd 7.75.

Another caveat is that you can't just interpret "true GB" in the same manner you do actual games behind. The Indians' "true GB" of 14.5 does not mean their situation is anywhere near as dire as a team that is 14.5 games behind. Our idea of the plight of a 14.5 GB team naturally includes other teams in front, since rarely is the second place team in a division that far back.

Please note I am not holding it up as anything essential that should be carried in the standings in your local newspaper. I just prefer it to the "GBsum" column. I'm not sure an alternative figure including the other teams is necessary, as each comparison is a pennant race in its own right (you can look at it from the perspective that the Phillies have to beat the Braves in a distinct race, and beat the Mets in a distinct race, and...), and anyone looking at the traditional standings can figure the margin between two teams themselves. And if one does want to get more involved in putting a number on the race, I'd rather go all the way and try to estimate probabilities like BP and other sites are already doing.

Here are the standings as of this moment, with all three flavors of GB on display (if you saw this post in the first few hours it was up, I mis-figured the NL West):

You can see that there is general agreement between GBsum and "true GB", and if you'd like to posit that as a demonstration "true GB"'s limited utility, I won't put up much of a fight. Unfortunately for me, any way you slice it except traditional games behind, the Indians are in the deepest hole in the American League.

As an aside, one often hears "games in the loss column" cited as being important, even more so than standard games behind. A 10-5 team is one game ahead of an opponent with a 9-6 record or an opponent with a 10-7 record, but this viewpoint would hold you'd rather be in the latter position as the leader.

This is correct, of course; if you have two teams equal in the standings, but with different win-loss records, the one with the advantage in the loss column has the better winning percentage...assuming that the teams are above .500.

If the teams were below .500, then the one with the advantage in the loss column would actually have a lower W%. This makes sense if you think about it. If you have a good team that has played one less game, it is likely to be a win. But for a bad team, the extra game is more likely to be a loss.

Now this "advantage" to being ahead in the win column rather than in the loss column never really comes into play in real MLB situations, since sub-.500 teams are generally not fighting each other for playoff spots (1994 AL West notwithstanding). Even if they were, the teams would likely be sufficiently close to .500 to make it a wash anyway (of course, it is "true" quality, not observed W%, that should be the basis for which column it is preferable to have the advantage in).

And in a race to the bottom, like who will get the #1 draft pick, the goal becomes to finish last, not first, and so the advantage flips back to the team "ahead" in the loss column--although in this case, being "ahead" means having more losses. Perhaps this principle could come into play in relegation for international soccer or something--it has to be a case in which true sub-.500 teams are competing for a the best record amongst themselves, something that rarely/never happens in MLB (for anything important, at least--fourth place in the NL Central isn't what I have in mind).


  1. Here's a possible refinement to "true GB" that would probably render it a little more logical and eliminates "quarter-games". However, it's harder to figure and I'm not trying to push either of these for inclusion in your morning newspaper or anything.

    Let me illustrate with the AL Central, which is a good example at the moment since Detroit and Kansas City are tied for the lead. For each team behind them, we will figure the number of wins and opponent losses they need to reach identical records as the leaders.

    MIN needs 2 wins to catch DETand 3 to catch KC, so they need 3. They also need 3 KC losses and 4 DET losses, a total of 7. Thus, they are (3+7)/2 = 5 pseudo GB.

    CHA needs 3 wins to catch DET and 4 to catch KC, so their needed wins is 4. They also need 3 DET losses and 2 KC losses, a total of 5, for (4+5)/2 = 4.5 pseudo GB. Despite the fact that the Sox and Twins are tied in GB, GB sum, and "true GB", they have different figures here. I'm not sure this is a good thing, as the Twins actually have a higher W%, but oh well.

    CLE's maximum needed wins versus their opponents is 7, and they also need 8 DET losses, 7 KC losses, 4 MIN losses, and 5 CHA losses, a total of 24. (7+24)/2 = 15.5

    If you need a formula here, it would be:

    pseudo GB = (1/2)*(MAX(win differences) + SUM(loss differences))

  2. As "AaronGNP" commenting on Tango's site points out, "true GB" is just the simple average of GB and GBsum. I can't believe I didn't realize/point this out, as the formula makes it clear:

    "true GB" = games behind leader + (1/2)(games behind others)

    can be rewritten as GB + (1/2)(GBsum-GB) = (1/2)GB + (1/2)(GBsum)

  3. Total Geekspeak garbage. Even the regular GB line in most listed standings is worthless. Earl Weaver could have told you 40 yrs ago the loss colum is the only thing that matters because you can never make up a loss.


I reserve the right to reject any comment for any reason.