Walk Like a Sabermetrician: Run Distribution and W%, 2012

A couple of caveats apply to everything that follows in this post. The first is that there are no park adjustments anywhere. There's obviously a difference between scoring 5 runs at Petco and scoring 5 runs at Coors, but if you're using discrete data there's not much that can be done about it unless you want to use a different distribution for every possible context. Similarly, it's necessary to acknowledge that games do not always consist of nine innings; again, it's tough to do anything about this while maintaining your sanity.

All of the conversions of runs to wins are based only on 2012 data. Ideally, I would use an appropriate distribution for runs per game based on average R/G, but I've taken the lazy way out and used the empirical data for 2012 only.

The first breakout is record in blowouts versus non-blowouts. I define a blowout as a margin of five or more runs. This is not really a satisfactory definition of a blowout, as many five-run games are quite competitive--"blowout” is just a convenient label to use, and expresses the point succinctly. I use these two categories with wide ranges rather than more narrow groupings like one-run games because the frequency and results of one-run games are highly biased by the home field advantage. Drawing the focus back a little allows us to identify close games and not so close games with a margin built in to allow a greater chance of capturing the true nature of the game in question rather than a disguised situational effect.

In 2012, 74.9% of games were non-blowouts (and thus 25.1% were blowouts). Here are the teams sorted by non-blowout record:

Records in blowouts:

This chart is sorted by differential between blowout and non-blowout W% and also displays blowout/non-blowout percentage:

As you can see, the Phillies had the highest percentage of non-blowouts (and also went exactly .500 in both categories) while the Angels had the highest percentage of blowouts. This is the second consecutive season in which Cleveland has had the most extreme W% differential (in either direction). Coincidentally, both pennant winners were better in non-blowouts by the same -.012.

A more interesting way to consider game-level results is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:

The “marg” column shows the marginal W% for each additional run scored. In 2012, the fourth run was both the most marginally valuable and the cutoff point between winning and losing (on average).

I use these figures to calculate a measure I call game Offensive W% (or Defensive W% as the case may be), which was suggested by Bill James in an old Abstract. It is a crude way to use each team’s actual runs per game distribution to estimate what their W% should have been by using the overall empirical W% by runs scored for the majors in the particular season.

Theoretical run per game distribution was a major topic on this blog in 2012, and so I will digress for a moment and talk about what I found. The major takeaway is that a zero-modified negative binomial distribution provides a pretty good model of runs per game (I called my specific implementation of that model Enby so that I didn’t have to write “zero-modified negative binomial” a hundred times, but that’s what it is. This is important to point out so that I don’t 1) give the impression that I created a unique distribution out of thin air and 2) to assure you that said distribution is a real thing that you could read about in a textbook).

However, the Enby distribution is not ready to be used to estimate winning percentages. In order to use Enby, you have to estimate the three parameters of the negative binomial distribution at a given R/G mean. I do this by estimating the variance of runs scored and fudging (there is no direct way to solve for these parameters, at least that is published in math journals that I can make heads or tails of). The estimate of variance is quite crude, although it appears to work fine for modeling the run distribution of a team independently. But as Tango Tiger has shown in his work with the Tango Distribution (which considers the runs per inning distribution), the distribution must be modified when two teams are involved (as is the case when considering W%, as it simultaneously involves the runs scored and allowed distribution). I have not yet been able to apply a similar corrector in Enby, although I have an idea of how to do so which is on my to-do list. Perhaps by the time I look at the 2013 data, I’ll have a theoretical distribution to use. Here are three reasons why theoretical would be superior to empirical for this application:

1. The empirical distribution is subject to sample size fluctuations. In 2012, teams that scored 11 runs won 96.9% of the time while teams that scored 12 runs won 95.9% of the time. Does that mean that scoring 11 runs is preferable to scoring 12 runs? Of course not--it's a small sample size fluke (there were 65 games in which 11 runs were scored and 49 games in which 12 runs were scored). Additionally, the marginal values don’t necessary make sense even when W% increases from one runs scored level to another--for instance, the marginal value of a ninth run is implied to be .030 wins while the marginal value of an tenth run is implied to be .063 wins. (In figuring the gEW% family of measures below, I lumped all games with 11+ runs into one bucket, which smoothes any illogical jumps in the win function, but leaves the inconsistent marginal values unaddressed and fails to make any differentiation between scoring 20 runs and scoring 11).

2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored. (Enby doesn’t allowed for fractional runs either, which makes sense given that runs are indeed discrete, but you can park adjust Enby by park adjusting the baseline).

3. Related to #2 (really its root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce quirks into the data.

Before leaving the topic of the Enby distribution, I was curious to see how it performed in estimating the major league run distribution for 2012. The major league average was 4.324 R/G, which corresponds to Enby distribution parameters of (r = 4.323, B = 1.0116, z = .0594). This graph truncates scoring at 15 runs per game to keep things manageable, and there’s very little probability in the far right tail:

From my (admittedly biased) vantage point, Enby does a fairly credible job of estimating the run scoring distribution. Enby is too low on zero and one run and too high on 2-4 runs, which is fairly common and thus an area for potential improvement to the model.

I will not go into the full details of how gOW%, gDW%, and gEW% (which combines both into one measure of team quality) are calculated here, but full details were provided here. The “use” column here is the coefficient applied to each game to calculate gOW% while the “invuse” is the coefficient used for gDW%. For comparison, I have looked at OW%, DW%, and EW% (Pythagenpat record) for each team; none of these have been adjusted for park to maintain consistency with the g-family of measures which are not park-adjusted.

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate):

Positive: CIN, DET, KC, MIA
Negative: BOS, TEX, ARI, OAK

Last year, the Red Sox gOW% was 6.2 wins lower than their OW%, which is by far the highest I’ve seen since I started tracking this. Boston once again led the majors in this department, but only with a 2.5 win discrepancy. Of course, last year their gOW% was a still-excellent .572, while this year it was down to a near average .507.

As I’ve noted in an earlier post, Cincinnati’s offense was much worse than one would have expected given the names in the lineup and their recent performances. Historically bad leadoff hitters certainly didn’t help, but on the bright side, the Reds distributed their runs as efficiently as any team in MLB. CIN had a .479 OW% (which would be a little lower, .470, if I was park-adjusting), but their .498 gOW% was essentially league average. To see how this came about, the graph below considers Cincinnati’s runs scored distribution, the league average for 2012, and the Enby distribution expectation for a team averaging 4.15 runs per game (CIN actually averaged 4.13). The graph is cutoff at 15 runs; the Reds highest single game total
was 12:

The Reds were shutout much less frequently than an average team (or the expectation for a team with their average R/G), but they gave up much of this advantage by scoring exactly one run more frequently than expected. In total, CIN scored one or fewer runs 16.7% of the time, compared to a ML average of 17.4% and Enby expectation of 17.8%. They were also scored precisely two runs less than expected. Where Cincinnati made hay was in games of moderate runs scored--the Reds exceeded expectations for 3, 4, 5, and 6 runs scored. As you can see if you look at the chart from earlier in the post, the most valuable marginal runs in 2012 were 2-4, for the Reds did a decent job of clustering their runs in the sweet spot where an extra run can have a significant impact on your win expectancy.

From the defensive side, the biggest differences between gDW% and DW% were:

Positive: TEX, CHN, BAL
Negative: MIN, WAS, TB, NYA, CIN

The Reds and the Rangers managed to offset favorable/unfavorable offensive results with the opposite for defense. For the Twins to have the largest negative discrepancy was just cruel, considering that only COL (.386) and CLE (.411) had worse gDW%s than Minnesota’s .418. In gDW%, Minnesota’s .400 was better only than Colorado’s .394, a gap that would be wiped out by any reasonable park adjustment.
gOW% and gDW% are combined via Pythagenpat math into gEW%, which can be compared to a team’s standard Pythagenpat record:

Positive: DET, CHN, KC, NYN, BAL, MIA
Negative: MIN, ARI, OAK, WAS, STL

The table below is sorted by gEW%: