Walk Like a Sabermetrician: Run Distribution and W%, 2011

A couple of caveats apply to everything that follows in this post. The first is that there are no park adjustments anywhere. There's obviously a difference between scoring 5 runs at Petco and scoring 5 runs at Coors, but if you're using discrete data there's not much that can be done about it unless you want to use a different distribution for every possible context. Similarly, it's necessary to acknowledge that games do not always consist of nine innings; again, it's tough to do anything about this while maintaining your sanity.

All of the conversions of runs to wins are based only on 2011 data. Ideally, I would use an appropriate distribution for runs per game based on average R/G, but I've taken the lazy way out and used the empirical data for 2011 only.

This post also contains little in the way of "analysis" and a lot of tables. This is probably a good thing for you as the reader, but I felt obliged to warn you anyway. I’ve cut out a lot of what I listed last year simply because I don’t have that much free time right now. The data was not particularly useful in any event—knowing how many runs teams scored and allowed in their wins and losses, or what percentage of their games fell into arbitrarily defined classes might offer some trivia but is not exactly essential material.

The first breakout is record in blowouts versus non-blowouts. I define a blowout as a margin of five or more runs. This is not really a satisfactory definition of a blowout, as many five-run games are quite competitive--"blowout” is just a convenient label to use, and expresses the point succinctly. I use these two categories with wide ranges rather than more narrow groupings like one-run games because the frequency and results of one-run games are highly biased by the home field advantage. Drawing the focus back a little allows us to identify close games and not so close games with a margin built in to allow a greater chance of capturing the true nature of the game in question rather than a disguised situational effect.

In 2011, 75.8% of games were non-blowouts and 24.2% were blowouts. The teams sorted by non-blowout record:

The standard deviation of W% in non-blowouts was .064, which as expected is less than the standard deviation for blowouts (.114) and all games (.070).

Records in blowouts:

Obviously the sample size on these games is pretty small, but Kansas City and Oakland at .500 in blowouts caught my eye.

This chart shows blowout W% less non-blowout W%, along with the percentage of games that were blowouts and non-blowouts for each team:

This is the second year in a row in which San Diego has ranked high in terms of difference between blowout and non-blowout record. Usually teams with large differences are the better teams; that description may have fit the Padres in 2010 but not in 2011. Cleveland was the most extreme team in either direction in the majors. Florida played in the smallest proportion of blowouts while Texas played in the most.

A more interesting way to consider game-level results is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:

The “marg” column shows the marginal W% for each additional run scored. The second and third run were both worth about .15 wins on average in 2011, while scoring four runs was the cutoff point between winning and losing (on average, of course).

I use these figures to calculate a measure I call game Offensive W% (or Defensive W% as the case may be), which was suggested by Bill James in an old Abstract. It is a crude way to use each team’s actual runs per game distribution to estimate what their W% should have been by using the overall empirical W% by runs scored for the majors in the particular season.

Using the empirical distribution rather than a theoretical distribution has the upside of being simple (modeling the runs per game distribution is fairly messy), but the benefits are outnumbered by the drawbacks. A non-comprehensive list of said drawbacks:

1. The empirical distribution is subject to sample size fluctuations. In 2011, at least, each additional run increased W%. This is often not the case given the low frequency of high scoring games. Even so, the marginal values don’t necessary make sense--for instance, the marginal value of a tenth run is implied to be .006 wins while the marginal value of an eleventh run is implied to be .040.

2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored.

3. Related to #2 (really it’s root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce quirks into the data.

I will not go into the full details of how gOW%, gDW%, and gEW% (which combines both into one measure of team quality) are calculated here, but full details were disclosed in this post. The “use” column here is the coefficient applied to each game to calculate gOW% while the “invuse” is the coefficient used for gDW%. For comparison, I have looked at OW%, DW%, and EW% (Pythagenpat record) for each team; none of these have been adjusted for park to maintain consistency with the g-family of measures which are not park-adjusted.

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate):

Positive: BAL, PIT, ATL, FLA, HOU, SEA
Negative: BOS, NYA, TEX, COL

You'll note that the positive differences tended to belong to bad offenses; this is a natural result of the nature of the game, and is reflected in the marginal value of each run as discussed above. In the four years that I’ve been looking at these figures, I can’t recall a difference as large as the Red Sox’ deviation in 2011--a standard OW% of .610 and a gOW% of .572, a 6.2 win difference. Boston led the majors in OW%; their gOW% was still excellent and good enough for third in the majors, but they did not spread their runs across games in an efficient fashion. The Sox scored ten or more runs 25 times; Toronto was second with 19 and the major league average was 9. Boston scored 36% of their runs in that 15% subset of games; the major league average was 15%, and next on the list was Texas at 28%.

Differences in for gDW%:

Positive: DET, BAL
Negative: PHI, SD, TB

I combine gOW% and gDW% through some Pythagorean math to produce gEW%, which can then be compared to a team’s standard Pythagorean record (EW%). Of course, it could also be compared to actual W%, but I think the comparison to a method that also uses runs is more interesting than a comparison to the actual win totals:

Positive: BAL, PIT, CHA, DET, MIN, HOU, OAK, FLA
Negative: BOS, PHI, COL, NYA, SD, TB, LA, KC

There are so many large differences that I’m a little worried that I may have made a spreadsheet error somewhere along the way, although I have double-checked and can’t find anything. Below is a table with all of the metrics discussed in this post for each team, sorted by gEW%: