Monday, January 19, 2015

Run Distribution & W%, 2014

A couple of caveats apply to everything that follows in this post. The first is that there are no park adjustments anywhere. There's obviously a difference between scoring 5 runs at Petco and scoring 5 runs at Coors, but if you're using discrete data there's not much that can be done about it unless you want to use a different distribution for every possible context. Similarly, it's necessary to acknowledge that games do not always consist of nine innings; again, it's tough to do anything about this while maintaining your sanity.

All of the conversions of runs to wins are based only on 2014 data. Ideally, I would use an appropriate distribution for runs per game based on average R/G, but I've taken the lazy way out and used the empirical data for 2014 only. (I have a methodology I could use to do estimate win probabilities at each level of scoring that take context into account, but I’ve not been able to finish the full write-up it needs on this blog before I am comfortable using it without explanation).

The first breakout is record in blowouts versus non-blowouts. I define a blowout as a margin of five or more runs. This is not really a satisfactory definition of a blowout, as many five-run games are quite competitive--"blowout” is just a convenient label to use, and expresses the point succinctly. I use these two categories with wide ranges rather than more narrow groupings like one-run games because the frequency and results of one-run games are highly biased by the home field advantage. Drawing the focus back a little allows us to identify close games and not so close games with a margin built in to allow a greater chance of capturing the true nature of the game in question rather than a disguised situational effect.

In 2014, 74.5% of major league games were non-blowouts while the complement, 25.5%, were. Team record in non-blowouts:



It must have been a banner year for MASN, as both the Nationals and the Orioles won a large number of competitive games, just the kind of fan-friendly programming any RSN would love to have. Arizona was second last in non-blowouts in addition to dead last in blowouts:



For each team, the difference between blowout and non-blowout W%, as well as the percentage of each type of game:



Typically the teams that exhibit positive blowout differentials are good teams in general, and this year that is mostly the case, but Colorado is a notable exception with the highest difference. Not surprisingly, they also played the highest percentage of blowout games in the majors as the run environment in which they play is a major factor. The Rockies’ blowout difference is also correlated to some degree with their home field advantage--more of their blowouts are at home, where all teams have a better record, but they have exhibited particularly large home field advantages. This year the home/road split was extreme as Colorado’s home record was similar to the overall record of a wildcard team (.556) and their road record that of a ’62 Mets or ’03 Tigers type disaster (.259).

I did not look at the home/road blowout differentials for all teams, but of the 52 blowouts Colorado participated in, 38 (73%) came at home and 14 on the road. The Rockies were 22-16 (.579) in home blowouts but just 4-10 (.286) in road blowouts.

A more interesting way to consider game-level results is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:



The “marg” column shows the marginal W% for each additional run scored. In 2014, the fourth run was both the run with the greatest marginal impact on the chance of winning and the level of scoring for which a team was more likely to win than lose.

I use these figures to calculate a measure I call game Offensive W% (or Defensive W% as the case may be), which was suggested by Bill James in an old Abstract. It is a crude way to use each team’s actual runs per game distribution to estimate what their W% should have been by using the overall empirical W% by runs scored for the majors in the particular season.

A theoretical distribution would be much preferable to the empirical distribution for this exercise, but as I mentioned earlier I haven’t yet gotten around to writing up the requisite methodological explanation, so I’ve defaulted to the 2014 empirical data. Some of the drawbacks of this approach are:

1. The empirical distribution is subject to sample size fluctuations. In 2014, teams that scored 9 runs won 94.2% of the time while teams that scored 10 runs won 92.5% of the time. Does that mean that scoring 9 runs is preferable to scoring 10 runs? Of course not--it's a quirk in the data. Additionally, the marginal values don’t necessary make sense even when W% increases from one runs scored level to another (In figuring the gEW% family of measures below, I lumped all games with between 10 and 14 runs scored/allowed into one bucket, which smoothes any illogical jumps in the win function, but leaves the inconsistent marginal values unaddressed and fails to make any differentiation between scoring in that range. The values actually used are displayed in the “use” column, and the “invuse” column is the complements of these figures--i.e. those used to credit wins to the defense. I've used 1.0 for 15+ runs, which is a horrible idea theoretically. In 2014, teams were 20-0 when scoring 15 or more runs).

2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored.

3. Related to #2 (really its root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce more quirks into the data.

I keep promising that I will use my theoretical distribution (Enby, which you can read about here) to replace the empirical approach, but that would require me to finish writing my full explanation of the method and associated applications and I keep putting that off. I will use Enby for a couple graphs here but not beyond that.

First, a comparison of the actual distribution of runs per game in the majors to that predicted by the Enby distribution for the 2014 major league average of 4.066 runs per game (Enby distribution parameters are B = 1.059, r = 3.870, z = .0687):



Enby fares pretty well at estimating the actual frequencies, most notably overstating the probability of two or three runs and understating the probability of four runs.

I will not go into the full details of how gOW%, gDW%, and gEW% (which combines both into one measure of team quality) are calculated in this post, but full details were provided here (***). The “use” column here is the coefficient applied to each game to calculate gOW% while the “invuse” is the coefficient used for gDW%. For comparison, I have looked at OW%, DW%, and EW% (Pythagenpat record) for each team; none of these have been adjusted for park to maintain consistency with the g-family of measures which are not park-adjusted.

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate):

Positive: STL, NYA
Negative: OAK, TEX, COL

The Rockies’ -3.6 win difference between gOW% and OW% was the largest absolute offensive or defensive difference in the majors, so looking at their runs scored distribution may help in visualizing how a team can vary from expectation. Colorado scored 4.660 R/G, which results in an Enby distribution with parameters B = 1.125, r = 4.168, z = .0493:



The purple line is Colorado’s actual distribution, the red line is the major league average, and the blue line is their Enby expectation. The Rockies were held to three runs or less more than Enby would expect. Major league teams had a combined .231 W% when scoring three or fewer runs, and that doesn’t even account for the park effect which would make their expected W% even lower (of course, the park effect is also a potential contributing factor to Colorado’s inefficient run distribution itself).The spike at 10 runs stands out--the Rockies scored exactly ten runs in twelve games, twice as many as second-place Oakland. Colorado’s 20 games with 10+ runs also led the majors (the A’s again were second with seventeen such games, while the average team had just 8.3 double digit tallies).

Teams with differences of +/- 2 wins between gDW% and standard DW%:

Positive: TEX
Negative: NYN, OAK, MIA

Texas’ efficient distribution of runs allowed offset their inefficient distribution of runs scored, while Oakland was poor in both categories which will be further illustrated by comparing EW% to gEW%:

Positive: STL, CHN, NYA, HOU
Negative: SEA, COL, MIA, OAK

The A’s EW% was 4.9 wins better than their gEW%, which in turn was 5.8 wins better than their actual W%.

Last year, EW% was actually a better predictor of actual W% than was gEW%. This is unusual since gEW% knows the distribution of runs scored and runs allowed, while EW% just knows the average runs scored and allowed. gEW% doesn’t know the joint distribution of runs scored and allowed, so oddities in how they are paired in individual games can nullify the advantage that should come from knowing the distribution of each. A simplified example of how this could happen is a team that over 162 games has an apparent tendency to “waste” outstanding offensive and defensive performances by pairing them (e.g. winning a game 12-0) or get clunkers out of the way at the same time (that same game, but from the perspective of the losing team).

In 2014, gEW% outperformed EW% as is normally the case, with a 2.85 to 3.80 advantage in RMSE when predicting actual W%. Still, gEW% was a better predictor than EW% for only seventeen of the thirty teams, but it had only six errors of +/- two wins compared to sixteen for EW%.

Below are the various W% measures for each team, sorted by gEW%:

Wednesday, January 07, 2015

Crude Team Ratings, 2014

For the last several years I have published a set of team ratings that I call "Crude Team Ratings". The name was chosen to reflect the nature of the ratings--they have a number of limitations, of which I documented several when I introduced the methodology.

I explain how CTR is figured in the linked post, but in short:

1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.

2) Figure the average win ratio of the team’s opponents.

3) Adjust for strength of schedule, resulting in a new set of ratings.

4) Begin the process again. Repeat until the ratings stabilize.

First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS:



I lost a non-negligible number of Twitter followers by complaining about the playoff results this year. As you can see, the eventual world champs had just the fourteenth most impressive win-loss record when taking quality of opposition into account. The #7 Mariners, #10 Indians, #11 Yankees, and #12 Blue Jays all were at least two games better than the Giants over the course of the season (at least based on this crude method of adjusting win-loss records). Note that this is not an argument about “luck”, such as when a team plays better or worse than one would it expect from their component statistics, this is about the actual win-loss record considering opponents’ records.

San Francisco played the second-worst schedule in the majors (90 SOS); of the teams that ranked ahead of them in CTR but failed to make the playoffs, Toronto had the strongest SOS (107, ranking seventh). Based on the Log5 interpretation of CTR described in the methodology post, this suggests that Toronto’s average opponent would play .543 baseball against San Francisco’s average opponent. The magnitude of this difference can be put into (potentially misleading) context by noting that the long-term home-field W% of major league teams is around .543. Thus the Giants could be seen as having played an entire 162 game schedule at home relative to the Blue Jays playing an even mix of home and road games. Another way to look at it is that Toronto’s average opponent was roughly equivalent to St. Louis or Pittsburgh while San Francisco’s average opponent was roughly equivalent to Milwaukee or Atlanta.

On the other hand, the disparity between the best teams as judged by CTR and those that actually made the playoffs is solely a function of the AL/NL disparity--the five playoff teams in each league were the top five teams by CTR. The AL/NL disparity is alive and well, though, as seen by the average rating by league/division (actually calculated as the geometric average of the CTR of the respective clubs):



While this is not the AL’s largest advantage within the five seasons I’ve published these ratings, it is the first time that every AL division is ranked ahead of every NL division. Typically there has been a weak AL division or strong NL division that prevented this, but not in 2014. Matchup the AL’s worst division and the NL’s best division (both the Central) and you can see why:



The two teams that battled to the end for the AL Central crown stood out, with the NL Central’s two combatants unable to distinguish themselves from Cleveland, who hung around the periphery of the AL Central race throughout September but was never able to make a charge. In all cases the Xth place team from the ALC ranks ahead of the Xth place team from the NLC. In fact, the same holds true for the other two geographic division pairings:



This would also hold for any AL/NL division comparison rather than just the arbitrary geographic comparisons, except for the NL East v. AL Central, where the NL-best Nationals rank ahead of the Tigers 129 to 123.

The AL’s overall CTR edge of 106-89 implies that the average AL team would have a .544 record against the average NL team, similar to the gap between SF and TOR opponents described above. This is very close to the AL’s actual interleague record (140-117, .545).

All the results discussed so far are based on actual wins and losses. I also use various estimated W%s to calculated CTRs, and will present those results with little comment. First, CTR based on gEW%, which considers independently each team’s distribution of runs scored and allowed per game:



Well, I will point out that by gCTR, the world champions are the epitome of average. Next is CTR based on EW% (Pythagenpat):



And based on PW% (Pythagenpat using Runs Created/Runs Created Allowed):



Last year I started including actual W-L CTR including the results of the playoffs. There are a number of reasons why one may want to exclude the playoffs (the different nature of the game in terms of roster construction and strategy, particularly as it relates to pitcher workloads; the uneven nature of the opportunity to play in postseason and pad a team’s rating; etc.), but in general the playoffs provide us with additional data regarding team quality, and it would be prudent to heed this information in evaluating teams. The chart presents each team’s CTR including the playoffs (pCTR), their rank in that category, their regular season-only CTR (rsCTR), and is sorted by pCTR - rsCTR:



Last year there was not a lot of movement between the two sets of ratings, since the top regular season teams also won their league’s pennants. It should be no surprise that both wildcard pennant winners in 2014 were able to significantly improve their standings in the ratings when postseason is taken into account. Still, San Francisco ranks just ninth, still trailing Seattle who didn’t even make the playoffs, and Kansas City is a distant third from the two teams they beat in the AL playoffs, Los Angeles and Baltimore.