Monday, January 30, 2017

Run Distribution and W%, 2016

Every year I state that by the time this post rolls around next year, I hope to have a fully functional Enby distribution to allow the metrics herein to be more flexible (e.g. not based solely on empirical data, able to handle park effects, etc.) And every year during the year I fail to do so. “Wait ‘til next year”...the Indians taking over the longest World Series title drought in spectacular fashion has now given me an excuse to apply this to any baseball-related shortcoming on my part. This time, it really should be next year; what kept me from finishing up over the last twelve months was only partly distraction but largely perfectionism on a minor portion of the Enby methodology that I think I now have convinced myself is folly.

Anyway, there are some elements of Enby in this post, as I’ve written enough about the model to feel comfortable using bits and pieces. But I’d like to overhaul the calculation of gOW% and gDW% that are used at the end based on Enby, and I’m not ready to do that just yet given the deficiency of the material I’ve published on Enby.

Self-indulgence, aggrandizement, and deprecation aside, I need to caveat that this post in no way accounts for park effects. But that won’t come in to play as I first look at team record in blowouts and non-blowouts, with a blowout defined as 5+ runs. Obviously some five run games are not truly blowouts, and some are; one could probably use WPA to make a better definition of blowout based on some sort of average win probability, or the win probability at a given moment or moments in the game. I should also note that Baseball-Reference uses this same definition of blowout. I am not sure when they started publishing it; they may well have pre-dated by usage of five runs as the delineator. However, I did not adopt that as my standard because of Baseball-Reference, I adopted it because it made the most sense to me being unaware of any B-R standard.

73.0% of major league games in 2015 were non-blowouts (of course 27.0% were). The leading records in non-blowouts:



Texas was much the best in close-ish games; their extraordinary record in one-run games which of course are a subset of non-blowouts was well documented. The Blue Jays have made it to consecutive ALCS, but their non-blowout regular season record in 2015-16 is just 116-115. Also, if you audit this you may note that the total comes to 1771-1773, which is obviously wrong. I used Baseball Prospectus' data.

Records in blowouts:



It should be no surprise that the Cubs were the best in blowouts. Toronto was nearly as good last year, 37-12, for a two-year blowout record of 66-27 (.710).

The largest differences (blowout - non-blowout W%) and percentage of blowouts and non-blowouts for each team:



It is rare to see a playoff team with such a large negative differential as Texas had. Colorado played the highest percentage of blowouts and San Diego the lowest, which shouldn’t come as a surprise given that scoring environment has a large influence. Outside of Colorado, though, the Cubs and the Indians played the highest percentage of blowout games, with the latter not sporting as a high of a W% but having the second most blowout wins.

A more interesting way to consider game-level results is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:



The “marg” column shows the marginal W% for each additional run scored. In 2015, the third run was both the run with the greatest marginal impact on the chance of winning, while it took a fifth run to make a team more likely to win than lose. 2016 was the first time since 2008 that teams scoring four runs had a losing record, a product of the resurgence in run scoring levels.

I use these figures to calculate a measure I call game Offensive W% (or Defensive W% as the case may be), which was suggested by Bill James in an old Abstract. It is a crude way to use each team’s actual runs per game distribution to estimate what their W% should have been by using the overall empirical W% by runs scored for the majors in the particular season.

The theoretical distribution from Enby discussed earlier would be much preferable to the empirical distribution for this exercise, but I’ve defaulted to the 2016 empirical data. Some of the drawbacks of this approach are:

1. The empirical distribution is subject to sample size fluctuations. In 2016, all 58 times that a team scored twelve runs in a game, they won; meanwhile, teams that scored thirteen runs were 46-1. Does that mean that scoring 12 runs is preferable to scoring 13 runs? Of course not--it's a quirk in the data. Additionally, the marginal values don’t necessary make sense even when W% increases from one runs scored level to another (In figuring the gEW% family of measures below, I lumped games with 12+ runs together, which smoothes any illogical jumps in the win function, but leaves the inconsistent marginal values unaddressed and fails to make any differentiation between scoring in that range. The values actually used are displayed in the “use” column, and the invuse” column is the complements of these figures--i.e. those used to credit wins to the defense.)

2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored.

3. Related to #2 (really its root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce more quirks into the data.

I keep promising that I will use Enby to replace the empirical approach, but for now I will use Enby for a couple graphs but nothing more.

First, a comparison of the actual distribution of runs per game in the majors to that predicted by the Enby distribution for the 2016 major league average of 4.479 runs per game (Enby distribution parameters are B = 1.1052, r = 4.082, z = .0545):



This is pretty typical of the kind of fit you will see from Enby for a given season: a few important points where there’s a noticeable difference (in this case even tallies two, four, six on the high side and 1 and 7 on the low side), but generally acquitting itself as a decent model of the run distribution.

I will not go into the full details of how gOW%, gDW%, and gEW% (which combines both into one measure of team quality) are calculated in this post, but full details were provided here and the paragraph below gives a quick explanation. The “use” column here is the coefficient applied to each game to calculate gOW% while the “invuse” is the coefficient used for gDW%. For comparison, I have looked at OW%, DW%, and EW% (Pythagenpat record) for each team; none of these have been adjusted for park to maintain consistency with the g-family of measures which are not park-adjusted.

A team’s gOW% is the sumproduct of their frequency of scoring x runs, where x runs from 0 to 22, and the empirical W% of teams in 2015 when they scored x runs. For example, Philadelphia was shutout 11 times; they would not be expected to win any of those games (nor would they, we can be certain). They scored one run 23 times; an average team in 2016 had a .089 W% when scoring one run, so they could have been expected to win 2.04of the 23 games given average defense. They scored two runs 22 times; an average team had a .228 W% when scoring two, so they could have been expected to win 5.02 of those games given average defense. Sum up the estimated wins for each value of x and divide by the team’s total number of games and you have gOW%.

It is thus an estimate of what W% a team with the given team’s empirical distribution of runs scored and a league average defense would have. It is analogous to James’ original construct of OW% except looking at the empirical distribution of runs scored rather than the average runs scored per game. (To avoid any confusion, James in 1986 also proposed constructing an OW% in the manner in which I calculate gOW%).

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate):

Positive: MIA, PHI, ATL, KC
Negative: LA, SEA

The Marlins offense had the largest difference (3.55) between their corresponding g-type W% and their OW%/DW%, so I like to include a run distribution chart to hopefully ease in understanding what this means. Miami scored 4.167 R/G, so their Enby parameters (r = 3.923, B = 1.0706, z = .0649) produce these estimated frequencies:



Miami scored 0-3 runs in 47.8% of their games compared to an expected 47.9%. But by scoring 0-2 runs 3% less often then expected and scoring three 3% more often, they had 1.3 more expected wins from such games than Enby expected. They added an additional 1.2 wins from 4-6 runs, and lost 1.1 from 7+ runs. (Note that the total doesn’t add up to the difference between their gOW% and OW%, nor should it--the comparisons I was making were between what the empirical 2016 major league W%s for each x runs scored predicted using their actual run distribution and their Enby run distribution. If I had my act together and was using Enby to estimate the expected W% at each x runs scored, then we would expect a comparison like the preceding to be fairly consistent with a comparison of gOW% to OW%).

Teams with differences of +/- 2 wins between gDW% and standard DW%:

Positive: CIN, COL, ARI
Negative: NYN, MIL, MIA, TB, NYA

The Marlins were the only team to appear on both the offense and defense list, their defense giving back 2.75 wins when looking at their run distribution rather than run average.

Teams with differences of +/- 2 wins between gEW% and standard EW%:

Positive: PHI, TEX, CIN, KC
Negative: LA, SEA, NYN, MIL, NYA, BOS

The Royals finally showed up on these lists, but turning a .475 EW% into a .488 gEW% is not enough pixie dust to make the playoffs.

Below is a full chart with the various actual and estimated W%s:

Monday, January 23, 2017

Crude Team Ratings, 2016

For the last several years I have published a set of team ratings that I call "Crude Team Ratings". The name was chosen to reflect the nature of the ratings--they have a number of limitations, of which I documented several when I introduced the methodology.

I explain how CTR is figured in the linked post, but in short:

1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.

2) Figure the average win ratio of the team’s opponents.

3) Adjust for strength of schedule, resulting in a new set of ratings.

4) Begin the process again. Repeat until the ratings stabilize.

The resulting rating, CTR, is an adjusted win/loss ratio rescaled so that the majors’ arithmetic average is 100. The ratings can be used to directly estimate W% against a given opponent (without home field advantage for either side); a team with a CTR of 120 should win 60% of games against a team with a CTR of 80 (120/(120 + 80)).

First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS:



Last year, the top ten teams in CTR were the playoff participants. That was not remotely the case this year thanks to a resurgent gap in league strength. While the top five teams in the AL made the playoffs and the NL was very close, St. Louis slipping just ahead of New York and San Francisco (by a margin of .7 wins if you compare aW%), the Giants ranked only fifteenth in the majors in CTR. Any of the Mariners, Tigers, Yankees, or Astros were considered stronger than the actual NL #3 seed and CTR finisher the Dodgers.

The Dodgers had the second-softest schedule in MLB, ahead of only the Cubs. (The natural tendency is for strong teams in weak divisions to have the lowest SOS, since they don’t play themselves. The flip is also true--I was quite sure without checking to verify that Tampa Bay had the toughest schedule). The Dodgers average opponent was about as good as the Pirates or the Marlins; the Mariners average opponent was rated stronger than the Cardinals.

At this point you probably want to see just how big of a gap there was between the AL and NL in average rating. Originally I gave the arithmetic average CTR for each divison, but that’s mathematically wrong--you can’t average ratios like that. Then I switched to geometric averages, but really what I should have done all along is just give the arithemetic average aW% for each division/league. aW% converts CTR back to an “equivalent” W-L record, such that the average across the major leagues will be .50000. I do this by taking CTR/(100 + CTR) for each team, then applying a small fudge factor to force the average to .500. In order to maintain some basis for comparison to prior years, I’ve provided the geometric average CTR alongside the arithmetric average aW%, and the equivalent CTR by solving for CTR in the equation:

aW% = CTR/(100 + CTR)*F, where F is the fudge factor (it was 1.0012 for 2016 lest you be concerned there is a massive behind-the-scenes adjustment taking place).



Every AL division was better than every AL division, a contrast from 2015 in which the two worst divisions were the NL East and West, but the NL Central was the best division. Whether you use the geometric or backdoor-arithmetric average CTRs to calculate it, the average AL team’s expected W% versus an average NL team is .545. The easiest SOS in the AL was the Indians, as to be expected as the strongest team in the weakest division; it was still one point higher than that of the toughest NL schedule (the Reds, the weakest team in the strongest division).

I also figure CTRs based on various alternate W% estimates. The first is based on game-Expected W%, which you can read about here. It uses each team’s game-by-game distribution of runs scored and allowed, but treats the two as independent:



Next is Expected W%, that is to say Pythagenpat based on actual runs scored and allowed:



Finally, CTR based on Predicted W% (Pythagenpat based on runs created and allowed, actually Base Runs):



A few seasons ago I started including a CTR version based on actual wins and losses, but including the postseason. I am not crazy about this set of ratings, but I can’t quite articulate why.

On the one hand, adding in the playoffs is a no-brainer. The extra games are additional datapoints regarding team quality. If we have confidence in the rating system (and I won’t hold it against you if you don’t), then the unbalanced nature of the schedule for these additional games shouldn’t be too much of a concern. Yes, you’re playing stronger opponents, but the system understands that and will reward you (or at least not penalize you) for it.

On the other hand, there is a natural tendency among people who analyze baseball statistics to throw out the postseason, due to concerns about unequal opportunity (since most of the league doesn’t participant) and due to historical precedent. Unequal opportunity is a legitimate concern when evaluating individuals--particularly for counting or pseudo-counting metrics like those that use a replacement level baseline--but much less of a concern with teams. Even though the playoff participants may not be the ten most deserving teams by a strict, metric-based definition of “deserving”, there’s no question that teams are largely responsible for their own postseason fate to a much, much greater extent than any individual player is. And the argument from tradition is fine if the issue at hand is the record for team wins or individual home runs or the like, but not particularly applicable when we are simply using the games that have been played as datapoints by which to gauge team quality.

Additionally, the fact that playoff series are not played to their conclusion could be seen as introducing bias. If the Red Sox get swept by the Indians, they not only get three losses added to their ledger, they lose the opportunity to offset that damage. The number of games that are added to a team’s record, even within a playoff round, is directly related to their performance in the very small sample of games.

Suppose that after every month of the regular season, the bottom four teams in the league-wide standings were dropped from the schedule. So after April, the 7-17 Twins record is frozen in place. Do you think this would improve our estimates of team strength? And I don’t just mean from the smaller sample, obviously their record as used in the ratings could be more heavily regressed than teams that played more games. But it would freeze our on-field observations of the Twins, and the overall effect would be to make the dropped teams look worse than their “true” strength.

I doubt that poorly reasoned argument swayed even one person, so the ratings including playoff performance are:



The teams sorted by difference between playoff CTR (pCTR) and regular season CTR (rsCTR):



It’s not uncommon for the pennant winners to be the big gainers, but the Cubs and Indians made a lot of hay this year, as the Cubs managed to pull every other team in the NL Central up one point in the ratings. The Rangers did the reverse with the AL West by getting swept out of the proceedings. They still had a better ranking than the team that knocked them out, as did Washington.

Tuesday, January 10, 2017

Hitting by Position, 2016

Of all the annual repeat posts I write, this is the one which most interests me--I have always been fascinated by patterns of offensive production by fielding position, particularly trends over baseball history and cases in which teams have unusual distributions of offense by position. I also contend that offensive positional adjustments, when carefully crafted and appropriately applied, remain a viable and somewhat more objective competitor to the defensive positional adjustments often in use, although this post does not really address those broad philosophical questions.

The first obvious thing to look at is the positional totals for 2016, with the data coming from Baseball-Reference.com. "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the overall major league average (this is a departure from past posts; I’ll discuss this a little at the end). “LPADJ” is the long-term positional adjustment that I use, based on 2002-2011 data. The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:



Obviously when looking at a single season of data it’s imperative not to draw any sweeping conclusions. That doesn’t make it any less jarring to see that second basemen outhit every position save the corner infield spots, or that left fielders created runs at the league average rate. The utter collapse of corner outfield offense left them, even pooled, ahead only of catcher and shortstop. Pitchers also added another point of relative RG, marking two years in a row of improvement (such as it is) over their first negative run output in 2014.

It takes historical background to fully appreciate how much the second base and corner outfield performances stack up. 109 for second base is the position’s best showing since 1924, which was 110 thanks largely to Rogers Hornsby, Eddie Collins and Frankie Frisch. Second base had not hit for the league average since 1949. (I should note that the historical figures I’m citing are not directly comparable - they based on each player’s primary position and include all of their PA, regardless of whether they were actually playing the position at the time or not, unlike the Baseball-Reference positional figures used for 2016). Corner outfield was even more extreme at 103, the nadir for the 116 seasons starting with 1901 (the previous low was 107 in 1992).

If the historical perspective is of interest, you may want to check out Corrine Landrey’s article in The Hardball Time Baseball Annual. She includes some charts showing OPS+ by position in the DH-era and theorizes that an influx of star young players, still playing on the right-side of the defensive spectrum, has led to the positional shakeup. While I cautioned above about over-generalizing from one year of data, it has been apparent over the last several years that the spread between positions has declined. Landrey’s explanation is as viable as any I’ve seen to explain these season’s results.

Moving on to looking at more granular levels of performance, I always start by looking at the NL pitching staffs and their RAA. I need to stress that the runs created method I’m using here does not take into account sacrifices, which usually is not a big deal but can be significant for pitchers. Note that all team figures from this point forward in the post are park-adjusted. The RAA figures for each position are baselined against the overall major league average RG for the position, except for left field and right field which are pooled.



This is the second consecutive year that the Giants led the league in RAA, and of course they employ the active pitcher most known for his batting. But as usual the spread from top to bottom is in the neighborhood of twenty runs.

I don’t run a full chart of the leading positions since you will very easily be able to go down the list and identify the individual primarily responsible for the team’s performance and you won’t be shocked by any of them, but the teams with the highest RAA at each spot were:

C--WAS, 1B--CIN, 2B--WAS, 3B--TOR, SS--LA, LF--PIT, CF--LAA, RF--BOS, DH--BOS

More interesting are the worst performing positions; the player listed is the one who started the most games at that position for the team:



I am have as little use for batting average as anyone, but I still find the Angels .209 left field average to be the single most entertaining number on that chart (remember, that’s park-adjusted; it was .204 raw). The least entertaining thing for me at least was the Indians’ production at catcher, which was tolerable when Roberto Perez was drawing walks but intolerable when Terry Francona was pinch-running for him in Game 7.

I like to attempt to measure each team’s offensive profile by position relative to a typical profile. I’ve found it frustrating as a fan when my team’s offensive production has come disproportionately from “defensive” positions rather than offensive positions (“Why can’t we just find a corner outfielder who can hit?”) The best way I’ve yet been able to come up with to measure this is to look at the correlation between RG at each position and the long-term positional adjustment. A positive correlation indicates a “traditional” distribution of offense by position--more production from the positions on the right side of the defensive spectrum. (To calculate this, I use the long-term positional adjustments that pool 1B/DH as well as LF/RF, and because of the DH I split it out by league):



As you can see, there are good offenses with high correlations, good offenses with low correlations, and every other combination. I have often used this space to bemoan the Indians continual struggle to get adequate production from first base, contributing to their usual finish in the bottom third or so of correlation. This year, they rank in the middle of the pack, and while it is likely a coincidence that they had a good season, it’s worth noting that Mike Napoli only was average for a first baseman. Even that is much better than some of their previous showings.

Houston’s two best hitting positions (not relative to positional averages, but in terms of RG) were second base and shortstop. In fact the Astros positions in descending order of RG was 4, 6, 9, 2, 5, 3, D, 7, 8. That’s how you get a fairly strong negative correlation between RG and PADJ.

The following charts, broken out by division, display RAA for each position, with teams sorted by the sum of positional RAA. Positions with negative RAA are in red, and positions that are +/-20 RAA are bolded:



Boston had the AL’s most productive outfield, while Toronto was just an average offense after bashing their way to a league leading 118 total RAA in 2015. It remains jarring to see New York at the bottom of an offense list, even just for a division, and their corner infielders were the worst in the majors.



Other than catcher, Cleveland was solid everywhere, with no bold positions--and in this division, that’s enough to lead in RAA and power a cruise to the division title. Detroit had the AL’s top corner infield RAA (no thanks to third base). Kansas City, where to begin with the sweet, sweet schadenfreude? Eksy Magic? No, already covered at length in the leadoff hitters post. Maybe the fact that they had the worst middle infield production in MLB? Or that the bros at the corners chipped in another -19 RAA to also give them the worst infield? The fact that they were dead last in the majors in total RAA? It’s just too much.



The pathetic production of the Los Angeles left fielders was discussed above. The Mike Trout-led center fielders were brilliant, the best single position in the majors. And so, even with a whopping -31 runs from left field, the Angels had the third-most productive outfield in MLB. Houston’s middle infielders, also mentioned above, were the best in the majors. Oakland’s outfield RAA was last in the AL.



Washington overcame the NL’s least productive corner infielders, largely because they had the NL’s most productive middle infielders. Miami had a similar but even more extreme juxtaposition, the NL’s worst infield and the majors’ best outfield, and that with a subpar season from Giancarlo Stanton as right field was the least productive of the three spots. Atlanta had the NL’s worst-hitting middle infield, and Philadelphia the majors’ worst outfield despite Odubel Herrera making a fool of me.



Chicago was tops in the majors in corner infield RAA and total infield RAA. No other teams in this division achieved any superlatives but thanks to Joey Votto and a half-season of Jonathon Lucroy, every team was in the black for total RAA, even if we were to add in Cincinnati’s NL-trailing -9 RAA from pitchers.



No position grouping superlatives in this division, but it feels like more should be said about Corey Seager. It seems like a rookie shortstop hitting as he did, fielding adequately enough to be a serious MVP candidate for a playoff team in a huge market for one of the five or so most venerated franchises should have gotten a lot more attention than it did. Is it the notion that a move to third base is inevitable? Is he, like the superstar down the road, just considered too boring of a personality?

The full spreadsheet is available here.