Tuesday, March 14, 2017

Win Value of Pitcher Adjusted Run Averages

The most common class of metrics used in sabermetrics for cross-era comparisons use relative measures of actual or estimated runs per out or sother similar denominator. These include ERA+ for pitchers and OPS+ or wRC+ for batters (OPS+ being an estimate of relative runs per out, wRC+ using plate appearances in the denominator but accounting for the impact of avoiding outs). While these metrics provide an estimate of runs relative to the league average, they implicitly assume that the resulting relative scoring level is equally valuable across all run environments.

This is in fact not the case, as it is well-established that the relationship between run ratio and winning percentage depends on the overall level of run scoring. A team with a run ratio of 1.25 will have a different expected winning percentage if they play in a 9 RPG environment than if they play in a 10 RPG environment. Metrics like ERA+ and OPS+ do not translate relative runs into relative wins, but presumably the users of such metrics are ultimately interested in what they tell us about player contribution to wins.

There are two key points that should be acknowledged upfront. One is that the difference in win value based on scoring level is usually quite small. If it wasn’t, winning percentage estimators that don’t take scoring level into account would not be able to accurately estimate W% across the spectrum of major league teams. While methods that do consider scoring level are more accurate estimators of W% than similar methods that don’t, a method like fixed exponent Pythagorean can still produce useful estimates despite maintaining a fixed relationship between runs and wins.

The second is that players are not teams. The natural temptation (and one I will knowingly succumb to in what follows) is to simply plug the player’s run ratio into the formula and convert to a W%. This approach ignores the fact that an individual player’s run rate does not lead directly to wins, as the performance of his teammates must be included as well. Pitchers are close, because while they are in the game they are the team (more accurately, their runs allowed figures reflect the totality of the defense, which includes contributions from the fielders), but even ignoring fielding, non-complete games include innings pitched by teammates as well.

For the moment I will set that aside and instead pretend (in the tradition of Bill James’ Offensive Winning %) that a player or pitcher’s run ratio can or should be converted directly to wins, without weighting the rest of the team. This makes the figures that follow something of a freak show stat, but the approach could be applied directly to team run ratios as well. Individuals are generally more interesting and obviously more extreme, which means that the impact of considering run environment will be overstated.

I will focus on pitchers for this example and will use Bob Gibson’s 1968 season as an example. Gibson allowed 49 runs in 304.2 innings, which works out to a run average of 1.45 (there will be some rounding discrepancies in the figures). In 1968 the NL average RA was 3.42, so Gibson’s adjusted RA (aRA for the sake of this post) is RA/LgRA = .423 (ideally you would park-adjust as well, but I am ignoring park factors for this post). As an aside, please resist the temptation to instead cite his RA+ of 236 instead. Please.

.423 is a run ratio; Gibson allowed runs at 42.3% of the league average. Since wins are the ultimate unit of measurement, it is tempting to convert this run ratio to a win ratio. We could simply square it, which reflects a Pythagorean relationship. Ideally, though, we should consider the run environment. The 1968 NL was an extremely low scoring league. Pythagenpat suggests that the ideal exponent is around 1.746. Let’s define the Pythagenpat exponent to use as:

x = (2*LgRA)^.29

Note that this simply uses the league scoring level to convert to wins; it does not take into account Gibson’s own performance. That would be an additional enhancement, but it would also strongly increase the distortion that comes from viewing a player as his own team, albeit less so for pitchers and especially those who basically were pitching nine innings/start as in the case of Gibson.

So we could calculate a loss ratio as aRA^x, or .223 for Gibson. This means that a team with Gibson’s aRA in this environment would be expected to have .223 losses for every win (basic ratio transformations apply; the reciprocal would be the win ratio, the loss ratio divided by (1 + itself) would be a losing %, the complement of that W%, etc.)

At this point, many people would like to convert it to a W% and stop there, but I’d like to preserve the scale of a run average while reflecting the win impact. In order to do so, I need to select a Pythagorean exponent corresponding to a reference run environment to convert Gibson’s loss ratio back to an equivalent aRA for that run environment. For 1901-2015, the major league average RA was 4.427, which I’ll use as the reference environment, which corresponds to a 1.882 Pythagenpat exponent (there are actually 8.94 IP/G over this span, so the actual RPG is 8.937 which would be a 1.887 exponent--I'll stick with RA rather than RPG for this example since we are already using it to calculate aRA).

If we call that 1.882 exponent r, then the loss ratio can be converted back to an equivalent aRA by raising it to the (1/r) power. Of course, the loss ratio is just an interim step, and this is equivalent to:

aRA^(x*(1/r)) = aRA^(x/r) = waRA

waRA (excuse the acronyms, which I don’t intend to survive beyond this post) is win-Adjusted Run Average. For Gibson, it works out to .450, which illustrates how small the impact is. Pitching in one of the most extreme run environments in history, Gibsons aRA is only 6.4% higher after adjusting for win impact.

In 1994, Greg Maddux allowed 44 runs in 202 innings for a run average of 1.96. Pitching in a league with a RA of 4.65, his aRA was .421, basically equal to Gibson. But his waRA was better, at .416, since the same run ratio leads to more wins in a higher scoring environment.

It is my guess that consumers of sabermetrics will generally find this result unsatisfactory. There seems to be a commonly-held belief that it is easier to achieve a high ERA+ in a higher run scoring environment, but the result of this approach is the opposite--as RPG increases, the win impact of the same aRA increases as well. Of course, this approach says nothing about how “easy” it is to achieve a given aRA--it converts aRA to an win-value equivalent aRA in a reference run environment. It is possible that it could be simultaneously “easier” to achieve a low aRA in a higher scoring environment and that the value of a low aRA be enhanced in a higher scoring environment. I am making no claim regarding the impressiveness or aesthetic value, etc. of any pitcher’s performance, only attempting to frame it in terms of win value.

Of course, the comparison between Gibson and Maddux need not stop there. I do believe that waRA shows us that Maddux’ rate of allowing runs was more valuable in context than Gibson’s, but there is more to value than the rate of allowing runs. Of course we could calculate a baselined metric like WAR to value the two seasons, but even if we limit ourselves to looking at rates, there is an additional consideration that can be added.

So far, I’ve simply used the league average to represent the run environment, but a pitcher has a large impact on the run environment through his own performance. If we want to take this into account, it would be inappropriate to simply use LgRA + pitcher’s RA as the new RPG to plug into Pythagenpat; we definitely need to consider the extent to which the pitcher’s teammates influence the run environment, since ultimately Gibson’s performance was converted into wins in the context of games played by the Cardinals, not a hypothetical all-Gibson team. So I will calculate a new RPG instead by assuming that the 18 innings in a game (to be more precise for a given context, two times the league average IP/G) is filled in by the pitcher’s RA for his IP/G, and the league’s RA for the remainder.

In the 1968 NL, the average IP/G was 9.03 and Gibson’s 304.2 IP were over 34 appearances (8.96 IP/G), so the new RPG is 8.96*1.45/9 + (2*9.03 - 8.96)* 3.42/9 = 4.90 (rather than 6.84 previously). This converts to a Pythagenpat exponent of 1.59, and an pwaRA (personal win-Adjusted Run Average?) of .485. To spell that all out in a formula:

px = ((IP/G)*RA/9 + (2*Lg(IP/G) - IP/G)*LgRA/9) ^ .29
pwaRA = aRA^(px/r)

Note that adjusting for the pitcher’s impact on the scoring context reduces the win impact of effective pitchers, because as discussed earlier, lowering the RPG lowers the Pythagenpat exponent and makes the same run ratio convert to fewer wins. In fact, considering the pitcher’s effect on the run environment in which he operates actually brings most starting pitchers’ pwaRA closer to league average than their aRA is.

pwaRA is divorced from any real sort of baseball meaning, though, because pitchers aren’t by themselves a team. Suppose we calculated pwaRA for two teammates in a 4.5 RA league. The starter pitches 6 innings and allows 2 runs; the reliever pitches 3 innings and allows 1. Both pitchers have a RA of 3.00, and thus identical aRA (.667) or waRA (.665). Furthermore, their team also has a RA of 3.00 for this game, and whether figured as a whole or as the weighted average of the two individuals, the team also has the same aRA and waRA.

However, if we calculate the starter’s pwaRA, we get .675, while the reliever is at .667. Meanwhile, the team has a pwaRA of .679, which makes this all seem quite counterintuitive. But since all three entities have the same RA, the lower the run environment, the less win value it has on a per inning basis.

I hope this post serves as a demonstration of the difficulty of divorcing a pitcher’s value from the number of innings he pitched. Of course, the effects discussed here are very small, much smaller than the impact of other related differences, like the inherent statistical advantage of pitchers over shorter stints, attempts to model differences in replacement level between starters and relievers, and attempts to detect/value any beneficial side effects of starters working deep into games.

One of my long-standing interests has been the proper rate stat to use to express a batter’s run contribution (I have been promising myself for almost as long as this blog has been existence that I will write a series of posts explaining the various options for such a metric and the rationale for each, yet have failed to do so). I’ve never had the same pull to the question for pitchers, in part because the building block seems obvious: runs/out (which depending on how one defines terms can manifest itself as RA, ERA, component ERA, FIP-type metrics, etc.)

But while there are a few adjustments that can theoretically made between a hitter’s overall performance expressed as a rate and a final value metric (like WAR), the adjustments (such as the hitter’s impact on his team’s run scoring beyond what the metric captures itself, and the secondary effect that follows on the run/win conversion) are quite minor in scale compared to similar adjustments for pitchers. While the pitcher (along with his fielders) can be thought as embodying the entire team while he is the game, that also means that said unit’s impact on the run/win conversion is significant. And while there are certainly cases of batters whose rates may be deceiving because of how they are deployed by their managers (particularly platooning), the additional playing time over which a rate is spread increases value in a WAR-like metric without any special adjustment. Pitchers’ roles and secondary effects thereof (like any potential value generated by “eating” innings) have a more significant (and more difficult to model) impact on value than the comparable effects for position players.

Monday, February 13, 2017

Rebuilding a Strip Mall

"Rebuilding", as commonly thrown around in sports discussions, is an interesting term. It inherently implies that something had been built on the same spot previously. It does not, however, give an indication whether what was built there was a blanket fort or the Taj Mahal, a strip mall or the Sears Tower. If one rebuilds on the site of a strip mall, does "re-" imply they are building another strip mall, or might they be building something else?

The baseball program that Greg Beals has presided over for six seasons at The Ohio State University has been much more of a strip mall than a Sears Tower. After his most successful season, which saw OSU tie for third in the Big Ten regular season, win the Big Ten Tournament, and qualify for their first NCAA regional since 2009, Beals is now faced with a rebuilding project in the classic sports sense. Of the nine players with the most PA in 2016, OSU must replace seven, so it would be fair to say that there will be seven new regulars. OSU must also replace two of its three weekend starters; the bullpen is the only area of the roster not decimated by graduation and the draft.

Note: The discussion of potential player roles that follows is my own opinion, informed by my own knowledge of the players and close watching of the program and information released by the SID, particularly the season preview posted here.

Sophomore Jacob Barnwell will almost certainly be the primary catcher; he played sparingly last season (just 29 PA). This is one of the few open positions not due to loss, but rather to a position switch which will be discussed in a moment. Classmate Andrew Fishel (8 PA) will serve as his backup.

First base/DH will be shared by senior Zach Ratcliff, who has flashed power at times during his career but has never earned consistent playing time, and Boo Coolen, a junior Hawaii native who played at Cypress CC in California. Junior Noah McGowan, a transfer from McLennan CC in Texas, would appear to have the inside track at the keystone; his JUCO numbers are impressive but come with obvious caveats. Sophomore Brady Cherry, who got off to a torrid start in 2016 but then cooled precipitously (final line .218/.307/.411 in 143 PA) is likely to play third and bat in the middle of the order. At shortstop, senior captain Jalen Washington moves out from behind the plate to captain the infield; he spent his first two years as a Buckeye as a utility infielder, so it was the move to catcher, not to shortstop that really stands out. Unfortunately, Washington didn’t offer much with the bat as a junior (.249/.331/.343 in 261 PA). Other infield contenders include true freshman shortstop Noah West, redshirt freshman middle infielder Casey Demko, true freshman Conor Pohl at the corners, and redshirt sophomore Nate Romans and redshirt freshman Matt Carpenter in utility roles.

The one thing that appears clear in the outfielder is that junior Tre’ Gantt will take over as center fielder; he struggled offensively last season (.255/.311/.314 in 158 PA). True freshman Dominic Canzone may step in right away in right field, while left field/DH might be split between a pair of transfers. Tyler Cowles, a junior Columbus native who hit well at Sinclair CC in Georgia will attempt to join Coolen and satisfy the Beals’ desperate need for bats with experience and power. Other outfielders include senior former pitcher Shea Murray and little-used redshirt sophomore Ridge Winand.

The pitching staff is slightly more intact, but not much so. Redshirt junior captain Adam Niemeyer will likely be the #1 starter as the only returning weekend starter; his 2016 campaign can be fairly described as average. Sophomore Ryan Feltner was the #4 starter last year and so is a safe bet to pitch on the weekend; his 5.67 eRA was not encouraging but 8 K/3.9 W suggest some raw, harness-able ability. The third spot will apparently go to an erstwhile reliever. Junior Yianni Pavlopoulos was a surprising choice as closer last year, but pitched very well (10.3 K/3.3 W, 3.72 eRA), while senior Jake Post returns from a season wiped out by injury. Neither pitcher has been the picture of health throughout their careers, but Pavlopoulos seems the more likely choice to start. Junior Austin Woodby (7.75 eRA in 39 innings) and sophomore lefty Connor Curlis (six relief innings) will jockey for weekday assignments along with junior JUCO transfer Reece Calvert (a teammate of McGowan) and three true freshmen: lefty Michael McDonough and righties Collin Lollar and Gavin Lyon.

The bullpen will be well-stocked, even assuming Pavlopoulos takes a spot in the rotation. Junior sidearmer Seth Kinker was a workhorse (team-high 38 appearances) and behind departed ace Tanner Tully was arguably Ohio’s most valuable pitcher in 2016. Senior Jake Post will return from a season lost to injury looking to return to a setup role, and junior sidearmer Kyle Michalik pitched well in middle relief last season. These four form a formidable bullpen that will almost certainly be augmented by a lefty specialist, a favorite of Beals. He’ll choose from senior Joe Stoll (twelve unsuccessful appearances), true freshman Andrew Magno, and the favorite in my book is Curlis should be not best Woodby for a starting spot. It appears that sophomore JUCO transfer Thomas Waning (also a sidearmer; one of the few positives about Beals as a coach is his affinity for sidearmers). Other right-handed options for the pen will include junior Dustin Jourdan (a third JUCO transfer from McLennan), sophomore Kent Axcell (making the jump from the club team), and true freshman Jake Vance.

The non-conference schedule is again rather unambitious. The season opens the weekend of February 17 in central Florida with neutral site games against Kansas State (two), Delaware, and Pitt. Two games each against Utah and Oregon State in Arizona will follow as part of the Big Ten/Pac 12 challenge. The Bucks will then play true road series in successive weekends against Campbell and Florida Gulf Coast, then play midweek neutral site games in Port Charlotte, FL against Lehigh and Bucknell. The home schedule opens March 17 with a weekend series against Xavier (the Sunday finale being played in at XU), and the next two weekends see the Buckeyes open Big Ten play by hosting Minnesota and Purdue.

Subsequent weekend series are at Penn State, at Michigan State, home against UNC-Greensboro, home against Nebraska, at the forces of evil, at Iowa, and home against Indiana. Midweek opponents are Youngstown State, OU, Kent State, Cincinnati, Eastern Michigan, Northern Kentucky, Texas Tech (two), Bowling Green, Ball State, and Toledo, all at home, giving OSU 28 scheduled home dates.

Should OSU finish in the top eight in the Big Ten, the Big Ten Tournament is shifting from the recent minor league/MLB/CWS venues (including Huntington Park in Columbus, Target Field, and TD Ameritrade Park in Omaha) to campus sites, although scheduled in advance instead of at the home park of the regular season champ as was the case for many years in the past. This year’s tournament will be in Bloomington, and it speaks to both the volume of players lost and Beals’ uninspiring record that participation in this event should not be taken for granted.

Thursday, February 09, 2017

Simple Extra Inning Game Length Probabilities

With the recent news that MLB will be testing starting an extra inning with a runner on second in the low minors, it might be worthwhile to crunch some numbers and estimate the impact on the average length of extra innings game under various base/out situations to start innings. I used empirical data on the probability of scoring X runs in an inning given the base/out situation based on a nifty calculator created by Greg Stoll. Stoll’s description says it is based on MLB games from 1957-2015, including postseason.

Obviously using empirical data doesn’t allow you to vary the run environment…the expected runs for the rest of the inning with no outs, bases empty is .466 so the average R/G here is around 4.2. It also doesn’t account for any behavioral changes due to game situation, as strategy can obviously differ when it is an extra innings situation as opposed to a more mundane point in the game. Plus any quirks in the data are not smoothed over. Still, I think it is a fun exercise to quickly estimate the outcome of various extra inning setups.

These results will be presented in terms of average number of extra innings and probability of Y extra innings assuming that the rule takes effect in the tenth inning (i.e. each extra inning is played under the same rules).

If you know the probability of scoring X runs, assume the two teams are of equal quality, and assume independence between their runs scored (all significant assumptions), then it is very simple to calculate the probabilities of various outcomes in extra innings. If Pa(x) is the probability that team A scores x runs in an inning, and Pb(x) is the probability that team B scores x runs in an inning, then the probability that team A outscores team B in the inning (i.e. wins the game this inning) is:

P(A > B) = Pa(1)*Pb(0) + Pa(2)*[Pb(0) + Pb(1)] + Pa(3)*[Pb(0) + Pb(1) + Pb(2)] + ….

Since we’ve assumed the teams are of equal quality, the probability for team B is the same, just switching the Pas and Pbs. We can calculate the probability of them scoring the same number of runs (i.e. the probability the game extends an additional inning) by taking 1 – P(A > B) – P(B > A) = 1 – 2*P(A >B) since the teams are even, or directly as:

P(A = B) = Pa(0)*Pb(0) + Pa(1)*Pb(1) + Pa(2)*Pb(2) + … = Pa(0)^2 + Pa(1)^2 + Pa(2)^2 + … since the teams are even

I called this P. The probability that game continues past the tenth is equal to P. The probability that the game terminates after the tenth is 1-P. The probability that the game continues past the eleventh is P^2; the probability that the game terminates after the eleventh is P*(1 – P). Continue recursively from here. The average length of the game is 10*P(terminates after 10) + 11*P(terminates after 11) + …

I used Stoll’s data to estimate a few probabilities of game length for a rule that would start each extra innings with the teams in each of the 24 base/out situations. For a given inning-initial base/out situation, P(10) is the probability that the game is over after 10 innings, P(11) the probability it is over after 11 or fewer extra innings, etc. “average” is the average number of innings in an extra inning game played under that rule, and R/I is the average scored in the remainder of the inning from Stoll’s data for teams in that base/out situation.

It will come as no surprise that generally the higher the R/I, the lower the probability of the game continuing is. In a low scoring environment, the teams are more likely to each score zero or one run; as the scoring environment increases, so does the variance (I should have calculated the variance of runs per inning from Stoll’s data to really drive this point home, but I didn’t think of it until after I’d made the tables), and differences in inning run totals between the two teams are what ends extra inning games.

The highlighted roles are bases empty, nobody out (i.e. the status quo); runner at second, nobody out (the proposed MLB rule); runners at first and second, nobody out (the international rule, starting from the eleventh inning; this chart assumes all innings starting with the tenth are played under the same rules, so it doesn’t let you compare these two rules directly); and bases loaded, nobody out, which maximizes the run environment and minimizes the duration of extra innings (making games beyond 12 innings as theoretically rare as games beyond 15 innings are under traditional rules). Of course, these higher scoring innings would take longer to play, so simply looking at the duration of game doesn’t fully address the alleged problems that tinkering with the rules would be intended to solve.

I did separately calculate these probabilities for the international rule--play the tenth inning under standard rules, then start subsequent innings with runners on first and second. It produces longer games than starting with a runner at second in the tenth, which is not surprising.

Monday, January 30, 2017

Run Distribution and W%, 2016

Every year I state that by the time this post rolls around next year, I hope to have a fully functional Enby distribution to allow the metrics herein to be more flexible (e.g. not based solely on empirical data, able to handle park effects, etc.) And every year during the year I fail to do so. “Wait ‘til next year”...the Indians taking over the longest World Series title drought in spectacular fashion has now given me an excuse to apply this to any baseball-related shortcoming on my part. This time, it really should be next year; what kept me from finishing up over the last twelve months was only partly distraction but largely perfectionism on a minor portion of the Enby methodology that I think I now have convinced myself is folly.

Anyway, there are some elements of Enby in this post, as I’ve written enough about the model to feel comfortable using bits and pieces. But I’d like to overhaul the calculation of gOW% and gDW% that are used at the end based on Enby, and I’m not ready to do that just yet given the deficiency of the material I’ve published on Enby.

Self-indulgence, aggrandizement, and deprecation aside, I need to caveat that this post in no way accounts for park effects. But that won’t come in to play as I first look at team record in blowouts and non-blowouts, with a blowout defined as 5+ runs. Obviously some five run games are not truly blowouts, and some are; one could probably use WPA to make a better definition of blowout based on some sort of average win probability, or the win probability at a given moment or moments in the game. I should also note that Baseball-Reference uses this same definition of blowout. I am not sure when they started publishing it; they may well have pre-dated by usage of five runs as the delineator. However, I did not adopt that as my standard because of Baseball-Reference, I adopted it because it made the most sense to me being unaware of any B-R standard.

73.0% of major league games in 2015 were non-blowouts (of course 27.0% were). The leading records in non-blowouts:

Texas was much the best in close-ish games; their extraordinary record in one-run games which of course are a subset of non-blowouts was well documented. The Blue Jays have made it to consecutive ALCS, but their non-blowout regular season record in 2015-16 is just 116-115. Also, if you audit this you may note that the total comes to 1771-1773, which is obviously wrong. I used Baseball Prospectus' data.

Records in blowouts:

It should be no surprise that the Cubs were the best in blowouts. Toronto was nearly as good last year, 37-12, for a two-year blowout record of 66-27 (.710).

The largest differences (blowout - non-blowout W%) and percentage of blowouts and non-blowouts for each team:

It is rare to see a playoff team with such a large negative differential as Texas had. Colorado played the highest percentage of blowouts and San Diego the lowest, which shouldn’t come as a surprise given that scoring environment has a large influence. Outside of Colorado, though, the Cubs and the Indians played the highest percentage of blowout games, with the latter not sporting as a high of a W% but having the second most blowout wins.

A more interesting way to consider game-level results is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:

The “marg” column shows the marginal W% for each additional run scored. In 2015, the third run was both the run with the greatest marginal impact on the chance of winning, while it took a fifth run to make a team more likely to win than lose. 2016 was the first time since 2008 that teams scoring four runs had a losing record, a product of the resurgence in run scoring levels.

I use these figures to calculate a measure I call game Offensive W% (or Defensive W% as the case may be), which was suggested by Bill James in an old Abstract. It is a crude way to use each team’s actual runs per game distribution to estimate what their W% should have been by using the overall empirical W% by runs scored for the majors in the particular season.

The theoretical distribution from Enby discussed earlier would be much preferable to the empirical distribution for this exercise, but I’ve defaulted to the 2016 empirical data. Some of the drawbacks of this approach are:

1. The empirical distribution is subject to sample size fluctuations. In 2016, all 58 times that a team scored twelve runs in a game, they won; meanwhile, teams that scored thirteen runs were 46-1. Does that mean that scoring 12 runs is preferable to scoring 13 runs? Of course not--it's a quirk in the data. Additionally, the marginal values don’t necessary make sense even when W% increases from one runs scored level to another (In figuring the gEW% family of measures below, I lumped games with 12+ runs together, which smoothes any illogical jumps in the win function, but leaves the inconsistent marginal values unaddressed and fails to make any differentiation between scoring in that range. The values actually used are displayed in the “use” column, and the invuse” column is the complements of these figures--i.e. those used to credit wins to the defense.)

2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored.

3. Related to #2 (really its root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce more quirks into the data.

I keep promising that I will use Enby to replace the empirical approach, but for now I will use Enby for a couple graphs but nothing more.

First, a comparison of the actual distribution of runs per game in the majors to that predicted by the Enby distribution for the 2016 major league average of 4.479 runs per game (Enby distribution parameters are B = 1.1052, r = 4.082, z = .0545):

This is pretty typical of the kind of fit you will see from Enby for a given season: a few important points where there’s a noticeable difference (in this case even tallies two, four, six on the high side and 1 and 7 on the low side), but generally acquitting itself as a decent model of the run distribution.

I will not go into the full details of how gOW%, gDW%, and gEW% (which combines both into one measure of team quality) are calculated in this post, but full details were provided here and the paragraph below gives a quick explanation. The “use” column here is the coefficient applied to each game to calculate gOW% while the “invuse” is the coefficient used for gDW%. For comparison, I have looked at OW%, DW%, and EW% (Pythagenpat record) for each team; none of these have been adjusted for park to maintain consistency with the g-family of measures which are not park-adjusted.

A team’s gOW% is the sumproduct of their frequency of scoring x runs, where x runs from 0 to 22, and the empirical W% of teams in 2015 when they scored x runs. For example, Philadelphia was shutout 11 times; they would not be expected to win any of those games (nor would they, we can be certain). They scored one run 23 times; an average team in 2016 had a .089 W% when scoring one run, so they could have been expected to win 2.04of the 23 games given average defense. They scored two runs 22 times; an average team had a .228 W% when scoring two, so they could have been expected to win 5.02 of those games given average defense. Sum up the estimated wins for each value of x and divide by the team’s total number of games and you have gOW%.

It is thus an estimate of what W% a team with the given team’s empirical distribution of runs scored and a league average defense would have. It is analogous to James’ original construct of OW% except looking at the empirical distribution of runs scored rather than the average runs scored per game. (To avoid any confusion, James in 1986 also proposed constructing an OW% in the manner in which I calculate gOW%).

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate):

Positive: MIA, PHI, ATL, KC
Negative: LA, SEA

The Marlins offense had the largest difference (3.55) between their corresponding g-type W% and their OW%/DW%, so I like to include a run distribution chart to hopefully ease in understanding what this means. Miami scored 4.167 R/G, so their Enby parameters (r = 3.923, B = 1.0706, z = .0649) produce these estimated frequencies:

Miami scored 0-3 runs in 47.8% of their games compared to an expected 47.9%. But by scoring 0-2 runs 3% less often then expected and scoring three 3% more often, they had 1.3 more expected wins from such games than Enby expected. They added an additional 1.2 wins from 4-6 runs, and lost 1.1 from 7+ runs. (Note that the total doesn’t add up to the difference between their gOW% and OW%, nor should it--the comparisons I was making were between what the empirical 2016 major league W%s for each x runs scored predicted using their actual run distribution and their Enby run distribution. If I had my act together and was using Enby to estimate the expected W% at each x runs scored, then we would expect a comparison like the preceding to be fairly consistent with a comparison of gOW% to OW%).

Teams with differences of +/- 2 wins between gDW% and standard DW%:

Positive: CIN, COL, ARI
Negative: NYN, MIL, MIA, TB, NYA

The Marlins were the only team to appear on both the offense and defense list, their defense giving back 2.75 wins when looking at their run distribution rather than run average.

Teams with differences of +/- 2 wins between gEW% and standard EW%:

Positive: PHI, TEX, CIN, KC
Negative: LA, SEA, NYN, MIL, NYA, BOS

The Royals finally showed up on these lists, but turning a .475 EW% into a .488 gEW% is not enough pixie dust to make the playoffs.

Below is a full chart with the various actual and estimated W%s:

Monday, January 23, 2017

Crude Team Ratings, 2016

For the last several years I have published a set of team ratings that I call "Crude Team Ratings". The name was chosen to reflect the nature of the ratings--they have a number of limitations, of which I documented several when I introduced the methodology.

I explain how CTR is figured in the linked post, but in short:

1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.

2) Figure the average win ratio of the team’s opponents.

3) Adjust for strength of schedule, resulting in a new set of ratings.

4) Begin the process again. Repeat until the ratings stabilize.

The resulting rating, CTR, is an adjusted win/loss ratio rescaled so that the majors’ arithmetic average is 100. The ratings can be used to directly estimate W% against a given opponent (without home field advantage for either side); a team with a CTR of 120 should win 60% of games against a team with a CTR of 80 (120/(120 + 80)).

First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS:

Last year, the top ten teams in CTR were the playoff participants. That was not remotely the case this year thanks to a resurgent gap in league strength. While the top five teams in the AL made the playoffs and the NL was very close, St. Louis slipping just ahead of New York and San Francisco (by a margin of .7 wins if you compare aW%), the Giants ranked only fifteenth in the majors in CTR. Any of the Mariners, Tigers, Yankees, or Astros were considered stronger than the actual NL #3 seed and CTR finisher the Dodgers.

The Dodgers had the second-softest schedule in MLB, ahead of only the Cubs. (The natural tendency is for strong teams in weak divisions to have the lowest SOS, since they don’t play themselves. The flip is also true--I was quite sure without checking to verify that Tampa Bay had the toughest schedule). The Dodgers average opponent was about as good as the Pirates or the Marlins; the Mariners average opponent was rated stronger than the Cardinals.

At this point you probably want to see just how big of a gap there was between the AL and NL in average rating. Originally I gave the arithmetic average CTR for each divison, but that’s mathematically wrong--you can’t average ratios like that. Then I switched to geometric averages, but really what I should have done all along is just give the arithemetic average aW% for each division/league. aW% converts CTR back to an “equivalent” W-L record, such that the average across the major leagues will be .50000. I do this by taking CTR/(100 + CTR) for each team, then applying a small fudge factor to force the average to .500. In order to maintain some basis for comparison to prior years, I’ve provided the geometric average CTR alongside the arithmetric average aW%, and the equivalent CTR by solving for CTR in the equation:

aW% = CTR/(100 + CTR)*F, where F is the fudge factor (it was 1.0012 for 2016 lest you be concerned there is a massive behind-the-scenes adjustment taking place).

Every AL division was better than every AL division, a contrast from 2015 in which the two worst divisions were the NL East and West, but the NL Central was the best division. Whether you use the geometric or backdoor-arithmetric average CTRs to calculate it, the average AL team’s expected W% versus an average NL team is .545. The easiest SOS in the AL was the Indians, as to be expected as the strongest team in the weakest division; it was still one point higher than that of the toughest NL schedule (the Reds, the weakest team in the strongest division).

I also figure CTRs based on various alternate W% estimates. The first is based on game-Expected W%, which you can read about here. It uses each team’s game-by-game distribution of runs scored and allowed, but treats the two as independent:

Next is Expected W%, that is to say Pythagenpat based on actual runs scored and allowed:

Finally, CTR based on Predicted W% (Pythagenpat based on runs created and allowed, actually Base Runs):

A few seasons ago I started including a CTR version based on actual wins and losses, but including the postseason. I am not crazy about this set of ratings, but I can’t quite articulate why.

On the one hand, adding in the playoffs is a no-brainer. The extra games are additional datapoints regarding team quality. If we have confidence in the rating system (and I won’t hold it against you if you don’t), then the unbalanced nature of the schedule for these additional games shouldn’t be too much of a concern. Yes, you’re playing stronger opponents, but the system understands that and will reward you (or at least not penalize you) for it.

On the other hand, there is a natural tendency among people who analyze baseball statistics to throw out the postseason, due to concerns about unequal opportunity (since most of the league doesn’t participant) and due to historical precedent. Unequal opportunity is a legitimate concern when evaluating individuals--particularly for counting or pseudo-counting metrics like those that use a replacement level baseline--but much less of a concern with teams. Even though the playoff participants may not be the ten most deserving teams by a strict, metric-based definition of “deserving”, there’s no question that teams are largely responsible for their own postseason fate to a much, much greater extent than any individual player is. And the argument from tradition is fine if the issue at hand is the record for team wins or individual home runs or the like, but not particularly applicable when we are simply using the games that have been played as datapoints by which to gauge team quality.

Additionally, the fact that playoff series are not played to their conclusion could be seen as introducing bias. If the Red Sox get swept by the Indians, they not only get three losses added to their ledger, they lose the opportunity to offset that damage. The number of games that are added to a team’s record, even within a playoff round, is directly related to their performance in the very small sample of games.

Suppose that after every month of the regular season, the bottom four teams in the league-wide standings were dropped from the schedule. So after April, the 7-17 Twins record is frozen in place. Do you think this would improve our estimates of team strength? And I don’t just mean from the smaller sample, obviously their record as used in the ratings could be more heavily regressed than teams that played more games. But it would freeze our on-field observations of the Twins, and the overall effect would be to make the dropped teams look worse than their “true” strength.

I doubt that poorly reasoned argument swayed even one person, so the ratings including playoff performance are:

The teams sorted by difference between playoff CTR (pCTR) and regular season CTR (rsCTR):

It’s not uncommon for the pennant winners to be the big gainers, but the Cubs and Indians made a lot of hay this year, as the Cubs managed to pull every other team in the NL Central up one point in the ratings. The Rangers did the reverse with the AL West by getting swept out of the proceedings. They still had a better ranking than the team that knocked them out, as did Washington.

Tuesday, January 10, 2017

Hitting by Position, 2016

Of all the annual repeat posts I write, this is the one which most interests me--I have always been fascinated by patterns of offensive production by fielding position, particularly trends over baseball history and cases in which teams have unusual distributions of offense by position. I also contend that offensive positional adjustments, when carefully crafted and appropriately applied, remain a viable and somewhat more objective competitor to the defensive positional adjustments often in use, although this post does not really address those broad philosophical questions.

The first obvious thing to look at is the positional totals for 2016, with the data coming from Baseball-Reference.com. "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the overall major league average (this is a departure from past posts; I’ll discuss this a little at the end). “LPADJ” is the long-term positional adjustment that I use, based on 2002-2011 data. The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:

Obviously when looking at a single season of data it’s imperative not to draw any sweeping conclusions. That doesn’t make it any less jarring to see that second basemen outhit every position save the corner infield spots, or that left fielders created runs at the league average rate. The utter collapse of corner outfield offense left them, even pooled, ahead only of catcher and shortstop. Pitchers also added another point of relative RG, marking two years in a row of improvement (such as it is) over their first negative run output in 2014.

It takes historical background to fully appreciate how much the second base and corner outfield performances stack up. 109 for second base is the position’s best showing since 1924, which was 110 thanks largely to Rogers Hornsby, Eddie Collins and Frankie Frisch. Second base had not hit for the league average since 1949. (I should note that the historical figures I’m citing are not directly comparable - they based on each player’s primary position and include all of their PA, regardless of whether they were actually playing the position at the time or not, unlike the Baseball-Reference positional figures used for 2016). Corner outfield was even more extreme at 103, the nadir for the 116 seasons starting with 1901 (the previous low was 107 in 1992).

If the historical perspective is of interest, you may want to check out Corrine Landrey’s article in The Hardball Time Baseball Annual. She includes some charts showing OPS+ by position in the DH-era and theorizes that an influx of star young players, still playing on the right-side of the defensive spectrum, has led to the positional shakeup. While I cautioned above about over-generalizing from one year of data, it has been apparent over the last several years that the spread between positions has declined. Landrey’s explanation is as viable as any I’ve seen to explain these season’s results.

Moving on to looking at more granular levels of performance, I always start by looking at the NL pitching staffs and their RAA. I need to stress that the runs created method I’m using here does not take into account sacrifices, which usually is not a big deal but can be significant for pitchers. Note that all team figures from this point forward in the post are park-adjusted. The RAA figures for each position are baselined against the overall major league average RG for the position, except for left field and right field which are pooled.

This is the second consecutive year that the Giants led the league in RAA, and of course they employ the active pitcher most known for his batting. But as usual the spread from top to bottom is in the neighborhood of twenty runs.

I don’t run a full chart of the leading positions since you will very easily be able to go down the list and identify the individual primarily responsible for the team’s performance and you won’t be shocked by any of them, but the teams with the highest RAA at each spot were:


More interesting are the worst performing positions; the player listed is the one who started the most games at that position for the team:

I am have as little use for batting average as anyone, but I still find the Angels .209 left field average to be the single most entertaining number on that chart (remember, that’s park-adjusted; it was .204 raw). The least entertaining thing for me at least was the Indians’ production at catcher, which was tolerable when Roberto Perez was drawing walks but intolerable when Terry Francona was pinch-running for him in Game 7.

I like to attempt to measure each team’s offensive profile by position relative to a typical profile. I’ve found it frustrating as a fan when my team’s offensive production has come disproportionately from “defensive” positions rather than offensive positions (“Why can’t we just find a corner outfielder who can hit?”) The best way I’ve yet been able to come up with to measure this is to look at the correlation between RG at each position and the long-term positional adjustment. A positive correlation indicates a “traditional” distribution of offense by position--more production from the positions on the right side of the defensive spectrum. (To calculate this, I use the long-term positional adjustments that pool 1B/DH as well as LF/RF, and because of the DH I split it out by league):

As you can see, there are good offenses with high correlations, good offenses with low correlations, and every other combination. I have often used this space to bemoan the Indians continual struggle to get adequate production from first base, contributing to their usual finish in the bottom third or so of correlation. This year, they rank in the middle of the pack, and while it is likely a coincidence that they had a good season, it’s worth noting that Mike Napoli only was average for a first baseman. Even that is much better than some of their previous showings.

Houston’s two best hitting positions (not relative to positional averages, but in terms of RG) were second base and shortstop. In fact the Astros positions in descending order of RG was 4, 6, 9, 2, 5, 3, D, 7, 8. That’s how you get a fairly strong negative correlation between RG and PADJ.

The following charts, broken out by division, display RAA for each position, with teams sorted by the sum of positional RAA. Positions with negative RAA are in red, and positions that are +/-20 RAA are bolded:

Boston had the AL’s most productive outfield, while Toronto was just an average offense after bashing their way to a league leading 118 total RAA in 2015. It remains jarring to see New York at the bottom of an offense list, even just for a division, and their corner infielders were the worst in the majors.

Other than catcher, Cleveland was solid everywhere, with no bold positions--and in this division, that’s enough to lead in RAA and power a cruise to the division title. Detroit had the AL’s top corner infield RAA (no thanks to third base). Kansas City, where to begin with the sweet, sweet schadenfreude? Eksy Magic? No, already covered at length in the leadoff hitters post. Maybe the fact that they had the worst middle infield production in MLB? Or that the bros at the corners chipped in another -19 RAA to also give them the worst infield? The fact that they were dead last in the majors in total RAA? It’s just too much.

The pathetic production of the Los Angeles left fielders was discussed above. The Mike Trout-led center fielders were brilliant, the best single position in the majors. And so, even with a whopping -31 runs from left field, the Angels had the third-most productive outfield in MLB. Houston’s middle infielders, also mentioned above, were the best in the majors. Oakland’s outfield RAA was last in the AL.

Washington overcame the NL’s least productive corner infielders, largely because they had the NL’s most productive middle infielders. Miami had a similar but even more extreme juxtaposition, the NL’s worst infield and the majors’ best outfield, and that with a subpar season from Giancarlo Stanton as right field was the least productive of the three spots. Atlanta had the NL’s worst-hitting middle infield, and Philadelphia the majors’ worst outfield despite Odubel Herrera making a fool of me.

Chicago was tops in the majors in corner infield RAA and total infield RAA. No other teams in this division achieved any superlatives but thanks to Joey Votto and a half-season of Jonathon Lucroy, every team was in the black for total RAA, even if we were to add in Cincinnati’s NL-trailing -9 RAA from pitchers.

No position grouping superlatives in this division, but it feels like more should be said about Corey Seager. It seems like a rookie shortstop hitting as he did, fielding adequately enough to be a serious MVP candidate for a playoff team in a huge market for one of the five or so most venerated franchises should have gotten a lot more attention than it did. Is it the notion that a move to third base is inevitable? Is he, like the superstar down the road, just considered too boring of a personality?

The full spreadsheet is available here.

Monday, December 12, 2016

Hitting by Lineup Position, 2016

I devoted a whole post to leadoff hitters, whether justified or not, so it's only fair to have a post about hitting by batting order position in general. I certainly consider this piece to be more trivia than sabermetrics, since there’s no analytic content.

The data in this post was taken from Baseball-Reference. The figures are park-adjusted. RC is ERP, including SB and CS, as used in my end of season stat posts. The weights used are constant across lineup positions; there was no attempt to apply specific weights to each position, although they are out there and would certainly make this a little bit more interesting:

The seven year run of NL #3 hitters as the best position in baseball was snapped, albeit by an insignificant .01 RG by AL #3 hitters. Since Mike Trout’s previous career high in PA out of the #3 spot was 336 in 2015 and he racked up 533 this year, I’m going to give full credit to Trout; as we will see in a moment, the Angels’ #3 hitters were the best single lineup spot in baseball. #2 hitters did not outperform #5 in both circuits as they did last year, just the AL. However, the NL made up for hit by having their leadoff hitters create runs at almost the exact same rate as their #5s.

Next are the team leaders and trailers in RG at each lineup position. The player listed is the one who appeared in the most games in that spot (which can be misleading, especially for spots low in the batting order where many players cycle through):

A couple things that stood out to me was St. Louis’ dominance at the bottom of the order and the way in which catchers named Perez managed to sabotage lineup spots for two teams. Apologies to Carlos Beltran (the real culprits for the poor showing of Texas #3 hitters were Adrian Beltre, Prince Fielder, and Nomar Mazara) and Luis Valbuena (Carlos Gomez and Marwin Gonzalez).

The case of San Diego’s cleanup hitters deserves special attention. Yangervis Solarte was actually pretty good when batting cleanup, as his .289/.346/.485 line in 289 PA compares favorably to the NL average for cleanup hitters. The rest of the Padres who appeared in that spot combined for 399 PA with a dreadful .187/.282/.336 line. Just to give you a quick idea of how bad this is, the 618 OPS would have been the eleventh-worst among any non-NL #9 lineup spot in the majors, leading only 6 AL #9s, 2 #2s, a #7, and the horrible Oakland #2s. It was also worse than the Cardinals’ #9 hitters.

The next list is the ten best positions in terms of runs above average relative to average for their particular league spot (so AL leadoff spots are compared to the AL average leadoff performance, etc.):

And the ten worst:

Joe Mauer himself wasn’t that bad, with a 799 OPS when hitting third. That’s still well-below the AL average, but not bottom ten in RAA bad without help from his friends.

The last set of charts show each team’s RG rank within their league at each lineup spot. The top three are bolded and the bottom three displayed in red to provide quick visual identification of excellent and poor production:

The full spreadsheet is available here.

Monday, December 05, 2016

Leadoff Hitters, 2016

I will try to make this as clear as possible: the statistics are based on the players that hit in the #1 slot in the batting order, whether they were actually leading off an inning or not. It includes the performance of all players who batted in that spot, including substitutes like pinch-hitters.

Listed in parentheses after a team are all players that started in twenty or more games in the leadoff slot--while you may see a listing like "COL (Blackmon)" this does not mean that the statistic is only based solely on Blackmon's performance; it is the total of all Colorado batters in the #1 spot, of which Blackmon was the only one to start in that spot in twenty or more games. I will list the top and bottom three teams in each category (plus the top/bottom team from each league if they don't make the ML top/bottom three); complete data is available in a spreadsheet linked at the end of the article. There are also no park factors applied anywhere in this article.

That's as clear as I can make it, and I hope it will suffice. I always feel obligated to point out that as a sabermetrician, I think that the importance of the batting order is often overstated, and that the best leadoff hitters would generally be the best cleanup hitters, the best #9 hitters, etc. However, since the leadoff spot gets a lot of attention, and teams pay particular attention to the spot, it is instructive to look at how each team fared there.

The conventional wisdom is that the primary job of the leadoff hitter is to get on base, and most simply, score runs. It should go without saying on this blog that runs scored are heavily dependent on the performance of one’s teammates, but when writing on the internet it’s usually best to assume nothing. So let's start by looking at runs scored per 25.5 outs (AB - H + CS):

1. HOU (Springer/Altuve), 6.9
2. COL (Blackmon), 6.7
3. DET (Kinsler), 6.6
Leadoff average, 5.2
ML average, 4.5
28. SF (Span), 4.4
29. KC (Escobar/Dyson/Merrifield), 4.1
30. OAK (Crisp/Burns), 3.4

Again, no park adjustments were applied, so the Rockies performance was good but it wasn’t really “best in the NL good”. I’m also going to have a hard time resisting just writing “Esky Magic” every time the Royals appear on a trailers list.

The most basic team independent category that we could look at is OBA (figured as (H + W + HB)/(AB + W + HB)):

1. CHN (Fowler/Zobrist), .383
2. HOU (Springer/Altuve), .375
3. STL (Carpenter), .370
Leadoff average, .341
ML average, .324
28. WAS (Turner/Revere/Taylor), .305
29. KC (Escobar/Dyson/Merrifield), .298
30. OAK (Crisp/Burns), .290

Esky Magic. And once again Billy Burns chipping in to Oakland’s anemic showing and of course Kansas City just had to have Billy Burns.

The next statistic is what I call Runners On Base Average. The genesis for ROBA is the A factor of Base Runs. It measures the number of times a batter reaches base per PA--excluding homers, since a batter that hits a home run never actually runs the bases. It also subtracts caught stealing here because the BsR version I often use does as well, but BsR versions based on initial baserunners rather than final baserunners do not. Here ROBA = (H + W + HB - HR - CS)/(AB + W + HB).

This metric has caused some confusion, so I’ll expound. ROBA, like several other methods that follow, is not really a quality metric, it is a descriptive metric. A high ROBA is a good thing, but it's not necessarily better than a slightly lower ROBA plus a higher home run rate (which would produce a higher OBA and more runs). Listing ROBA is not in any way, shape or form a statement that hitting home runs is bad for a leadoff hitter. It is simply a recognition of the fact that a batter that hits a home run is not a baserunner. Base Runs is an excellent model of offense and ROBA is one of its components, and thus it holds some interest in describing how a team scored its runs, rather than how many it scored:

1. CHN (Fowler/Zobrist), .351
2. MIA (Gordon/Suzuki/Dietrich/Realmuto), .335
3. ATL (Inciarte/Peterson/Markakis), .331
4. HOU (Springer/Altuve), .331
Leadoff average, .305
ML average, .287
28. TEX (Choo/Odor/DeShields/Profar), .264
29. WAS (Turner/Revere/Taylor), .260
30. MIN (Dozier/Nunez), .256

Kansas City leadoff hitters finished tied for last in the majors with five home runs (with Miami), so Esky Magic was only good for 23rd place. Twins leadoff hitters, thanks primarily to Dozier, led the majors with 39 homers. So only after around 25.6% of leadoff hitter plate appearances did they actually wind up with a runner on base. Their .320 OBA was well-below average too, but again ROBA describes how an offense plays out--other considerations are necessary to determine how good it was.

I also include what I've called Literal OBA--this is just ROBA with HR subtracted from the denominator so that a homer does not lower LOBA, it simply has no effect. It “literally” (not really, thanks to errors, out stretching, caught stealing after subsequent plate appearances, etc.) is the proportion of plate appearances in which the batter becomes a baserunner able to be advanced by his teammates. You don't really need ROBA and LOBA (or either, for that matter), but this might save some poor message board out there twenty posts, by not implying that I think home runs are bad, so here goes. LOBA = (H + W + HB - HR - CS)/(AB + W + HB - HR):

1. CHN (Fowler/Zobrist), .360
2. HOU (Springer/Altuve), .344
3. STL (Carpenter), .342
Leadoff average, .313
ML average, .297
28. OAK (Crisp/Burns), .273
29. MIN (Dozier/Nunez), .270
30. WAS (Turner/Revere/Taylor), .268

The next two categories are most definitely categories of shape, not value. The first is the ratio of runs scored to RBI. Leadoff hitters as a group score many more runs than they drive in, partly due to their skills and partly due to lineup dynamics. Those with low ratios don’t fit the traditional leadoff profile as closely as those with high ratios (at least in the way their seasons played out):

1. MIA (Gordon/Suzuki/Dietrich/Realmuto), 2.6
2. SD (Jankowski/Jay), 2.3
3. ATL (Inciarte/Peterson/Markakis), 2.0
6. LAA (Escobar/Calhoun), 1.9
Leadoff average, 1.5
ML average, 1.0
26. STL (Carpenter), 1.3
28. BOS (Betts/Pedroia), 1.2
29. OAK (Crisp/Burns), 1.2
30. MIN (Dozier/Nunez), 1.1

This speaks more to me than the measure, but the most interesting thing I learned from that list was that Travis Jankowski was San Diego’s primary leadoff hitter (71 games). Looking at the rest of the list, I think I could have guessed most team’s in two or three, I never would have gotten the Padres.

A similar gauge, but one that doesn't rely on the teammate-dependent R and RBI totals, is Bill James' Run Element Ratio. RER was described by James as the ratio between those things that were especially helpful at the beginning of an inning (walks and stolen bases) to those that were especially helpful at the end of an inning (extra bases). It is a ratio of "setup" events to "cleanup" events. Singles aren't included because they often function in both roles.

Of course, there are RBI walks and doubles are a great way to start an inning, but RER classifies events based on when they have the highest relative value, at least from a simple analysis:

1. MIA (Gordon/Suzuki/Dietrich/Realmuto), 1.8
2. ATL (Inciarte/Peterson/Markakis), 1.4
3. PHI (Herrera/Hernandez), 1.4
6. NYA (Ellsbury/Gardner), 1.2
Leadoff average, .8
ML average, .7
26. COL (Blackmon), .5
28. TB (Forsythe/Guyer), .5
29. DET (Kinsler), .5
30. BAL (Jones/Rickard), .4

The Orioles certainly had a non-traditional leadoff profile thanks mostly to Jones; their five stolen base attempts was the fewest of any team, they were tied for third with 30 homers, and they drew 20 less walks than an average team out of the leadoff spot.

Since stealing bases is part of the traditional skill set for a leadoff hitter, I've included the ranking for what some analysts call net steals, SB - 2*CS. I'm not going to worry about the precise breakeven rate, which is probably closer to 75% than 67%, but is also variable based on situation. The ML and leadoff averages in this case are per team lineup slot:

1. WAS (Turner/Revere/Taylor), 30
2. MIL (Villar/Santana), 27
3. MIA (Gordon/Suzuki/Dietrich/Realmuto), 22
4. CLE (Santana/Davis), 20
Leadoff average, 6
ML average, 2
28. TB (Forsythe/Guyer), -11
29. SEA (Aoki/Martin), -13
30. PHI (Herrera/Hernandez), -16

The Indians are a good example of why I list all players who had at least twenty starts in the leadoff spot; AL steal leader Rajai Davis’ 69 games leading off led to them leading the AL in net steals.

Shifting back to quality measures, first up is one that David Smyth proposed when I first wrote this annual leadoff review. Since the optimal weight for OBA in a x*OBA + SLG metric is generally something like 1.7, David suggested figuring 2*OBA + SLG for leadoff hitters, as a way to give a little extra boost to OBA while not distorting things too much, or even suffering an accuracy decline from standard OPS. Since this is a unitless measure anyway, I multiply it by .7 to approximate the standard OPS scale and call it 2OPS:

1. COL (Blackmon), 880
2. BOS (Betts/Pedroia), 872
3. HOU (Springer/Altuve), 865
Leadoff average, 775
ML average, 745
28. SF (Span), 722
29. OAK (Crisp/Burns), 654
30. KC (Escobar/Dyson/Merrifield), 650

Esky Magic.

Along the same lines, one can also evaluate leadoff hitters in the same way I'd go about evaluating any hitter, and just use Runs Created per Game with standard weights (this will include SB and CS, which are ignored by 2OPS):

1. COL (Blackmon), 6.4
2. BOS (Betts/Pedroia), 6.3
3. HOU (Springer/Altuve), 6.2
Leadoff average, 4.9
ML average, 4.5
28. SF (Span), 4.1
29. KC (Escobar/Dyson/Merrifield), 3.4
30. OAK (Crisp/Burns), 3.3

Esky Magic.

The same six teams make up the leaders and trailers, which shouldn’t be a big surprise.

Allow me to close with a crude theoretical measure of linear weights supposing that the player always led off an inning (that is, batted in the bases empty, no outs state). There are weights out there (see The Book) for the leadoff slot in its average situation, but this variation is much easier to calculate (although also based on a silly and impossible premise).

The weights I used were based on the 2010 run expectancy table from Baseball Prospectus. Ideally I would have used multiple seasons but this is a seat-of-the-pants metric. The 2010 post goes into the detail of how this measure is figured; this year, I’ll just tell you that the out coefficient was -.224, the CS coefficient was -.591, and for other details refer you to that post. I then restate it per the number of PA for an average leadoff spot (746 in 2014):

1. HOU (Springer/Altuve), 30
2. COL (Blackmon), 28
3. CHN (Fowler/Zobrist), 27
Leadoff average, 7
ML average, 0
28. SF (Span), -8
29. KC (Escobar/Dyson/Merrifield), -19
30. OAK (Crisp/Burns), -21

Esky Magic. Lest anyone think I am being unduly critical of Escobar's performance (he did after all start only half (actually 82) of KC's games as the leadoff hitter), note that Escobar when in the #1 spot hit .242/.272/.289. The rest of the Royals combined for .274/.317/.378, which would only rank second worst in the majors in 2OPS. So the Royals team performance was terrible, but Escobar was dreadful. Just the worst.

The spreadsheet with full data is available here.