Tuesday, August 22, 2017

Enby Distribution, pt. 4: Revisiting W%

In my first series about runs per game distributions, I wrote about how to use estimates of the probability of scoring k runs (however these probabilities were estimated, Enby distribution or an alternative approach) to estimate a team’s winning percentage. I’m going to circle back to that here, and most of the content is a repeat of the earlier post.

However, I think this is an important enough topic to rehash. In fact, a winning percentage estimator strikes me as the most logical application for a runs per game distribution, albeit one that is not particularly helpful to everyday sabermetric practice. After all, multiple formulas to estimate W% as a function of runs scored and runs allowed have been developed, and most of them work quite well when working with normal major league teams--well enough to make it difficult to imagine that there is any appreciable gain in accuracy to be had. Better yet, these W% estimators are fairly simple--even the most complex versions in common use, Pythagenport/pat, can be quickly tapped out on a calculator in about thirty seconds.

Given that there are powerful, relatively simple W% models already in use, why even bother to examine a model based on the estimated scoring distribution? There are three obvious reasons that come to my mind. The first is that such a model serves as a check on the others. Depending on how much confidence one has in the underlying run distribution model, it is possible that the resulting W% estimator will produce a batter estimate, at least at the extremes. We know of course that some of the easier models don’t hold up well in extreme situations--linear estimators will return negative or greater than one figures at some point, and fixed Pythagorean exponents will fray at some point. While we know that Pythagenpat works at the known point of 1 RPG and appears to work well at other extreme values, it doesn’t hurt to have another way of estimating W% in those extremes to see if Pythagenpat is corroborated, or whether the models disagree. This can also serve as a check on Enby--if the results vary too much from what we expect, it may imply that Enby does not hold up well at extremes itself.

A second reason is that it’s plain fun if you like esoteric sabermetrics (and if you’re reading this blog, it’s a good bet that you do). I’ve never needed an excuse to mess around with alternative methods, particularly when it comes to W% estimators, which along with run estimators are my own personal favorite sabermetric tools.

But the third reason is the one that I want to focus on here, which is that a W% estimator based on an underlying estimate of the run distribution is from one perspective the simplest possible estimator. This may seem to be an absurd statement given all of the steps that are necessary to compute Enby estimates, let alone plugging these into a W% formula. But from a first principles standpoint, the distribution-based W% estimator is the simplest to explain, because it is defined by the laws of the game itself.

If you score no runs, you don’t win. If you score one run, you win if you allow zero runs. If you score two runs, you win if you allow either zero or one run, and on it goes ad infinitum. If at the end of nine innings you have scored and allowed an equal number of runs, you play on until there is an inning in which an unequal, greater than zero number of runs are scored. This fundamental identity is what all of the other W% estimators attempt to approximate, the mechanics which they attempt to sweep under the rug by taking shortcuts to approximate. The distribution-based approach is computationally dense but conceptually easy (and correct). Of course, to bring points one and three together, the definition may be correct, but the resulting estimates are useless if the underlying model (Enby in this case) does not work.

In order to produce our W% estimate, we first need to use Enby to estimate the scoring distribution for the two teams. This is not as simple as using the Enby parameters we have already developed based on the Tango Distribution with c = .767. Tango has found that his method produces more accurate results for two teams when c is set equal to .852 instead.

In the previous post, I walked through the computations for the Enby distribution with any c value, so this is an easy substitution to make. But why is it necessary? I don’t have a truly satisfactory answer to that question--it's trite to just assert that it works better for head-to-head matchups because of the covariance between runs scored and runs allowed, even if that is in fact the right answer.

How will modifying the control value alter the Enby distribution? All of the parameters will be effected, because all depend on the control value in one way or another. First, B and r (the latter as it is initially figured before zero modification):

VAR = RG^2/9 + (2/c - 1)*RG
r = RG^2/(VAR - RG)
B = VAR/RG - 1

When c is larger, the variance of runs scored will be smaller. We can see this by examining the equations for variance with c = .767 and .852:

VAR (.767) = RG^2/9 + 1.608*RG
VAR (.852) = RG^2/9 + 1.347*RG

This results in a larger value for r and a smaller value for B, but these parameters don’t have an intuitive baseball explanation, unlike variance. It’s difficult to explain (for me at least) why variance of a single team’s runs scored should be lower when considering a head-to-head matchup, but that’s the way it works out.

It should be noted that if the sole purpose of this exercise is to estimate W%, we don’t have to care whether the actual probability of each team scoring k runs is correct. All we need to do is have an accurate estimate of how often Team A’s runs scored are greater than Team B’s.

By increasing c, we also reduce the probability of a shutout, as can be seen from the formula for z:

z =(RI/(RI + c*RI^2))^9

Originally, I had intended to display some graphs showing the behavior of the three parameters by RG with each choice of c, but these turned out to be not of any particular interest. I ran similar graphs earlier in the series with parameters based on the earlier variance model, and the shape of the resulting functions are quite similar. The only real visual difference when c varies is what appears to be linear shifts for r and B (the B shift is linear, the r not quite).

What might be more interesting is looking at how c shapes the estimated run distribution for a team with a given RG. I’ll look at three teams--one average (4.5 RG), one extremely low-scoring (2.25 RG), and one extremely high-scoring (9 RG). First, the 4.5 RG team:

As you may recall from earlier, Enby consistently overestimates the frequency with which a normal major league team will score 2-4 runs. Using the .852 c value exacerbates this issue; in fact, the main thing to take away from this set of graphs is that the higher c value clusters more probability around the mean, while the lower c value leaves more probability for the tails.

The 2.25 RG team:

And the 9 RG team:

Thursday, August 10, 2017

Bottoming Out

On June 5, OSU Athletic Director Gene Smith unceremoniously fired Thad Matta, the winningest men’s basketball coach in the history of the school. He did so months after the normal time to fire coaches had passed, and he did so in a way that ensured that the end of Matta’s tenure would be the dominant story in college basketball over the next week. Matta won four regular season Big Ten championships, went to two Final Fours, and was as close to universally respected and beloved by his former players as you will ever find in college basketball. He did all of this while dealing with a debilitating condition that made routine tasks like walking and taking off his shoes a major challenge; it was a side effect of a surgery performed at the university’s own hospital. OSU was coming off a pair of seasons without making the NCAA Tournament, but basketball is a sport in which a roster can get turned around in a hurry, and this author feels that Matta had more than earned another year or two in which to have the opportunity to do just that. Gene Smith felt otherwise.

On May 20, the OSU baseball team lost to Indiana 4-3 at home. This brought an end to a season in which they went 22-34, the school’s worst record since going 6-12 in 1974. They went 8-16 in the Big Ten, the worst showing since going 4-12 in 1987. The season brought Greg Beals’ seven-year record at OSU to 225-167 (.574) and his Big Ten record to 85-83 (.506). Setting aside 2008-2014, a seven-year stretch in which OSU had a .564 W% (since four of the seasons were coached by Beals), the seven-year record is OSU’s worst since 1986-1992. The seven-year stretch in the Big Ten is the worst since 1984-1990 (.486). The Buckeyes finished eleventh in the Big Ten, which in fairness wasn’t possible until the addition of Nebraska, but since the Big Ten eliminated divisions in 1988, the lowest previous conference standing had been seventh (out of 10 in 2010, out of 11 in 2014, out of 13 in 2015).

The OSU season is hardly worth recapping in detail, except to point out that baseball is such that Oregon State could go 56-6 on the year let have one of those losses come to the Buckeyes (February 24, 6-1; the Beavers won a rematch 5-1 two days later). The other noteworthy statistical oddity is that in eight Big Ten series, Ohio won just one (2-1 at Penn State). They were swept once (home against Minnesota) and the other six were all 1-2 for the opposition. The top eight teams in the conference qualify for the tournament; OSU finished four games out of the running, eliminated even before the final weekend.

The Buckeyes’ .393 overall W% and .412 EW% were both eleventh of thirteen Big Ten teams (the forces of darkness led at .724 and .748 respectively), and their .463 PW% was eighth (again, the forces of darkness led with .699). OSU was twelfth with 5.07 R/G and tenth with 6.05 RA/G, although Bill Davis Staidum is a pitcher’s park and those are unadjusted figures. OSU’s .659 DER was last in the conference.

None of this was surprising; OSU lost a tremendous amount of production from 2016, which was Beals’ most successful team, notching his only championship (Big Ten Tournament) and NCAA appearance. With individual exceptions, outside of the 2016 draft class, Beals has failed to recruit and develop talent, often patching his roster with copious amounts of JUCO transfers rather than underclassmen developed in the program. Never was this more acute than in 2017. None of this is meant to be an indictment of the players, who did the best they could to represent their school. It is not their fault that the coach put them in situations that they couldn’t handle or weren’t ready for.

Sophomore catcher Jacob Barnwell had a solid season, hitting .254/.367/.343 for only -1 RAA; his classmate and backup Andrew Fishel only got 50 PA but posted a .400 OBA. First base/DH was a real problem position, as senior Zach Ratcliff was -8 RAA and JUCO transfer junior Bo Coolen chipped in -6; both had secondary averages well below the team average. Noah McGowan, another JUCO transfer started at second (and got time in left as well), with -3 RAA in 162 PA before getting injured. True freshman Noah West followed him into the lineup, but a lack of offense (.213/.278/.303 in 105 PA) gave classmate Connor Pohl a shot. Pohl is 6’5” and his future likely lies at third, but his bat gave a boost to the struggling offense (.325/.386/.450 in 89 PA).

Senior Jalen Washington manned shortstop and acquitted himself fine defensively and at the plate (.266/.309/.468), and was selected by San Diego in the 28th round. Sophomore third baseman Brady Cherry did not build on the power potential his freshman year seemed to show, hitting four homers in 82 more PA than he had when he hit four in 2016. His overall performance (.260/.333/.410) was about average (-2 RAA).

Outfield was definitely the bright spot for the offense, despite getting little production out of JUCO transfer Tyler Cowles (.190/.309/.314 in 129 PA). Senior Shea Murray emerged from a pitching career marred by injuries to provide adequate production and earn the left field job (.252/.331/.449, 0 RAA) and was drafted in the 18th round by Pittsburgh, albeit as a pitcher. Junior center fielder Tre’ Gantt was the team MVP, hitting .314/.426/.426, leading the team with 18 RAA, and was drafted in the 29th round by Cleveland. True freshman right fielder Dominic Canzone was also a key contributor, challenging for the Big Ten batting average lead (.343/.398/.458 for 8 RAA).

On the mound, OSU never even came close to establishing a starting rotation due to injuries and ineffectiveness. Nine pitchers started a game, and only one of them had greater than 50% of his appearances as a starter. That was senior Jake Post, who went 1-7 over 13 starts with a 6.41 eRA. Sophomore lefty Connor Curlis was most effective, starting eight times for +3 RAA with 8.3/2.7 K/W. He tied for team innings lead with classmate Ryan Feltner, who was -13 RAA with a 6.71 eRA. Junior Yianni Pavloupous, the closer a year ago, was -10 RAA over 40 innings between both roles. Junior Adam Niemeyer missed time with injuries, appearing in just ten games (five starts) for -3 RAA over 34 innings. Freshman Jake Vance was rushed into action and allowed 20 runs and walks in 26 innings (-4 RAA). And JUCO transfer Reece Calvert gave up a shocking 39 runs in 39 innings.

I thought the bullpen would be the strength of the team before the season. In the case of Seth Kinker, I was right. The junior slinger was terrific, pitching 58 innings (21 relief appearances, 3 starts) and leading the team by a huge margin with 13 RAA (8.4/2.0 K/W). But the rest of the bullpen was less effective. Junior Kyle Michalik missed much of the season with injuries and wasn’t that effective when on the mound (6.85 RA and just 4.8 K/9 over 22 innings). Senior Joe Stoll did fine in the LOOGY role, something Beals has brought to OSU, with 3 RAA in 23 innings over 25 appearances. Junior Austin Woodby had a 6.00 RA over 33 innings but deserved better with a 4.79 eRA and 5.5/1.8 K/W. The only other reliever to work more than ten innings was freshman sidearmer Thomas Waning (3 runs, 11 K, 4 W over 12 innings). Again, it’s hard to describe the roles because almost everyone was forced to both start and relieve.

It’s too early to hazard a prognosis for 2018, but given the lack of promising performances from young players, it’s hard to be optimistic. What remains to be seen is whether Smith’s ruthlessness can be transferred from coaches who do not deserve it to those who have earned it in spades. No, baseball is not a revenue sport, and no, baseball is not bringing the athletic department broad media exposure. But when properly curated, the OSU baseball program is a top-tier Big Ten program, with the potential to make runs in the NCAA Tournament, and bring in more revenue than most of the “other” 34 programs that are not football or men’s basketball. Neglected in the hands of a failed coach, it is capable of putting up a .333 W% in conference play. Smith, not Beals, is the man who will most directly impact the future success of the program.

Wednesday, July 12, 2017

Enby Distribution, pt. 3: Enby Distribution Calculator

At this point, I want to re-explain how to use the Enby distribution, step-by-step. While I already did this in part 6 of the original series, I now have the new variance estimator as found by Alan Jordan to plug in, and so to avoid any confusion and to make this is easy if anyone ever wants to implement it themselves, I will recount it all in one location. I will also re-introduce a spreadsheet that you can use to estimate the probability of scoring X runs based on the Enby distribution.

Step 1: Estimate the variance of runs scored per game (VAR) as a function of mean runs/game (RG):

VAR = RG^2/9 + (2/c - 1)*RG
where c is the control value from the Tango Distribution. For normal applications, we’ll assume that c = .767.

Step 2: Use the mean and variance to estimate the parameters (r and B) of the negative binomial distribution:

r = RG^2/(VAR - RG)
B = VAR/RG - 1

B will be retained as a parameter for the Enby distribution.

Step 3: Find the probability of zero runs scored as estimated by the negative binomial distribution (we’ll call this value a):

a = (1 + B)^(-r)

Step 4: Use the Tango Distribution to estimate the probability of being shutout. This will become the Enby distribution parameter z:

z =(RI/(RI + c*RI^2))^9
where RI is runs/inning, which we’ll estimate as RG/9.

Step 5: Use trial and error to estimate a new value of r given the modified value at zero. B and z will stay constant, but r must be chosen so as to ensure that the correct mean RG is returned by the Enby distribution. Use the following formula to estimate the probability of k runs scored per game using the non-modified negative binomial distribution:

q(0) = a
q(k) = (r)(r + 1)(r + 2)(r + 3)…(r + k - 1)*B^k/(k!*(1 + B)^(r + k)) for k >=1

Then modify by taking:

p(0) = z
p(k) = (1 - z)*q(k)/(1 - a)for k >=1

The mean is calculated as:

mean = sum (from k = 1 to infinity) of (k*p(k)) = p(1) + 2*p(2) + 3*p(3) + ...

Now you have the parameters r, B, and z and the probability of scoring k runs in a game.

I previously published a spreadsheet that provided the approximate Enby distribution parameters at each .05 increment of RG between 3 and 7. The link below will take you to an updated version of this calculator. It is updated in two ways: first, the Tango Distribution estimate of variance developed by Alan Jordan is used as in the example above. Secondly, I have added lines for RG levels between 0-3 and 7-15 RG (at intervals of .25). Previously, you could enter in any value between 3-7 RG and the calculator would round it to nearest .05; now I’m going to make you enter a legitimate value yourself or accept whatever vlookup() gives you.

P(x) is the probability of scoring x runs in a game, P(<= x) is the probability of scoring that many or fewer, and P(> x) is the probability of scoring more than x runs.

Enby Calculator

Tuesday, June 20, 2017

Enby Distribution, pt. 2: Revamping the Variance Estimate

All models are approximations of reality, but some are more useful than others. The notion of being able to estimate the runs per game distribution cleanly in one algorithm (rather than patching together runs per inning distributions or using simulators) is one that can be quite useful in estimating winning percentage or trying to distinguish between the effectiveness of team offense beyond similar noting their runs scored total. I’d argue that a runs per game distribution is a fundamentally useful tool in classical sabermetrics.

However, while such a model would be useful, Enby as currently constructed falls well short of being an ideal tool. There are a few major issues:

1) It is not mathematically feasible to solve directly for the parameters of a zero-modified negative binomial distribution, which forces me to use trial and error to estimate Enby coefficients. In doing so, the distribution is no longer able to exactly match the expected mean and variance--instead, I have chosen to match the mean precisely, and hope that the variance is not too badly distorted.

2) The variance that we should expect for runs per game at any given level of average R/G is itself unknown. I developed a simple formula to estimate variance based on some actual team data, but that formula is far from perfect and there’s no particular reason to expect it to perform well outside of the R/G range represented by the data from which it was developed.

3) An issue with run distribution models found by Tom Tango in the course of his research on runs per inning distribution is that the optimal fit for a single team’s distribution may not return optimal results in situations in which two teams are examined simultaneously (such as using the distribution to model winning percentage). One explanation for this phenomenon is the covariance between runs scored and runs allowed in a given game, due to either environmental or strategic causes.

I have recently attempted to improve the Enby distribution by focusing on these obvious flaws. Unfortunately, my findings were not as useful as I had hoped they would be, but I would argue (hope?) that they represent at least small progress in this endeavor.

During the course of writing the original series on this topic, I was made aware of work being done by Alan Jordan, who was developing a spreadsheet that used the Tango Distribution to estimate scoring distributions and winning percentage. One of the underpinnings was that he found (or found work by Darren Glass and Phillip Lowry that demonstrated) that the variance of runs scored per inning as predicted by the Tango Distribution could be calculated as follows (where RI = runs per inning and c is the Tango Distribution constant):

Variance (inning) = RI*(2/c + RI - 1) = RI^2 + (2/c - 1)*RI

Assuming independence of runs per inning (this is a necessary assumption to use the Tango Distribution to estimate runs per game), the variance of runs per game will simply be nine times the variance of runs per inning (assuming of course that there are precisely nine innings per game, as I did in estimating the z parameter of Enby from the Tango Distribution). If we attempt to simply this further by assuming that RI = RG/9, where RG = runs per game:

Variance (game) = 9*(RI^2 + (2/c - 1)*RI) = 9*((RG/9)^2 + (2/c - 1)*RG/9) = RG^2/9 + (2/c - 1)*RG

The traditional value of c used to estimate runs per inning for one team is .767, so if we substitute that for c, we wind up with:

Variance (game) =1.608*RG + .111*RG^2

When I worked on this problem previously, I did not have any theoretical basis for an estimator of variance as a function of RG, so I experimented with a few possibilities and found what appeared to be a workable correlation between mean RG and the ratio of variance to mean. I used linear regression on a set of actual team data (1981-1996) and wound up with an equation that could be written as:

Variance (game) = 1.43*RG + .1345*RG^2

Note the similarities between this equation and the equation based on the Tango Distribution - they both take the form of a quadratic equation less the constant (I purposefully avoided constants in developing my variance estimator so as to avoid unreasonable results at zero and near-zero RG). The coefficients are somewhat different, but the form of the equation is identical.

On one hand, this is wonderful for me, because it vindicates my intuition that this was a reasonable way to estimate variance. On the other hand, this is very disappointing, because I had hoped that Jordan’s insight would allow me to significantly improve the variance estimate. Instead, any gains to be had here are limited to improving the equation by using a more theoretical basis to estimate its coefficients, but there is no change in the form of this equation.

In fact, any revision to the estimator will reduce accuracy over the 1981-96 sample that I am using, since the linear regression already found optimal coefficients for this particular dataset. This by no means should be taken as a claim on my part that the regression-based equation should be used rather than the more theoretically-grounded Tango Distribution estimate, simply an observation that any improvement will not show up given the confines of the data I have at hand.

What about data from out of that set? I have easy access to the four seasons from 2009-2012. In these seasons, major league teams have averaged 4.401 runs per game and the variance of runs scored per game is 9.373. My equation estimates the variance should be 8.90, while the Tango-based formula estimates 9.23. In this case, we could get a near-precise match by using c = .757.

While we know how accurate each estimator is with respect to variance for this case, what happens when we put Enby to use to estimate the run distribution? The Enby parameters for 4.40 RG using my original equation are (B = 1.0218, r = 4.353, z = .0569). If we instead use the Tango estimated variance of 9.23, the parameters become (B = 1.0970, r = 4.041, z = .0569). With that, we can calculate the estimated frequencies of X runs scored using each estimator and compare to the empirical frequencies from 2009-2012:

Eyeballing this, the Tango-based formula is closer for one run, but exacerbates the recurring issue of over-estimating the likelihood of two or three runs. It makes up for this by providing a better estimate at four and five runs, but a worse estimate at six. After that the two are similar, although the Tango estimate provides for more probability in the tail of the distribution, which in this case is consistent with empirical results.

For now, I will move on to another topic, but I will eventually be coming back to this form of the Tango-based variance estimate, re-estimating the parameters for 3-7 RG, and providing an updated Enby calculator, as I do feel that there are distinct advantages to using the theoretical coefficients of the variance estimator rather than my empirical coefficients.

Tuesday, May 09, 2017

Enby Distribution, pt. 1: Pioneers

A few years ago, I attempted to demonstrate that one could do a decent job of estimating the distribution of runs scored per game by using the negative binomial distribution, particularly a zero-modified version given the propensity of an unadulterated negative binomial distribution to underestimate the probability of a shutout. I dubbed this modified distribution Enby.

I’m going to be re-introducing this distribution and adopting a modification to the key formula in this series, but I wanted to start by acknowledging that I am not the first sabermetrician to adopt the negative binomial distribution to the matter of the runs per game distribution. To my knowledge, a zero-modified negative binomial distribution had not been implemented prior to Enby, and while the zero-modification is a significant improvement to the model, it would be disingenuous not to acknowledge and provide an overview of the two previous efforts using the negative binomial distribution of which I am aware.

I acknowledged one of these in the original iteration of this series, but inadvertently overlooked the first. In the early issues of Bill James’ Baseball Analyst newsletter, Dallas Adams published a series of articles on run distributions, ultimately developing an unwieldy formula I discussed in the linked post. What I overlooked was an article in the August 1983 edition in which the author noted that the Poisson distribution worked for hockey, it would not work for baseball because the variance of runs per game is not equal to the mean, but rather is twice the mean. But a "modified Poisson" distribution provided a solution.

The author of the piece? Pete Palmer. Palmer is often overlooked to an undue extent when sabermetric history is recounted. While one could never omit Palmer from such a discussion, his importance is often downplayed. But the sheer volume of methods that he developed or refined is such that I have no qualms about naming him the most important technical sabermetrician by a wide margin. Park factors, run to win converters, linear weights, relative statistics, OPS for better or worse, the construct of an overall metric by adding together runs above average in various discrete components of the game...these were all either pioneered or greatly improved by Palmer. And while it is not nearly as widespread in use as his other innovations, you can add using the negative binomial distribution for the runs per game distribution the list.

Palmer says that he learned about this “modified Poisson” in a book called Facts From Figures by Maroney. The relevant formulas were:

Mean (u) = p/c
Variance (v) = u + u/c
p(0) = (c/(1 + c))^p
p(1) = p(0)*p/(1 + c)
p(2) = p(1)*(p + 1)/(2*(1 + c))
p(3) = p(2)*(p + 2)/(3*(1 + c))
p(n) = p(0)*(p*(p + 1)*(p + 2)*...*(p + n - 1)/(n!*(1 + c)^n)

The text that I used renders the negative binomial distribution as:

p(k) = (1 + B)^(-r) for k = 0
p(k) = (r)(r + 1)(r + 2)(r + 3)…(r + k - 1)*B^k/(k!*(1 + B)^(r + k)) for k >=1
mean (u) = r*B
variance(v) = r*B*(1 + B)

You may be forgiven for not immediately recognizing these two as equivalent; I did not at first glance. But if you recognize that r = p and B = 1/c, then you will find that the mean and variance equations are equivalent and that the formulas for each n or k depending on the nomenclature used are equivalent as well.

So Palmer was positing the negative binomial distribution to model runs scored. He noted that the variance of runs per game is about two times the mean, which is true. In my original Enby implementation, I estimated variance as 1.430*mean + .1345*mean^2, which for the typical mean value of around 4.5 R/G works out to an estimated variance of 9.159, which is 2.04 times the mean. Of course, the model can be made more accurate by allowing the ratio
if variance/mean to vary from two.

The second use of the negative binomial distribution to model runs per game of which I am aware was implemented by Phil Melita. Mr. Melita used it to estimate winning percentage and sent me a copy of his paper (over a decade ago, which is profoundly disturbing in the existential sense). Unfortunately, I am not aware of the paper ever being published so I hesitate to share too much from the copy in my possession.

Melita’s focus was on estimating W%, but he did use negative binomial to look at the run distribution in isolation as well. Unfortunately, I had forgotten his article when I started messing around with various distributions that could be used to model runs per game; when I tried negative binomial and got promising results, I realized that I had seen it before.

So as I begin this update of what I call Enby, I want to be very clear that I am not claiming to have “discovered” the application of the negative binomial distribution in this context. To my knowledge using zero-modification is a new (to sabermetrics) application of the negative binomial, but obviously is a relatively minor twist on the more important task of finding a suitable distribution to use. So if you find that my work in this series has any value at all, remember that Pete Palmer and Phil Melita deserve much of the credit for first applying the negative binomial distribution to runs scored per game.

Saturday, April 01, 2017

2017 Predictions

All the usual disclaimers. This is not serious business.


1. Boston
2. Toronto (wildcard)
3. New York
4. Baltimore
5. Tampa Bay

I have noted the last couple years that I always pick the Red Sox--last year was one of the years where that was the right call. Boston has question marks, and they have less talent on hand to fill holes than in past years, but no one else in the division is making a concerted push with the Blue Jays retrenching and the Yankees in transition. While much has been made of the NL featuring more of a clear dichotomy between contenders and rebuilders, the AL features three strong division favorites and a void for wildcard contention that Toronto may well once again fill. New York looks like a .500 team to me, and one with as strong a recent history of overperforming projections/Pythagorean as darlings like Baltimore and Kansas City, but get far less press for it. (I guess the mighty Yankees aren’t a good sell as a team being unfairly dismissed by the statheads). The Orioles offense has to take step back at some point with only Machado and Schoop being young, and if that happens the rotation can’t carry them. It’s not that I think the Rays are bad; this whole division is filled with potential wildcard contenders.


1. Cleveland
2. Detroit
3. Kansas City
4. Minnesota
5. Chicago

I have a general policy of trying to pick against the Indians when reasonable, out of irrational superstition and an attempt to counteract any unconscious fan-infused optimism. Last year I felt they were definitely the best team in this division on paper but picked against them regardless. But the gap is just too big to ignore this season, so I warily pick them in front. There are reasons to be pessimistic--while they didn’t get “every break in the world last season” as Chris Russo says in a commercial that hopefully will be off the air soon, it’s easy to overstate the impact of their pitching injuries since the division was basically wrapped up before the wheels came off the rotation. Consider the volatility of bullpens, the extra workload for the pitchers who were available in October, the fact that the two that weren’t aren’t the best health bets in the world, and you can paint a bleaker picture than the triumphalism that appears to be the consensus. On the other hand, Michael Brantley, the catchers, the fact that the offense didn’t score more runs than RC called for last year. I see them as the fourth-strongest team out of the six consensus division favorites. Detroit is the team best-positioned to challenge them; I used the phrase “dead cat bounce” last year and it remains appropriate. The less said about Kansas City the better, but as much fun as it was to watch the magic dissipate last season, the death throes of this infuriating team could be even better. The Twins have famously gone from worst to first in their franchise history; given the weakness of the division and some young players who may be much better than they’ve shown so far, it’s not that far-fetched, but it’s also more likely that they lose 95 again. The White Sox rebuilding might succeed in helping them compete down the road and finally ridding the world of the disease that is Hawk Harrelson.


1. Houston
2. Seattle (wildcard)
3. Los Angeles
4. Texas
5. Oakland

Houston looks really good to me; if their rotation holds together (or if they patch any holes with the long awaited Jose Quintana acquisition), I see them as an elite team. Maybe the third time is the charm picking Seattle to win the wildcard. Truth be told, I find it hard to distinguish between most AL teams including the middle three in this division. Picking the Angels ahead of the Rangers is more a way to go on record disbelieving that the latter can do it again than an endorsement of the former, but even with a shaky rotation the Angels should be respectable. My Texas pick will probably look terrible when Nomar Mazara breaks out, Yu Darvish returns healthy, and Josh Hamilton rises from the dead or something. Oakland’s outlook for this year looks bleak, but am I crazy to have read their chapter in Baseball Prospectus and thought there were a number of really interesting prospects who could have a sneaky contender season in 2018? Probably.


1. Washington
2. New York (wildcard)
3. Miami
4. Atlanta
5. Philadelphia

It’s very tempting to pick New York over Washington, based on the superficial like the Nationals sad-Giants even year pattern and cashing in most of their trade chits for Adam Eaton, but there remains a significant on-paper gap between the two. Especially since the Mets stood pat from a major league roster perspective. This might be the best division race out there in a season in which there are six fairly obvious favorites. Sadly, Miami is about one 5 WAR player away from being right in the mix…I wonder where on might have found such a player? Atlanta seems like a better bet than Philadelphia in both the present and future tense, but having a great deal of confidence in the ordering of the two seems foolhardy.


1. Chicago
2. Pittsburgh
3. St. Louis
4. Milwaukee
5. Cincinnati

The Cubs’ starting pitching depth is a little shaky? Kyle Schwarber doesn’t have a position and people might be a little too enthusiastic about him? Hector Rondon struggled late in the season and Wade Davis’ health is not a sure thing? These are the straws that one must grasp at to figure out how Chicago might be defeated. You also have to figure out whether Pittsburgh can get enough production from its non-outfielders while also having some good fortune with their pitching. Or whether St. Louis’ offense is good enough. Or whether Milwaukee or Cincinnati might have a time machine that could jump their rebuild forward a few years. You know, the normal questions you ask about a division.


1. Los Angeles
2. San Francisco
3. Arizona
4. Colorado
5. San Diego

Last year I picked the Giants over the Dodgers despite the numbers suggesting otherwise because of injury concerns. I won’t make that mistake again, as it looks as if LA could once again juggle their rotation and use their resources to patch over any holes. The Giants are strong themselves, but while the two appear close in run prevention, the Dodgers have the edge offensively. The Diamondbacks should have a bounce back season, but one that would still probably break Tony LaRussa’s heart if he still cared. The Rockies seem like they should project better than they do, with more promise on the mound than they usually do. The Padres are the consensus worst team in baseball from all of the projection systems, which can be summed up with two words: Jered Weaver.


Los Angeles over Houston

Just about every projection system out there has the Dodgers ever so slightly ahead of the Cubs. That of course does not mean they are all right--perhaps there is some blind spot about these teams that player projection systems and/or collation of said projections into team win estimates share in common. On the other hand, none of these systems dislike the Cubs—everyone projects them to win a lot of games. I was leaning towards picking LA even before I saw that it was bordering on a consensus, because the two teams look fairly even to me but the Dodgers have more depth on hand, particularly in the starting pitching department (the natural rebuttal is that the Dodgers are likely to need that depth, while the Cubs have a four pretty reliable starters). The Dodgers bullpen looks better, and their offense is nothing to sneeze at.

AL Rookie of the Year: LF Andrew Benintendi, BOS
AL Cy Young: Chris Sale, BOS
AL MVP: CF George Springer, HOU
NL Rookie of the Year: SS Dansby Swanson, ATL
NL Cy Young: Stephen Strasburg, WAS
NL MVP: 1B Anthony Rizzo, CHN

Tuesday, March 14, 2017

Win Value of Pitcher Adjusted Run Averages

The most common class of metrics used in sabermetrics for cross-era comparisons use relative measures of actual or estimated runs per out or sother similar denominator. These include ERA+ for pitchers and OPS+ or wRC+ for batters (OPS+ being an estimate of relative runs per out, wRC+ using plate appearances in the denominator but accounting for the impact of avoiding outs). While these metrics provide an estimate of runs relative to the league average, they implicitly assume that the resulting relative scoring level is equally valuable across all run environments.

This is in fact not the case, as it is well-established that the relationship between run ratio and winning percentage depends on the overall level of run scoring. A team with a run ratio of 1.25 will have a different expected winning percentage if they play in a 9 RPG environment than if they play in a 10 RPG environment. Metrics like ERA+ and OPS+ do not translate relative runs into relative wins, but presumably the users of such metrics are ultimately interested in what they tell us about player contribution to wins.

There are two key points that should be acknowledged upfront. One is that the difference in win value based on scoring level is usually quite small. If it wasn’t, winning percentage estimators that don’t take scoring level into account would not be able to accurately estimate W% across the spectrum of major league teams. While methods that do consider scoring level are more accurate estimators of W% than similar methods that don’t, a method like fixed exponent Pythagorean can still produce useful estimates despite maintaining a fixed relationship between runs and wins.

The second is that players are not teams. The natural temptation (and one I will knowingly succumb to in what follows) is to simply plug the player’s run ratio into the formula and convert to a W%. This approach ignores the fact that an individual player’s run rate does not lead directly to wins, as the performance of his teammates must be included as well. Pitchers are close, because while they are in the game they are the team (more accurately, their runs allowed figures reflect the totality of the defense, which includes contributions from the fielders), but even ignoring fielding, non-complete games include innings pitched by teammates as well.

For the moment I will set that aside and instead pretend (in the tradition of Bill James’ Offensive Winning %) that a player or pitcher’s run ratio can or should be converted directly to wins, without weighting the rest of the team. This makes the figures that follow something of a freak show stat, but the approach could be applied directly to team run ratios as well. Individuals are generally more interesting and obviously more extreme, which means that the impact of considering run environment will be overstated.

I will focus on pitchers for this example and will use Bob Gibson’s 1968 season as an example. Gibson allowed 49 runs in 304.2 innings, which works out to a run average of 1.45 (there will be some rounding discrepancies in the figures). In 1968 the NL average RA was 3.42, so Gibson’s adjusted RA (aRA for the sake of this post) is RA/LgRA = .423 (ideally you would park-adjust as well, but I am ignoring park factors for this post). As an aside, please resist the temptation to instead cite his RA+ of 236 instead. Please.

.423 is a run ratio; Gibson allowed runs at 42.3% of the league average. Since wins are the ultimate unit of measurement, it is tempting to convert this run ratio to a win ratio. We could simply square it, which reflects a Pythagorean relationship. Ideally, though, we should consider the run environment. The 1968 NL was an extremely low scoring league. Pythagenpat suggests that the ideal exponent is around 1.746. Let’s define the Pythagenpat exponent to use as:

x = (2*LgRA)^.29

Note that this simply uses the league scoring level to convert to wins; it does not take into account Gibson’s own performance. That would be an additional enhancement, but it would also strongly increase the distortion that comes from viewing a player as his own team, albeit less so for pitchers and especially those who basically were pitching nine innings/start as in the case of Gibson.

So we could calculate a loss ratio as aRA^x, or .223 for Gibson. This means that a team with Gibson’s aRA in this environment would be expected to have .223 losses for every win (basic ratio transformations apply; the reciprocal would be the win ratio, the loss ratio divided by (1 + itself) would be a losing %, the complement of that W%, etc.)

At this point, many people would like to convert it to a W% and stop there, but I’d like to preserve the scale of a run average while reflecting the win impact. In order to do so, I need to select a Pythagorean exponent corresponding to a reference run environment to convert Gibson’s loss ratio back to an equivalent aRA for that run environment. For 1901-2015, the major league average RA was 4.427, which I’ll use as the reference environment, which corresponds to a 1.882 Pythagenpat exponent (there are actually 8.94 IP/G over this span, so the actual RPG is 8.937 which would be a 1.887 exponent--I'll stick with RA rather than RPG for this example since we are already using it to calculate aRA).

If we call that 1.882 exponent r, then the loss ratio can be converted back to an equivalent aRA by raising it to the (1/r) power. Of course, the loss ratio is just an interim step, and this is equivalent to:

aRA^(x*(1/r)) = aRA^(x/r) = waRA

waRA (excuse the acronyms, which I don’t intend to survive beyond this post) is win-Adjusted Run Average. For Gibson, it works out to .450, which illustrates how small the impact is. Pitching in one of the most extreme run environments in history, Gibsons aRA is only 6.4% higher after adjusting for win impact.

In 1994, Greg Maddux allowed 44 runs in 202 innings for a run average of 1.96. Pitching in a league with a RA of 4.65, his aRA was .421, basically equal to Gibson. But his waRA was better, at .416, since the same run ratio leads to more wins in a higher scoring environment.

It is my guess that consumers of sabermetrics will generally find this result unsatisfactory. There seems to be a commonly-held belief that it is easier to achieve a high ERA+ in a higher run scoring environment, but the result of this approach is the opposite--as RPG increases, the win impact of the same aRA increases as well. Of course, this approach says nothing about how “easy” it is to achieve a given aRA--it converts aRA to an win-value equivalent aRA in a reference run environment. It is possible that it could be simultaneously “easier” to achieve a low aRA in a higher scoring environment and that the value of a low aRA be enhanced in a higher scoring environment. I am making no claim regarding the impressiveness or aesthetic value, etc. of any pitcher’s performance, only attempting to frame it in terms of win value.

Of course, the comparison between Gibson and Maddux need not stop there. I do believe that waRA shows us that Maddux’ rate of allowing runs was more valuable in context than Gibson’s, but there is more to value than the rate of allowing runs. Of course we could calculate a baselined metric like WAR to value the two seasons, but even if we limit ourselves to looking at rates, there is an additional consideration that can be added.

So far, I’ve simply used the league average to represent the run environment, but a pitcher has a large impact on the run environment through his own performance. If we want to take this into account, it would be inappropriate to simply use LgRA + pitcher’s RA as the new RPG to plug into Pythagenpat; we definitely need to consider the extent to which the pitcher’s teammates influence the run environment, since ultimately Gibson’s performance was converted into wins in the context of games played by the Cardinals, not a hypothetical all-Gibson team. So I will calculate a new RPG instead by assuming that the 18 innings in a game (to be more precise for a given context, two times the league average IP/G) is filled in by the pitcher’s RA for his IP/G, and the league’s RA for the remainder.

In the 1968 NL, the average IP/G was 9.03 and Gibson’s 304.2 IP were over 34 appearances (8.96 IP/G), so the new RPG is 8.96*1.45/9 + (2*9.03 - 8.96)* 3.42/9 = 4.90 (rather than 6.84 previously). This converts to a Pythagenpat exponent of 1.59, and an pwaRA (personal win-Adjusted Run Average?) of .485. To spell that all out in a formula:

px = ((IP/G)*RA/9 + (2*Lg(IP/G) - IP/G)*LgRA/9) ^ .29
pwaRA = aRA^(px/r)

Note that adjusting for the pitcher’s impact on the scoring context reduces the win impact of effective pitchers, because as discussed earlier, lowering the RPG lowers the Pythagenpat exponent and makes the same run ratio convert to fewer wins. In fact, considering the pitcher’s effect on the run environment in which he operates actually brings most starting pitchers’ pwaRA closer to league average than their aRA is.

pwaRA is divorced from any real sort of baseball meaning, though, because pitchers aren’t by themselves a team. Suppose we calculated pwaRA for two teammates in a 4.5 RA league. The starter pitches 6 innings and allows 2 runs; the reliever pitches 3 innings and allows 1. Both pitchers have a RA of 3.00, and thus identical aRA (.667) or waRA (.665). Furthermore, their team also has a RA of 3.00 for this game, and whether figured as a whole or as the weighted average of the two individuals, the team also has the same aRA and waRA.

However, if we calculate the starter’s pwaRA, we get .675, while the reliever is at .667. Meanwhile, the team has a pwaRA of .679, which makes this all seem quite counterintuitive. But since all three entities have the same RA, the lower the run environment, the less win value it has on a per inning basis.

I hope this post serves as a demonstration of the difficulty of divorcing a pitcher’s value from the number of innings he pitched. Of course, the effects discussed here are very small, much smaller than the impact of other related differences, like the inherent statistical advantage of pitchers over shorter stints, attempts to model differences in replacement level between starters and relievers, and attempts to detect/value any beneficial side effects of starters working deep into games.

One of my long-standing interests has been the proper rate stat to use to express a batter’s run contribution (I have been promising myself for almost as long as this blog has been existence that I will write a series of posts explaining the various options for such a metric and the rationale for each, yet have failed to do so). I’ve never had the same pull to the question for pitchers, in part because the building block seems obvious: runs/out (which depending on how one defines terms can manifest itself as RA, ERA, component ERA, FIP-type metrics, etc.)

But while there are a few adjustments that can theoretically made between a hitter’s overall performance expressed as a rate and a final value metric (like WAR), the adjustments (such as the hitter’s impact on his team’s run scoring beyond what the metric captures itself, and the secondary effect that follows on the run/win conversion) are quite minor in scale compared to similar adjustments for pitchers. While the pitcher (along with his fielders) can be thought as embodying the entire team while he is the game, that also means that said unit’s impact on the run/win conversion is significant. And while there are certainly cases of batters whose rates may be deceiving because of how they are deployed by their managers (particularly platooning), the additional playing time over which a rate is spread increases value in a WAR-like metric without any special adjustment. Pitchers’ roles and secondary effects thereof (like any potential value generated by “eating” innings) have a more significant (and more difficult to model) impact on value than the comparable effects for position players.