Walk Like a Sabermetrician: May 2018

Now that you presumably have some confidence in Cigol’s ability to do something fairly easy by the standards of classical sabermetrics, you may have some more interest in what Cigol says about a much harder question--how does W% vary by runs scored and runs allowed in extreme situations? This is the area in which Cigol (whether powered by Enby or any other run distribution model) has the potential to enhance our understanding of the relationship between runs and wins. Unfortunately, it is difficult to tell whether these results are reasonable, since we don’t have empirical data regarding extreme teams. If Cigol deviates from Pythagenpat, we won’t know which one to trust. Throughout this post, I am going to discuss these issues as if Cigol is in fact the “true” or “correct” estimate. This is simply for the sake of discussion--it would be unwieldy to have to issue a disclaimer every time we compare Cigol and Pythagenpat. Please note that I am not asserting that this is demonstrably the case.

For a first look at how the two compare at the extreme, let’s assume that a team’s runs scored are fixed at an average 4.5, and look at their estimated W% at each interval of .5 in runs allowed from 1-15 RA/G using Cigol and Pythagenpat with three different exponents (.27, .28, and .29; I’ve always called this Pythagenpat constant z and will stick with that notation here, hoping that it will not be confused with the Enby z parameter):

Just eyeballing the data, two things are evident. The first is that Pythagenpat with any of the exponent choices is a fairly decent match at any RA value. The largest differences come at the extremes, as you’d expect, but the maximum difference is .013 between the Cigol and z = .27 estimate for the 4.5 R/15 RA team. This is a difference of a little over 2 wins over the course of a 162 game schedule, which isn’t terrible since it represents close to the maximum discrepancy. While I have not figured Enby parameters past 15 RG, at some point the differences would begin to decline as both Cigol and Pythagenpat estimates converge at a 1.000 W%. For comparison, a Pythagorean fixed exponent of 1.83 predicts a W% of .099 for the 4.5/15 team, almost 8 wins/162 off of the Cigol estimate.

The second thing that becomes apparent is that Cigol implies that as scoring increases, the Pythagenpat z constant is not fixed. For the lowest RPGs on the table (1-3 RA/G, which when combined with the 4.5 R/G is 5.5-7.5 RPG), .27 performs the best relative to Cigol. Once we cross 3.5 RA/G, .28 performs best, and maintains that advantage from 3.5-8 RA/G (8-12.5 RPG). Past that point (>8.5 RA/G, >13 RPG), .29 is the top-performer. This explains why studies have tended to peg z somewhere in the .28-.29 range, as such a value represents the best fit at normal major league scoring levels.

A nice way to see the relationship is to plot the difference (Pythagenpat - Cigol) relative to RA/G for each exponent:

The point at which all converge is 4.5 RA/G, where R = RA and all estimators predict .500. As you can see, the differences converge as we approach either a .000 or 1.000 W%, since there is a hard cap on the linear difference at those points.

This exercise gives us some direction on where to go, but it is not comprehensive enough to draw any conclusions. In order to do that, we need a more comprehensive set of data than simply fixing R/G at 4.5. To do so, I figured the Cigol W% for each interval of .25 runs scored and runs allowed between 1-15 RPG (removing all points at which R = RA). This yields 3,192 R/RA pairs, many of which are so extreme as to be absurd, which is the point.

In order to make sense of this data, we will need to simplify the scope of what we are considering, so let’s start by trying to ascertain the relationship between runs and wins if we assume that a linear model should be used. Basically, the idea here is that we should be able to determine a runs per win (RPW) factor such that:

W% = (R - RA)/RPW + .5

From this, we can calculate RPW given W%, R, and RA as:

RPW = (R - RA)/(W% - .5)

In its most simple form, this type of equation assumes a fixed number of runs per win; for standard scoring contexts, 10 is a nice, round number that does the job and of course has become famous as a sabermetric rule of thumb. But it has long been known that RPW varies with the scoring context, and usually sabermetricians have attempted to express this by making RPW a function of RPG. So let’s graph our data in that manner:

As you can see, RPW is not even close to being a linear function of RPG when extreme teams are considered. The bulk of the observations scattered around a nice, linear-looking function, but the outliers are such that the linear function will fail horrifically at the extremes. And when I say extremes, I really mean extremes. For instance, a 15 R/1 RA team is at 16 RPG, but would need much more than 16 marginal runs for a marginal win--Cigol estimates that such a team would need 28.11 marginal runs (as would it’s 1/15 counterpart). This should make sense to you logically--the team’s W% is already so high, and so many of the games blowouts, that you need to scatter a large number of runs around to move the win needle. This point represents the maximum RPW for the points I’ve included--the minimum is 3.69 at 1.25/1.

This is not to say that a linear model cannot be used to estimate W%; it is simply the case that one linear model cannot be used to estimate W% over a wide range of possible scoring contexts and/or disparities in team strength. Let’s suppose that we limit the scope of our data in each of these manners. First, let’s consider only cases in which a team’s runs are between 3-7 and its runs allowed are between 3-7. This essentially captures the range of teams in modern major league baseball and limits the sample to 272 data points:

I’ve taken the liberty of including a linear regression line, which now has the slope we’d expect (recall that Tango’s formula for RPW is .75*RPG + 3, and that this is consistent with Pythagenpat). The line is shifted up more than the best fit using normal teams or centering Pythagenpat at 9 RPG indicates, as there are still some extreme combinations here (for example, a 7 R/3 RA team is expected by Cigol to play .815 ball, well beyond anything we’ll ever see in modern MLB).

We can also try limiting the data in another way--only looking at cases in which the resulting records are feasible in modern MLB. For simplicity, I’ll define this as cases in which the Cigol W% is between .300 and .700 (yes, I realize the 2001 Mariners and 2003 Tigers fall outside of this range in terms of actual W%, but in fact it’s probably too wide of a band if we consider only expected W% based on R and RA). Here are the results from our Cigol data points, including all intervals of R and RA between 1-15 (this leaves us with 1,126 cases):

Once again, the slope of the line is the ballpark of what we observe with normal teams, but the intercept is still off, shifting the line up to get closer to the extreme cases. If we make both adjustments simultaneously (look only at cases between 3-7 R, 3-7 RA, and .3-.7 Cigol W%), we are left with 202 data points and this graph:

Closer still, with the slope now essentially exactly where we expect it to be, but the intercept still shifting the line upwards. Why is this happening? We know that it’s not because of a breakdown of Cigol when estimating W% for normal teams--as we saw in the previous post, Cigol is of comparable accuracy to Pythagenpat and RPW = .75*RPG + 3 with normal teams. What’s happening is that we are not biasing our sample with near-.500 team as happens when we observe real major league data. All of our hypothetical teams have a run differential of at least +/- .25. In 1996-2006, about one quarter of teams had run differentials of less than +/- .25.
The standard deviation of W% for 1996-2006 was .073; the standard deviation of Cigol W% for this data is .111. This illustrates the point that I and other sabermetricians who seek theoretical soundness make repeatedly--using normal major league full season data, the variance is small enough that any halfway intelligible model will come close to predicting whatever it is your predicting. Anything that centers estimated W% at .500 and allows it to vary as run differential varies from zero will work just fine. But if you run into a sample that includes a lot of unusual cases, or you start looking at smaller sample sizes, or a higher variance league, or try to extrapolate results to individual player data, then many formulas that work just fine normally will begin to break down.

A linear conversion between runs and wins breaks down in extreme cases for a few main reasons, including no bounds as is the case for real world W% [0,1] and the declining value of marginal runs on not one but two determinants--scoring context and differential between the two teams. There are some things we could attempt to do to salvage it, such as introducing run differential as a variable. If we did this, we could allow RPW to increase not only as RPG increases, but also as absolute value of RD increases.

Let’s use the pared down in both dimensions data set to find a RPW estimator using both RPG and abs(RD) as predictors. I simply ran a multiple regression and got this equation:

RPW = .732*RPG + .204*abs(R - RA) + 3.081

If we assume that a team has R = RA, then this equation is a very good match for our expected .75*RPG + 3, as it would reduce to .732*RPG + 3.081. This is encouraging, since it should work with normal teams and offers the prospect for better performance with extreme teams.

Remember, though, that “extreme” teams in the context of this dataset is a lot more restrictive than extreme teams in the broader set--we've limited the data to only 3-7 R, 3-7 RA, and .3-.7 Cigol W%. If we step outside of that range, the equation will break down again. For example, a 10 R/5 RA has a RPW of 15.081 according to this equation, which suggests a .832 W% versus the .819 expected by Cigol. While this is not a catastrophic error (and much better than the .851 suggested by .75*RPG + 3), don’t lose sight of the fact that the W% function is non-linear.

If we use this equation on the rounded to nearest .05 1996-2006 major league data discussed in the last post, the RMSE times 162 is 3.858--just a tad worse than the RPW version that does not account for RD, but still comparable to (in fact, slightly lower RMSE than) the heavy hitters Pythagenpat and Cigol. It produces a very good match for Cigol over this dataset, in fact closer to Cigol than is Pythagenpat with z = .28.

A similar equation to this one was previously developed by Tango Tiger (which is where I got the idea to use abs(R - RA) as the second variable; there might be some other ways one could construct the equation and achieve a similar outcome) and posted on FanHome in 2001:

RPW = .756*RPG + .403*abs(R - RA) + 2.645

In this version, the lower intercept is offset by the higher coefficient on RD.

We can also attempt to improve the RPW estimate by using a non-linear equation. The best fit comes from a power regression, and again I will limit this to the 3-7 RPG, .300-.700 Cigol W% set of teams to produce this estimate:

RPW = 2.171*RPG^.691

This may look familiar, because as I have demonstrated in the past, the Pythagenpat implied RPW at a given RPG for a .500 team is 2*RPG^(1 - z). Here the implied z value of .309 is higher than we typically see (.27 - .29), but the form is essentially the same.

Any linear approximation might work well near the RPG/team quality level where it was constructed, but will falter outside of that range. We could develop an equation based on teams similar to the 10/5 example that would work well for them, but we’d necessarily lose accuracy when looking at normal teams. Non-linear W% functions allow us to capture a wider range of contexts with one particular equation. We can push the envelope a little bit by using a non-linear estimate of RPW, but we’d still have to be very careful as we varied the scoring context and skill difference between the teams.

Assuming we are not just satisfied with an equation to use for normal teams, all of this caution is a lot to go through to salvage a functional form that still allows for sub-zero or greater than one W% estimates. Instead, it makes more sense to attempt to construct a W% estimate that bounds W% between 0 and 1 and builds-in non-linearity. This of course is why Bill James and many sabermetricians who have followed have turned to the Pythagorean family of estimators.