Wednesday, March 03, 2021

Rob Manfred: Run Killer

There are many “crimes against baseball” that one could charge Rob Manfred with, if one were inclined to use hyperbolic language and pretend that the commissioner had the sole authority to decide matters (I tend to neither but am guilty of seeking a more eye-catching post title):

* Attacking the best player in his sport for not going along with whatever horrible promotional scheme the commissioner had dreamed up

* Making a general mess of negotiations with the MLBPA

* Teaming up with authoritarian governments ranging from cities in Arizona to Leviathan itself to attempt to delay or prevent baseball from being played

* Claiming to be open to every harebrained scheme to reign in shifts, home runs, strikeouts, or whatever the current groupthink of the aesthetically-offended crowd finds most troublesome

From my selfish perspective as a sabermetrician, though, I will argue that the greatest crime of all is that he has rendered team runs scored and allowed totals unusable. The extra innings rule, which I doubt will ever go away even if seven-inning doubleheaders do, makes anything using actual runs scored incomparable with historical standards (in the sense of parameters of metrics rather than context). A RMSE error test of a run estimator against team runs scored? Can’t use it. Pythagenpat? Nope. Relief pitcher’s run average? Use with extreme caution.

Of course, I am not seriously suggesting that the ease with which existing metrics can be used should be a consideration in determining the rules of the game. But if you use these metrics, it is necessary to recognize that they are very much compromised by the rule.

So how can we adjust for it? I will start with a plea that the keepers of the statistical record (which in our day means sites like Baseball-Reference and Fangraphs) compile a split of runs scored and allowed in regulation and extra innings, as well as team innings pitched/batted in regulation and extra innings, and display this data prominently. Having it will allow for adjustments to be made that can at least partially correct, and more importantly increase awareness of the compromised nature of the raw data.

I want to acknowledge a deeper problem that also exists, and then not dwell on it too much even though it is quite important and renders the simple fixes I’m going to offer inaccurate. This is a problem that Tom Tango pointed out some time ago, particularly as it related to run expectancy tables – innings that are terminated due to walkoffs. In such innings, there are often significant potential runs left stranded on base, and so including these innings will understate the final number of runs one could expect. Tango corrected for this by removing these potential game-ending innings from RE calculations. It’s even more of a problem when it comes to extra innings, since rather than just being 1/18 of the half-innings of a regulation game, they represent 1/2 of the half innings of an extra inning game. This means that when we look at just extra innings, the number of potential runs lost upon termination of the game make up a significant portion of the total runs.

I gathered the 2020 data on runs scored by inning from Baseball-Reference, and divided each inning into regulation and extras. I did not, however, do this correctly, as the seven-inning doubleheader rule complicates matters. The eighth and ninth innings of a standard nine-inning game are played under very different circumstances than the eighth and ninth innings of a seven-inning doubleheader. I have ignored these games here, and treated all eighth and ninth innings as belonging to standard games, but this is a distortion. I didn’t feel like combing through box scores to dig out the real data as I’m writing this post for illustrative and not analytical purposes, but it buttresses my plea for the keepers of the data to do this. This is not solely out of my laziness (although I really don’t want to have to compile it myself), but also a recognition of the reality that many casual consumers of statistics will not even be cognizant of the problem if it is not made clear in the presentation of data.

Forging ahead despite these two serious data issues that remain unresolved (counting eighth and ninth innings of seven-inning doubleheaders as regulation innings rather than extra innings, and ignoring the potential runs lost due to walkoffs), I used the team data on runs by inning from Baseball-Reference to get totals for innings played and runs scored between regulation and extra innings. Note that these are innings played, not innings pitched, understating the true nature of the problem since almost most of the regulation innings include three outs (with the exception being bottom of the ninths terminated on walkoffs), a much greater proportion of the extra innings do not.

Still:



Expressed on the intuitive scale of runs per 9 innings, regulation innings yielded 4.80 runs, while extra innings were good for a whopping 8.40, a rate 75% higher. And no wonder, as Baseball Prospectus RE table for 2019 shows .5439 for (---, 0 out) and 1.1465 for (-x-, 0), a rate 111% higher. That we don’t see that big of a difference is due to an indeterminate amount to sample size and environmental differences (e.g. a high-leverage reliever is likely pitching in an extra inning situation, unless they have all been in the game already) but probably more significantly to the lost potential runs.

Considering all runs scored and innings, there were .5378 runs/inning or 4.84 R/9 in the majors in 2020, so even a crude calculation suggests a distortion of around 1% embedded in the raw data due to extra innings. Of course, the impact can vary significantly at the team level since the team-level proportion of extra innings will vary (1.25% of MLB innings played were extras, ranging from a low of 0.40% for Cincinnati to 3.44% for Houston).

How to correct for this? If the walkoff problem didn’t exist, I would suggest a relatively simple approach. After separating each team’s data into regulation and extra innings, calculate each team’s “pre-Manfred runs” as:

PMR = Runs in Regulation Innings + Runs in Extra Innings – park adjusted RE for (-x-,0)*Extra Innings

= Runs - park adjusted RE for (-x-,0)*Extra Innings

You could address the walkoff problem by adding in the park adjusted RE for any innings that terminated, but this gets tricky for two reasons:

1) it means that the simple data dividing runs and innings into “regulation” and “extra” is inadequate for the task; I doubt “potential runs lost at time of game termination” would ever find there way into a standard table of team offensive statistics

2) it overcorrects to the extent that the legacy statistics we have always used ignore the loss of those potential runs as well. Of course, the issue is more pronounced with extra innings as they represent a huge proportion of extra innings rather than a small one of regulation innings (and because the nature of Manfred extra innings increases the proportion of walkoffs within the subset of extra innings, since run expectancy is 111% higher at the start of a Manfred extra inning than at the start of standard inning).

Also note that when I say park-adjusted, I mean that the run expectancy would have to be park-adjusted not in order to normalize across environments, but rather to transform a neutral environment RE table to the specific park. I wouldn’t want to use “just” 1.1465 for Coors Field, but rather a higher value so that the PMR estimate can still be used in conjunction with our Coors Field park adjustment as the Rockies raw runs total would have been pre-2020. Another complication is that the standard runs park factor would likely overstate the park impact because of the issue of lost potential runs (they too would increase in expected value as the park factor increased).

The manner in which I attempted to adjust in my 2020 End of Season statistics was to restate everything for a team on a per nine inning basis, and then use the R/9 and RA/9 figures in conjunction with standard methodology. But this is also unsatisfactory – for instance, a Pythagorean estimate ceases to be an estimate of the team’s actual W%, but rather a theoretical estimate of what their W% would be if they played a full slate of nine inning games. The extra innings aren’t really a problem here, but the seven-inning doubleheaders are. As long as these accursed games exist, in order to develop a true Pythagorean estimate of team wins, one would have to estimate the exponent that would hold for a seven-inning game (Tango came up with a Pythagorean exponent of 1.57 through an empirical analysis; my theoretical approach would be to use the Enby distribution to develop theoretical W%s for seven-inning games for a representative variety of underlying team strengths in terms of runs and runs allowed per inning, then use this to determine the best Pythagenpat z value), and then use runs and runs allowed per inning rates to estimate separate W%s for seven- and nine-inning games, then weight these by the proportion of a team’s games that were scheduled of seven and nine innings.

I also took the unfortunate step of ignoring actual runs everywhere (as I mentioned in passing earlier, Manfred extra innings wreck havoc on reliever’s run averages), since the league averages are polluted by Manfred extra innings. Again, I am not advocating that sabermetric expediency drive the construction of the rules of baseball, but it is a happy coincidence that sabermetric expediency tracks in this case with aesthetic considerations. I should include a caveat about aesthetic considerations being in the eyes of the beholder, but the groupthink crowd that is now in the ascendancy rarely sees the need to do so. No surprise, as many also subscribe to the totalitarian thinking that is ascendant in the broader society. They’ll tell you all about it, and about what a terrible person you are if you dissent, for $25.19.



Wednesday, February 17, 2021

Akousmatikoi Win Estimators, pt. 5: Notes on Linear RPW Estimators

I had intended the last installment to be the end of this series, but Tom Tango left a comment on pt. 3 that led me down a rabbit hole. It’s of the frustrating variety, as I can’t figure out how to dig back to the surface and exploring it hasn’t led me to learn anything useful or interesting about baseball. Nevertheless, I find it interesting as a purely mathematical exercise and worth a brief post.

Tango pointed out that he had proposed some time ago the simple formula:

RPW = .75*RPG + 3 (Tango’s version was originally expressed as 1.5*RPG + 3 because he was defining RPG as the average for one team; I’ll keep with my definition here for consistency with the rest of the series)

I was aware of this formula and have mentioned it on this blog before, but it slipped my mind when writing these posts. You may recall from pt.3 that I offered the formula:

RPW = .777*RPG + 2.694

A brief reminder of how this was derived – I started by differentiating the Pythagenpat formula for a fixed z value of .282 with respect to run differential, and then plugging in the appropriate values for a .500 team to get RPW = 2*RPG^(1 – .282). Then I differentiated this formula with respect to RPG, and found the y = mx + b formula that would follow if you assumed a .500 team with the average of RPG and RPW of the 1961 – 2019 major leagues.

Of course, these formulas both take the form y = mx + b, where y is the estimated RPW and x is the team’s RPG. My formula has a higher slope, but a lower intercept. At 9 RPG for a team with a run differential of one per game, mine would estimate 97.72 wins for a team and Tango’s 97.62. This doesn’t seem like a lot, and in the grand scheme of things it isn’t, but if this kind of difference didn’t interest me than this blog wouldn’t exist.

Using the 1961-2019 data, and scaling the RMSE to 162 games, Tango’s formula has a RMSE of 4.0348, and mine 4.0370. Pythagenpat itself (z = 2.82) checks in at 4.0345, which is interesting – my RPW formula performs worse than Tango’s, but is derived directly from Pythagenpat, which performs better. Also interesting – that with real major league teams, Tango’s formula is about as accurate as you can get despite being very simple (relative to full-blown Pythagenpat) and having rounded coefficients.

Note, I’m emphasizing RMSE with real teams in this discussion because if you want theoretical accuracy over a wide range of possible team R/RA combinations, you’d just use Pythagenpat and be done with it. If you’re using a simplification that isn’t as accurate as an equally simple formula for the application you’ll most use it for, what’s the point?

My first thought as to why Tango’s formula had a lower RMSE than mine was that I had over-flattened the whole thing and was thus missing something. This series starts from the premise that Pythagenpat is the right model for win estimation, and then simplifies from there, often centering at the point of a team that scores and allows the same number of runs, in an average scoring context. But the teams in the sample data, while by definition centered there, vary in both axes (R/RA and RPG). Perhaps the linear approximation to the Pythagorean RPW for a .500 team misses some subtle change in the slope or intercept caused by this variation, and you could do better by running a regression on all the individual datapoints rather than using the single point estimate to derive the formula.

So I calculated the actual Pythagenpat RPW for all team (i.e. the value for RPW which when applied will estimate that the team’s W% will be equal to its Pythagenpat W%), which from pt.3 is:

RPW = (R – RA)/(R^x/(R^x + RA^x) - .5)

Where x is the Pythagenpat exponent corresponding to each team’s RPG

This is undefined when R = RA, but also from pt. 3, we can fill this gap with the calculus-derived formula for a team with R = RA:

RPW = 2*RPG/x

Having calculated the actual Pythagenpat RPW for all teams, we can run a linear regression with RPG as the independent variable to get an alternative formula, which winds up being:

RPW = .7818*RPG + 2.6823

Which is reasonably close to my formula (and thus an argument in favor of “centering” being a reasonable approach), but takes the slope higher and the intercept lower – in other words, moving away from Tango rather than closing the gap as we might have hoped/expected. This formula has a RMSE of 4.0364, still worse than Tango’s although better than mine.

At this point, the logical question is how far can we push the slope down and the intercept up to minimize RMSE? According to Excel solver, quite far:

RPW = .6528*RPG + 3.8760

This is a huge difference even from Tango’s formula, with the slope 13% lower and the intercept 29% higher. RMSE = 4.0334, ever so slightly lower than even Pythagenpat.

Why can we improve the accuracy of our W% estimate (at least working with this sample of the last sixty years of MLB), even while getting farther away from the RPW relationship suggested by Pythagenpat? Unfortunately, I don’t have a satisfying answer to that question. It’s tempting to say that we are losing something by eliminating the team’s quality (e.g. the difference and/or ratio between their runs and runs allowed), which Pythagenpat considers in addition to the level of run scoring (RPG). Of course, the best-fit cares not about quality either, and I don’t have a compelling explanation for why lowering the slope and raising the intercept would be related to that.

Wednesday, February 03, 2021

Akousmatikoi Win Estimators, pt. 4: Best Fits and Accuracy

Herein I’ll be using the expansion era (1961 – 2019) data for all major league teams to calculate the RMSE of the various Akousmatikoi win estimators we’ve discussed. This exercise is not intended to prove which metric is “better” or “more accurate” than the others, but rather is intended to give you a feel for the differences between the various approaches when used for major league teams. What I am calling the Akousmatikoi family of win estimators is built on the conceit that Pythagenpat is the “best” win estimator, and uses it as a jumping-off point to develop alternate/simplified methods that can be tied back to the parent method. As such, the contention here would be that if you want the “best” answer, you should use Pythagenpat. But if you are just looking at the standings and can apply something quick like the Kross method or 9.56 runs per win, how far off will you be for a normal team?

This is also not a “fair” accuracy test, in that we would develop the equation based on one set of data and test it on another; all of the approaches will be calibrated and tested on the 1961 – 2019 data. This will not favor one approach or another as they all will have the benefit of the same data set. I will also be including the best fits for a few of the approaches, which I think is interesting because in several cases I’ve developed the alternate to Pythagenpat shown in this series by taking the tangent line at the point where R = RA and applying it broadly. While this should work well enough as our real teams will be centered around this point, in some cases the best fit may be a little different, which might be interesting. In those cases I would probably recommend using the best fit, since the point of any of the simpler methods would be to use with real teams; there’s no need to get hung up on theoretical centering at .500.

If this is not a “fair” accuracy test, then what exactly is the point? The point is to provide information that can be us used to inform the decision of which shortcut to Pythagenpat you choose to use. There is no right or wrong answer. For example, the Kross formulas are about as simple as it gets. Are they accurate enough as a win estimator to use for quick and dirty estimates? That depends on how dirty you’d like it to be (i.e. your own determination of what level of error is acceptable), and about your own tradeoff between simplicity and accuracy.

In another sense, though, it is a fair test, because each method is operating under the same constraints (although some will benefit from having best fits determined directly while others won’t operate under that luxury). In deciding which model to use, it does make sense to have a final check when all are calibrated on all of the available data.

I will not only calculate the RMSE of the estimates compared to W%, I will also show it compared to Pythagenpat. This is not meant to imply that Pythagenpat is correct and any deviation from it is wrong, but since the presentation in this series has relied on each method’s own relationship to Pythagenpat, I think it’s of interest to identify which approaches do the best job of approximating Pythagenpat. And if you do choose to start with Pythagenpat as your win estimator of choice when attempting to be as accurate as possible, it might follow that one of the criteria you’d consider when deciding which quicker method to use is how well it tracks Pythagenpat.

I will divide the methods to be tested as follows:

Pythagorean

These will all take the form W% = R^x/(R^x + RA^x).

Pyth1: this will be Pythagenpat, where x = RPG^.282 (best-fit)

Pyth2: Fixed exponent Pythagorean where x = 2

Pyth3: Fixed exponent Pythagorean where x = 1.847 (best-fit)

Port: Davenport/Woolner’s Pythagenport, where x = 1.5*log(RPG) + .45

Cig1: formula I derived from Cigol, where x = 1.03841*RPG^.265 + .00114*RD^2

Cig2: formula I derived from Cigol to follow the Pythagenpat form, where z = .27348 + .00025*RPG + .00020*(R - RA)^2

Ratio-Based

Kross: these are the nifty equations developed by Bill Kross; when R > RA, W% = 1 – RA/(2*R); when R <= RA, W% = R/(2*RA)

Ratio: this is the general case that resolves to Kross’ equation when x = 2, although it’s not really the general-case at all since I’m using the Pythagorean best-fit of x = a = 1.847 to get these equations:

when RR > = 1, (a*RR – a + 1)/(a*RR – a + 2) = (1.847*RR - .847)/(1.847*RR + .153)

when RR < 1, 1/(a/RR – a + 1)/(1/(a/RR – a + 1) + 1) = 1/(1.847/RR - .847)/(1/(1.847/RR - .847) + 1)

Differential-Based

FixRPW: if you force an equation of the form W% = ((R-RA)/G)/RPW + .5, the best fit is when RPW = 9.71, which is equivalent to .103*((R – RA)/G) + .5

PythRPW: RPW = 2*RPG^.718, the Pythagenpat result

LinRPW: RPW = .777*RPG + 2.694, the tangent line to the Pythagenpat RPW at the average RPG/RPW for the period

BVL: W% = .9125*(R – RA)/(R + RA) + .5; the form proposed by Ben Vollmayr-Lee, although I’m using the best fit for this dataset and also rounding the intercept to .5 (it actually comes out to .49978)

BVLPyth: W% = .923*(R – RA)/(R + RA) + .5; the same equation, but using the Pythagorean best-fit exponent rather than the empirical best-fit

The RMSE shown here is actually the overall RMSE of the W% estimate, scaled to 162 games. So for each team the error is (W% - estimator)^2; the final value shown is 162*sqrt(average error):


Pythagenport has a slight lead over Pythagenpat, and they are followed very closely by the estimates based on Cigol and runs per win formulations that take RPG into consideration. In the Akousmatikoi family, that is the accuacy seperator (such that it is; the overall range of RMSE values is narrow) – considering the specific run environment for the team either through a Pythagorean approach (as in the case of Pythagenport, Pythagenpat, and their Cigol knockoffs), or a two operation (multiplication and addition) or power RPW function (as in the case of LinRPW and PythRPW). The next little cluster of RMSE includes a fixed Pythagorean exponent less than 2, the BenV-L formulas (which take the team’s RPG into account but only with a simple multiplicative function, not a y = mx + b form), and the intrepid Kross formulas. A fixed RPW linear approach is next, and then surpisingly, the Kross formuals actually outperform their antecedent (in the Akousmatikoi conceit, not reality) standard Pythagorean and “Ratio”, which uses a non-2 Pythagorean exponent.

This finding is surprising, and suggests that the Ratio approach should be discarded, as it’s arguably the most complicated to calculate of all the options we’ve looked at (despite being, at least from one perspective, a mathematical “simplification” of the Pythagorean relationship). But why does it perform worse than the Kross method, which ties to x = 2, while the ratio approach ties to x = 1.847, which has a lower RMSE than using x = 2?

To answer that question, I started by looking at the most extreme teams in terms of run ratio in the period, and found what I consider to be a satisfactory answer. The team with the highest run ratio in the expansion era is the 1969 Orioles (779/517 = 1.51). Incidentally, they do not have the highest Pythagenpat W% in the era, a distinction that goes to the 2001 Mariners by a hair’s breadth over the 1998 Yankees; those teams had run ratios of 1.48 and 1.47 respectively, but they had RPGs 20% and 25% higher respectively, which made their run ratios convert to higher win ratios.

The Orioles EW% using a fixed Pythagorean exponent of 1.847 is .681. This is the calculation that “Ratio” is supposed to flatten, but it predicts a W% of just .659. I think that since this formula is developed by differentiating win ratio, and then using the estimated win ratio to calculate an estimated W%,,the linear approximation does poorly. Win ratios have a much wider range than winning percentages; if we consider .300 - .700 a reasonably range for the expected (as oposed to actual) W%s of major league teams, this is a win ratio range of .429 to 2.333. Drawing the tangent line for the point where RR = WR = 1 leaves a lot of room outside of this range where teams will fall.

The Kross formula performs better because even though (in the Akousmatikoi sense) it starts from a less accurate proposition that x = 2, it will produce a wider range of win ratio estimates. The Kross estimated win ratio for a team with a run ratio of 1.51 is 2*1.51 – 1 = 2.02, while the other approach estimates 1.847*1.51 - .847 = 1.94.

The other RMSE comparison I want to make is to Pythagenpat. Again, I am not trying to say that Pythagenpat is the standard by which all win estimators should be judged. However, it (or something similar like Pythagenport) is the most accurate version of the Pythagorean relationship that has yet been published, and since this series is an examination of alternative win estimators mathematically related to Pythagorean methods, I think it is worthwhile to see which of these alternatives hew most closely to the starting point:


The fact that the lowest RMSE is for the first Cigol estimate tells us only what we can see by observing it – that it is essentially the same formula with added terms to attempt (perhaps in vain) to increase accuracy at the extremes (this formula sets the Pythagenpat exponent to .27348 + .00025*RPG + .00020*(R - RA)^2 rather than .282). That next in line are two more close cousins, Pythagenport and the other Cigol estimate, is comforting but also uninteresting.

You’ll notice that the ranking of estimators in terms of agreement with Pythagenpat closely resembles their ranking in accuracy predicting W%, so the first grouping of methods that most closely track Pythagenpat while actually being simpler to compute are the two RPW estimates that use a “complex” function – either the power relationship or the y = mx + b form.

The next cluster is an optimized fixed Pythagorean exponent and the Ben V-L approaches, which are equivalent to RPW as a multiplier of RPG (no y-intercept term). This implies that if you want to imitate Pythagenpat for normal teams, it’s most important to consider the impact of scoring level on the runs to wins conversion than it is to consider the non-linearity of the runs to wins conversion. The remarkable Kross formulas are next, with the others (a fixed RPW value, Pythagorean with x = 2, and the worthless “Ratio” approach) lagging the field.

I don’t have any grand conclusion to draw from this series, which is appropriate since as I’ve acknowledged previously, there really is nothing new here. It has served of a good reminder for me as to how various win estimators are connected, and hopefully has collected in one place observations of the connections that were previously published but strewn across multiple sources.

Trivia to close: I feel like I should have been aware of this previously, but did you know that (at least using a Pythagenpat z constant = .282), the 2019 Tigers had the worst EW% of the expansion era? They did not have the worst run ratio, a distinction that fell to the expansion 1969 Padres, but as we saw with the 1969 Orioles on the other end of the spectrum, the low RPG made that run ratio translate into a better win ratio than a couple of teams in higher scoring enviornments.

Four teams had sub-.310 EW%s (an arbitrarty cutoff as I think these four are interesting):

1. At .307, the expansion 1962 Mets, widely famous as the worst modern team and with the worst actual W% of the bunch at .250, are not a surprise.

2. At .305, the aforementioned 1969 Padres, a team I had never thought of as being historically bad for an expansion team. They actually went 52-110, a full twelve games better than the Mets, outplaying their Pythagenpat bytwo and a half games, whereas the ‘62 Mets underplayed theirs by nine. That explains it.

3. At .2998, the 2003 Tigers, who at 43-119 just missed matching the Mets record for most modern losses, although the Mets only played 160 games (40-120). This team is widely acknowledeged as one of the worst of all-time, but they underplayed their Pythagenpat by five and a half games.

4. At .2997, the 2019 Tigers, who only underplayed their Pythagenpat by one and a quarter games, going 47-114 and escaping historical notice. A big help was that they weren’t alone languishing at the bottom, as the phenomenon of “tanking” has been widely called out, and a number of teams over the last decade have put up truly terrible W-L records, including three others which lost 105 or more in 2019. The 2018 Orioles also served to take the heat off, as their 47-115 record was worse (they underplayed their Pythagenpat expectation by seven and a half games – they were only the thirteenth-worst of the expansion era at .337).

Wednesday, January 20, 2021

Akousmatikoi Win Estimators, pt. 3: Differential-Based Simplifications

Simplifying the Pythagorean estimate by focusing on run differential is not as intuitive as using run ratio, since of course Pythagorean constructs are based on the latter rather than the former. The upfront calculus is messier, the relationships harder to explain – I’ve covered all this before, and so I went back to my previous work rather than go through the hassle of re-deriving it. However, while the calculus is messier, the end result is simpler, and give you relationships that you might actually choose to use in place of the full Pythagorean treatment if you want something quick and simple to punch into a calculator.

The easiest way I’ve found to demonstrate this approach (which is not to say that a simpler derivation doesn’t exist) is to use the following definitions. To make this easier to follow, I’m going to define R as R/G and RA as RA/G:

RR = R/RA

RD = R - RA

RPG = R + RA

Given these relationships, we can relate run ratio and run differential using RPG:

RR = (RD + RPG)/(RPG – RD)

If you need a proof of that, replace RD and RPG with the equations above and you will see that:

RR = (R – RA + R + RA)/(R + RA – (R – RA)) = (2*R)/(2*RA) = R/RA

In the last installment, we differentiated Pythagorean win ratio with respect to run ratio; here, I want to differentiate Pythagorean winning % with respect to run ratio, which will look slightly messier. Starting from the Pythagorean relationship:

W% = RR^x/(RR^x + 1)

we differentiate to get:

dW%/dRR = ((RR^x + 1)*(x*RR^(x – 1)) – RR^x*(x*RR^(x – 1)))/(RR^x + 1)^2

= (x*RR^(x – 1))*((RR^x + 1) – RR^x)/(RR^x + 1)^2

dW%/dRR = x*RR^(x – 1)/(RR^x + 1)^2

That’s well and good, but it doesn’t tell us anything about the relationship between Pythagorean W% and run differential. To bridge that gap, we can differentiate run ratio with respect to run differential and multiplying this result with dW%/dRR which we just derived:

(dW%/dRR)*(dRR/dRD) = dW%/dRD

Since we know that RR = (RD + RPG)/(RPG – RD), we get:

dRR/dRD = ((RPG – RD)*1 – (RD + RPG)*(-1))/(RPG – RD)^2

= 2*RPG/(RPG – RD)^2

If you slogged through any of my previous treatments of this topic, I must apologize – I missed some simplifications of both of these formulas before. The final math worked out the same, but it was needlessly difficult to follow. In any event, we now have:

dW%/dRD = (x*RR^(x – 1)/(RR^x + 1)^2) * (2*RPG/(RPG – RD)^2)

= 2*RPG*x*RR^(x -1)/((RR^x + 1)^2*(RPG – RD)^2)

This ends up being expressed in terms of marginal wins per margin run. The classic sabermetric presentation is marginal runs per margin win (Runs Per Win, ala the rule of thumb that 10 runs = 1 win). So we can take the reciprocal to get this formula for Runs Per Win from Pythagorean:

Pythagorean RPW = (RR^x + 1)^2*(RPG – RD)^2/(2*RPG*x*RR^(x - 1))

Before moving forward, one thing I should note is that this function does not allow us to match the Pythagenpat W% at a given point for a set of inputs. For example, if you plug in 5 runs scored and 4 runs allowed, you will get a dW%/dRD of .1071. You might then reasonably assume that if you take the team’s run differential of 1 times .1071 plus a y-intercept (which by definition would be .5 since Pythagorean will estimate a .500 W% when R = RA), you will get a restatement of the team’s Pythagorean W%. But in fact you will get .6071, while Pythagorean would estimate 5^2/(5^2 + 4^2) = .6098. The differences will be more extreme if you put in more extreme teams.

Alas, I do not have a simple mathematical explanation for why this is the case. However, I will note that we don’t need calculus to calculate the actual Runs Per Win value from Pythagorean for any given set of R, RA, and x that we input. We can simply calculate this by noting that:

W% = RD/RPW + .5

Plugging in Pythagorean relationships and solving for RPW:

R^x/(R^x + RA^x) = RD/RPW + .5

R^x/(R^x + RA^x) - .5 = RD/RPW

RPW = (R – RA)/(R^x/(R^x + RA^x) - .5)

For our 5 R/4 RA team, this results in (5 – 4)/(5^2/(5^2 + 4^2) - .5) = 9.1111 RPW or .1098 wins/run, which of course is the right answer. In terms of simplifying the Pythagorean relationship, though, this is useless – all we’ve done is rearrange terms to calculate runs per win for a given set of inputs. How we could use this to produce a flatter win estimator is to eliminate the use of a team’s R and RA figures and instead replace with a function that only considers the scoring level (i.e. RPG).

This is what the rule of thumb that 10 runs = 1 win does, substituting a general rule for specifics about the team’s actual location on a run/win curve with respect to the marginal value of an additional run scored or allowed. As such, since it’s establishing a rule that will be applied to all teams, it makes sense to center it at the point which will be closest to an average team – at the point where R = RA.

In other words, we will be developing a RPW equation that can be applied generally, but will be defined based on the relationship at the point where R = RA for a given RPG. Using our formula above for RPW based on rearrangement of terms in the Pythagenpat relationship, we can substitute R = RA wherever we see one of those terms and...reduce the equation to 0/0, as the denominator R – RA equals 0 when R = RA, and the numerator R^x/(R^x + RA^x) - .5 = 0 when R = RA.

However, this is where the equation for RPW derived using calculus can step in, and tell us what the theoretical RPW value is at that point. Recall from above that:

RPW = (RR^x + 1)^2*(RPG – RD)^2/(2*RPG*x*RR^(x – 1))

If we assume that R = RA, then RR = 1 and RD = 0, and this simplifies nicely to:

RPW = (1^x + 1)^2*(RPG – 0)^2/(2*RPG*x*1^(x – 1))

= 2^2*RPG^2/(2*RPG*x) = 4*RPG^2/(2*RPG*x) = 2*RPG/x

The first immediate implication is that for our special Pythagorean case where x = 2, RPW = RPG. Since the general case is:

W% = (R/G – RA/G)/RPW + .5

RPW = RPG is equivalent to saying that (after all of the game denominators cancel out):

W% = (R – RA)/(R + RA) + .5

What if x is a constant other than 2, like the value of x = 1.847 that minimizes RMSE for expansion-era major league teams? Then RPW = 2*RPG/1.847 = 1.083*RPG, and we could say that:

W% = (R/G – RA/G)/(1.083*(R/G + RA/G)) + .5

= (1/1.083)*(R/G – RA/G)/(R/G + RA/G) + .5

= .923*(R – RA)/(R + RA) + .5

More generally:

W% = (x/2)*(R – RA)/(R + RA) + .5

This form is one that was proposed by Ben Vollmayr-Lee as .91*(R – RA)/(R + RA) + .5 (I’ve rewritten his formula to match the format I’m using), which would imply a Pythagorean x = 1.82. I would suggest that the Kross equations and the Vollmayr-Lee equation are the ultimate in terms of simplified win estimators from the Akousmatikoi family (again, Kross and Vollmayr-Lee did not start from Pythagorean as we have; by including these estimators in the Akousmatikoi family, I only mean to suggest that they are mathematically related to Pythagorean, not that their creators didn’t independently discover them).

Remember that for the expansion era, the average RPG is 8.83, which would imply that the long-term RPW value is approximately 1.083*8.83 = 9.56; close enough to ten that you can see why we might have a rule of thumb, although ten runs would imply a 4.5% higher scoring context (10/1.083 = 9.23) than observed in the expansion era.

We could also use a hybrid approach, in which we allow each team’s RPW according to the formula that applies when R = RA to vary based on their RPG, but not on how that RPG breaks down into runs scored and allowed. In order to do this, we’d return to RPW = 2*RPG/x, but instead of setting x equal to a constant, use a custom value for x. Of course, my suggested value would be the Pythagenpat estimate of x, namely:

x = RPG^z, where z = .282 for now (value that minimizes RMSE for the expansion era)

Substituting this equation for x, we find a general case for a variable z that:

RPW = 2*RPG/(RPG^z) = 2*RPG^(1 – z)

Or for the specific case that z = .282:

RPW = 2*RPG^.718

We could further flatten this equation by approximating it with a linear function. Recall from the last section that we can write a tangent line in the form:

y – y1 = m(x – x1) where x1 and y1 and the x and y values for the point in question, and m is the slope of the curve at x1.

To apply this approach to develop a linear approximation of the above equation, we first need the slope of the RPW function 2*RPG^(1 – z). Differentiating with respect to RPG yields 2*(1 – z)*RPG^(-z).

Let’s center this at the point corresponding to our expansion-era averages, so x = 1.847 (For the eagle-eyed readers or those checking my math (always welcomed!) I’m choosing to use the value that minimizes RMSE to be consistent with earlier applications rather than the value of 1.848 that corresponds to 8.83 RPG using the equation directly). In this case x1 will be 8.83 RPG, and y1 = 2*8.83^.718 = 9.555 At 8.83 RPG, m will be 2*(1 - .282)*8.83^(-.282) = .777, so we have:

RPW – 9.555 = .777*(RPG – 8.83)

which simplifies to:

RPW = .777*RPG + 2.694

We’ve now developed two RPW estimates, using only RPG as a dependent variable, one with a y-intercept and one without, by trying to flatten the Pythagorean relationships wherever possible. Which is more accurate? One would assume that it’s the version with y-intercept, but even if it is, how much more accurate for normal teams, and how does this tangent line based approach compare with the best fit for an equation of the form RPW = m*RPG + b? Those are questions we’ll explore in the final installment.

References

Ben Vollmayr-Lee’s article on win estimation formulas:

http://www.eg.bucknell.edu/~bvollmay/baseball/pythagoras.html

Ralph Caola published multiple articles on using differentiation with the Pythagorean formula, as well as an (to the best of my knowledge) unpublished article he shared with me on double the edge.

His articles can be found in the 11/2003, 2/2004, and 5/2004 issues of By the Numbers.

https://sabr.org/research/statistical-analysis-research-committee-newsletters/

Kevin D. Dayaratna and Steven J. Miller explored the relationship that RPW = 2*RPG/x in the 5/2012 issue of BTN. I had known and used that one for a long time, thanks originally to a post by David Glass on rec.sport.baseball. Unfortunately a quick search did not yield a live link to Glass’ post.

Wednesday, January 06, 2021

Akousmatikoi Win Estimators, pt. 2: Ratio-Based Simplifications

We will begin our endeavor to simplify/”flatten” the Pythagenpat exponent by looking at approaches that maintain the use of run ratio as the chief independent variable in the W% estimate. Before jumping into that, I should note that we could think of the first flattening as being moving from a variable exponent like Pythagenport/pat to a fixed exponent. However, since the latter came first historically, and is easier to explain conceptually, I didn’t approach it in that manner.

We could also make flattening the Pythagenpat exponent itself the first step. My definition of “flatten” for the sake of this discussion is to replace exponents with multiplication where possible. We could start by trying to convert z = RPG^.282 into a linear formula. I’ve skipped this step because we would still be left with exponents when we go to calculate the winning percentage. While simplifying the equations will generally cost us some theoretical and a tiny bit of empirical accuracy, it will gain us ease of calculation. Replacing RPG^.282 with a linear equation wouldn’t really make the calculation any easier, but more importantly I don’t think it would result in an interesting alternative methodology to estimate W%. It would just result in a very slightly easier to calculate, less accurate Pythagenpat equation.

I previously wrote the general Pythagorean relationship as:

W% = R^x/(R^x + RA^x)

but note that we could equivalently define win ratio (W/L = WR) as:

WR = RR^x where RR = run ratio = R/RA

I will alternate between these two ways of writing the equation depending on whichever is most convenient for what we’re trying to do. In this case, I want to see what happens if we get rid of the exponent. The approach I will take is to replace the current function with a simplified function that produces the same result for a particular point. Of course we cannot replace the function with another that will produce the same results at all points, or even expect to find one that would produce the same results at multiple points. But we will be able to find a function that produces the same result at a given point.

Mathematically, this will the tangent line to the curve at that point. At that point, the tangent line intersects the curve and has the same slope as the curve. We will determine the slope by differentiating the function, and we will then determine the tangent line using the point-slope equation for the line as a starting point (to me, this is the most intuitive way to write the equation of a line, and if necessary we can simplify later). The point-slope equation of a line is:

y – y1 = m(x – x1)

where x1 and y1 and the x and y values for the point in question, and m is the slope of the curve at x1.

I’m going to switch to referring to the Pythagorean exponent as “a”, so that it doesn’t get confused with x, our independent variable (which is run ratio). So if we want the tangent line for the equation WR = RR^a, we first differentiate with respect to run ratio to get:

dWR/dRR = a*RR^(a – 1)

Now we just need to determine x1 and y1. Since we are going to be applying simplified win estimation formulas across the entire spectrum of possible team performance, it makes the most sense to look at a team with R = RA, that we expect to have a .500 W%. Picking the average will likely result in the most accurate simplified equation over the entire spectrum of teams.

Of course, by simplifying the equation, we will lose accuracy (at least when the result of our simplified equation is compared to the “parent” equation – we hope in this case that the Pythagorean form is more accurate or else the entire premise of Akousmatikoi win estimators is moot). However, the simplified equation will match the parent equation precisely at chosen point, and will produce very similar results near the chosen point, so picking a point in the center of the distribution should maximize accuracy.

So, if R = RR, then RR equals one, and so our slope is simply equal to a, which is the Pythagorean exponent. Our x value is RR, which is 1, and our y is the WR corresponding to a RR of 1, which is 1 for any value of a as WR = RR^a. So in point slope form:

y – 1 = a*(x – 1)

which can simplify to

y – 1 = a*x – a

y = a*x – a + 1

Remembering what y and x represent in this case:

WR = a*RR – a + 1

For a fixed Pythagorean exponent a = 2:

WR = 2*RR – 2 + 1 = 2*RR - 1

This relationship suggests that if a team scores 10% more runs than it allows, it should win 20% more games than it loses. In the 1984 Baseball Abstract, Bill James wrote:

Another method that I have never tested but which I suspect would work as well as the others would be just to “double the edge”; that is, if a team scores 10% more runs than their opponents, they should win 20% more games than their opponents. If they score 1% more runs, they should win 2% more games. That method would probably work as well or better than the Pythagorean approach.

To my knowledge that’s the extent of James’ writings on this subject, so I can’t say whether he either explicitly or implicitly inferred “double the edge” from the Pythagorean formula, or whether he came across it some other way. Either way, it can be directly related back to his own Pythagorean method.

If WR = a*RR – a + 1, and we already know that by definition W% = WR/(WR + 1), then we can convert this into a W% estimate as:

W% = (a*RR – a + 1)/(a*RR – a + 1 + 1) = (a*RR – a + 1)/(a*RR – a + 2)

For the special case of a = 2, this becomes:

W% = (2*RR – 2 + 1)/(2*RR – 2 + 2) = (2*RR – 1)/(2*RR) = 1 – 1/(2*RR) = 1 – 1/(2*R/RA)

= 1 - RA/(2*R)

This special case was noted by Bill Kross, and got a brief callout in The Hidden Game of Baseball. Kross also noticed that this method would not produce the same result for teams that had inverse runs and runs allowed. A team that scores 5 and allows 4 runs would have an estimated W% of 1 - 4/(2*5) = .600, but a team that scores 4 and allows 5 would have an estimated W% of 1 – 5/(2*4) = .375.

So Kross proposed that that for the case in which runs scored < run allowed, the W% would be estimated as R/(2*RA), which would produce 4/(2*5) = .400 for the the team scoring 4/allowing 5. Not only is it satisfying to get a consistent result for the two sides of the same coin, this modification significantly improves the accuracy when comparing empirically comparing estimated to actual W%s.

Expressing this inversion in terms of the general case above, in a case where R < RA, the estimated WR would be:

WR = 1/(a*1/RR – a + 1) = 1/(a/RR – a + 1)

and the W% would be:

W% = 1/(a/RR – a + 1)/(1/(a/RR – a + 1) + 1)

There are some ways to make that look nicer, but I don’t think any of them are sufficiently nice to bother with here. For the specific case when a = 2, Ralph Caola has suggested this formula as a clean way to boil the Kross equations down to one line:

W% = (R - RA)/(R + RA + ABS(R - RA)) + .5

You might be reading this and objecting “I thought you were going to simplify the Pythagorean relationship, but nothing about the equation with all of those reciprocals above looks simpler”. That is true – other than the special case when a = 2 and the Kross equations apply, this is not an easier way to calculate an estimated winning percentage provided you have a modern calculator or computer. However, it is “simpler” mathematically in the sense that we have eliminated exponents. Of course, in so doing we have lost some accuracy, particular for extreme cases. Next time, instead of starting with run ratio, we’ll start with run differential and see what shakes out of Pythagorean and how it compares to methods that have been developed independently of Pythagorean.

Wednesday, December 16, 2020

Akousmatikoi Win Estimators, pt. 1: Pythagorean

This series will be a brief review of the Pythagorean methodology for estimating team winning percentage from runs scored and runs allowed, and will examine a number of alternative winning percentage estimators that can be derived from the standard Pythagorean approach. I call it a “review” because I will not be presenting any new methods – in fact, not only was everything I plan to cover discovered and published by other sabermetricians, but it is all material that I have already written about in one form or another. When recently posting old articles from my Tripod site, I saw how poorly organized the section on win estimators was, and decided that I should try to write a cleaner version that focuses on the relationship between the Pythagorean approach and other mathematical forms for win estimators. This series will start from the assumption that Pythagorean is a useful model; I don’t think this is a controversial claim but a full treatment would need to establish that before jumping into mathematical offshoots.

By christening his win estimator the “Pythagorean Theorem” due to the three squared terms in the formula reminding him of the three squared terms Pythagoras discovered defined the dimensions of right triangles, Bill James made it irresistible for future writers to double down with even more ridiculous names. I am sure any students of Greek philosophy are cursing me, but I am calling this the “Akousmatikoi” family of win estimators because Wikipedia informs me that Akousmatikoi was a philosophical school that was a branch of the larger school of Pythagoreanism based on the teachings of Pythagoras. A rival branch, the Mathematikoi school, was more focused on the intellectual and mathematical aspects of Pythagorean thought, which would make it a better name for my purposes, but even I think that sounds too ridiculous. I’ve also jumbled the analogy as James’ Pythagorean theorem is the starting point for the Akousmatikoi family of estimators but Pythagoras begat this school of philosophy, but not the other way around. Of course, James’ Pythagorean theorem really has nothing to do with Pythagoras to begin with, so don’t think too hard about this.

Before I get started, I want to make certain that I am very clear that I’m introducing nothing new and that while I will derive a number of methods from Pythagorean, the people who originally discovered and published these methods used their own thought processes and ingenuity to do so. They did not simply derive them from Pythagorean. I will try to namecheck them throughout the series, but will also do it here in case I slip up – among the sabermetricians who developed the methods that I will treat as Pythagorean offshoots independently are Bill Kross, Ralph Caola, and Ben Vollmayr-Lee.

I also want to briefly address the win estimators that are in common use that are not part of what I am calling the Akousmatikoi family. The chief one that I use is Cigol, which is my implementation of a methodology that starts with an assumed run distribution per game and calculates W% from there (I say “calculates” rather than “estimates” because given the assumptions about per game and per inning run distribution functions, it is a logical mathematical derivation, not an estimate. Of course, the assumptions are just that). Cigol is very consistent with the results of Pythagenpat for teams across a wide range of scoring environments, but is its own animal. There are also approaches based on regression that offer non-Akousmatikoi paths to win estimates. If you regress on run differential or run ratio, your results will look similar to Akousmatikoi, but if you take the path of Arnold Soolman’s pioneering work and regress runs and runs allowed separately, or you use logistic regression or another non-linear methodology, your results won’t be as easily relatable to the Akousmatikoi methods.

It all starts with Pythagorean, which Bill James originally formulated as:

W% = R^2/(R^2 + RA^2)

The presence of three squared terms reminded James of the real Pythagorean theorem for the lengths of the side of right triangle (A^2 = B^2 + C^2) and gave us the charmingly wacky name for this method of win estimation. James would later complicate matters by noting that a lower exponent resulted in a slight increase in accuracy:

W% = R^1.83/(R^1.83 + RA^1.83)

Later research by Clay Davenport and Keith Woolner would demonstrate that a custom exponent, varying by run environment, would result in better accuracy in extreme situations. Pete Palmer had long before demonstrated that his linear methods increased in accuracy when considering run environment; “Pythagenport” brought this insight to Pythagorean, which we’ll now more generally express as:

W% = R^x/(R^x + RA^x)

Where Pythagenport estimates x = 1.5*log(RPG) + .45, where RPG = (R + RA)/G

Davenport and Woolner stated that the accuracy of Pythagenport was untested for RPG less than 4. A couple years later, David Smyth had the insight that 1 RPG was a situation that could only occur if the score of each game was 1-0, and that such a team’s W% would by definition be equal to R/(R + RA). This implies that the Pythagorean exponent must be 1 when RPG = 1. Based on this insight, Smyth and I independently developed a modified exponent which was constructed as:

x = RPG^z

where z is a constant generally in the range of .27 - .29 (I originally published as .29 and have tended to use this value out of habit, although if you forced me to pick one value and stick to it I’d probably choose .28)

This approach produced very similar results to Pythagenport for the RPG ranges tested by Davenport and Woolner, and returned the correct result for the known case at RPG = 1. It has come to be called “Pythagenpat”.

Using Cigol, I tried to develop a refined formula for Pythagorean exponent using data for truly extreme temas. I loosened the restriction on requiring x = 1 when RPG = 1 to be able to consider a wider range of models, but I wasn’t able to come up with a version that produced superior accuracy with a large dataset of actual major league team-seasons to the standard Pythagenpat construction. My favorite of the versions I came up are below, which I won’t dwell on any longer but will revisit briefly at the end of the series. The first is a Pythagenpat exponent that produces a Pythagorean exponent of 1 at 1 RPG; the second is a Pythagorean exponent that does not adhere to that restriction.

z = .27348 + .00025*RPG + .00020*(R - RA)^2

x = 1.03841*RPG^.265 + .00114*RD^2

There are several properties of a Pythagorean construct that make it better suited as a starting point (standing in for the “true” W% function, if there could ever be such a thing) than some of the other methods we’ll look at. I have previously proposed a list of three ideal properties of a W% estimator:

1. The estimate should fall in the range [0,1]
2. The formula should recognize that the marginal value of runs is variable.
3. The formula should recognize that as more runs are scored, the number of marginal runs needed to earn a win increases.

As we move throughout this series, we will make changes to simplify the Pythagenpat function in some ways; in my notes I called it “flattening”, but that’s not a technical term. Basically, where we see exponents, we will try to convert into multiplication, or we will try to use run differential in place of run ratio. As we “flatten” the functions out, we will progressively lose some of these ideal properties, with the (usual) benefit of having simpler functions.

Throughout this series I will make sporadic use of the team seasonal data for the expansion era (1961 – 2019), so at this point I want to use this dataset to define the Pythagorean constants that we’ll use going forward. Rather than using any formulaic approach, I am going to fix x and z for this period by minimizing the RMSE of the W% estimates for the teams in the dataset. I will also use the fixed Pythagorean exponent of 2 throughout the series as it is easy to calculate, reasonably accurate, widely used, and mathematically will produce some pleasing results for the other Akousmatikoi estimators.

Using this data, the average RPG is 8.83, the value for x that minimizes RMSE is 1.847, and the z value that minimizes RMSE is .282. Note that if we used the average RPG to estimate the average Pythagorean exponent, we’d get 1.848 (8.83^.282), which doesn’t prove anything but at least it’s not way off.

Thursday, December 03, 2020

Palmerian Park Factors

The first sabermetrician to publish extensive work on park effects was Pete Palmer. His park factors appeared in The Hidden Game of Baseball and Total Baseball and as such became the most-widely used factors in the field. They continue in this role thanks to their usage at Baseball-Reference.com.

Broadly speaking, all ratio-based run park factors are calculated in the same manner. The starting point is the ratio of runs at home to runs on the road. There are a number of different possible variations; some methods use runs scored by both teams, while others (including Palmer’s original methodology in The Hidden Game use only the runs scored by one of the teams (home or road)). The opportunity factor can also vary slightly; many people just use games, which is less precise than innings or outs. The variations are mostly technical rather than philosophical in nature, so they rarely get a lot of attention. Park factor calculations are accepted at face value to an extent that other classes of sabermetric tools (like run estimators) are not.

Among park factors in use, the actual computations in Palmer’s are the most unique, so I thought it would be worthwhile to walk through his methodology (as stated in Total Baseball V) and discuss its properties.

In the course of this discussion, I will primarily focus on the actual calculations and not the inputs. Where Palmer has made choices about what inputs to use, how to weight multiple seasons, and the like, I will not dwell, as I’m more interested in the aspects of the approach that can be applied to one’s own choice of inputs. Palmer uses separate park factors for batting and pitching (more on this later); I’ll focus on the batting ones here.

Palmer generally uses three years of data, unweighted, as the basis for the factors. There are some rules about which years to use when teams change parks, but those are not relevant to this discussion. The real meat of the method starts by finding the total runs scored and allowed per game at home and on the road, which I’ll call RPG(H) and RPG(R).

i (initial factor) = RPG(H)/RPG(R)

I will be using the 2010 Colorado Rockies as the example team here, considering just one year of data to keep things simple. Colorado played 81 games both home and away, scoring 479 and allowing 379 runs at home and scoring 291 and allowing 338 on the road. Thus, their RPG(H) = (479 + 379)/81 = 10.593 and RPG(R) = (291 + 338)/81 = 7.765. That makes i = 10.593/7.765 = 1.364 (I am rounding to three places throughout the post, which will cause some rounding discrepancies with the spreadsheet from which I am reporting the results).

The next step is to adjust the initial factor for the number of innings actually played rather than just using games as the denominator. This step can be ignored if you begin with innings or outs as the denominator rather than using games. The Innings Pitched corrector is:

IPC = (18.5 - Home W%)/(18.5 - (1 - Road W%))

Palmer explains that 18.5 is the average number of innings batted per game if the home team always bats in the ninth inning. Teams that win a higher percentage of games at home bat in less innings due to skipping the bottom of the ninth. The IPC seems to assume that in all games won by the home team, they do not bat in the bottom of the ninth.

Colorado was 52-29 (.642) at home and 31-50 on the road (.383), so their IPC is:

IPC = (18.5 - .642)/(18.5 - (1 - .383)) = .999

The initial factor is divided by the IPC to produce what the explanation refers to as Run Factor:

RF = i/IPC

For the Rockies, RF = 1.364/.999 = 1.366

The next step is the Other Parks Corrector (OPC). The OPC “[corrects] for the fact that the other road parks’ total difference from the league average is offset by the park rating of the club that is being rated.” The glossary explanation may be confusing, but the thought process behind it is pretty straightforward--a team’s own park makes up part of the league road average, but none of the team’s own road games are played there. Without accounting for this, all parks would be estimated to be more extreme than they are in reality.

The OPC is figured in the same manner as I do it in my park factors; I borrowed it from Craig Wright’s work without realizing that Palmer had done the same mathematical operation, but its derivation is fairly obvious and the equivalent appears in multiple park factor approaches. Let T equal the number of teams in the league:

OPC = T/(T - 1 + RF)

Basically, OPC assumes a balanced schedule, so each team’s schedule is made up half of games in its own park (hence the use of RF) and half in the other parks, of which there are T - 1. For a sixteen team league (and thus Colorado 2010):

OPC = 16/(16 - 1 + 1.366) = .978

The next step is to multiply RF by OPC, producing scoring factor:

SF = RF*OPC

For Colorado, SF = 1.366*.978 = 1.335

If all you were interested in was an adjustment factor of the park effect’s on scoring, rather than specific adjustments for the team’s batters and pitchers, this would be your stopping point. The scoring factor is the final park factor in that case, and with the exception of the Innings Pitched Corrector, it is equivalent to the approach used by Craig Wright, myself, and many others. (My park factors are then averaged with one to account for the fact that only half of the games for a given team are at home, but that only obscures the fact that the underlying approach is identical, and Palmer accounts for that consideration later in his process).

It is when the other factors are adjusted for that the math gets a little more involved. The first step is to calculate SF1, which is an adjustment to scoring factor:

SF1 = 1 - (SF - 1)/(T - 1)

For the 2010 Rockies:

SF1 = 1 - (1.335 - 1)/(16 - 1) = .978

While I am writing this explanation, I must stress that it is a walkthrough of another person’s method. I cannot fully explain the thought process behind it and justify every step. I decided to include that disclaimer at this point because I don’t understand what the purpose of SF1 is, as it is mathematically equivalent to OPC. Why it needed to be defined again and in a more obtuse way is beyond me.

In any event, the purpose of SF1 is to serve as a road park factor. If a team plays a balanced schedule and we take as a given that the overall league park factor should be equal to one, but we determine that its own park has a PF greater than one (favors hitters), then it must be the case that the road parks they play in have a composite PF less than one. That’s the function of the OPC, and of SF1. For the rest of this post, I will refer to OPC rather than SF1 when it is used in formulas.

Palmer uses separate factors for batters and pitchers to account for the fact that a player does not have to face his teammates. By doing so, Palmer’s park factors make their name something of a misnomer as they adjust for things other than the park. (One could certainly make the case that the park factor name is constantly misapplied in sabermetrics, as we can never truly isolate the effect of the park, and the sample data we have is affected by personnel and other decisions. Palmer takes it a step further, though, by accounting for things that have nothing to do with the park.) The park effect is generally stronger than the effect of not facing one’s own teammates, since a team plays in its park half the time but only misses out on facing its own pitchers 1/T percent of the time assuming a balanced schedule.

One can argue about the advisability of adjusting for the teammate factor at all, and if so it certainly is debatable whether it should be included in the park factor or spun off as a separate adjustment. I would find the later to be a much better choice that would result in a much more meaningful set of factors (both for the park and teammates), but Palmer chose the former.

The separate factors are calculated through an iterative process. One must know the R/G scored and allowed at home and away for the team (I’ll call these RG(H), RG(R), RAG(H), and RAG(R) for R/G at home, R/G on the road, RA/G at home, and RA/G on the road respectively). Additionally, one must know the average RPG for the entire league (which I’ll call 2*N, since I often use N to denote league average runs/game for one team). For the Rockies, we can determine from the data above that RG(H) = 5.913, RG(R) = 3.593, RAG(H) = 4.679, RAG(R) = 4.173, and N = 4.33.

The iterative process is used to calculate a team batter rating (TBR) and a team pitcher rating (TPR, not to be confused with Total Player Rating, another Palmer acronym). These steps are necessary since the strength of each unit cannot be determined wholly independently of the other. Just as the total impact of park goes beyond the effect on home games and into road games (necessitating the OPC, but not being of strong enough magnitude to swamp over the home factor), so the influence of the two units on another are codependent.

The first step of the process assumes that the pitching staff is average (TPR = 1). Then the TBR can be calculated as:

TBR = [RG(R)/OPC + RG(H)/SF]*[1 + (TPR - 1)/(T - 1)]/(2*N)

For Colorado:

TBR = [3.593/.978 + 5.913/1.335]*[1 + (1 - 1)/(16 - 1)]/(2*4.33) = .936

The first bracketed portion of the equation adds the teams’ adjusted (for park) R/G at home and on the road, using the home (SF) or road (OPC) park factor as appropriate. The second set of brackets multiplies the first by the adjustment for not facing the team’s pitching staff. The difference between the team pitching and average (TPR - 1) is divided by (T - 1) since playing a true balanced schedule, the team would only face those pitchers in 1/(T - 1) percent of the games anyway. If the team’s pitchers are above average (TPR < 1), then the correction will increase the estimate of team batting strength.

Then the entire quantity is divided by double the league average of runs scored per game by a single team. This may seem confusing at first glance, but it is only because the first bracketed portion did not weight the home and road R/G by 50% each. The formula could be written as the equivalent:

TBR = [.5*RG(R)/OPC + .5*RG(H)/SF]*[1 + (TPR - 1)/(T - 1)]/N

In this case, it is much easier to see that the first bracket is the average runs scored per game by the team, park-adjusted. Dividing this by the league average results in a very straightforward rating of runs scored relative to the league average. Both TBR and TPR are runs per game relative to the league average, but in both cases they are constructed as team figure/league figure. This means that the higher the TBR, the better, with the opposite being true for TPR.

The pitcher rating can then be estimated using the actual TBR just calculated, rather than assuming that the batters are average:

TPR = [RAG(R)/OPC + RAG(H)/SF]*[1 + (TBR - 1)/(T - 1)]/(2*N)

For the Rockies, this yields:

TPR = [4.173/.978 + 4.679/1.335]*[1 + (.936 - 1)/(16 - 1)]/(2*4.33) = .894

The first estimate is that the Rockies batters, playing in a neutral park and against a truly balanced schedule, would score 93.6% of the league average R/G. Rockie pitchers would be expected to allow 89.4% of the league average R/G. At first when I was performing the sample calculations, I thought I might have made a mistake since the Colorado offense was evaluated as so far below average, but I was forgetting that they would appear quite poor using a one-year factor for Coors Field. My instinct as to what Colorado’s TBR should look like was informed by my knowledge of the five-year PF that I use.

Now the process is repeated for three more iterations, each time using the most recently calculated value for TBR or TPR as appropriate. The second iteration calculations are:

TBR = [3.593/.978 + 5.913/1.335]*[1 + (.894 - 1)/(16 - 1)]/(2*4.33) = .929

TPR = [4.173/.978 + 4.679/1.335]*[1 + (.929 - 1)/(16 - 1)]/(2*4.33) = .893

Repeating the loop two more times should ensure pretty stable values. Theoretically, there’s nothing to stop you from setting up an infinite loop. I won’t insult your intelligence by spelling the second pair of iterations suggested by Palmer, but the final results for Colorado are a TBR of .929 and a TPR of .993.

So far, all we’ve done is figured teammate-corrected runs ratings for team offense and defense. Our estimate of the park factor remains stuck back at scoring factor. All that’s left now is correcting scoring factor for 1) the teammate factor and 2) the fact that a team plays half its games at home and half on the road. This will require two separate formulas--a batter’s PF (BPF) and a pitcher’s PF (PPF), which are twins in the same way that TBR and TPR are:

BPF = (SF + OPC)/[2*(1 + (TPR - 1)/(T - 1)]

PPF = (SF + OPC)/[2* (1 + (TBR - 1)/(T - 1)]

For Colorado:

BPF = (1.335 + .978)/[2*(1 + (.893 - 1)/(16 - 1)] = 1.165

PPF = (1.335 + .978)/[2*(1 + (.929 - 1)/(16 - 1)] = 1.162

The logic behind the two formulas should be fairly obvious at this point. Again, the multiplication by two in the denominator arises because the two factors in the numerator were not weighted at 50% each. You can write the formulas as:

BPF = (.5*SF + .5*OPC)/[1 + (TPR - 1)/(T - 1)]

PPF = (.5*SF + .5*OPC)/[1 + (TBR - 1)/(T - 1)]

Here, you can see more clearly that the numerator averages the home park factor (SF) and the road park factor (OPC). The numerator could be your final park factor if you wanted to account for road park but didn’t care about teammate effects. The denominator is where the teammate effect is taken into account, and in the same manner as in TBR and TPR. If TPR > 1, BPF goes down because the batters did not benefit from facing the team’s poor pitching. If TBR > 1, PPF goes down pitchers benefitted from not facing their batting teammates.

If one does not care to incorporate the teammate effect, then Palmer’s park factors are pretty much the same as any other intelligently designed park factors. This should not come as a surprise, because as with many implementations of classical sabermetrics, Palmer’s influence is towering. The iterative process used to generate the batting and pitching ratings is pretty clever, and incorporates a real albeit small effect that many of us (myself included) often gloss over.

Wednesday, November 04, 2020

Musings on Positional Adjustments

This is an old post that I never published. It is sort of an attempt to justify why I use offensive positional adjustments, which is an even more dated position today than it was when I wrote it. In re-reading it, though, I thought my comments about zero-level defense were at least somewhat pertinent (if not particularly insightful) given Bill James' current effort at developing "Runs Saved Against Zero".

This post is not intended to be a comprehensive discussion of the issue of position adjustments; it will just quickly sketch out a system to classify adjustments and then I’ll offer a few of my opinions on them. There is a lot more that could be said and most of it could and has been said more eloquently by others.

The most important technical distinction between position adjustments (which I’ll shorten to PADJ sometimes) is which type of metric is used to set them--offensive or defensive. This distinction is well-known and gets a lot of attention. One that is talked about less is the difference between explicit and implicit position adjustments, and while people who get their hands dirty with various rating systems are well aware of implicit position adjustments, the average reader presented with a metric might gloss over them.

Explicit position adjustments are obvious and are acknowledged as being position adjustments. The first well-known example of their usage was in Pete Palmer’s linear weights system. They have also been used in VORP, just about every implementation of WAR, and many other metrics.

Implicit position adjustments usually crop up in the work of Bill James, although there are other metrics out there that utilize them. An implicit position adjustment is not really implicit in the truest sense of the word--they are obviously position adjustments if you look at them and consider what their function is. James likes to hide them in his fielding systems.

James’ metrics have always attempted to measure absolute wins and losses. I’ve always maintained that this is a fool’s errand, and that absolute wins and losses only make sense on the team level, not the player level. Most sabermetricians are in general agreement on this, and construct systems that are built to yield results against some baseline.

This is especially true for defensive metrics, whether for pitching for fielding. Absolute metrics (such as runs created) are tempting to apply to individual batters because there is a theoretical minimum on the number of runs a player can create (zero, of course), and such a performance represents the worst possible performance. There is no such cap on the poor performance of a defense; a team could allow an infinite number of runs. The only real cap on the poor performance of an individual fielder is the number of balls that are hit to the locations on the field that fall under his responsibility.

As such, it is impossible to develop a true zero baseline metric to evaluate pitchers or fielders (one can certainly argue that it’s impossible for batters as well, but the existence of the theoretical floor makes it undeniably more tempting). You have to start by comparing to a non-zero baseline (average being the most straightforward), but the problem is compounded for fielders by the fact that it’s also impossible to directly compare fielders at different positions. The fielding standards, be it in fielding average, range factor, or more complex methods vary wildly from one position to another. While all fielders have the same objective (record outs and prevent the opponent from scoring), the primary ways in which fielders at different positions contribute to the common goal are very different.

That pretty much leaves comparing a player to the average fielder at his position as the only viable starting point for the developer of a fielding metric. As is, the results are not satisfactory for inclusion in a total value metric, because they implicitly assume that an average fielder at any position is equal in value to an average fielder at any other position.

There is no one with any degree of baseball knowledge that believes this to be true. Everyone agrees that an average shortstop is harder to find than an average first baseman--that is, the pool of available players that can adequately field shortstop is much smaller than the pool of adequate available first basemen. This basic truth is sometimes obfuscated by silly hypotheticals (i.e. “if you didn’t have a catcher, every pitch with a runner on base would be a passed ball” and “without a first baseman, it would be nearly impossible to convert a groundball into an out”), but serious people agree on this.

So what is one to do about this problem? You have to do something--you cannot have a functioning estimate of total value that pretends first basemen and shortstops are equal in fielding value. The easiest answer is a position adjustment.

While James attempts to express all of his value metrics relative to an absolute baseline, he of course can’t pull off a clean implementation. His solution, in both his early 1980s Defensive Winning Percentage and his more recent Win Shares, is to develop a fielding winning percentage for each position and convert this to wins and losses (the terminology and procedure is a little different in Win Shares but that’s a long story).

To make the conversion to from a rate to a total of wins and losses, James multiplies by a number of games for which each position is assumed to be responsibility. Positions on the left side of the defensive spectrum are assigned less responsibility than those on the right side…and thus this is an implicit position adjustment.

In pointing this out, I don’t mean to suggest that James is in any way dishonest in describing his systems--the assigned games are clearly defined in the system and aren’t hidden. The characterization I’ve offered of these adjustments as “implicit” is therefore not really accurate. The real difference between James-style position adjustments and the ones I’ve defined as “explicit” is that explicit adjustments either add or subtract a set number of runs dependent upon a player’s position or apply a different baseline to their rate in converting to a value stat.

The other major characteristic that defines a position adjustment’s type is whether it is an offensive PADJ or a defensive PADJ. The categories are not black and white--many positional adjustments incorporate subjective weighting of various factors, which could include offensive performance by players at a position, the range of offensive performance, the performance of fielders at multiple positions, comparisons of average salary as a stand-in for how teams value players, subjective corrections that the developer feels better matches the results of the system to common sense--but usually the primary basis can be identified as either offensive or defensive.

Offensive position adjustments have fallen out of favor recently, although there are still some people using them (including me). The offensive PADJ originated with Pete Palmer, who used it as part of his linear weights system. The other most prominent use came in Keith Woolner’s VORP.

Defensive positional adjustments are a more recent phenomenon, but are key to both the Fangraphs and Chone WAR methodology. Tango Tiger was the driving force behind their development, and Chone has also done his own research to establish the adjustments for his WAR.

Before deciding how to construct a position adjustment, it’s a good idea to take a step back and figure out why you need a position adjustment at all. Taking the reason behind your metric for granted is a path to just slapping numbers around indiscriminately and failing to model baseball reality. From my perspective, the only real reason that a PADJ is necessary is that it is essentially impossible to measure a player’s fielding value independent of his position. Therefore, one has to have a way of comparing the value of fielding performances across positions--a position adjustment.

A common misperception regarding all position adjustments among people not that well-versed in sabermetrics is that they provide a bonus “just for playing the position”. While I suppose that might be technically true in the sense of calculation, the underlying need for such an adjustment is discussed above. If one does not believe in applying a positional adjustment, and accepts the use of defensive metrics baselined to an average fielder at the position, then they must conclude that, as a group, the most valuable players are those at left-side of the spectrum positions. Or, in other words, that the overall average value of players at a given position is strictly a function of their aggregate offensive production.

It is possible to complicate the question of position adjustments by talking about baselines (particularly replacement level) and other considerations, but at the heart of the issue is the need to compare the value of a shortstop -5 fielding runs relative to an average shortstop to a third baseman +10 runs relative to an average third baseman to an average first baseman.

Such a viewpoint suggests that a defensive PADJ is the way to go, since the sole reason for needing the adjustment is consideration of defense. So while the overwhelming positive of a defensive PADJ is that it is defined in the terms that necessitate the entire endeavor, it also carries a few negatives.

One is the difficulty of accurately evaluating fielding value, even within the constraints of one’s own position. While it is quite possible that any biases or methodological errors will balance out when aggregated over a large population of players, it would nonetheless be more comforting to begin from metrics in which one had a great deal of confidence.

Another key issue is that the pool of players who log times at multiple positions, while relatively large when comparing similar position groups (particularly outfielders, but also middle infielders, corner infielders, etc.), there is a much smaller available sample of players who play very different positions, at least in the same or adjacent seasons. And catcher? Forget about it--Bill James even left catcher off of the defensive spectrum due to the difficulty of comparing it directly to the positions whose occupants stand in fair territory.

Players that move positions introduce all kinds of selective sampling issues as well. Consider the problem comparing positions where left-handers are de facto ineligible, and the fact that players moved off a position are more likely to have been stopgaps. For a more complete discussion of these issues (and an all-around good discussion of PADJ issues), see Colin Wyers’ article at Baseball Prospectus.

Thus, to avoid strange conclusions, defensive position adjustments are always going to require a little subjective massaging. That’s not necessarily a bad thing--the construction of any metric requires subjective decisions made on the part of the developer--but it makes them inherently high maintenance.

Of course, offensive position adjustments are best employed with a measure of caution as well. The pros of offensive adjustments are that they are very easy to work with. Offensive statistics are more reliable than fielding statistics, require much more basic data to calculate, and are available throughout the entire history of the game. Rather than having to compare performance of players across positions, one can at least start by simply looking at the average performance of all players at a particular position.

An offensive PADJ implicitly assumes that teams allocate their talent in such a manner that the average player at any position is equal to the average player at any other position--alternatively stated, that the offensive value gap between positional averages is equal to the defensive value gap. This is certainly never 100% truly the case for any sample period, particularly for single years. Offensive PADJs based on one year of data or other short stretches should be viewed with a great deal of skepticism.

Another problem lurking is what Wyers, in the linked article, refers to as the “Mays problem”--the existence of supremely talented players that excel at both hitting and fielding. Such players might be superstars at any position (ignoring handedness and other impediments), even first base, thanks to their hitting alone but are able to handle the defensive rigors of right-side defensive spectrum positions. While more ordinary players offer a package of offensive and defensive skills that limits their possible fielding positions commensurate to their offensive production, these players are playable anywhere. There are also potential issues with the very worst players at a position.

The Mays problem skews offensive positional averages, so Wyers proposes using an alternative offensive PADJ that adjusts the overall positional average for the gap between the upper and lower median of observed performance at the position. This approach (and other similar algorithms that could be offered) is novel but involves subjective choices similar to those necessitated by defensive PADJs.

The offensive PADJ will surely fail at lower levels of baseball thanks to the Mays problem--the best high school players, for instance, are often the cleanup hitter, ace pitcher, and center fielder or shortstop when not pitching. Such all-around stars are also more common in college ball or in earlier, less developed major leagues than they are in the modern major leagues with their high overall quality of play and relatively strong competitive balance. An offensive PADJ approach will surely break down at those low levels without serious alterations.

There are other relevant issues to discuss with respect to position adjustments, such as their relationship to replacement level and the manner in which they are applied (That is, if they should be used to change the baseline to which performance is compared or if they should be assigned as a lump sum based on playing time. The possible answers to this question is also closely tied to one’s choice of offensive or defensive adjustment), but those will have to wait for some other time.