Saturday, March 06, 2021

dWhat!%

It’s understandable that the editing process for Baseball Prospectus 2021 overlooked something trivial like explaining what a metric in the team prospectus box means. After all, it must have been exhausting work to ensure that each of the many political non-sequiturs in the book were on message (Status: success! You can give this book to your children to read with confidence that they are in a safe space, with no deviation from the blessed orthodoxy). The vital imperative of ideological conformity handled, they would have needed next to run a fine-tooth comb over any reference to the aesthetics of present day MLB on-field play to ensure the proper level of smug conflation of one’s own preferences with the perfect ideal. Another success. Finally, they could turn their attention to making sure there were the requisite number of sneering statements about the fact that there even was a MLB season in 2020.  As always, left unaddressed was how a publication that exists (in theory at least – reading the 2021 annual, this may be a fatally flawed assumption on my part) to analyze professional baseball could continue to exist if professional baseball ceased to exist, but who knows? When you tow the line so perfectly, maybe you can figure out a way to get in some of that sweet $1.9 trillion.

So it is entirely understandable that such a triviality as a publication rooted in statistical analysis could completely overlook explaining a metric that none of its writers ever bother to refer to anyway. The metric in question is called “dWin%”. It didn’t replace any team metric that was listed in the 2020 edition – it literally fills in a blank space in the right data column. A search of the term “dWin%” and “Deserved Winning Percentage” on the BP website doesn’t yield any obvious (non-paywalled, at least) relevant hits. So the best I can do is make an educated guess about what this metric is.

I gave away my guess by searching for “Deserved Winning Percentage”. BP has adopted a family of metrics with the “Deserved” prefix which utilize Jonathan Judge’s mixed model methodology to adjust for all manner of effects (going well beyond the staples of traditional sabermetrics like league run environment and park). The team prospectus box lists “DRC+” and “DRA-“, which are the DRC metric for hitters and DRA for pitchers indexed to the league average. So it’s only natural to assume that dWin% is some type of combination of these two to yield a team’s “deserved” winning percentage.

It’s also natural to assume that there would be a relationship between DRC+, DRA-, and dWin%. If the first two are in essence run ratios (with myriad adjustments, of course, but essentially an estimate of percentage difference between a team’s deserved rate of runs scored or allowed and the league average), then it’s only natural to assume that there would be some close relationship between them and dWin%. If we were in the realm of actual runs scored and allowed, or runs created/runs created allowed, we could confidently state that one powerful way to state the relationship would be a Pythagorean approach. Namely, the square of the ratio of DRC+ to DRA- should be close to the ratio of dWin% to its complement.

There are two obvious caveats to throw on this conclusion:

1) While the statistical introduction does not specifically refer to DRA- (it refers just to DRA, which was listed for teams rather than DRA- in the 2020 edition), it’s reasonable to assume that DRA- is the indexed version of DRA. DRA is a pitching metric, which would attempt to state a pitcher’s deserved runs allowed after removing the impact of the defense that supports him. This means that comparing the ratio of DRC+ and DRA- on the team level is likely ignoring fielding, and thus the relationship I’ve posited above would be incomplete. I would be remiss in saying that this is not the fault of BP, except to the extent that we are left to speculate about the meaning of these metrics, as there's certainly nothing wrong with having a measure that attempts to isolate the performance of a team's pitching staff.

2) It is possible that there is something else going on besides fielding in the process of developing the Deserved family of metrics that would invalidate this manner of combining the offensive and pitching components. Without being privy to the full nature of the adjustments made in these metrics, it’s hard to speculate on what if anything that might be, but I would be remiss in not raising the possibility that there’s something going on behind the curtain or that I have simply overlooked.

I’m not going to run a chart of all of the team values, because that would be infringing on BP’s property rights, and given the first paragraph of this post that would be practically unwise even if it were not morally objectionable. A few summary points provide defensible ground:

1) the average of the team DRC+s listed in the annual is 99.3 and the average of DRA-s is 99.5. Given that the figures are rounded to the nearest whole number (e.g. 99 = 99%), this is encouraging as we would expect the league average to be 100.

2) the average of the team dWin%s is .464. Less encouraging. As I was reading through the book, there were two team figures that really caught my eye and led me to this more formal examination. The first was Philadelphia, which had a dWin% of .580, ranking second in MLB. Their DRA- was 83, also second.

The Deserved family of metrics have always produced some eyebrow-raising results, which are difficult to evaluate objectively given the somewhat black box nature of the metrics and the complexity of the mathematical approach involved (I will be the first to admit that “mixed models” of the kind described are beyond my own mathematical toolkit). So it’s dangerous to focus too much on any particular result, as it may just be a vehicle by which to expose one’s own ignorance. As a second-generation sabermetrician, this is a particular nightmare, becoming the sportswriter you laughed at as a twelve-year old for dismissing RC/27 as impossibly complex and unintelligible.

Still, it is quite remarkable that the team which allowed the second-most park-adjusted runs per inning in the majors might actually have turned in the second-best performance. In fairness, it was a sixty-game season, so the deviation between underlying quality of performance and actual outcome could be enormous, and the East could have been the toughest of the three sub-leagues, especially in terms of balance as the Dodgers tip the scales West. Most significantly, it is just a pitching metric, and the Phillies defense was dreadful at turning balls in play into outs – they were last in the majors in DER at .619. Boston was at .623 and the next worst team was Washington at .642. Further, the East subleague combined for a .657 DER (the fourth-worst DER belonged to the Mets, and Toronto and Miami made it six of the bottom ten) compared to .685 for the Central and .684 for the West. It’s still hard to believe that the Phillies’ pitchers deserved to have the second-fewest runs allowed in the majors, but easy to buy that they performed much, much better than their runs allowed would suggest.

However, every factor that would explain how their pitching was actually second-best does nothing to explain how their overall deserved team performance was also second-best. Adjusting away terrible defensive support doesn’t mean that the team’s poor runs allowed weren’t deserved, it just means that the blame should be pinned on the fielders and not the pitchers. Again, it’s hard to pinpoint any exact criticism given the nature of the metrics, but this one is tough to accept at face value.

It also seems that if one had conviction in the result, it would show up in the narrative somewhere. There’s always been a disconnect between what BP statistics say and what their authors write, which owes partly to the ensemble approach to writing and presumably partly to the timing (the authors of team chapters probably start very soon after the season and without the benefit of the full spread of data that will appear in the book). Still, it seems as if this disconnect has increased with the advent of the deserved metrics, which often tell a very different story than even the mainstream traditional sabermetric tools (e.g. an EqA or a FIP, to refer to metrics previously embraced by BP). But I can assure you that if I believed the Phillies underlying performance as a team was actually second only to the Dodgers, I’d work that into any retrospective of their 2020 performance and forecast of their 2021.

The second team that caught my eye was the A’s, who posted a 103 DRC+, 98 DRA-, and .499 dWin%. The obvious disconnect between an above-average offense, above-average pitching, but sub-.500 deserved W% could be explained by defense. What can’t be explained is how a .499 dWin% ranks ninth in the majors, at least until you line up the thirty teams and see that the average is .464. While we can charitably assume that a combination of our own ignorance and the proprietary nature of the calculations can explain many odd results from the deserved stats, I don’t know what can satisfactorily explain a W% metric that averages to .464 for the whole league.

The hope is that this simply some scalar error, a fudge factor not applied somewhere. There is some evidence that this is the case – if you take the ratio of DRC+ to DRC- and plot against the ratio of dWin% to (1 – dWin%), you get a correlation of +0.974 and a pretty straight line, as you would expect given what should be in the vicinity of a Pythagorean relationship. It might even work out as you’d expect if dWin% is baking in fielding.

Still, it’s disappointing that the question has to be asked.

Wednesday, March 03, 2021

Rob Manfred: Run Killer

There are many “crimes against baseball” that one could charge Rob Manfred with, if one were inclined to use hyperbolic language and pretend that the commissioner had the sole authority to decide matters (I tend to neither but am guilty of seeking a more eye-catching post title):

* Attacking the best player in his sport for not going along with whatever horrible promotional scheme the commissioner had dreamed up

* Making a general mess of negotiations with the MLBPA

* Teaming up with authoritarian governments ranging from cities in Arizona to Leviathan itself to attempt to delay or prevent baseball from being played

* Claiming to be open to every harebrained scheme to reign in shifts, home runs, strikeouts, or whatever the current groupthink of the aesthetically-offended crowd finds most troublesome

From my selfish perspective as a sabermetrician, though, I will argue that the greatest crime of all is that he has rendered team runs scored and allowed totals unusable. The extra innings rule, which I doubt will ever go away even if seven-inning doubleheaders do, makes anything using actual runs scored incomparable with historical standards (in the sense of parameters of metrics rather than context). A RMSE error test of a run estimator against team runs scored? Can’t use it. Pythagenpat? Nope. Relief pitcher’s run average? Use with extreme caution.

Of course, I am not seriously suggesting that the ease with which existing metrics can be used should be a consideration in determining the rules of the game. But if you use these metrics, it is necessary to recognize that they are very much compromised by the rule.

So how can we adjust for it? I will start with a plea that the keepers of the statistical record (which in our day means sites like Baseball-Reference and Fangraphs) compile a split of runs scored and allowed in regulation and extra innings, as well as team innings pitched/batted in regulation and extra innings, and display this data prominently. Having it will allow for adjustments to be made that can at least partially correct, and more importantly increase awareness of the compromised nature of the raw data.

I want to acknowledge a deeper problem that also exists, and then not dwell on it too much even though it is quite important and renders the simple fixes I’m going to offer inaccurate. This is a problem that Tom Tango pointed out some time ago, particularly as it related to run expectancy tables – innings that are terminated due to walkoffs. In such innings, there are often significant potential runs left stranded on base, and so including these innings will understate the final number of runs one could expect. Tango corrected for this by removing these potential game-ending innings from RE calculations. It’s even more of a problem when it comes to extra innings, since rather than just being 1/18 of the half-innings of a regulation game, they represent 1/2 of the half innings of an extra inning game. This means that when we look at just extra innings, the number of potential runs lost upon termination of the game make up a significant portion of the total runs.

I gathered the 2020 data on runs scored by inning from Baseball-Reference, and divided each inning into regulation and extras. I did not, however, do this correctly, as the seven-inning doubleheader rule complicates matters. The eighth and ninth innings of a standard nine-inning game are played under very different circumstances than the eighth and ninth innings of a seven-inning doubleheader. I have ignored these games here, and treated all eighth and ninth innings as belonging to standard games, but this is a distortion. I didn’t feel like combing through box scores to dig out the real data as I’m writing this post for illustrative and not analytical purposes, but it buttresses my plea for the keepers of the data to do this. This is not solely out of my laziness (although I really don’t want to have to compile it myself), but also a recognition of the reality that many casual consumers of statistics will not even be cognizant of the problem if it is not made clear in the presentation of data.

Forging ahead despite these two serious data issues that remain unresolved (counting eighth and ninth innings of seven-inning doubleheaders as regulation innings rather than extra innings, and ignoring the potential runs lost due to walkoffs), I used the team data on runs by inning from Baseball-Reference to get totals for innings played and runs scored between regulation and extra innings. Note that these are innings played, not innings pitched, understating the true nature of the problem since almost most of the regulation innings include three outs (with the exception being bottom of the ninths terminated on walkoffs), a much greater proportion of the extra innings do not.

Still:



Expressed on the intuitive scale of runs per 9 innings, regulation innings yielded 4.80 runs, while extra innings were good for a whopping 8.40, a rate 75% higher. And no wonder, as Baseball Prospectus RE table for 2019 shows .5439 for (---, 0 out) and 1.1465 for (-x-, 0), a rate 111% higher. That we don’t see that big of a difference is due to an indeterminate amount to sample size and environmental differences (e.g. a high-leverage reliever is likely pitching in an extra inning situation, unless they have all been in the game already) but probably more significantly to the lost potential runs.

Considering all runs scored and innings, there were .5378 runs/inning or 4.84 R/9 in the majors in 2020, so even a crude calculation suggests a distortion of around 1% embedded in the raw data due to extra innings. Of course, the impact can vary significantly at the team level since the team-level proportion of extra innings will vary (1.25% of MLB innings played were extras, ranging from a low of 0.40% for Cincinnati to 3.44% for Houston).

How to correct for this? If the walkoff problem didn’t exist, I would suggest a relatively simple approach. After separating each team’s data into regulation and extra innings, calculate each team’s “pre-Manfred runs” as:

PMR = Runs in Regulation Innings + Runs in Extra Innings – park adjusted RE for (-x-,0)*Extra Innings

= Runs - park adjusted RE for (-x-,0)*Extra Innings

You could address the walkoff problem by adding in the park adjusted RE for any innings that terminated, but this gets tricky for two reasons:

1) it means that the simple data dividing runs and innings into “regulation” and “extra” is inadequate for the task; I doubt “potential runs lost at time of game termination” would ever find there way into a standard table of team offensive statistics

2) it overcorrects to the extent that the legacy statistics we have always used ignore the loss of those potential runs as well. Of course, the issue is more pronounced with extra innings as they represent a huge proportion of extra innings rather than a small one of regulation innings (and because the nature of Manfred extra innings increases the proportion of walkoffs within the subset of extra innings, since run expectancy is 111% higher at the start of a Manfred extra inning than at the start of standard inning).

Also note that when I say park-adjusted, I mean that the run expectancy would have to be park-adjusted not in order to normalize across environments, but rather to transform a neutral environment RE table to the specific park. I wouldn’t want to use “just” 1.1465 for Coors Field, but rather a higher value so that the PMR estimate can still be used in conjunction with our Coors Field park adjustment as the Rockies raw runs total would have been pre-2020. Another complication is that the standard runs park factor would likely overstate the park impact because of the issue of lost potential runs (they too would increase in expected value as the park factor increased).

The manner in which I attempted to adjust in my 2020 End of Season statistics was to restate everything for a team on a per nine inning basis, and then use the R/9 and RA/9 figures in conjunction with standard methodology. But this is also unsatisfactory – for instance, a Pythagorean estimate ceases to be an estimate of the team’s actual W%, but rather a theoretical estimate of what their W% would be if they played a full slate of nine inning games. The extra innings aren’t really a problem here, but the seven-inning doubleheaders are. As long as these accursed games exist, in order to develop a true Pythagorean estimate of team wins, one would have to estimate the exponent that would hold for a seven-inning game (Tango came up with a Pythagorean exponent of 1.57 through an empirical analysis; my theoretical approach would be to use the Enby distribution to develop theoretical W%s for seven-inning games for a representative variety of underlying team strengths in terms of runs and runs allowed per inning, then use this to determine the best Pythagenpat z value), and then use runs and runs allowed per inning rates to estimate separate W%s for seven- and nine-inning games, then weight these by the proportion of a team’s games that were scheduled of seven and nine innings.

I also took the unfortunate step of ignoring actual runs everywhere (as I mentioned in passing earlier, Manfred extra innings wreck havoc on reliever’s run averages), since the league averages are polluted by Manfred extra innings. Again, I am not advocating that sabermetric expediency drive the construction of the rules of baseball, but it is a happy coincidence that sabermetric expediency tracks in this case with aesthetic considerations. I should include a caveat about aesthetic considerations being in the eyes of the beholder, but the groupthink crowd that is now in the ascendancy rarely sees the need to do so. No surprise, as many also subscribe to the totalitarian thinking that is ascendant in the broader society. They’ll tell you all about it, and about what a terrible person you are if you dissent, for $25.19.



Wednesday, February 17, 2021

Akousmatikoi Win Estimators, pt. 5: Notes on Linear RPW Estimators

I had intended the last installment to be the end of this series, but Tom Tango left a comment on pt. 3 that led me down a rabbit hole. It’s of the frustrating variety, as I can’t figure out how to dig back to the surface and exploring it hasn’t led me to learn anything useful or interesting about baseball. Nevertheless, I find it interesting as a purely mathematical exercise and worth a brief post.

Tango pointed out that he had proposed some time ago the simple formula:

RPW = .75*RPG + 3 (Tango’s version was originally expressed as 1.5*RPG + 3 because he was defining RPG as the average for one team; I’ll keep with my definition here for consistency with the rest of the series)

I was aware of this formula and have mentioned it on this blog before, but it slipped my mind when writing these posts. You may recall from pt.3 that I offered the formula:

RPW = .777*RPG + 2.694

A brief reminder of how this was derived – I started by differentiating the Pythagenpat formula for a fixed z value of .282 with respect to run differential, and then plugging in the appropriate values for a .500 team to get RPW = 2*RPG^(1 – .282). Then I differentiated this formula with respect to RPG, and found the y = mx + b formula that would follow if you assumed a .500 team with the average of RPG and RPW of the 1961 – 2019 major leagues.

Of course, these formulas both take the form y = mx + b, where y is the estimated RPW and x is the team’s RPG. My formula has a higher slope, but a lower intercept. At 9 RPG for a team with a run differential of one per game, mine would estimate 97.72 wins for a team and Tango’s 97.62. This doesn’t seem like a lot, and in the grand scheme of things it isn’t, but if this kind of difference didn’t interest me than this blog wouldn’t exist.

Using the 1961-2019 data, and scaling the RMSE to 162 games, Tango’s formula has a RMSE of 4.0348, and mine 4.0370. Pythagenpat itself (z = 2.82) checks in at 4.0345, which is interesting – my RPW formula performs worse than Tango’s, but is derived directly from Pythagenpat, which performs better. Also interesting – that with real major league teams, Tango’s formula is about as accurate as you can get despite being very simple (relative to full-blown Pythagenpat) and having rounded coefficients.

Note, I’m emphasizing RMSE with real teams in this discussion because if you want theoretical accuracy over a wide range of possible team R/RA combinations, you’d just use Pythagenpat and be done with it. If you’re using a simplification that isn’t as accurate as an equally simple formula for the application you’ll most use it for, what’s the point?

My first thought as to why Tango’s formula had a lower RMSE than mine was that I had over-flattened the whole thing and was thus missing something. This series starts from the premise that Pythagenpat is the right model for win estimation, and then simplifies from there, often centering at the point of a team that scores and allows the same number of runs, in an average scoring context. But the teams in the sample data, while by definition centered there, vary in both axes (R/RA and RPG). Perhaps the linear approximation to the Pythagorean RPW for a .500 team misses some subtle change in the slope or intercept caused by this variation, and you could do better by running a regression on all the individual datapoints rather than using the single point estimate to derive the formula.

So I calculated the actual Pythagenpat RPW for all team (i.e. the value for RPW which when applied will estimate that the team’s W% will be equal to its Pythagenpat W%), which from pt.3 is:

RPW = (R – RA)/(R^x/(R^x + RA^x) - .5)

Where x is the Pythagenpat exponent corresponding to each team’s RPG

This is undefined when R = RA, but also from pt. 3, we can fill this gap with the calculus-derived formula for a team with R = RA:

RPW = 2*RPG/x

Having calculated the actual Pythagenpat RPW for all teams, we can run a linear regression with RPG as the independent variable to get an alternative formula, which winds up being:

RPW = .7818*RPG + 2.6823

Which is reasonably close to my formula (and thus an argument in favor of “centering” being a reasonable approach), but takes the slope higher and the intercept lower – in other words, moving away from Tango rather than closing the gap as we might have hoped/expected. This formula has a RMSE of 4.0364, still worse than Tango’s although better than mine.

At this point, the logical question is how far can we push the slope down and the intercept up to minimize RMSE? According to Excel solver, quite far:

RPW = .6528*RPG + 3.8760

This is a huge difference even from Tango’s formula, with the slope 13% lower and the intercept 29% higher. RMSE = 4.0334, ever so slightly lower than even Pythagenpat.

Why can we improve the accuracy of our W% estimate (at least working with this sample of the last sixty years of MLB), even while getting farther away from the RPW relationship suggested by Pythagenpat? Unfortunately, I don’t have a satisfying answer to that question. It’s tempting to say that we are losing something by eliminating the team’s quality (e.g. the difference and/or ratio between their runs and runs allowed), which Pythagenpat considers in addition to the level of run scoring (RPG). Of course, the best-fit cares not about quality either, and I don’t have a compelling explanation for why lowering the slope and raising the intercept would be related to that.

Wednesday, February 03, 2021

Akousmatikoi Win Estimators, pt. 4: Best Fits and Accuracy

Herein I’ll be using the expansion era (1961 – 2019) data for all major league teams to calculate the RMSE of the various Akousmatikoi win estimators we’ve discussed. This exercise is not intended to prove which metric is “better” or “more accurate” than the others, but rather is intended to give you a feel for the differences between the various approaches when used for major league teams. What I am calling the Akousmatikoi family of win estimators is built on the conceit that Pythagenpat is the “best” win estimator, and uses it as a jumping-off point to develop alternate/simplified methods that can be tied back to the parent method. As such, the contention here would be that if you want the “best” answer, you should use Pythagenpat. But if you are just looking at the standings and can apply something quick like the Kross method or 9.56 runs per win, how far off will you be for a normal team?

This is also not a “fair” accuracy test, in that we would develop the equation based on one set of data and test it on another; all of the approaches will be calibrated and tested on the 1961 – 2019 data. This will not favor one approach or another as they all will have the benefit of the same data set. I will also be including the best fits for a few of the approaches, which I think is interesting because in several cases I’ve developed the alternate to Pythagenpat shown in this series by taking the tangent line at the point where R = RA and applying it broadly. While this should work well enough as our real teams will be centered around this point, in some cases the best fit may be a little different, which might be interesting. In those cases I would probably recommend using the best fit, since the point of any of the simpler methods would be to use with real teams; there’s no need to get hung up on theoretical centering at .500.

If this is not a “fair” accuracy test, then what exactly is the point? The point is to provide information that can be us used to inform the decision of which shortcut to Pythagenpat you choose to use. There is no right or wrong answer. For example, the Kross formulas are about as simple as it gets. Are they accurate enough as a win estimator to use for quick and dirty estimates? That depends on how dirty you’d like it to be (i.e. your own determination of what level of error is acceptable), and about your own tradeoff between simplicity and accuracy.

In another sense, though, it is a fair test, because each method is operating under the same constraints (although some will benefit from having best fits determined directly while others won’t operate under that luxury). In deciding which model to use, it does make sense to have a final check when all are calibrated on all of the available data.

I will not only calculate the RMSE of the estimates compared to W%, I will also show it compared to Pythagenpat. This is not meant to imply that Pythagenpat is correct and any deviation from it is wrong, but since the presentation in this series has relied on each method’s own relationship to Pythagenpat, I think it’s of interest to identify which approaches do the best job of approximating Pythagenpat. And if you do choose to start with Pythagenpat as your win estimator of choice when attempting to be as accurate as possible, it might follow that one of the criteria you’d consider when deciding which quicker method to use is how well it tracks Pythagenpat.

I will divide the methods to be tested as follows:

Pythagorean

These will all take the form W% = R^x/(R^x + RA^x).

Pyth1: this will be Pythagenpat, where x = RPG^.282 (best-fit)

Pyth2: Fixed exponent Pythagorean where x = 2

Pyth3: Fixed exponent Pythagorean where x = 1.847 (best-fit)

Port: Davenport/Woolner’s Pythagenport, where x = 1.5*log(RPG) + .45

Cig1: formula I derived from Cigol, where x = 1.03841*RPG^.265 + .00114*RD^2

Cig2: formula I derived from Cigol to follow the Pythagenpat form, where z = .27348 + .00025*RPG + .00020*(R - RA)^2

Ratio-Based

Kross: these are the nifty equations developed by Bill Kross; when R > RA, W% = 1 – RA/(2*R); when R <= RA, W% = R/(2*RA)

Ratio: this is the general case that resolves to Kross’ equation when x = 2, although it’s not really the general-case at all since I’m using the Pythagorean best-fit of x = a = 1.847 to get these equations:

when RR > = 1, (a*RR – a + 1)/(a*RR – a + 2) = (1.847*RR - .847)/(1.847*RR + .153)

when RR < 1, 1/(a/RR – a + 1)/(1/(a/RR – a + 1) + 1) = 1/(1.847/RR - .847)/(1/(1.847/RR - .847) + 1)

Differential-Based

FixRPW: if you force an equation of the form W% = ((R-RA)/G)/RPW + .5, the best fit is when RPW = 9.71, which is equivalent to .103*((R – RA)/G) + .5

PythRPW: RPW = 2*RPG^.718, the Pythagenpat result

LinRPW: RPW = .777*RPG + 2.694, the tangent line to the Pythagenpat RPW at the average RPG/RPW for the period

BVL: W% = .9125*(R – RA)/(R + RA) + .5; the form proposed by Ben Vollmayr-Lee, although I’m using the best fit for this dataset and also rounding the intercept to .5 (it actually comes out to .49978)

BVLPyth: W% = .923*(R – RA)/(R + RA) + .5; the same equation, but using the Pythagorean best-fit exponent rather than the empirical best-fit

The RMSE shown here is actually the overall RMSE of the W% estimate, scaled to 162 games. So for each team the error is (W% - estimator)^2; the final value shown is 162*sqrt(average error):


Pythagenport has a slight lead over Pythagenpat, and they are followed very closely by the estimates based on Cigol and runs per win formulations that take RPG into consideration. In the Akousmatikoi family, that is the accuacy seperator (such that it is; the overall range of RMSE values is narrow) – considering the specific run environment for the team either through a Pythagorean approach (as in the case of Pythagenport, Pythagenpat, and their Cigol knockoffs), or a two operation (multiplication and addition) or power RPW function (as in the case of LinRPW and PythRPW). The next little cluster of RMSE includes a fixed Pythagorean exponent less than 2, the BenV-L formulas (which take the team’s RPG into account but only with a simple multiplicative function, not a y = mx + b form), and the intrepid Kross formulas. A fixed RPW linear approach is next, and then surpisingly, the Kross formuals actually outperform their antecedent (in the Akousmatikoi conceit, not reality) standard Pythagorean and “Ratio”, which uses a non-2 Pythagorean exponent.

This finding is surprising, and suggests that the Ratio approach should be discarded, as it’s arguably the most complicated to calculate of all the options we’ve looked at (despite being, at least from one perspective, a mathematical “simplification” of the Pythagorean relationship). But why does it perform worse than the Kross method, which ties to x = 2, while the ratio approach ties to x = 1.847, which has a lower RMSE than using x = 2?

To answer that question, I started by looking at the most extreme teams in terms of run ratio in the period, and found what I consider to be a satisfactory answer. The team with the highest run ratio in the expansion era is the 1969 Orioles (779/517 = 1.51). Incidentally, they do not have the highest Pythagenpat W% in the era, a distinction that goes to the 2001 Mariners by a hair’s breadth over the 1998 Yankees; those teams had run ratios of 1.48 and 1.47 respectively, but they had RPGs 20% and 25% higher respectively, which made their run ratios convert to higher win ratios.

The Orioles EW% using a fixed Pythagorean exponent of 1.847 is .681. This is the calculation that “Ratio” is supposed to flatten, but it predicts a W% of just .659. I think that since this formula is developed by differentiating win ratio, and then using the estimated win ratio to calculate an estimated W%,,the linear approximation does poorly. Win ratios have a much wider range than winning percentages; if we consider .300 - .700 a reasonably range for the expected (as oposed to actual) W%s of major league teams, this is a win ratio range of .429 to 2.333. Drawing the tangent line for the point where RR = WR = 1 leaves a lot of room outside of this range where teams will fall.

The Kross formula performs better because even though (in the Akousmatikoi sense) it starts from a less accurate proposition that x = 2, it will produce a wider range of win ratio estimates. The Kross estimated win ratio for a team with a run ratio of 1.51 is 2*1.51 – 1 = 2.02, while the other approach estimates 1.847*1.51 - .847 = 1.94.

The other RMSE comparison I want to make is to Pythagenpat. Again, I am not trying to say that Pythagenpat is the standard by which all win estimators should be judged. However, it (or something similar like Pythagenport) is the most accurate version of the Pythagorean relationship that has yet been published, and since this series is an examination of alternative win estimators mathematically related to Pythagorean methods, I think it is worthwhile to see which of these alternatives hew most closely to the starting point:


The fact that the lowest RMSE is for the first Cigol estimate tells us only what we can see by observing it – that it is essentially the same formula with added terms to attempt (perhaps in vain) to increase accuracy at the extremes (this formula sets the Pythagenpat exponent to .27348 + .00025*RPG + .00020*(R - RA)^2 rather than .282). That next in line are two more close cousins, Pythagenport and the other Cigol estimate, is comforting but also uninteresting.

You’ll notice that the ranking of estimators in terms of agreement with Pythagenpat closely resembles their ranking in accuracy predicting W%, so the first grouping of methods that most closely track Pythagenpat while actually being simpler to compute are the two RPW estimates that use a “complex” function – either the power relationship or the y = mx + b form.

The next cluster is an optimized fixed Pythagorean exponent and the Ben V-L approaches, which are equivalent to RPW as a multiplier of RPG (no y-intercept term). This implies that if you want to imitate Pythagenpat for normal teams, it’s most important to consider the impact of scoring level on the runs to wins conversion than it is to consider the non-linearity of the runs to wins conversion. The remarkable Kross formulas are next, with the others (a fixed RPW value, Pythagorean with x = 2, and the worthless “Ratio” approach) lagging the field.

I don’t have any grand conclusion to draw from this series, which is appropriate since as I’ve acknowledged previously, there really is nothing new here. It has served of a good reminder for me as to how various win estimators are connected, and hopefully has collected in one place observations of the connections that were previously published but strewn across multiple sources.

Trivia to close: I feel like I should have been aware of this previously, but did you know that (at least using a Pythagenpat z constant = .282), the 2019 Tigers had the worst EW% of the expansion era? They did not have the worst run ratio, a distinction that fell to the expansion 1969 Padres, but as we saw with the 1969 Orioles on the other end of the spectrum, the low RPG made that run ratio translate into a better win ratio than a couple of teams in higher scoring enviornments.

Four teams had sub-.310 EW%s (an arbitrarty cutoff as I think these four are interesting):

1. At .307, the expansion 1962 Mets, widely famous as the worst modern team and with the worst actual W% of the bunch at .250, are not a surprise.

2. At .305, the aforementioned 1969 Padres, a team I had never thought of as being historically bad for an expansion team. They actually went 52-110, a full twelve games better than the Mets, outplaying their Pythagenpat bytwo and a half games, whereas the ‘62 Mets underplayed theirs by nine. That explains it.

3. At .2998, the 2003 Tigers, who at 43-119 just missed matching the Mets record for most modern losses, although the Mets only played 160 games (40-120). This team is widely acknowledeged as one of the worst of all-time, but they underplayed their Pythagenpat by five and a half games.

4. At .2997, the 2019 Tigers, who only underplayed their Pythagenpat by one and a quarter games, going 47-114 and escaping historical notice. A big help was that they weren’t alone languishing at the bottom, as the phenomenon of “tanking” has been widely called out, and a number of teams over the last decade have put up truly terrible W-L records, including three others which lost 105 or more in 2019. The 2018 Orioles also served to take the heat off, as their 47-115 record was worse (they underplayed their Pythagenpat expectation by seven and a half games – they were only the thirteenth-worst of the expansion era at .337).

Wednesday, January 20, 2021

Akousmatikoi Win Estimators, pt. 3: Differential-Based Simplifications

Simplifying the Pythagorean estimate by focusing on run differential is not as intuitive as using run ratio, since of course Pythagorean constructs are based on the latter rather than the former. The upfront calculus is messier, the relationships harder to explain – I’ve covered all this before, and so I went back to my previous work rather than go through the hassle of re-deriving it. However, while the calculus is messier, the end result is simpler, and give you relationships that you might actually choose to use in place of the full Pythagorean treatment if you want something quick and simple to punch into a calculator.

The easiest way I’ve found to demonstrate this approach (which is not to say that a simpler derivation doesn’t exist) is to use the following definitions. To make this easier to follow, I’m going to define R as R/G and RA as RA/G:

RR = R/RA

RD = R - RA

RPG = R + RA

Given these relationships, we can relate run ratio and run differential using RPG:

RR = (RD + RPG)/(RPG – RD)

If you need a proof of that, replace RD and RPG with the equations above and you will see that:

RR = (R – RA + R + RA)/(R + RA – (R – RA)) = (2*R)/(2*RA) = R/RA

In the last installment, we differentiated Pythagorean win ratio with respect to run ratio; here, I want to differentiate Pythagorean winning % with respect to run ratio, which will look slightly messier. Starting from the Pythagorean relationship:

W% = RR^x/(RR^x + 1)

we differentiate to get:

dW%/dRR = ((RR^x + 1)*(x*RR^(x – 1)) – RR^x*(x*RR^(x – 1)))/(RR^x + 1)^2

= (x*RR^(x – 1))*((RR^x + 1) – RR^x)/(RR^x + 1)^2

dW%/dRR = x*RR^(x – 1)/(RR^x + 1)^2

That’s well and good, but it doesn’t tell us anything about the relationship between Pythagorean W% and run differential. To bridge that gap, we can differentiate run ratio with respect to run differential and multiplying this result with dW%/dRR which we just derived:

(dW%/dRR)*(dRR/dRD) = dW%/dRD

Since we know that RR = (RD + RPG)/(RPG – RD), we get:

dRR/dRD = ((RPG – RD)*1 – (RD + RPG)*(-1))/(RPG – RD)^2

= 2*RPG/(RPG – RD)^2

If you slogged through any of my previous treatments of this topic, I must apologize – I missed some simplifications of both of these formulas before. The final math worked out the same, but it was needlessly difficult to follow. In any event, we now have:

dW%/dRD = (x*RR^(x – 1)/(RR^x + 1)^2) * (2*RPG/(RPG – RD)^2)

= 2*RPG*x*RR^(x -1)/((RR^x + 1)^2*(RPG – RD)^2)

This ends up being expressed in terms of marginal wins per margin run. The classic sabermetric presentation is marginal runs per margin win (Runs Per Win, ala the rule of thumb that 10 runs = 1 win). So we can take the reciprocal to get this formula for Runs Per Win from Pythagorean:

Pythagorean RPW = (RR^x + 1)^2*(RPG – RD)^2/(2*RPG*x*RR^(x - 1))

Before moving forward, one thing I should note is that this function does not allow us to match the Pythagenpat W% at a given point for a set of inputs. For example, if you plug in 5 runs scored and 4 runs allowed, you will get a dW%/dRD of .1071. You might then reasonably assume that if you take the team’s run differential of 1 times .1071 plus a y-intercept (which by definition would be .5 since Pythagorean will estimate a .500 W% when R = RA), you will get a restatement of the team’s Pythagorean W%. But in fact you will get .6071, while Pythagorean would estimate 5^2/(5^2 + 4^2) = .6098. The differences will be more extreme if you put in more extreme teams.

Alas, I do not have a simple mathematical explanation for why this is the case. However, I will note that we don’t need calculus to calculate the actual Runs Per Win value from Pythagorean for any given set of R, RA, and x that we input. We can simply calculate this by noting that:

W% = RD/RPW + .5

Plugging in Pythagorean relationships and solving for RPW:

R^x/(R^x + RA^x) = RD/RPW + .5

R^x/(R^x + RA^x) - .5 = RD/RPW

RPW = (R – RA)/(R^x/(R^x + RA^x) - .5)

For our 5 R/4 RA team, this results in (5 – 4)/(5^2/(5^2 + 4^2) - .5) = 9.1111 RPW or .1098 wins/run, which of course is the right answer. In terms of simplifying the Pythagorean relationship, though, this is useless – all we’ve done is rearrange terms to calculate runs per win for a given set of inputs. How we could use this to produce a flatter win estimator is to eliminate the use of a team’s R and RA figures and instead replace with a function that only considers the scoring level (i.e. RPG).

This is what the rule of thumb that 10 runs = 1 win does, substituting a general rule for specifics about the team’s actual location on a run/win curve with respect to the marginal value of an additional run scored or allowed. As such, since it’s establishing a rule that will be applied to all teams, it makes sense to center it at the point which will be closest to an average team – at the point where R = RA.

In other words, we will be developing a RPW equation that can be applied generally, but will be defined based on the relationship at the point where R = RA for a given RPG. Using our formula above for RPW based on rearrangement of terms in the Pythagenpat relationship, we can substitute R = RA wherever we see one of those terms and...reduce the equation to 0/0, as the denominator R – RA equals 0 when R = RA, and the numerator R^x/(R^x + RA^x) - .5 = 0 when R = RA.

However, this is where the equation for RPW derived using calculus can step in, and tell us what the theoretical RPW value is at that point. Recall from above that:

RPW = (RR^x + 1)^2*(RPG – RD)^2/(2*RPG*x*RR^(x – 1))

If we assume that R = RA, then RR = 1 and RD = 0, and this simplifies nicely to:

RPW = (1^x + 1)^2*(RPG – 0)^2/(2*RPG*x*1^(x – 1))

= 2^2*RPG^2/(2*RPG*x) = 4*RPG^2/(2*RPG*x) = 2*RPG/x

The first immediate implication is that for our special Pythagorean case where x = 2, RPW = RPG. Since the general case is:

W% = (R/G – RA/G)/RPW + .5

RPW = RPG is equivalent to saying that (after all of the game denominators cancel out):

W% = (R – RA)/(R + RA) + .5

What if x is a constant other than 2, like the value of x = 1.847 that minimizes RMSE for expansion-era major league teams? Then RPW = 2*RPG/1.847 = 1.083*RPG, and we could say that:

W% = (R/G – RA/G)/(1.083*(R/G + RA/G)) + .5

= (1/1.083)*(R/G – RA/G)/(R/G + RA/G) + .5

= .923*(R – RA)/(R + RA) + .5

More generally:

W% = (x/2)*(R – RA)/(R + RA) + .5

This form is one that was proposed by Ben Vollmayr-Lee as .91*(R – RA)/(R + RA) + .5 (I’ve rewritten his formula to match the format I’m using), which would imply a Pythagorean x = 1.82. I would suggest that the Kross equations and the Vollmayr-Lee equation are the ultimate in terms of simplified win estimators from the Akousmatikoi family (again, Kross and Vollmayr-Lee did not start from Pythagorean as we have; by including these estimators in the Akousmatikoi family, I only mean to suggest that they are mathematically related to Pythagorean, not that their creators didn’t independently discover them).

Remember that for the expansion era, the average RPG is 8.83, which would imply that the long-term RPW value is approximately 1.083*8.83 = 9.56; close enough to ten that you can see why we might have a rule of thumb, although ten runs would imply a 4.5% higher scoring context (10/1.083 = 9.23) than observed in the expansion era.

We could also use a hybrid approach, in which we allow each team’s RPW according to the formula that applies when R = RA to vary based on their RPG, but not on how that RPG breaks down into runs scored and allowed. In order to do this, we’d return to RPW = 2*RPG/x, but instead of setting x equal to a constant, use a custom value for x. Of course, my suggested value would be the Pythagenpat estimate of x, namely:

x = RPG^z, where z = .282 for now (value that minimizes RMSE for the expansion era)

Substituting this equation for x, we find a general case for a variable z that:

RPW = 2*RPG/(RPG^z) = 2*RPG^(1 – z)

Or for the specific case that z = .282:

RPW = 2*RPG^.718

We could further flatten this equation by approximating it with a linear function. Recall from the last section that we can write a tangent line in the form:

y – y1 = m(x – x1) where x1 and y1 and the x and y values for the point in question, and m is the slope of the curve at x1.

To apply this approach to develop a linear approximation of the above equation, we first need the slope of the RPW function 2*RPG^(1 – z). Differentiating with respect to RPG yields 2*(1 – z)*RPG^(-z).

Let’s center this at the point corresponding to our expansion-era averages, so x = 1.847 (For the eagle-eyed readers or those checking my math (always welcomed!) I’m choosing to use the value that minimizes RMSE to be consistent with earlier applications rather than the value of 1.848 that corresponds to 8.83 RPG using the equation directly). In this case x1 will be 8.83 RPG, and y1 = 2*8.83^.718 = 9.555 At 8.83 RPG, m will be 2*(1 - .282)*8.83^(-.282) = .777, so we have:

RPW – 9.555 = .777*(RPG – 8.83)

which simplifies to:

RPW = .777*RPG + 2.694

We’ve now developed two RPW estimates, using only RPG as a dependent variable, one with a y-intercept and one without, by trying to flatten the Pythagorean relationships wherever possible. Which is more accurate? One would assume that it’s the version with y-intercept, but even if it is, how much more accurate for normal teams, and how does this tangent line based approach compare with the best fit for an equation of the form RPW = m*RPG + b? Those are questions we’ll explore in the final installment.

References

Ben Vollmayr-Lee’s article on win estimation formulas:

http://www.eg.bucknell.edu/~bvollmay/baseball/pythagoras.html

Ralph Caola published multiple articles on using differentiation with the Pythagorean formula, as well as an (to the best of my knowledge) unpublished article he shared with me on double the edge.

His articles can be found in the 11/2003, 2/2004, and 5/2004 issues of By the Numbers.

https://sabr.org/research/statistical-analysis-research-committee-newsletters/

Kevin D. Dayaratna and Steven J. Miller explored the relationship that RPW = 2*RPG/x in the 5/2012 issue of BTN. I had known and used that one for a long time, thanks originally to a post by David Glass on rec.sport.baseball. Unfortunately a quick search did not yield a live link to Glass’ post.

Wednesday, January 06, 2021

Akousmatikoi Win Estimators, pt. 2: Ratio-Based Simplifications

We will begin our endeavor to simplify/”flatten” the Pythagenpat exponent by looking at approaches that maintain the use of run ratio as the chief independent variable in the W% estimate. Before jumping into that, I should note that we could think of the first flattening as being moving from a variable exponent like Pythagenport/pat to a fixed exponent. However, since the latter came first historically, and is easier to explain conceptually, I didn’t approach it in that manner.

We could also make flattening the Pythagenpat exponent itself the first step. My definition of “flatten” for the sake of this discussion is to replace exponents with multiplication where possible. We could start by trying to convert z = RPG^.282 into a linear formula. I’ve skipped this step because we would still be left with exponents when we go to calculate the winning percentage. While simplifying the equations will generally cost us some theoretical and a tiny bit of empirical accuracy, it will gain us ease of calculation. Replacing RPG^.282 with a linear equation wouldn’t really make the calculation any easier, but more importantly I don’t think it would result in an interesting alternative methodology to estimate W%. It would just result in a very slightly easier to calculate, less accurate Pythagenpat equation.

I previously wrote the general Pythagorean relationship as:

W% = R^x/(R^x + RA^x)

but note that we could equivalently define win ratio (W/L = WR) as:

WR = RR^x where RR = run ratio = R/RA

I will alternate between these two ways of writing the equation depending on whichever is most convenient for what we’re trying to do. In this case, I want to see what happens if we get rid of the exponent. The approach I will take is to replace the current function with a simplified function that produces the same result for a particular point. Of course we cannot replace the function with another that will produce the same results at all points, or even expect to find one that would produce the same results at multiple points. But we will be able to find a function that produces the same result at a given point.

Mathematically, this will the tangent line to the curve at that point. At that point, the tangent line intersects the curve and has the same slope as the curve. We will determine the slope by differentiating the function, and we will then determine the tangent line using the point-slope equation for the line as a starting point (to me, this is the most intuitive way to write the equation of a line, and if necessary we can simplify later). The point-slope equation of a line is:

y – y1 = m(x – x1)

where x1 and y1 and the x and y values for the point in question, and m is the slope of the curve at x1.

I’m going to switch to referring to the Pythagorean exponent as “a”, so that it doesn’t get confused with x, our independent variable (which is run ratio). So if we want the tangent line for the equation WR = RR^a, we first differentiate with respect to run ratio to get:

dWR/dRR = a*RR^(a – 1)

Now we just need to determine x1 and y1. Since we are going to be applying simplified win estimation formulas across the entire spectrum of possible team performance, it makes the most sense to look at a team with R = RA, that we expect to have a .500 W%. Picking the average will likely result in the most accurate simplified equation over the entire spectrum of teams.

Of course, by simplifying the equation, we will lose accuracy (at least when the result of our simplified equation is compared to the “parent” equation – we hope in this case that the Pythagorean form is more accurate or else the entire premise of Akousmatikoi win estimators is moot). However, the simplified equation will match the parent equation precisely at chosen point, and will produce very similar results near the chosen point, so picking a point in the center of the distribution should maximize accuracy.

So, if R = RR, then RR equals one, and so our slope is simply equal to a, which is the Pythagorean exponent. Our x value is RR, which is 1, and our y is the WR corresponding to a RR of 1, which is 1 for any value of a as WR = RR^a. So in point slope form:

y – 1 = a*(x – 1)

which can simplify to

y – 1 = a*x – a

y = a*x – a + 1

Remembering what y and x represent in this case:

WR = a*RR – a + 1

For a fixed Pythagorean exponent a = 2:

WR = 2*RR – 2 + 1 = 2*RR - 1

This relationship suggests that if a team scores 10% more runs than it allows, it should win 20% more games than it loses. In the 1984 Baseball Abstract, Bill James wrote:

Another method that I have never tested but which I suspect would work as well as the others would be just to “double the edge”; that is, if a team scores 10% more runs than their opponents, they should win 20% more games than their opponents. If they score 1% more runs, they should win 2% more games. That method would probably work as well or better than the Pythagorean approach.

To my knowledge that’s the extent of James’ writings on this subject, so I can’t say whether he either explicitly or implicitly inferred “double the edge” from the Pythagorean formula, or whether he came across it some other way. Either way, it can be directly related back to his own Pythagorean method.

If WR = a*RR – a + 1, and we already know that by definition W% = WR/(WR + 1), then we can convert this into a W% estimate as:

W% = (a*RR – a + 1)/(a*RR – a + 1 + 1) = (a*RR – a + 1)/(a*RR – a + 2)

For the special case of a = 2, this becomes:

W% = (2*RR – 2 + 1)/(2*RR – 2 + 2) = (2*RR – 1)/(2*RR) = 1 – 1/(2*RR) = 1 – 1/(2*R/RA)

= 1 - RA/(2*R)

This special case was noted by Bill Kross, and got a brief callout in The Hidden Game of Baseball. Kross also noticed that this method would not produce the same result for teams that had inverse runs and runs allowed. A team that scores 5 and allows 4 runs would have an estimated W% of 1 - 4/(2*5) = .600, but a team that scores 4 and allows 5 would have an estimated W% of 1 – 5/(2*4) = .375.

So Kross proposed that that for the case in which runs scored < run allowed, the W% would be estimated as R/(2*RA), which would produce 4/(2*5) = .400 for the the team scoring 4/allowing 5. Not only is it satisfying to get a consistent result for the two sides of the same coin, this modification significantly improves the accuracy when comparing empirically comparing estimated to actual W%s.

Expressing this inversion in terms of the general case above, in a case where R < RA, the estimated WR would be:

WR = 1/(a*1/RR – a + 1) = 1/(a/RR – a + 1)

and the W% would be:

W% = 1/(a/RR – a + 1)/(1/(a/RR – a + 1) + 1)

There are some ways to make that look nicer, but I don’t think any of them are sufficiently nice to bother with here. For the specific case when a = 2, Ralph Caola has suggested this formula as a clean way to boil the Kross equations down to one line:

W% = (R - RA)/(R + RA + ABS(R - RA)) + .5

You might be reading this and objecting “I thought you were going to simplify the Pythagorean relationship, but nothing about the equation with all of those reciprocals above looks simpler”. That is true – other than the special case when a = 2 and the Kross equations apply, this is not an easier way to calculate an estimated winning percentage provided you have a modern calculator or computer. However, it is “simpler” mathematically in the sense that we have eliminated exponents. Of course, in so doing we have lost some accuracy, particular for extreme cases. Next time, instead of starting with run ratio, we’ll start with run differential and see what shakes out of Pythagorean and how it compares to methods that have been developed independently of Pythagorean.

Wednesday, December 16, 2020

Akousmatikoi Win Estimators, pt. 1: Pythagorean

This series will be a brief review of the Pythagorean methodology for estimating team winning percentage from runs scored and runs allowed, and will examine a number of alternative winning percentage estimators that can be derived from the standard Pythagorean approach. I call it a “review” because I will not be presenting any new methods – in fact, not only was everything I plan to cover discovered and published by other sabermetricians, but it is all material that I have already written about in one form or another. When recently posting old articles from my Tripod site, I saw how poorly organized the section on win estimators was, and decided that I should try to write a cleaner version that focuses on the relationship between the Pythagorean approach and other mathematical forms for win estimators. This series will start from the assumption that Pythagorean is a useful model; I don’t think this is a controversial claim but a full treatment would need to establish that before jumping into mathematical offshoots.

By christening his win estimator the “Pythagorean Theorem” due to the three squared terms in the formula reminding him of the three squared terms Pythagoras discovered defined the dimensions of right triangles, Bill James made it irresistible for future writers to double down with even more ridiculous names. I am sure any students of Greek philosophy are cursing me, but I am calling this the “Akousmatikoi” family of win estimators because Wikipedia informs me that Akousmatikoi was a philosophical school that was a branch of the larger school of Pythagoreanism based on the teachings of Pythagoras. A rival branch, the Mathematikoi school, was more focused on the intellectual and mathematical aspects of Pythagorean thought, which would make it a better name for my purposes, but even I think that sounds too ridiculous. I’ve also jumbled the analogy as James’ Pythagorean theorem is the starting point for the Akousmatikoi family of estimators but Pythagoras begat this school of philosophy, but not the other way around. Of course, James’ Pythagorean theorem really has nothing to do with Pythagoras to begin with, so don’t think too hard about this.

Before I get started, I want to make certain that I am very clear that I’m introducing nothing new and that while I will derive a number of methods from Pythagorean, the people who originally discovered and published these methods used their own thought processes and ingenuity to do so. They did not simply derive them from Pythagorean. I will try to namecheck them throughout the series, but will also do it here in case I slip up – among the sabermetricians who developed the methods that I will treat as Pythagorean offshoots independently are Bill Kross, Ralph Caola, and Ben Vollmayr-Lee.

I also want to briefly address the win estimators that are in common use that are not part of what I am calling the Akousmatikoi family. The chief one that I use is Cigol, which is my implementation of a methodology that starts with an assumed run distribution per game and calculates W% from there (I say “calculates” rather than “estimates” because given the assumptions about per game and per inning run distribution functions, it is a logical mathematical derivation, not an estimate. Of course, the assumptions are just that). Cigol is very consistent with the results of Pythagenpat for teams across a wide range of scoring environments, but is its own animal. There are also approaches based on regression that offer non-Akousmatikoi paths to win estimates. If you regress on run differential or run ratio, your results will look similar to Akousmatikoi, but if you take the path of Arnold Soolman’s pioneering work and regress runs and runs allowed separately, or you use logistic regression or another non-linear methodology, your results won’t be as easily relatable to the Akousmatikoi methods.

It all starts with Pythagorean, which Bill James originally formulated as:

W% = R^2/(R^2 + RA^2)

The presence of three squared terms reminded James of the real Pythagorean theorem for the lengths of the side of right triangle (A^2 = B^2 + C^2) and gave us the charmingly wacky name for this method of win estimation. James would later complicate matters by noting that a lower exponent resulted in a slight increase in accuracy:

W% = R^1.83/(R^1.83 + RA^1.83)

Later research by Clay Davenport and Keith Woolner would demonstrate that a custom exponent, varying by run environment, would result in better accuracy in extreme situations. Pete Palmer had long before demonstrated that his linear methods increased in accuracy when considering run environment; “Pythagenport” brought this insight to Pythagorean, which we’ll now more generally express as:

W% = R^x/(R^x + RA^x)

Where Pythagenport estimates x = 1.5*log(RPG) + .45, where RPG = (R + RA)/G

Davenport and Woolner stated that the accuracy of Pythagenport was untested for RPG less than 4. A couple years later, David Smyth had the insight that 1 RPG was a situation that could only occur if the score of each game was 1-0, and that such a team’s W% would by definition be equal to R/(R + RA). This implies that the Pythagorean exponent must be 1 when RPG = 1. Based on this insight, Smyth and I independently developed a modified exponent which was constructed as:

x = RPG^z

where z is a constant generally in the range of .27 - .29 (I originally published as .29 and have tended to use this value out of habit, although if you forced me to pick one value and stick to it I’d probably choose .28)

This approach produced very similar results to Pythagenport for the RPG ranges tested by Davenport and Woolner, and returned the correct result for the known case at RPG = 1. It has come to be called “Pythagenpat”.

Using Cigol, I tried to develop a refined formula for Pythagorean exponent using data for truly extreme temas. I loosened the restriction on requiring x = 1 when RPG = 1 to be able to consider a wider range of models, but I wasn’t able to come up with a version that produced superior accuracy with a large dataset of actual major league team-seasons to the standard Pythagenpat construction. My favorite of the versions I came up are below, which I won’t dwell on any longer but will revisit briefly at the end of the series. The first is a Pythagenpat exponent that produces a Pythagorean exponent of 1 at 1 RPG; the second is a Pythagorean exponent that does not adhere to that restriction.

z = .27348 + .00025*RPG + .00020*(R - RA)^2

x = 1.03841*RPG^.265 + .00114*RD^2

There are several properties of a Pythagorean construct that make it better suited as a starting point (standing in for the “true” W% function, if there could ever be such a thing) than some of the other methods we’ll look at. I have previously proposed a list of three ideal properties of a W% estimator:

1. The estimate should fall in the range [0,1]
2. The formula should recognize that the marginal value of runs is variable.
3. The formula should recognize that as more runs are scored, the number of marginal runs needed to earn a win increases.

As we move throughout this series, we will make changes to simplify the Pythagenpat function in some ways; in my notes I called it “flattening”, but that’s not a technical term. Basically, where we see exponents, we will try to convert into multiplication, or we will try to use run differential in place of run ratio. As we “flatten” the functions out, we will progressively lose some of these ideal properties, with the (usual) benefit of having simpler functions.

Throughout this series I will make sporadic use of the team seasonal data for the expansion era (1961 – 2019), so at this point I want to use this dataset to define the Pythagorean constants that we’ll use going forward. Rather than using any formulaic approach, I am going to fix x and z for this period by minimizing the RMSE of the W% estimates for the teams in the dataset. I will also use the fixed Pythagorean exponent of 2 throughout the series as it is easy to calculate, reasonably accurate, widely used, and mathematically will produce some pleasing results for the other Akousmatikoi estimators.

Using this data, the average RPG is 8.83, the value for x that minimizes RMSE is 1.847, and the z value that minimizes RMSE is .282. Note that if we used the average RPG to estimate the average Pythagorean exponent, we’d get 1.848 (8.83^.282), which doesn’t prove anything but at least it’s not way off.

Thursday, December 03, 2020

Palmerian Park Factors

The first sabermetrician to publish extensive work on park effects was Pete Palmer. His park factors appeared in The Hidden Game of Baseball and Total Baseball and as such became the most-widely used factors in the field. They continue in this role thanks to their usage at Baseball-Reference.com.

Broadly speaking, all ratio-based run park factors are calculated in the same manner. The starting point is the ratio of runs at home to runs on the road. There are a number of different possible variations; some methods use runs scored by both teams, while others (including Palmer’s original methodology in The Hidden Game use only the runs scored by one of the teams (home or road)). The opportunity factor can also vary slightly; many people just use games, which is less precise than innings or outs. The variations are mostly technical rather than philosophical in nature, so they rarely get a lot of attention. Park factor calculations are accepted at face value to an extent that other classes of sabermetric tools (like run estimators) are not.

Among park factors in use, the actual computations in Palmer’s are the most unique, so I thought it would be worthwhile to walk through his methodology (as stated in Total Baseball V) and discuss its properties.

In the course of this discussion, I will primarily focus on the actual calculations and not the inputs. Where Palmer has made choices about what inputs to use, how to weight multiple seasons, and the like, I will not dwell, as I’m more interested in the aspects of the approach that can be applied to one’s own choice of inputs. Palmer uses separate park factors for batting and pitching (more on this later); I’ll focus on the batting ones here.

Palmer generally uses three years of data, unweighted, as the basis for the factors. There are some rules about which years to use when teams change parks, but those are not relevant to this discussion. The real meat of the method starts by finding the total runs scored and allowed per game at home and on the road, which I’ll call RPG(H) and RPG(R).

i (initial factor) = RPG(H)/RPG(R)

I will be using the 2010 Colorado Rockies as the example team here, considering just one year of data to keep things simple. Colorado played 81 games both home and away, scoring 479 and allowing 379 runs at home and scoring 291 and allowing 338 on the road. Thus, their RPG(H) = (479 + 379)/81 = 10.593 and RPG(R) = (291 + 338)/81 = 7.765. That makes i = 10.593/7.765 = 1.364 (I am rounding to three places throughout the post, which will cause some rounding discrepancies with the spreadsheet from which I am reporting the results).

The next step is to adjust the initial factor for the number of innings actually played rather than just using games as the denominator. This step can be ignored if you begin with innings or outs as the denominator rather than using games. The Innings Pitched corrector is:

IPC = (18.5 - Home W%)/(18.5 - (1 - Road W%))

Palmer explains that 18.5 is the average number of innings batted per game if the home team always bats in the ninth inning. Teams that win a higher percentage of games at home bat in less innings due to skipping the bottom of the ninth. The IPC seems to assume that in all games won by the home team, they do not bat in the bottom of the ninth.

Colorado was 52-29 (.642) at home and 31-50 on the road (.383), so their IPC is:

IPC = (18.5 - .642)/(18.5 - (1 - .383)) = .999

The initial factor is divided by the IPC to produce what the explanation refers to as Run Factor:

RF = i/IPC

For the Rockies, RF = 1.364/.999 = 1.366

The next step is the Other Parks Corrector (OPC). The OPC “[corrects] for the fact that the other road parks’ total difference from the league average is offset by the park rating of the club that is being rated.” The glossary explanation may be confusing, but the thought process behind it is pretty straightforward--a team’s own park makes up part of the league road average, but none of the team’s own road games are played there. Without accounting for this, all parks would be estimated to be more extreme than they are in reality.

The OPC is figured in the same manner as I do it in my park factors; I borrowed it from Craig Wright’s work without realizing that Palmer had done the same mathematical operation, but its derivation is fairly obvious and the equivalent appears in multiple park factor approaches. Let T equal the number of teams in the league:

OPC = T/(T - 1 + RF)

Basically, OPC assumes a balanced schedule, so each team’s schedule is made up half of games in its own park (hence the use of RF) and half in the other parks, of which there are T - 1. For a sixteen team league (and thus Colorado 2010):

OPC = 16/(16 - 1 + 1.366) = .978

The next step is to multiply RF by OPC, producing scoring factor:

SF = RF*OPC

For Colorado, SF = 1.366*.978 = 1.335

If all you were interested in was an adjustment factor of the park effect’s on scoring, rather than specific adjustments for the team’s batters and pitchers, this would be your stopping point. The scoring factor is the final park factor in that case, and with the exception of the Innings Pitched Corrector, it is equivalent to the approach used by Craig Wright, myself, and many others. (My park factors are then averaged with one to account for the fact that only half of the games for a given team are at home, but that only obscures the fact that the underlying approach is identical, and Palmer accounts for that consideration later in his process).

It is when the other factors are adjusted for that the math gets a little more involved. The first step is to calculate SF1, which is an adjustment to scoring factor:

SF1 = 1 - (SF - 1)/(T - 1)

For the 2010 Rockies:

SF1 = 1 - (1.335 - 1)/(16 - 1) = .978

While I am writing this explanation, I must stress that it is a walkthrough of another person’s method. I cannot fully explain the thought process behind it and justify every step. I decided to include that disclaimer at this point because I don’t understand what the purpose of SF1 is, as it is mathematically equivalent to OPC. Why it needed to be defined again and in a more obtuse way is beyond me.

In any event, the purpose of SF1 is to serve as a road park factor. If a team plays a balanced schedule and we take as a given that the overall league park factor should be equal to one, but we determine that its own park has a PF greater than one (favors hitters), then it must be the case that the road parks they play in have a composite PF less than one. That’s the function of the OPC, and of SF1. For the rest of this post, I will refer to OPC rather than SF1 when it is used in formulas.

Palmer uses separate factors for batters and pitchers to account for the fact that a player does not have to face his teammates. By doing so, Palmer’s park factors make their name something of a misnomer as they adjust for things other than the park. (One could certainly make the case that the park factor name is constantly misapplied in sabermetrics, as we can never truly isolate the effect of the park, and the sample data we have is affected by personnel and other decisions. Palmer takes it a step further, though, by accounting for things that have nothing to do with the park.) The park effect is generally stronger than the effect of not facing one’s own teammates, since a team plays in its park half the time but only misses out on facing its own pitchers 1/T percent of the time assuming a balanced schedule.

One can argue about the advisability of adjusting for the teammate factor at all, and if so it certainly is debatable whether it should be included in the park factor or spun off as a separate adjustment. I would find the later to be a much better choice that would result in a much more meaningful set of factors (both for the park and teammates), but Palmer chose the former.

The separate factors are calculated through an iterative process. One must know the R/G scored and allowed at home and away for the team (I’ll call these RG(H), RG(R), RAG(H), and RAG(R) for R/G at home, R/G on the road, RA/G at home, and RA/G on the road respectively). Additionally, one must know the average RPG for the entire league (which I’ll call 2*N, since I often use N to denote league average runs/game for one team). For the Rockies, we can determine from the data above that RG(H) = 5.913, RG(R) = 3.593, RAG(H) = 4.679, RAG(R) = 4.173, and N = 4.33.

The iterative process is used to calculate a team batter rating (TBR) and a team pitcher rating (TPR, not to be confused with Total Player Rating, another Palmer acronym). These steps are necessary since the strength of each unit cannot be determined wholly independently of the other. Just as the total impact of park goes beyond the effect on home games and into road games (necessitating the OPC, but not being of strong enough magnitude to swamp over the home factor), so the influence of the two units on another are codependent.

The first step of the process assumes that the pitching staff is average (TPR = 1). Then the TBR can be calculated as:

TBR = [RG(R)/OPC + RG(H)/SF]*[1 + (TPR - 1)/(T - 1)]/(2*N)

For Colorado:

TBR = [3.593/.978 + 5.913/1.335]*[1 + (1 - 1)/(16 - 1)]/(2*4.33) = .936

The first bracketed portion of the equation adds the teams’ adjusted (for park) R/G at home and on the road, using the home (SF) or road (OPC) park factor as appropriate. The second set of brackets multiplies the first by the adjustment for not facing the team’s pitching staff. The difference between the team pitching and average (TPR - 1) is divided by (T - 1) since playing a true balanced schedule, the team would only face those pitchers in 1/(T - 1) percent of the games anyway. If the team’s pitchers are above average (TPR < 1), then the correction will increase the estimate of team batting strength.

Then the entire quantity is divided by double the league average of runs scored per game by a single team. This may seem confusing at first glance, but it is only because the first bracketed portion did not weight the home and road R/G by 50% each. The formula could be written as the equivalent:

TBR = [.5*RG(R)/OPC + .5*RG(H)/SF]*[1 + (TPR - 1)/(T - 1)]/N

In this case, it is much easier to see that the first bracket is the average runs scored per game by the team, park-adjusted. Dividing this by the league average results in a very straightforward rating of runs scored relative to the league average. Both TBR and TPR are runs per game relative to the league average, but in both cases they are constructed as team figure/league figure. This means that the higher the TBR, the better, with the opposite being true for TPR.

The pitcher rating can then be estimated using the actual TBR just calculated, rather than assuming that the batters are average:

TPR = [RAG(R)/OPC + RAG(H)/SF]*[1 + (TBR - 1)/(T - 1)]/(2*N)

For the Rockies, this yields:

TPR = [4.173/.978 + 4.679/1.335]*[1 + (.936 - 1)/(16 - 1)]/(2*4.33) = .894

The first estimate is that the Rockies batters, playing in a neutral park and against a truly balanced schedule, would score 93.6% of the league average R/G. Rockie pitchers would be expected to allow 89.4% of the league average R/G. At first when I was performing the sample calculations, I thought I might have made a mistake since the Colorado offense was evaluated as so far below average, but I was forgetting that they would appear quite poor using a one-year factor for Coors Field. My instinct as to what Colorado’s TBR should look like was informed by my knowledge of the five-year PF that I use.

Now the process is repeated for three more iterations, each time using the most recently calculated value for TBR or TPR as appropriate. The second iteration calculations are:

TBR = [3.593/.978 + 5.913/1.335]*[1 + (.894 - 1)/(16 - 1)]/(2*4.33) = .929

TPR = [4.173/.978 + 4.679/1.335]*[1 + (.929 - 1)/(16 - 1)]/(2*4.33) = .893

Repeating the loop two more times should ensure pretty stable values. Theoretically, there’s nothing to stop you from setting up an infinite loop. I won’t insult your intelligence by spelling the second pair of iterations suggested by Palmer, but the final results for Colorado are a TBR of .929 and a TPR of .993.

So far, all we’ve done is figured teammate-corrected runs ratings for team offense and defense. Our estimate of the park factor remains stuck back at scoring factor. All that’s left now is correcting scoring factor for 1) the teammate factor and 2) the fact that a team plays half its games at home and half on the road. This will require two separate formulas--a batter’s PF (BPF) and a pitcher’s PF (PPF), which are twins in the same way that TBR and TPR are:

BPF = (SF + OPC)/[2*(1 + (TPR - 1)/(T - 1)]

PPF = (SF + OPC)/[2* (1 + (TBR - 1)/(T - 1)]

For Colorado:

BPF = (1.335 + .978)/[2*(1 + (.893 - 1)/(16 - 1)] = 1.165

PPF = (1.335 + .978)/[2*(1 + (.929 - 1)/(16 - 1)] = 1.162

The logic behind the two formulas should be fairly obvious at this point. Again, the multiplication by two in the denominator arises because the two factors in the numerator were not weighted at 50% each. You can write the formulas as:

BPF = (.5*SF + .5*OPC)/[1 + (TPR - 1)/(T - 1)]

PPF = (.5*SF + .5*OPC)/[1 + (TBR - 1)/(T - 1)]

Here, you can see more clearly that the numerator averages the home park factor (SF) and the road park factor (OPC). The numerator could be your final park factor if you wanted to account for road park but didn’t care about teammate effects. The denominator is where the teammate effect is taken into account, and in the same manner as in TBR and TPR. If TPR > 1, BPF goes down because the batters did not benefit from facing the team’s poor pitching. If TBR > 1, PPF goes down pitchers benefitted from not facing their batting teammates.

If one does not care to incorporate the teammate effect, then Palmer’s park factors are pretty much the same as any other intelligently designed park factors. This should not come as a surprise, because as with many implementations of classical sabermetrics, Palmer’s influence is towering. The iterative process used to generate the batting and pitching ratings is pretty clever, and incorporates a real albeit small effect that many of us (myself included) often gloss over.