Wednesday, January 26, 2022

Pythagenpat Using Run Rates

The widespread implementation of seven-inning games during the 2020 season forced a re-examination of some of the standard sabermetric tools. One example is Pythagorean records. It would be foolish to expect that the same run ratio achieved over nine innings would lead to the same expected winning percentage as if it had been achieved over seven innings. Thus, simply taking a team’s composite runs scored and allowed for the season, which consisted of some unique to that team distribution of seven-inning and nine-inning games, and expecting the standard Pythagorean approach to be the best estimate of their overall winning percentage was also foolish.

The approach that one should select to deal with this issue depends on what the desired final estimate is. If one wanted to estimate how many games a team should have won over the course of that season, one reasonable approach would be to develop a proper Pythagorean or Pythagenpat exponent for seven-inning games, and then calculate a team’s estimated winning percentage in seven-inning games using that value, in nine-inning games using the standard approach, and then weighting the results by the percentage of seven-inning and nine-inning games for the team (defining this in terms of the scheduled length of the game and not the actual number of innings that was played in the case of extra-inning seven-inning games).

Tom Tango studied games that were tied entering the third inning to simulate a seven-inning game, and found a Pythagorean exponent of 1.57 was appropriate. Of course that’s fixed rather than Pythagenpat exponent, but you could use the same approach to develop an equivalent Pythagenpat formula, and then apply as described above. 

I decided that I more interested in attempting to estimate what the team’s W% would have been under standard conditions (i.e. nine-inning games as the default, as we normally view a major league season). Thus I was interested in what a team’s W% “should have been” had they played in a normal season. This allowed me to skip the step of dealing with seven-inning games, and instead think about the best way to fit their 2020 data into the standard formulas. Of course, the silly runs scored in extra inning games are a problem, but I chose to ignore them for the sake of expediency (and in hopes that this all would be a temporary problem) and use the team’s runs (and allowed) per nine innings to plug into Pythagenpat.

In thinking about this, I was reminded of a related issue that I have been aware of for a long time, which is the reduced accuracy of Pythagorean estimates (and really all R/RA estimates of W%) as pertains to home and away games. If you look at 2010-2019 major league data and use Pythagenpat with x = RPG^.29, the RMSE of estimate team W% multiplied by 162 is 3.977 (for the sake of convenience I’ll just call this RMSE going forward, but this can be thought of as the standard error per 162 games). If you just look at away games, the RMSE is 6.537, and for home games it is 6.992. 

It should not surprise us that the error is larger, as we have just halved the number of games for each observation, and we should generally expect larger deviations from expectation over small samples. However, it’s not just the magnitude of the error that matters. Over this span, home teams averaged a .535 W% and road teams (of course) the complement of .465. But the Pythagenpat record of home teams was .514, and for road teams .486. One’s first inclination upon seeing this might be to say “Aha! Evidence of home field advantage manifesting itself. Home teams exceed their Pythagenpat record by .021 wins due to [insert explanation...strategic advantage of batting in the bottom of the ninth, crowd support allowing them to will runs when needed, etc.]”

One sabermetrician who encountered this phenomenon and developed a more likely (and indeed, obvious upon reflection) explanation for it was Thomas Tress. His article “Bias Against the Home Team in the Pythagorean Theorem” was published in the May 2004 By The Numbers. Tress provided the obvious explanation that home teams often don’t bat in the bottom of the ninth, which means that they often have fewer opportunities to score runs than they do to allow runs. Tress offers a correction with a scalar multiplier that can be applied to a home team’s runs (and of course also to the road team’s runs allowed) as a corrector.

Tress’ approach is a solid one, but it addresses only the home/road Pythagorean conundrum that we entered on a detour, rather than my primary concern about length of game (this is not a criticism as it was not intended to). The issues are related because the home team not batting in the bottom of the ninth is one way in which game lengths vary from the standard nine innings that are inherently assumed in most metrics (or, more precisely, they assume the average number of innings in the data which was used to calibrate them, which we’ll get to in due course).

I should point out that there is another issue that pertains to home teams that also distorts Pythagorean records, which is truncated bottom of the ninths (or tenths, elevenths, etc.). Foregone bottom of the ninths are more obviously troublesome, but truncated bottom of the ninths (in which a walkoff victory is achieved before three outs are recorded) which leave home teams’ runs totals lower than they would otherwise be, as run expectancy is left on the table when the game ends. I will not be correcting for that here; it is a lesser problem than foregone bottom of the ninths for the sake of Pythagorean records, and there’s no easy fix (one could add to a home team’s runs scored and an away team’s runs allowed the run expectancy that existed at the end of the game, but this is not a correction that can quickly be made with a conventional dataset). You can avoid this problem by using runs created rather than actual runs, as the potential runs are still reflected in the calculation, but that changes the whole nature of the Pythagorean record by opening up a second dimension of luck (“sequencing” of offensive events rather than simply “timing” of runs).

Ignoring the truncated innings issue, there is an obvious approach that should help address both the home field issue and the question of shortened games, which is using a rate of runs scored and allowed that considers outs/innings rather than raw totals or rates (most commonly runs/game) that don’t take into account outs/innings. Since Pythagenpat is built around runs per game determining the exponent, I will take the approach of using runs/9 innings.

Before jumping into the Pythagenpat implications, two points on this definition:

1. It’s easy to know a team’s defensive innings, as it’s just their innings pitched. For offenses, you can use Plate Appearances – Runs – Left on Base (at least for non-Manfred innings), although it’s easier if you can just get opponents’ innings pitched, or opponents’ putouts, since PO/3 = IP by definition.

 2. I am using 9 innings because it is the regulation game length, but it actually corresponds to a slightly longer game than what we actually saw in 2010-2019. For those seasons, the average outs/game was 26.82, which is equivalent to 8.94 innings/game.

I’m using 2010-2019 data for this post not because I think ten years (300 team seasons) is an appropriate sample when conditions of the game have not changed in the last century to an extent that should significantly influence Pythagorean records. The more mundane explanation is that data on actual team outs, home and away, is not easily accessible, and the easiest way I know how to get is through Retrosheet’s Game Logs which are an absolutely fantastic resource. But I didn’t want to spend a significant amount of time parsing them, which is why I limited my sample to ten years. 

My first step was to optimize standard Pythagenpat to work with this dataset, so that any RMSE comparisons we make after building a rate-based Pythagenpat formula are on a level playing field. However, I was quite surprised by what I found - the Pythagenpat exponent that minimizes RMSE for the 2010-2019 majors is .264 (in other words, the Pythagorean exponent x = RPG^.264).

Typically, a value in the range .28 - .29 minimizes RMSE. I was so surprised by .264 that I thought for a moment I might have made an error compiling the data from the game logs, so I checked the Lahman database at the season level to be sure. The data was accurate – this set of 300 teams happen to actually have a lower Pythagenpat exponent than I am conditioned to seeing. 

For the purpose of a proof of concept of using rates, this is not really an issue; however, I certainly question whether the best fit values I’ve found for the rate approach should be broadly applied across all league-seasons. I will leave it up to anyone who ultimately decides to implement these concepts to decide whether a larger sample is required to calibrate the exponents.

With that being said, the differences in RMSE using the lower Pythagenpat exponent are not earth-shattering. Using .264, the RMSE for all games is 3.923, with 7.015 for home games and 6.543 for away games, with the home/road RMSEs actually higher than those for the standard exponent. I provide these values for reference only as the real point of this exercise is to look at what happens for a rate-based Pythagenpat.

First, let’s define some basic terms:

R/9 = Runs/Actual Outs * 27

RA/9 = Runs Allowed/Innings Pitched * 9

RPG9 = R/9 + RA/9

x = RPG9^z (z will be our Pythagenpat exponent and x the resulting Pythagorean exponent for a given RPG9)

W% = (R/9)^x/((R/9)^x + (RA/9)^x)

The value of z that minimized RMSE for this dataset is .244. That RMSE is 3.771, which is a significant improvement over the optimized Pythagenpat that does not use rates. This is encouraging, as if there was no advantage to be had this whole exercise would be a waste of time. I also think it’s intuitive that considering rates rather than just raw run totals would allow us to improve our winning percentage estimate. After all, the only differences between raw runs and rates for a team season will arise due to how the team performs in individual games. 

To with, we can define opportunities to score runs in terms of outs, since outs are the correct denominator for a team-level evaluation of runs scored/allowed on a rate basis. A perfectly average team would expect to have an equal number of opportunities for their offense and defense, but a good team will allow its opponents’ offense more opportunities (since they will forego more bottom of the ninths at home and play more bottom of the ninths on the road), and a bad team will get more opportunities for its own offense. These differences don’t arise randomly, but due to team performance.  So we should expect a slight improvement in accuracy of our winning percentage estimate when we allow these corrections, but it should be slight since foregone bottom of the ninths have a ceiling in practice and a lower ceiling in reality (even very bad teams often forego bottom of the ninths and even very good teams frequently lose at home or at least need a walkoff to win).

Better yet, the reductions in RMSE for home games (5.779) and road (5.215) are larger, which we might have expected as the impact of foregone bottom of the ninths will not be as smooth across teams when considering home and road separately. When using this rate approach, the expected W% for all home teams in the dataset is .536, compared to the actual home W% of .535. So there is no evidence of any home field advantage in converting runs/runs allowed to wins that does not get wiped away by taking opportunities to score/allow runs into account, contrary to what one might conclude from a naïve Pythagenpat analysis.

A further note is that if you calculate a team’s total expected wins as a weighted average of their home and road rate Pythagenpats, the RMSE is a little better (3.754) than just looking at the combined rate. This also should not surprise, as we have sneaked in more data about how a team actually distributed its runs scored and allowed across games by slicing the data into two pieces instead of one. If we calculated a Pythagenpat record for every game and then aggregated, we should expect to maximize accuracy, but at that point we are kind of losing the point of a Pythagorean approach (we can make the RMSE zero if in that case we replace Pythagenpat with a rule that if R > RA, we should assign a value of 1 expected win and if R < RA we should assign a value of 0 expected wins).

Again, I would consider this a demonstration of concept rather than a suggestion that this be implemented with a rate Pythagenpat exponent of .244. My hunch is that the best value to use over a broad range of team-seasons is higher than .244. Also, I think that for just looking at total season records, a standard approach is sufficient. If you ever are working with a situation in which you can expect to see significant discrepancies between the number of foregone bottom of the ninths for a team and its opponents (as is definitely the case when considering home and away games separately, and may be the case to a much lesser extent for extremely good or extremely bad teams), then you may want to consider calculating Pythagenpat using run rates rather than raw totals.

Wednesday, January 05, 2022

Rate Stat Series, pt. 16: Summary

This series spans fifteen posts, over thirty tables, and over 25,000 words. I don’t really expect anyone to slog through all that. So here I want to express the key points of the series as succinctly and with as little math as possible. In doing so, it will become apparent that I haven’t broken any new ground in this series, which is even more reason not to slog through the rest.

1. The proper denominator for a rate stat (where “rate stat” is defined as a measure of overall offensive productivity expressed in units of runs or wins, rather than the rate of any given event or subset of events) for a team is outs. This is obviously true if you take a moment to examine it, and is one of the core fundamental insights of sabermetrics. Because when a pitcher is in the game, he functions as his own team, outs are also the proper denominator for any overall pitching rate stat.

2. The number of plate appearances any team gets is a function of their rate of making outs (if we ignore enough statistical categories, this boils down to their On Base Average). On the team level, plate appearances are an inappropriate rate stat denominator as it is illogical to penalize a team for avoiding outs more effectively than another.

3. At the individual batter level, neither outs nor plate appearances are a satisfactory denominator if an estimate of absolute runs created is used as the numerator of the rate stat. Beyond their primary contributions to their team through their direct actions at the plate and on the bases, batters make a secondary contribution by avoiding outs, thus generating additional plate appearances for their teammates. But individual batters don’t operate in a vacuum. An individual contributes to his team’s plate appearance total, but doesn’t individually define it as he only makes up one-ninth of the lineup. Using outs as a denominator treats an individual as if he alone defines his team. Using plate appearances, on the other hand, does not value the secondary contribution that a batter makes by generating additional opportunities for his teammate, absent some adjustment.

4. There are three frameworks through which we can evaluate an individual’s offense. The first, which I do not advocate at all, is to treat the player as a team, plugging the individual’s stats into a dynamic run estimator like Runs Created or Base Runs. The second is to use linear weights to evaluate either absolute runs created (as, for example, Estimated Runs Produced or Extrapolated Runs do) or runs above average (ala Pete Palmer’s Batting Runs). The third is to construct a theoretical team, using a dynamic run estimator to estimate the runs created by a hypothetical team that consists of the batter in question plus eight other (typically league average) players.

5. The selection of approach to run estimation should not be divorced from the choice of rate stat. The assumptions inherent in each of the approaches to run estimation suggest similar, consistently reasoned assumptions that would make sense to use in developing a rate stat. While it is possible and justifiable to mix certain elements across the framework, my point of view is that it makes more sense to keep the “frameworks” pure, and utilize the rate stat that makes the most sense to pair with the chosen run estimator.

6. Using linear weights runs above average (RAA) rather than absolute linear weights runs created as the numerator does enable the use of plate appearances as the denominator, because the RAA estimate already incorporates the batter’s secondary contribution. However, RAA/PA may not be everyone’s ideal choice for a rate stat, because…

7. Some rates can be compared (while maintaining meaningful units) differentially (i.e. subtracting the values for two players makes sense); others are ratio comparable (i.e. dividing the values for two players makes sense); some are neither differentially nor ratio comparable, and some are both. I prefer metrics that can be compared either way, but RAA/PA is only differentially comparable. FanHome poster Sibelius developed an adjustment called R+/PA, that depending on how you look at either adds the league average R/PA to RAA/PA, or makes an adjustment to absolute runs created before dividing by PA, that allows ratio comparisons for the rate stat.

8. wOBA, which is now in wide use thanks to its popularization by Tom Tango and Fangraphs, is a variant of the RAA/PA family as well, although it doesn’t maintain direct differential or ratio comparability.

9. Despite the issues with R/O as a rate stat for an individual, using it to calculate RAA will produce the same result for the RAA total as R+/PA, assuming that the inputs are consistently defined. R/O causes very minor distortion when used to compare normal players, and would cause much distortion with extreme players, but remains a useful shortcut rate stat. There are many worse choices one could make in devising an individual rate stat than using R/O. R/O remains the correct rate stat for a team; the RAA/PA family of metrics is inappropriate for the same reason R/PA is inappropriate for a team, in addition to some issues that would arise if attempting to define terms like “R+” for a team, as their actual runs scored or estimated runs created is already based on the number of plate appearances that they actually generated.

10. One can argue that batters also make tertiary contributions to their team through their impact on the run values of all of their teammate’s actions. The impact is very small for most hitters, dwarfed by their primary and secondary contributions, and if attempting to quantify them one must be careful to ensure that it’s not just measurement error. Attempting to capture these impacts lends itself to use of a theoretical team approach, which uses a dynamic run estimator to model how a batter’s impact on a team.

11. The theoretical team approach gives rise to a rate stat that David Smyth called R+/O+, which is expressed on a R/O scale but produces the same RAA given the same inputs. It can be applied to the linear weights framework as well, and offers an option if one prefers to express results on the R/O scale rather than R/PA, and thus have the same scale for the individual and team rate stat.

12. If you wish to compare rates across run environments, differentials between the individual and the league usually aren’t sufficient as higher run environments make equal differences less valuable in terms of wins. If you assume a fixed Pythagorean exponent for your win conversion, the case can be made that ratios do capture the win difference, but as soon as you introduce a run environment-dependent Pythagorean exponent that better models reality, this assumption fails. It is also necessary to consider that simply comparing the individual to the league average may not properly capture the dynamic of how the individual’s run contribution contributes to his team’s wins. There is also a potential complication from how differences in league PA/G impact rates denominated in PA. All of this is to say that there is no simple solution to converting run rate stats to their win-equivalents, and care should be taken in doing so, especially considering that the impact may be relatively small for many cases.