Wednesday, January 26, 2022

Pythagenpat Using Run Rates

The widespread implementation of seven-inning games during the 2020 season forced a re-examination of some of the standard sabermetric tools. One example is Pythagorean records. It would be foolish to expect that the same run ratio achieved over nine innings would lead to the same expected winning percentage as if it had been achieved over seven innings. Thus, simply taking a team’s composite runs scored and allowed for the season, which consisted of some unique to that team distribution of seven-inning and nine-inning games, and expecting the standard Pythagorean approach to be the best estimate of their overall winning percentage was also foolish.

The approach that one should select to deal with this issue depends on what the desired final estimate is. If one wanted to estimate how many games a team should have won over the course of that season, one reasonable approach would be to develop a proper Pythagorean or Pythagenpat exponent for seven-inning games, and then calculate a team’s estimated winning percentage in seven-inning games using that value, in nine-inning games using the standard approach, and then weighting the results by the percentage of seven-inning and nine-inning games for the team (defining this in terms of the scheduled length of the game and not the actual number of innings that was played in the case of extra-inning seven-inning games).

Tom Tango studied games that were tied entering the third inning to simulate a seven-inning game, and found a Pythagorean exponent of 1.57 was appropriate. Of course that’s fixed rather than Pythagenpat exponent, but you could use the same approach to develop an equivalent Pythagenpat formula, and then apply as described above. 

I decided that I more interested in attempting to estimate what the team’s W% would have been under standard conditions (i.e. nine-inning games as the default, as we normally view a major league season). Thus I was interested in what a team’s W% “should have been” had they played in a normal season. This allowed me to skip the step of dealing with seven-inning games, and instead think about the best way to fit their 2020 data into the standard formulas. Of course, the silly runs scored in extra inning games are a problem, but I chose to ignore them for the sake of expediency (and in hopes that this all would be a temporary problem) and use the team’s runs (and allowed) per nine innings to plug into Pythagenpat.

In thinking about this, I was reminded of a related issue that I have been aware of for a long time, which is the reduced accuracy of Pythagorean estimates (and really all R/RA estimates of W%) as pertains to home and away games. If you look at 2010-2019 major league data and use Pythagenpat with x = RPG^.29, the RMSE of estimate team W% multiplied by 162 is 3.977 (for the sake of convenience I’ll just call this RMSE going forward, but this can be thought of as the standard error per 162 games). If you just look at away games, the RMSE is 6.537, and for home games it is 6.992. 

It should not surprise us that the error is larger, as we have just halved the number of games for each observation, and we should generally expect larger deviations from expectation over small samples. However, it’s not just the magnitude of the error that matters. Over this span, home teams averaged a .535 W% and road teams (of course) the complement of .465. But the Pythagenpat record of home teams was .514, and for road teams .486. One’s first inclination upon seeing this might be to say “Aha! Evidence of home field advantage manifesting itself. Home teams exceed their Pythagenpat record by .021 wins due to [insert explanation...strategic advantage of batting in the bottom of the ninth, crowd support allowing them to will runs when needed, etc.]”

One sabermetrician who encountered this phenomenon and developed a more likely (and indeed, obvious upon reflection) explanation for it was Thomas Tress. His article “Bias Against the Home Team in the Pythagorean Theorem” was published in the May 2004 By The Numbers. Tress provided the obvious explanation that home teams often don’t bat in the bottom of the ninth, which means that they often have fewer opportunities to score runs than they do to allow runs. Tress offers a correction with a scalar multiplier that can be applied to a home team’s runs (and of course also to the road team’s runs allowed) as a corrector.

Tress’ approach is a solid one, but it addresses only the home/road Pythagorean conundrum that we entered on a detour, rather than my primary concern about length of game (this is not a criticism as it was not intended to). The issues are related because the home team not batting in the bottom of the ninth is one way in which game lengths vary from the standard nine innings that are inherently assumed in most metrics (or, more precisely, they assume the average number of innings in the data which was used to calibrate them, which we’ll get to in due course).

I should point out that there is another issue that pertains to home teams that also distorts Pythagorean records, which is truncated bottom of the ninths (or tenths, elevenths, etc.). Foregone bottom of the ninths are more obviously troublesome, but truncated bottom of the ninths (in which a walkoff victory is achieved before three outs are recorded) which leave home teams’ runs totals lower than they would otherwise be, as run expectancy is left on the table when the game ends. I will not be correcting for that here; it is a lesser problem than foregone bottom of the ninths for the sake of Pythagorean records, and there’s no easy fix (one could add to a home team’s runs scored and an away team’s runs allowed the run expectancy that existed at the end of the game, but this is not a correction that can quickly be made with a conventional dataset). You can avoid this problem by using runs created rather than actual runs, as the potential runs are still reflected in the calculation, but that changes the whole nature of the Pythagorean record by opening up a second dimension of luck (“sequencing” of offensive events rather than simply “timing” of runs).

Ignoring the truncated innings issue, there is an obvious approach that should help address both the home field issue and the question of shortened games, which is using a rate of runs scored and allowed that considers outs/innings rather than raw totals or rates (most commonly runs/game) that don’t take into account outs/innings. Since Pythagenpat is built around runs per game determining the exponent, I will take the approach of using runs/9 innings.

Before jumping into the Pythagenpat implications, two points on this definition:

1. It’s easy to know a team’s defensive innings, as it’s just their innings pitched. For offenses, you can use Plate Appearances – Runs – Left on Base (at least for non-Manfred innings), although it’s easier if you can just get opponents’ innings pitched, or opponents’ putouts, since PO/3 = IP by definition.

 2. I am using 9 innings because it is the regulation game length, but it actually corresponds to a slightly longer game than what we actually saw in 2010-2019. For those seasons, the average outs/game was 26.82, which is equivalent to 8.94 innings/game.

I’m using 2010-2019 data for this post not because I think ten years (300 team seasons) is an appropriate sample when conditions of the game have not changed in the last century to an extent that should significantly influence Pythagorean records. The more mundane explanation is that data on actual team outs, home and away, is not easily accessible, and the easiest way I know how to get is through Retrosheet’s Game Logs which are an absolutely fantastic resource. But I didn’t want to spend a significant amount of time parsing them, which is why I limited my sample to ten years. 

My first step was to optimize standard Pythagenpat to work with this dataset, so that any RMSE comparisons we make after building a rate-based Pythagenpat formula are on a level playing field. However, I was quite surprised by what I found - the Pythagenpat exponent that minimizes RMSE for the 2010-2019 majors is .264 (in other words, the Pythagorean exponent x = RPG^.264).

Typically, a value in the range .28 - .29 minimizes RMSE. I was so surprised by .264 that I thought for a moment I might have made an error compiling the data from the game logs, so I checked the Lahman database at the season level to be sure. The data was accurate – this set of 300 teams happen to actually have a lower Pythagenpat exponent than I am conditioned to seeing. 

For the purpose of a proof of concept of using rates, this is not really an issue; however, I certainly question whether the best fit values I’ve found for the rate approach should be broadly applied across all league-seasons. I will leave it up to anyone who ultimately decides to implement these concepts to decide whether a larger sample is required to calibrate the exponents.

With that being said, the differences in RMSE using the lower Pythagenpat exponent are not earth-shattering. Using .264, the RMSE for all games is 3.923, with 7.015 for home games and 6.543 for away games, with the home/road RMSEs actually higher than those for the standard exponent. I provide these values for reference only as the real point of this exercise is to look at what happens for a rate-based Pythagenpat.

First, let’s define some basic terms:

R/9 = Runs/Actual Outs * 27

RA/9 = Runs Allowed/Innings Pitched * 9

RPG9 = R/9 + RA/9

x = RPG9^z (z will be our Pythagenpat exponent and x the resulting Pythagorean exponent for a given RPG9)

W% = (R/9)^x/((R/9)^x + (RA/9)^x)

The value of z that minimized RMSE for this dataset is .244. That RMSE is 3.771, which is a significant improvement over the optimized Pythagenpat that does not use rates. This is encouraging, as if there was no advantage to be had this whole exercise would be a waste of time. I also think it’s intuitive that considering rates rather than just raw run totals would allow us to improve our winning percentage estimate. After all, the only differences between raw runs and rates for a team season will arise due to how the team performs in individual games. 

To with, we can define opportunities to score runs in terms of outs, since outs are the correct denominator for a team-level evaluation of runs scored/allowed on a rate basis. A perfectly average team would expect to have an equal number of opportunities for their offense and defense, but a good team will allow its opponents’ offense more opportunities (since they will forego more bottom of the ninths at home and play more bottom of the ninths on the road), and a bad team will get more opportunities for its own offense. These differences don’t arise randomly, but due to team performance.  So we should expect a slight improvement in accuracy of our winning percentage estimate when we allow these corrections, but it should be slight since foregone bottom of the ninths have a ceiling in practice and a lower ceiling in reality (even very bad teams often forego bottom of the ninths and even very good teams frequently lose at home or at least need a walkoff to win).

Better yet, the reductions in RMSE for home games (5.779) and road (5.215) are larger, which we might have expected as the impact of foregone bottom of the ninths will not be as smooth across teams when considering home and road separately. When using this rate approach, the expected W% for all home teams in the dataset is .536, compared to the actual home W% of .535. So there is no evidence of any home field advantage in converting runs/runs allowed to wins that does not get wiped away by taking opportunities to score/allow runs into account, contrary to what one might conclude from a naïve Pythagenpat analysis.

A further note is that if you calculate a team’s total expected wins as a weighted average of their home and road rate Pythagenpats, the RMSE is a little better (3.754) than just looking at the combined rate. This also should not surprise, as we have sneaked in more data about how a team actually distributed its runs scored and allowed across games by slicing the data into two pieces instead of one. If we calculated a Pythagenpat record for every game and then aggregated, we should expect to maximize accuracy, but at that point we are kind of losing the point of a Pythagorean approach (we can make the RMSE zero if in that case we replace Pythagenpat with a rule that if R > RA, we should assign a value of 1 expected win and if R < RA we should assign a value of 0 expected wins).

Again, I would consider this a demonstration of concept rather than a suggestion that this be implemented with a rate Pythagenpat exponent of .244. My hunch is that the best value to use over a broad range of team-seasons is higher than .244. Also, I think that for just looking at total season records, a standard approach is sufficient. If you ever are working with a situation in which you can expect to see significant discrepancies between the number of foregone bottom of the ninths for a team and its opponents (as is definitely the case when considering home and away games separately, and may be the case to a much lesser extent for extremely good or extremely bad teams), then you may want to consider calculating Pythagenpat using run rates rather than raw totals.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.