## Tuesday, February 15, 2022

### The End

This will be the final post on this blog - the archives will remain up for as long as Google will allow.

I do not currently have any new content to share, but in the future you can find this blog on Substack at https://walksaber.substack.com/?r=19pmi9

## Wednesday, January 26, 2022

### Pythagenpat Using Run Rates

The widespread implementation of seven-inning games during the 2020 season forced a re-examination of some of the standard sabermetric tools. One example is Pythagorean records. It would be foolish to expect that the same run ratio achieved over nine innings would lead to the same expected winning percentage as if it had been achieved over seven innings. Thus, simply taking a team’s composite runs scored and allowed for the season, which consisted of some unique to that team distribution of seven-inning and nine-inning games, and expecting the standard Pythagorean approach to be the best estimate of their overall winning percentage was also foolish.

The approach that one should select to deal with this issue depends on what the desired final estimate is. If one wanted to estimate how many games a team should have won over the course of that season, one reasonable approach would be to develop a proper Pythagorean or Pythagenpat exponent for seven-inning games, and then calculate a team’s estimated winning percentage in seven-inning games using that value, in nine-inning games using the standard approach, and then weighting the results by the percentage of seven-inning and nine-inning games for the team (defining this in terms of the scheduled length of the game and not the actual number of innings that was played in the case of extra-inning seven-inning games).

Tom Tango studied games that were tied entering the third inning to simulate a seven-inning game, and found a Pythagorean exponent of 1.57 was appropriate. Of course that’s fixed rather than Pythagenpat exponent, but you could use the same approach to develop an equivalent Pythagenpat formula, and then apply as described above.

I decided that I more interested in attempting to estimate what the team’s W% would have been under standard conditions (i.e. nine-inning games as the default, as we normally view a major league season). Thus I was interested in what a team’s W% “should have been” had they played in a normal season. This allowed me to skip the step of dealing with seven-inning games, and instead think about the best way to fit their 2020 data into the standard formulas. Of course, the silly runs scored in extra inning games are a problem, but I chose to ignore them for the sake of expediency (and in hopes that this all would be a temporary problem) and use the team’s runs (and allowed) per nine innings to plug into Pythagenpat.

In thinking about this, I was reminded of a related issue that I have been aware of for a long time, which is the reduced accuracy of Pythagorean estimates (and really all R/RA estimates of W%) as pertains to home and away games. If you look at 2010-2019 major league data and use Pythagenpat with x = RPG^.29, the RMSE of estimate team W% multiplied by 162 is 3.977 (for the sake of convenience I’ll just call this RMSE going forward, but this can be thought of as the standard error per 162 games). If you just look at away games, the RMSE is 6.537, and for home games it is 6.992.

It should not surprise us that the error is larger, as we have just halved the number of games for each observation, and we should generally expect larger deviations from expectation over small samples. However, it’s not just the magnitude of the error that matters. Over this span, home teams averaged a .535 W% and road teams (of course) the complement of .465. But the Pythagenpat record of home teams was .514, and for road teams .486. One’s first inclination upon seeing this might be to say “Aha! Evidence of home field advantage manifesting itself. Home teams exceed their Pythagenpat record by .021 wins due to [insert explanation...strategic advantage of batting in the bottom of the ninth, crowd support allowing them to will runs when needed, etc.]”

One sabermetrician who encountered this phenomenon and developed a more likely (and indeed, obvious upon reflection) explanation for it was Thomas Tress. His article “Bias Against the Home Team in the Pythagorean Theorem” was published in the May 2004 By The Numbers. Tress provided the obvious explanation that home teams often don’t bat in the bottom of the ninth, which means that they often have fewer opportunities to score runs than they do to allow runs. Tress offers a correction with a scalar multiplier that can be applied to a home team’s runs (and of course also to the road team’s runs allowed) as a corrector.

Tress’ approach is a solid one, but it addresses only the home/road Pythagorean conundrum that we entered on a detour, rather than my primary concern about length of game (this is not a criticism as it was not intended to). The issues are related because the home team not batting in the bottom of the ninth is one way in which game lengths vary from the standard nine innings that are inherently assumed in most metrics (or, more precisely, they assume the average number of innings in the data which was used to calibrate them, which we’ll get to in due course).

I should point out that there is another issue that pertains to home teams that also distorts Pythagorean records, which is truncated bottom of the ninths (or tenths, elevenths, etc.). Foregone bottom of the ninths are more obviously troublesome, but truncated bottom of the ninths (in which a walkoff victory is achieved before three outs are recorded) which leave home teams’ runs totals lower than they would otherwise be, as run expectancy is left on the table when the game ends. I will not be correcting for that here; it is a lesser problem than foregone bottom of the ninths for the sake of Pythagorean records, and there’s no easy fix (one could add to a home team’s runs scored and an away team’s runs allowed the run expectancy that existed at the end of the game, but this is not a correction that can quickly be made with a conventional dataset). You can avoid this problem by using runs created rather than actual runs, as the potential runs are still reflected in the calculation, but that changes the whole nature of the Pythagorean record by opening up a second dimension of luck (“sequencing” of offensive events rather than simply “timing” of runs).

Ignoring the truncated innings issue, there is an obvious approach that should help address both the home field issue and the question of shortened games, which is using a rate of runs scored and allowed that considers outs/innings rather than raw totals or rates (most commonly runs/game) that don’t take into account outs/innings. Since Pythagenpat is built around runs per game determining the exponent, I will take the approach of using runs/9 innings.

Before jumping into the Pythagenpat implications, two points on this definition:

1. It’s easy to know a team’s defensive innings, as it’s just their innings pitched. For offenses, you can use Plate Appearances – Runs – Left on Base (at least for non-Manfred innings), although it’s easier if you can just get opponents’ innings pitched, or opponents’ putouts, since PO/3 = IP by definition.

2. I am using 9 innings because it is the regulation game length, but it actually corresponds to a slightly longer game than what we actually saw in 2010-2019. For those seasons, the average outs/game was 26.82, which is equivalent to 8.94 innings/game.

I’m using 2010-2019 data for this post not because I think ten years (300 team seasons) is an appropriate sample when conditions of the game have not changed in the last century to an extent that should significantly influence Pythagorean records. The more mundane explanation is that data on actual team outs, home and away, is not easily accessible, and the easiest way I know how to get is through Retrosheet’s Game Logs which are an absolutely fantastic resource. But I didn’t want to spend a significant amount of time parsing them, which is why I limited my sample to ten years.

My first step was to optimize standard Pythagenpat to work with this dataset, so that any RMSE comparisons we make after building a rate-based Pythagenpat formula are on a level playing field. However, I was quite surprised by what I found - the Pythagenpat exponent that minimizes RMSE for the 2010-2019 majors is .264 (in other words, the Pythagorean exponent x = RPG^.264).

Typically, a value in the range .28 - .29 minimizes RMSE. I was so surprised by .264 that I thought for a moment I might have made an error compiling the data from the game logs, so I checked the Lahman database at the season level to be sure. The data was accurate – this set of 300 teams happen to actually have a lower Pythagenpat exponent than I am conditioned to seeing.

For the purpose of a proof of concept of using rates, this is not really an issue; however, I certainly question whether the best fit values I’ve found for the rate approach should be broadly applied across all league-seasons. I will leave it up to anyone who ultimately decides to implement these concepts to decide whether a larger sample is required to calibrate the exponents.

With that being said, the differences in RMSE using the lower Pythagenpat exponent are not earth-shattering. Using .264, the RMSE for all games is 3.923, with 7.015 for home games and 6.543 for away games, with the home/road RMSEs actually higher than those for the standard exponent. I provide these values for reference only as the real point of this exercise is to look at what happens for a rate-based Pythagenpat.

First, let’s define some basic terms:

R/9 = Runs/Actual Outs * 27

RA/9 = Runs Allowed/Innings Pitched * 9

RPG9 = R/9 + RA/9

x = RPG9^z (z will be our Pythagenpat exponent and x the resulting Pythagorean exponent for a given RPG9)

W% = (R/9)^x/((R/9)^x + (RA/9)^x)

The value of z that minimized RMSE for this dataset is .244. That RMSE is 3.771, which is a significant improvement over the optimized Pythagenpat that does not use rates. This is encouraging, as if there was no advantage to be had this whole exercise would be a waste of time. I also think it’s intuitive that considering rates rather than just raw run totals would allow us to improve our winning percentage estimate. After all, the only differences between raw runs and rates for a team season will arise due to how the team performs in individual games.

To with, we can define opportunities to score runs in terms of outs, since outs are the correct denominator for a team-level evaluation of runs scored/allowed on a rate basis. A perfectly average team would expect to have an equal number of opportunities for their offense and defense, but a good team will allow its opponents’ offense more opportunities (since they will forego more bottom of the ninths at home and play more bottom of the ninths on the road), and a bad team will get more opportunities for its own offense. These differences don’t arise randomly, but due to team performance.  So we should expect a slight improvement in accuracy of our winning percentage estimate when we allow these corrections, but it should be slight since foregone bottom of the ninths have a ceiling in practice and a lower ceiling in reality (even very bad teams often forego bottom of the ninths and even very good teams frequently lose at home or at least need a walkoff to win).

Better yet, the reductions in RMSE for home games (5.779) and road (5.215) are larger, which we might have expected as the impact of foregone bottom of the ninths will not be as smooth across teams when considering home and road separately. When using this rate approach, the expected W% for all home teams in the dataset is .536, compared to the actual home W% of .535. So there is no evidence of any home field advantage in converting runs/runs allowed to wins that does not get wiped away by taking opportunities to score/allow runs into account, contrary to what one might conclude from a naïve Pythagenpat analysis.

A further note is that if you calculate a team’s total expected wins as a weighted average of their home and road rate Pythagenpats, the RMSE is a little better (3.754) than just looking at the combined rate. This also should not surprise, as we have sneaked in more data about how a team actually distributed its runs scored and allowed across games by slicing the data into two pieces instead of one. If we calculated a Pythagenpat record for every game and then aggregated, we should expect to maximize accuracy, but at that point we are kind of losing the point of a Pythagorean approach (we can make the RMSE zero if in that case we replace Pythagenpat with a rule that if R > RA, we should assign a value of 1 expected win and if R < RA we should assign a value of 0 expected wins).

Again, I would consider this a demonstration of concept rather than a suggestion that this be implemented with a rate Pythagenpat exponent of .244. My hunch is that the best value to use over a broad range of team-seasons is higher than .244. Also, I think that for just looking at total season records, a standard approach is sufficient. If you ever are working with a situation in which you can expect to see significant discrepancies between the number of foregone bottom of the ninths for a team and its opponents (as is definitely the case when considering home and away games separately, and may be the case to a much lesser extent for extremely good or extremely bad teams), then you may want to consider calculating Pythagenpat using run rates rather than raw totals.

## Wednesday, January 05, 2022

### Rate Stat Series, pt. 16: Summary

This series spans fifteen posts, over thirty tables, and over 25,000 words. I don’t really expect anyone to slog through all that. So here I want to express the key points of the series as succinctly and with as little math as possible. In doing so, it will become apparent that I haven’t broken any new ground in this series, which is even more reason not to slog through the rest.

1. The proper denominator for a rate stat (where “rate stat” is defined as a measure of overall offensive productivity expressed in units of runs or wins, rather than the rate of any given event or subset of events) for a team is outs. This is obviously true if you take a moment to examine it, and is one of the core fundamental insights of sabermetrics. Because when a pitcher is in the game, he functions as his own team, outs are also the proper denominator for any overall pitching rate stat.

2. The number of plate appearances any team gets is a function of their rate of making outs (if we ignore enough statistical categories, this boils down to their On Base Average). On the team level, plate appearances are an inappropriate rate stat denominator as it is illogical to penalize a team for avoiding outs more effectively than another.

3. At the individual batter level, neither outs nor plate appearances are a satisfactory denominator if an estimate of absolute runs created is used as the numerator of the rate stat. Beyond their primary contributions to their team through their direct actions at the plate and on the bases, batters make a secondary contribution by avoiding outs, thus generating additional plate appearances for their teammates. But individual batters don’t operate in a vacuum. An individual contributes to his team’s plate appearance total, but doesn’t individually define it as he only makes up one-ninth of the lineup. Using outs as a denominator treats an individual as if he alone defines his team. Using plate appearances, on the other hand, does not value the secondary contribution that a batter makes by generating additional opportunities for his teammate, absent some adjustment.

4. There are three frameworks through which we can evaluate an individual’s offense. The first, which I do not advocate at all, is to treat the player as a team, plugging the individual’s stats into a dynamic run estimator like Runs Created or Base Runs. The second is to use linear weights to evaluate either absolute runs created (as, for example, Estimated Runs Produced or Extrapolated Runs do) or runs above average (ala Pete Palmer’s Batting Runs). The third is to construct a theoretical team, using a dynamic run estimator to estimate the runs created by a hypothetical team that consists of the batter in question plus eight other (typically league average) players.

5. The selection of approach to run estimation should not be divorced from the choice of rate stat. The assumptions inherent in each of the approaches to run estimation suggest similar, consistently reasoned assumptions that would make sense to use in developing a rate stat. While it is possible and justifiable to mix certain elements across the framework, my point of view is that it makes more sense to keep the “frameworks” pure, and utilize the rate stat that makes the most sense to pair with the chosen run estimator.

6. Using linear weights runs above average (RAA) rather than absolute linear weights runs created as the numerator does enable the use of plate appearances as the denominator, because the RAA estimate already incorporates the batter’s secondary contribution. However, RAA/PA may not be everyone’s ideal choice for a rate stat, because…

7. Some rates can be compared (while maintaining meaningful units) differentially (i.e. subtracting the values for two players makes sense); others are ratio comparable (i.e. dividing the values for two players makes sense); some are neither differentially nor ratio comparable, and some are both. I prefer metrics that can be compared either way, but RAA/PA is only differentially comparable. FanHome poster Sibelius developed an adjustment called R+/PA, that depending on how you look at either adds the league average R/PA to RAA/PA, or makes an adjustment to absolute runs created before dividing by PA, that allows ratio comparisons for the rate stat.

8. wOBA, which is now in wide use thanks to its popularization by Tom Tango and Fangraphs, is a variant of the RAA/PA family as well, although it doesn’t maintain direct differential or ratio comparability.

9. Despite the issues with R/O as a rate stat for an individual, using it to calculate RAA will produce the same result for the RAA total as R+/PA, assuming that the inputs are consistently defined. R/O causes very minor distortion when used to compare normal players, and would cause much distortion with extreme players, but remains a useful shortcut rate stat. There are many worse choices one could make in devising an individual rate stat than using R/O. R/O remains the correct rate stat for a team; the RAA/PA family of metrics is inappropriate for the same reason R/PA is inappropriate for a team, in addition to some issues that would arise if attempting to define terms like “R+” for a team, as their actual runs scored or estimated runs created is already based on the number of plate appearances that they actually generated.

10. One can argue that batters also make tertiary contributions to their team through their impact on the run values of all of their teammate’s actions. The impact is very small for most hitters, dwarfed by their primary and secondary contributions, and if attempting to quantify them one must be careful to ensure that it’s not just measurement error. Attempting to capture these impacts lends itself to use of a theoretical team approach, which uses a dynamic run estimator to model how a batter’s impact on a team.

11. The theoretical team approach gives rise to a rate stat that David Smyth called R+/O+, which is expressed on a R/O scale but produces the same RAA given the same inputs. It can be applied to the linear weights framework as well, and offers an option if one prefers to express results on the R/O scale rather than R/PA, and thus have the same scale for the individual and team rate stat.

12. If you wish to compare rates across run environments, differentials between the individual and the league usually aren’t sufficient as higher run environments make equal differences less valuable in terms of wins. If you assume a fixed Pythagorean exponent for your win conversion, the case can be made that ratios do capture the win difference, but as soon as you introduce a run environment-dependent Pythagorean exponent that better models reality, this assumption fails. It is also necessary to consider that simply comparing the individual to the league average may not properly capture the dynamic of how the individual’s run contribution contributes to his team’s wins. There is also a potential complication from how differences in league PA/G impact rates denominated in PA. All of this is to say that there is no simple solution to converting run rate stats to their win-equivalents, and care should be taken in doing so, especially considering that the impact may be relatively small for many cases.

## Wednesday, December 08, 2021

### Rate Stat Series, pt. 15: Mixing Frameworks

Thus far I have employed what I’ve described as a “puritanical” approach to matching run estimator, denominator, and win conversion for each of the three frameworks for evaluating an individual’s offensive performance. While I think my logic for this approach is sound, I do not think it is necessarily wrong to mix components in a different manner than I’ve described. This will be a brief discussion of which of these potential hybrids make more sense than others, and a few issues to keep in mind if you choose to do so.

For the player as a team framework, there are many places in the process in which a batter’s value (at least to the extent that we can define and model it) as a member of a real team is distorted. At the very beginning of the process, a dynamic run estimator is applied directly to individual statistics. This creates distortion. Then the rate stat is runs/out; this doesn’t create a tremendous amount of distortion, as runs/out is a defensible if not perfect choice for an individual rate stat even if you don’t use the player as a team framework. Then if we convert to win value, we create a tremendous amount of distortion by essentially multiplying the run distortion by nine – instead of just mis-estimating an individual’s run contribution, we compound the problem by assuming that the entire team hitting like him would shape the run environment to be something radically different from the league average.

This is an opportune juncture to make a point that I should have made earlier in the series – for league average hitters, the distortion will be very small, and the differences across all of the various methodologies we’ve discussed will be lesser. I have focused in this series on a small group of hitters, basically the five most productive and five least productive in each league-season. These are of course the hitters most impacted by different methodological decisions. A league average hitter would by definition have Base Runs = LW_RC = TT_BsR (at least as we’ve linked these formulas in this series), would by definition have 0 RAA or 0 WAA, would by definition have a 100 relative rate, regardless of which approach you take. This series, and more generally my sabermetric interests, tend to focus on the extreme cases, and evaluating methods with an eye to applicability to a wide range of performance levels. Additionally, extreme players are the ones people in general tend to care the most about – you can imagine people debating who had the better offensive season, Frank Robinson in 1966 or Frank Thomas in 1994. I cannot imagine many people doing the same for Earl Battey and Gary Gaetti.

While we could avoid compounding the issues in the player as a team approach, but why bother? The framework is inferior to linear weights or theoretical team in every possible way, except one could argue ease of calculation gives it an advantage over TT. The only value to be had in evaluating a player as a team is a theoretical exercise, and if you lose commitment to that theoretical exercise it ceases to have any value at all.

For the theoretical team framework, you could just calculate TT_BsR, and then treat it in the same way as linear weights, or calculate TT_BsRP and treat it in the same way as R+. However, it would be pretty silly to go through the additional effort needed to calculate the TT run estimates instead of their linear weight analogues only to use them in the same way. Unlike in the case of player as team, there is no argument to be made here that you would be making the results more reasonable by doing so, as the TT estimates can be combined with team-appropriate rate stat denominators and team-appropriate conversions from runs to wins. Here the result of mixing frameworks would be extra work coupled with less pure results.

The only mixture of frameworks that makes sense, then, is to mix the run estimation components of the linear weights framework with the rate stat and win conversion components of the theoretical team framework. A logical path that might defend this approach would be: It is questionable whether the valuation of tertiary offensive contribution claimed by theoretical team approach are accurate or material. Thus our best estimate of a player’s run contribution to a theoretical team remains his linear weights RC or RAA. When it comes to win estimation, we are on much firmer ground in understanding how team runs and runs allowed translate to wins than we are in measuring individual contributions to team runs. We shouldn’t refuse to use this knowledge in the name of methodological consistency, but rather we should use the best possible estimates for each component of the framework. That means using the full Pythagenpat approach coupled with linear weights run estimates.

Convinced? I’m not, but let me walk through an example of how we could apply this hybrid approach to the Franks. We can start with our wRC estimate, which we will now use in place of TT_BsRP as “R_+” going forward for this hybrid linear-theoretical team framework. Then we can use any of our TT rate stats  - I’ll show R+/O+ rather than R+/PA+ here, as I think it’s the former that might serve to make this hybrid framework an attractive option. R+/O+ allows us to express the individual and team rate on the same basis and better yet, doing so while using the most fundamental of rate stats (R/O) as that basis.

O+ remains equal to PA*(1 – LgOBA), and these relative R+/O+ figures are the same as our relative R+/PA, which makes sense – the numerators are the same, and the only difference in the denominators is multiplying PA by (1 – LgOBA). So an alternative way of expressing the relationship is:

R+/O+ = (R+/PA)/(1 – LgOBA)

I will skip some steps here, since they were all covered in the last installment – no need to convert this to a W% and then convert back to a relative adjusted R+/O+.

TT_R/O = (R+/O+)*(1/9) + LgR/O*(8/9)

TT_x= ((TT_R/O)*LgO/G + LgR/G)^.29

TT_WR = ((TT_R/O)/Lg(R/O))^TT_x

RelAdj R+/O+ = (TT_WR^(1/r) - 1)*9 + 1

It might be helpful to take a step back and look at our results for the relative adjusted metric (whether R+/PA or R+/O+) for each of the four options we’ve considered, which are:

A: linear run estimate and fixed RPG based on league average (final rate based on R+/PA)

B: linear run estimate and dynamic RPG based on player’s impact on team (final rate based on R+/PA)

C: linear run estimate and Pythagenpat theoretical team win estimate (hybrid approach discussed in this installment; final rate based on R+/O+)

D: theoretical team run estimate (full theoretical team approach; final rate based on R+/O+)

I included a third row which shows the percentage by which Thomas’ figure exceeded Robinson’s. The TT (D) approach maximizes each player’s value and also the difference between them. The hybrid approach (C) falls in between the pure linear approach (A) and the TT approach (D). One thing that I did not fully expect to see is that the linear approach that varies RPW based on the estimated RPG of a team with the player added (B) produces a lower estimate than any of the other approaches, and is the outlier of the bunch. I didn’t make enough of this in part 13, but we are actually better off assuming that the individual has no impact on RPW than adjusting RPW based on his own impact on the theoretical team’s RPG.

Note that when we translate to a reference league, we are fixing each theoretical team's RPG to the same level. In reality, the Frank Thomas TT will have a higher RPG than the Matt Walbeck TT for any given reference environment. However, this is not an issue, because the runs value we're reporting is not real. It is intended to be an equivalent run value that reflects the player's win contribution for a common frame of reference. When we use the full Pythagenpat approach, the theoretical team's winning percentage is preserved, so the runs are now an abstract quantity and do not need to tie out to any actual number of runs in this environment. In this sense, when we convert TT to a win-equivalent rate, we're doing something similar to what I said the linear weights framework was doing - we're making the run environments equal after adding the player. The difference is that in this case we are capturing the disparate impact of the players on team wins first, then restating a win ratio as an equivalent run ratio given the assumption of a stable run environment. Thus the runs are an abstraction, but the value they represent is preserved.

When we do the same with a dynamic RPW approach (i.e. when we use the Palmerian approach of adding the batter's RAA/G to the RPG and then calculating RPW), we run into difficulties because while we have fixed a WAA total, and can then translate that WAA to an equivalent for the reference environment, we have not taken into account that the batter would need to contribute more runs to actually produce the same number of wins. This is not a problem for the full Pythagenpat approach because we used a run ratio that modified the batter's abstract runs in concert with the run environment.

Now there is a way we could address this, but it results in a mess of an equation that I'm not sure has an algebraic solution (whether it does or not, I have no interest in trying to solve it). Basically, we can solve for the RAA that would be needed to preserve the batter's original WAA in the reference environment. To do this, we need to remember to apply the reference PA adjustment. For this application, I think it’s easiest to apply directly to the batter’s original WAA, so WAA*Ref(PA/G)/Lg(PA/G):

We could then say that adjWAA = X/(2*(refRPG + X/G)^.71) where X = the batter's RAA and G could be the batter's actual game or some kind of PA-equivalent games as long as we’re consistent with what was used in the original WAA calculation. I used the Goal Seek function in Excel to solve the following equations for the Franks:

for Robinson: 8.552 = X/(2*(8.83 + X/155)^.71), X = 85.555

for Thomas: 7.217 = X/(2*(8.83 + X/113)^.71), X = 71.176

With these new estimates of RAA, we can get relative adjusted R+/PA by first dividing by each batter’s PA, then adding in the reference R/PA, and then dividing by the reference R/PA:

This approach allows us to more correctly calculate a win-equivalent rate stat for the linear weights framework when allowing RPW to vary based on the batter’s impact on his team’s run environment. However, given that I don't know how to solve for x algebraically, and my pre-existing philosophical issues with using this approach with the linear weights framework to begin with, I think that at this point there are two better choices:

1. If you want to maintain methodological purity, stick with the fixed RPW for the linear weights framework like I advocate

2. Use the hybrid framework, which embraces the "real" run to win conversion and actually makes this math easier

With "B" out of the picture, what we see for A, C, and D make sense in relation to each other. C produces a lower final relativity for great hitters like the Franks because it recognizes that when they inflate their team's run environments, RPW increases. D is higher than both because the initial run estimate is higher, due to using TT_BsR rather than LW, and thus giving each batter credit for their estimated tertiary contributions. And our revised linear weight approach falls somewhere in between.

Of course, it’s possible that I’m wrong about this, and the hybrid or theoretical team approaches that make use of Pythagenpat are overstating the impact of the player on the runs to win conversion. I can’t prove that I am right about this, but I would offer two rejoinders:

1. The full Pythagenpat approach produces the same W% for the theoretical team across different environments by definition, and that is the most important number at the end of the lined if we are trying to build a win-equivalent rate stat.

2. We should expect the results from the linear and the Pythagenpat approach to be similar (which they are after making proper adjustments), as our RPW formula is consistent with Pythagenpat for .500 teams. While the relationship between the RPW formula and Pythagenpat will fray as we insert more extreme teams, the theoretical team approach doesn’t produce any extremes for even great hitters like the Franks. For example, if we use the Palmer approach, Thomas’ 77.8 RAA in 113 games takes an average 1994 AL team that scores 5.23 R/G up to 5.92 R/G, which would rank second in the league, even with the Yankees.  This is not an extreme team performance when it comes to applying the win estimation formulas. If we use Pythagenpat, such a team would be expected to have a .5623 W% using the RPW formula and a .5620 W% using Pythagenpat. So we shouldn’t expect the results to diverge too much when applying the approaches to derive win-equivalent rate stats.

And with that, we have come to the end of everything that I wanted to say about rate stats. I will close out the series with one final installment that attempts to briefly summarize my main points with limited math.

## Wednesday, December 01, 2021

### Hitting by Position, 2021

The first obvious thing to look at is the positional totals for 2021, with the data coming from Baseball-Reference. "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the total for all positions, including pitchers (but excluding pinch hitters). “LPADJ” is the long-term offensive positional adjustment, based on 2010-2019 data (see more below). The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:

Having reviewed this data annually for about fifteen seasons, I would strongly caution about drawing any conclusions about shifting norms from single season results, which are better treated as curiosities (and as strong warnings against using single-year or other short time period averages in developing any kind of positional adjustment for use in a player value system). The most interesting curiosity then is shortstops outhitting left fielders, essentially even with third basemen and even DHs. The bumper crop of free agent shortstops has accentuated recognition of the current strength of the position, and their collective 2021 offensive performance lives up to the hype.

However, I think the most interesting group is the pitchers. The lowest PADJ ever recorded by pitchers was -5 in 2018; they rebounded to 0 in 2019 and now fell back to -4 after taking a year off. This on its face should not be surprising, but remember that pitcher performance was buoyed by Shohei Ohtani. What would it look like sans Ohtani?

While Ohtani had only 65 PA as a pitcher (1.48% of all pitcher PA), he accounted for 6% of their doubles, 5% of their runs, RBI, and walks, and 17% of their home runs). Ohtani’s hitting as a pitcher paled in comparison to what he did as a DH, although he still created runs at 13% better rate than the MLB non-pitcher positional average. Without Ohtani, pitchers would have set a new low with a -6 PADJ. It’s interesting to consider that if the DH is made universal in the new CBA, future seasons’ data will likely show pitchers combine for competent offensive performances.

My next table is usually total pitcher performance for NL teams, but such a display would be incomplete without including the Angels. All team figures from this point forward in the post are park-adjusted. The RAA figures for each position are baselined against the overall major league average RG for the position, except for left field and right field which are pooled:

The teams with the highest RAA by position were:

C--SF, 1B--SF, 2B--LA, 3B--ATL, SS--SD, LF--CIN, CF--BAL, RF--PHI, DH--LAA

Those are pretty self-explanatory although it’s fun that Shohei Ohtani is essentially responsible for two of his teams positions ranking #1--let’s see Boster Posey or Brandon Belt do that!

I find it more entertaining to gawk at the teams that had the lowest RAA at each position (the listed player is the one who started the most games at the position, which does not always mean they were most responsible for the dreadful performance):

Hunter Dozier pulled the reverse Ohtani as the leading starter at third and right; he hit .213/.286/.390 with 3.7 RG, so it wasn’t really his fault. It’s kind of sad to see Miguel Cabrera leading the Tigers DHs to oblivion, and it kind of was his fault. Cabrera’s overall line (.256/.321/.386 for 4.2 RG) wasn’t that bad, but in 180 PA as a first baseman he posted a 844 (unadjusted) OPS while in 335 PA his OPS was just 617.

The next table shows the correlation (r) between each team’s RG for each position (excluding pitchers) and the long-term position adjustment (using pooled 1B/DH and LF/RF). A high correlation indicates that a team’s offense tended to come from positions that you would expect it to:

I didn’t dig through years of these posts to check, but Kansas City’s negative correlation may be the lowest I’ve ever seen. The Royals only above average positions were catcher, second base, and shortstop.

The following tables, broken out by division, display RAA for each position, with teams sorted by the sum of positional RAA. Positions with negative RAA are in red, and positions that are +/-20 RAA are bolded:

A few notes:

* Only five of the fifteen AL teams had positive RAA from their position players, while each NL division had three teams with positive RAA.

* Baltimore’s infield production was the worst in the majors at -91 runs, and only Cedric Mullins’ center field prevented them from having below average performance at all positions. Texas was saved from the same fate only by their right fielders who were just +1 run; the Rangers poor production is impressive for how consistently bad it was across the board.

* Their Texas neighbors were the opposite in displaying consistently good production across the board; Houston’s outfield was the best in the majors at +63 runs, while their infield was second in the AL to Toronto.

* San Francisco and Los Angles were nearly mirror images of each other; the Dodgers narrowly edged the Giants for the top infield in MLB (+89 to +87), while their outfielders were both slightly above average (+5 to +4). LA catchers were outstanding (their 25 RAA tied for second in the majors with the Blue Jays and ChiSox), but the Giants were better at 32 RAA, giving them a three run edge for the majors top total positional RAA.

A spreadsheet with full data is available here.

## Wednesday, November 17, 2021

### Crude Team Ratings, 2021

Crude Team Rating (CTR) is my name for a simple methodology of ranking teams based on their win ratio (or estimated win ratio) and their opponents’ win ratios. A full explanation of the methodology is here, but briefly:

1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.

2) Figure the average win ratio of the team’s opponents.

3) Adjust for strength of schedule, resulting in a new set of ratings.

4) Begin the process again. Repeat until the ratings stabilize.

The resulting rating, CTR, is an adjusted win/loss ratio rescaled so that the majors’ arithmetic average is 100. The ratings can be used to directly estimate W% against a given opponent (without home field advantage for either side); a team with a CTR of 120 should win 60% of games against a team with a CTR of 80 (120/(120 + 80)).

First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS:

This was not a great year for the playoff teams representing those that had the strongest W-L records in context as Toronto, Seattle, and Oakland all were significantly better than St. Louis and the world champs from Atlanta. The reason for this quickly becomes apparent when you look at the average aW%s by division (I use aW% to aggregate the performance of multiple teams rather than CTR because the latter is expressed as a win ratio—for a simple example a 90-72 team and a 72-90 team will end up with an average win ratio of 1.025 but their composite and average winning percentages will both be .500):

The NL East, despite being described by at least one feckless prognosticator as “the toughest division in baseball”, was in fact the worst division in baseball by a large margin. Atlanta had the second-weakest SOS in MLB, turning their lackluster 88-73 record into something even less impressive in context. In defense of the Braves, they did lose some significant pieces to injury and have a multi-year track record of being a strong team, as well as looking better when the CTRs are based on expected record (i.e. Pythagenpat using actual runs/runs allowed):

Here we see the Dodgers overtake the Giants by a large margin as MLB’s top team, and it actually lines up better with the playoff participants as the Braves rank highly and the Mariners drop.

One weakness of CTR is that I use the same starting win metric to calculate both team strength and strength of schedule in one iterative process. But one could make the case that in order to best put W-L records in context, it would make more sense to use each team’s actual W-L record to determine their ranking but use expected W% or some other measure to estimate strength of schedule. Such an approach would simultaneously recognize that a team should be evaluated on the basis of their actual wins and losses (assuming the objective is to measure “championship worthiness” or some similar hard-to-define but intuitively comprehensible standard), but that just because an opponent had “good luck” or were “efficient” in converting runs to wins, they didn’t necessarily represent a stronger foe. This would give a team credit for its own “efficiency” without letting it accrue credit for its opponents “efficiency”

This is what the ratings look like using predicted W% (using runs created/runs created allowed) as the starting point:

Finally, I will close by reverting to CTRs based on actual W-L, but this time taking the playoffs into account. I am not a big fan of including the playoffs - obviously they represent additional games which provide additional information about team quality, but they are played under very different circumstances than regular season games (particular with respect to pitcher usage), and the fact that series are terminated when a team clinches biases the W-L records that emerge from series. Nonetheless, here they are, along with a column showing each team’s percentage change in CTR relative to the regular season W-L only version. Unsurprisingly, the Braves are the big winner, although they still only rank twelfth in MLB. The biggest loser are the Rays, although they still rank #3 and lead the AL. The Dodgers rating actually declined slightly more than the Giants despite winning their series;  they end up with a 6-5 record weighing down their regular season, and with seven of those games coming against teams that are ranked just #12 and #13, the uptick in SOS was not enough to offset it.

## Wednesday, November 10, 2021

### Hypothetical Award Ballots, 2021

AL ROY:

1. LF Randy Arozarena, TB

2. SP Luis Garcia, HOU

3. SP Casey Mize, DET

4. SP Shane McClanahan, TB

Arozarena will likely win the award on name recognition if nothing else, but one could very easily make a case for Garcia, who I actually have slightly ahead in RAR 37 to 35. Arozarena’s baserunning and fielding are largely a wash, but Garcia’s RAR using eRA and dRA are slightly lower (32 and 31). That’s enough for me to slide Arozarena ahead. Adolis Garcia is an interesting case, as his standard offensive stats will probably land him high in the voting, but his OBA was only .289 which contributed to him ranking fifth among position players in RAR. But he has excellent fielding metrics (16 DRS and 12 UZR) which gets him back on my ballot. Among honorable mentions, Wander Franco had 21 RAR in just seventy games which is by far the best rate of performance. Ryan Mountcastle’s homer totals will get him on conventional ballots, but he appears to be slight minus as a fielder and was a below average hitter for a first baseman.

NL ROY:

1. 2B Jonathan India, CIN

2. SP Trevor Rogers, MIA

3. RF Dylan Carlson, STL

4. SP Ian Anderson, ATL

5. C Tyler Stephenson, CIN

India is the clear choice among position players and Rogers among pitchers, and I see no reason to make any adjustment to their RAR ordering. In fact, it’s pretty much RAR order all the way down.

AL Cy Young:

1. Robbie Ray, TOR

2. Gerrit Cole, NYA

3. Carlos Rodon, CHA

4. Jose Berrios, MIN/TOR

5. Nathan Eovaldi, BOS

The 2021 AL Cy Young race has to be the worst for a non-shortened season in history; while long-term trends are driving down starter workloads, let’s hope that a full previous season will make the 2022 Cy Young race at least a little less depressing. Robbie Ray is the obvious choice, leading the league in innings and ranking second to Carlos Rodon in RRA for a twelve-run RAR lead over Lance McCullers; Ray’s peripherals are less impressive, but are still solid. In addition to the pitchers on my ballot, McCullers, Lance Lynn, and Chris Bassitt could all easily be included as the seven pitchers behind Ray could be reasonably placed in just about any order.

NL Cy Young:

1. Zack Wheeler, PHI

2. Corbin Burnes, MIL

3. Walker Buehler, LA

4. Max Scherzer, WAS/LA

5. Brandon Woodruff, MIL

The NL race is almost the opposite of the AL, with five solid candidates who could be ranked in almost any order, even for a normal season. The easiest way to explain my reasoning is to show each pitcher’s RAR by each of the three metrics:

Wheeler and Burnes get the nods for my top two spots as they were equally good in the peripheral-based metrics, which I feel is sufficient to elevate them above RAR leader Buehler. It’s worth noting that Burnes was the leader in all three of the RA metrics, but Wheeler led the league with 213 innings while Burnes was nineteenth with 167. I suspect Burnes will win the actual vote, and while it’s tempting to side with the guy with spectacular rate stats, a 46 inning gap is enormous.

AL MVP:

1. DH/SP Shohei Ohtani, LAA

3. 1B Matt Olson, OAK

4. 2B Marcus Semien, TOR

5. 3B Jose Ramirez, CLE

6. SS Carlos Correa, HOU

7. RF Aaron Judge, NYA

8. SP Robbie Ray, TOR

9. RF Kyle Tucker, HOU

10. 2B Brandon Lowe, TB

A first baseman and a DH are the two AL offensive RAR leaders in a season in which no pitcher comes close to a top of the MVP ballot performance. The first baseman hits .305/.394/.589 to the DH’s .256/.372/.589, over 59 additional plate appearances. Under these circumstances, how can the first baseman possible rank second on the ballot, and a distant second at that? When the DH also pitches 130 innings with a RRA 31% lower than league average.

This should seem like a fairly obvious conclusion, and I suspect that Ohtani will handily win the award, but whether out of the need to generate “controversial” content or some other explanation that would indict their mental faculties, talking heads have spent a great deal of time pretending that this was a reasonable debate. I thought it would have been quite fascinating to see Guerrero win the triple crown as a test case of whether twice in a decade the mystical deference to the traditional categories could deny an Angel having a transcendent season of a MVP award.

For the rest of the ballot, if you take the fielding metrics at face value, you can make the case that Marcus Semien was actually the Most Valuable Blue Jay; I do not, with Carlos Correa serving as a prime example. He was +21 in DRS but only +3 in UZR, which is the difference between leading the league in position player bWAR and slotting seventh on my ballot (as he would fall behind Judge if I went solely on UZR).

The omission of Salvardor Perez will certainly be a deviation from the actual voting. Perez’ OBA was just .315, and despite 48 homers he created “just” 99 runs. Worse yet, his defensive value was -13 runs per Baseball Prospectus. I would rank him not just behind the ten players listed, but Cedric Mullins, Bo Bichette, Xander Bogaerts, Yasmani Grandal, Rafael Devers, and a slew of starting pitchers. I don’t think he was one of the twenty most valuable players in the AL.

NL MVP:

1. RF Juan Soto, WAS

2. RF Bryce Harper, PHI

3. SS Trea Turner, WAS/LA

4. SP Zack Wheeler, PHI

5. SS Fernando Tatis, SD

6. SP Corbin Burnes, MIL

7. SP Walker Buehler, LA

8. 1B Paul Goldschmidt, STL

9. RF Tyler O’Neill, STL

10. SP Brandon Woodruff, MIL

Having not carefully examined the statistics during the season, two things surprised me about this race, which it was quickly apparent would come down to the well-matched right fielders, each of whom were among the best young players ever when they burst on the scene, one of whom inherited the other’s job more or less, and both of whom still toil in the same division. The first was that Soto, despite his dazzling OBA, actually ranked a smidge behind Harper offensively; the second was that Soto had a significant advantage in the fielding metrics that elevated him to the top.

Taking the more straightforward comparison first, Soto and Harper had essentially the same batting average (I’m ignoring park factors as WAS and PHI helpfully had a 101 PF, so it won’t change the comparison between the two), .313 to .309. Soto had the clear edge in W+HB rate despite the pair ranking one-two in the NL (22.7% of PA to Harper’s 17.8%), while Harper had a sizeable edge in isolated power (.305 to .221; Harper had only six more homers than Soto, but 22 more doubles). The walks and power essentially cancel out (Harper had a .520 Secondary Average to Soto’s .514, again ranking one-two in the circuit). Each created 116 runs, but despite his OBA edge Soto made twelve more outs as he had fifty six  more plate appearances. That leaves Harper with a narrow two RAR lead.

Fangraphs estimates that Soto’s non-steal baserunning was one run better than average, Harper’s zero. So it comes down to fielding, where Soto has +3 DRS and +2 UZR to Harper’s -6/+2. As a crude combination with regression to put the result on an equal footing with offensive value, I typically sum the two and divide by four, which leaves Soto +1 and Harper -1, to create a total value difference of two runs in favor of Soto.

Obviously, this difference is so narrow that one should barely even feel the need to address a choice to put Harper on top of their ballot. One could easily reason that the Phillies were in the race, and Harper contributed to keeping them in said race with his September/October performance (1157 September OPS). But I have been pretty consistent in not giving any consideration to a team’s position in the standings, so my only sanity check was to take a closer look at fielding using very crude but accessible metrics. My non-scientific impression would be that Harper might be something like a B- fielder and Soto a C.

I looked at the putout rate for each, dividing putouts by team AB – HR – K – A + SF (this essentially defines the outfielder’s potential plays as any balls including hits put in play, removing plays actually made by infielders of which assists serve as an approximation. Obviously there is much that is not considered even that might be approximated from the standard Baseball Guide data, like actual GB/FB ratio, handedness of pitchers and opposing batters, etc.) and multiplying by each player’s innings in the outfield divided by the team’s total innings. Viewed in this manner, Soto made a putout on 13.7% of potential plays to Harper’s 11.2%.

A second crude check which may be free of unknown team-level biases but that introduces its own problems in that the other players are very different is to compare each player’s putout rate to that of his team’s other right fielders. For this, we can just look at per 9 innings as we have to assume that the other team level inputs in our putout % (HR, K, A, SF) were uniformly distributed between Soto/Harper’s innings and those played by other Nationals/Phillies right fielders. Soto recorded 2.17 PO/9 innings while other Nationals RF recorded 1.98: Harper 1.64 to other Phillies 1.51, so Soto recorded 10% more putouts than his teammates and Harper 9%.

Is any of this remotely conclusive? Of course not, but it is sufficient to convince me that the proposition that “Juan Soto was two runs more valuable than Bryce Harper in the field” is reasonable, and that in turn is enough to make Soto seem a whisker more valuable than Harper. It’s a very close race, much more interesting than the more discussed AL race (which in truth is interesting only because of Ohtani’s remarkable season and not any comparison to other players).

I think the rest of the ballot follows RAR very closely with the pitchers mixed in. Max Scherzer ranked ahead of Brandon Woodruff on my Cy Young list, but they flip here as Woodruff was merely bad offensively (-1 run created); Scherzer didn’t reach base in 59 plate appearances (-5).

## Wednesday, October 20, 2021

### Rate Stat Series, pt. 14: Relativity for the Theoretical Team Framework

Before jumping into win-equivalent rate stats for the theoretical team framework, I think it would be helpful to re-do our theoretical team calculations on a purely rate basis. This is, after all, a rate stat series. In discussing the TT framework in pts. 9-11, I started by using the player’s PA to define the PA of the team, as Bill James chose to do with his TT Runs Created. This allowed our initial estimate of runs created or RAA to remain grounded in the player’s actual season.

An alternative (and as we will see, equivalent) approach would be to eschew all of the “8*PA” and just express everything in rates to begin with. When originally discussing TT, I didn’t show it that way, but maybe I should have. I found that my own thinking when trying to figure out the win equivalent TT rates was greatly aided by walking through this process first.

Again, everything is equivalent to what we did before – if you just divide a lot of those equations by PA, you will get to the same place a lot quicker than I’m going to. The theoretical team framework we’re working with assumes that the batter gets 1/9 of the PA for the theoretical team. It’s also mathematically true that for Base Runs:

BsR/PA = (A*B/(B+C) + D)/PA = (A/PA)*(B/PA)/(B/PA + C/PA) + D/PA

If for the sake of writing formulas we rename A/PA as ROBA (Runners On Base Average), B/PA as AF (Advancement Factor; I’ve been using this abbreviation long before it came into mainstream usage in other contexts), C/PA as OA (Out Average), and D/PA as HRPA (Home Runs/PA), we can then write:

BsR/PA = ROBA*AF/(AF + OA) + HRPA

Since it is also true that R/O = R/PA/(1 – OBA), in this case it is true that:

BsR/O = (BsR/PA)/OA

We can use these equations to calculate the Base Runs per out for a theoretical team (I’m going to skip over “reference team” notation and just assume that the reference team is a league average team):

TT_ROBA = 1/9*ROBA + 8/9*LgROBA

TT_AF = 1/9*AF + 8/9*LgAF

TT_OA = 1/9*OA + 8/9*LgOA

TT_HRPA = 1/9*HRPA + 8/9*LgHRPA

TT_BsR/PA = TT_ROBA*TT_AF/(TT_AF + TT_OA) + TT_HRPA

TT_BsR/O = (TT_ROBA*TT_AF/(TT_AF + TT_OA) + TT_HRPA)/TT_OA

Here’s a sample calculation for 1994 Frank Thomas:

To calculate a win-equivalent rate stat, we can use the TT_BsR/O figure as a starting point (it suggests that a theoretical team of 1/9 Thomas and 8/9 league average would score .2344 runs/out). We don’t need to go through this additional calculation, though; when we calculated R+/O+ (or R+/PA+, RAA+/O+, or RAA+/PA+), we already had everything we needed for this calculation.

You will see if you do the math that:

TT_BsR/O = (RAA+/O+)*(1/9) + LgR/O

or

TT_BsR/O = (R+/O+)*(1/9) + (LgR/O)*(8/9)

or

TT_BsR/O = (RAA+/PA+)/(1 – LgOBA)*(1/9) + LgR/O

or

TT_BsR/O = (R+/PA+)/(1 - LgOBA)*(1/9) + (LgR/O)*(8/9)

You could view this as a validation of the R+/O+ approach, as it does what it set out to do, which is to isolate the batter’s contribution to the theoretical team’s runs/out. Once we’ve established the team’s runs/out, it is pretty simple to convert to wins. I will just give formulas as I think they are pretty self-explanatory:

TT_BsR/G = TT_BsR/O*LgO/G

TT_RPG = TT_BsR/G + LgR/G

TT_x = TT_RPG^.29

TT_W% = (TT_BsR/G)^TT_x/((TT_BsR/G)^TT_x + (LgR/G)^TT_x)

Walking through this for the Franks, we have:

One thing to note here is that if we look at the theoretical team’s R/O (or R/G) relative to the league average, subtract one, multiply by nine, and add one back in, we will Thomas and Robinson’s relative R+/O+. This is not a surprising result given what we saw above regarding the relationship between R+/O+ and theoretical team R/O.

We now have a W% for the theoretical team, which we could leave alone as a rate stat, but it’s not very satisfying to me to have an individual rate stat expressed as a team W%. If we subtract .5, we have WAA/Team G; we could interpret this as meaning that Thomas is estimated to add .0609 wins per game and Robinson .0546 to a theoretical team on which they get 1/9 of PA. Another option would be to convert this WAA back to a total, defining “games” as PA/Lg(PA/G), and then we could have WAA+/PA+ or WAA+/O+ as rates.

In keeping with the general format established in this series, though, my final answer for a win-equivalent rate stat for the TT framework will be to convert the winning percentage (actually, we’ll use win ratio since it  makes the math easier) back to the reference environment, and calculate a relative adjusted R+/O+. Since everything will be on an outs basis (as we’re using O+), we don’t need to worry about league PA/G when calculating our relative adjusted R+/O+.

Instead of calculating TT_W%, we could have left it in the form of team win ratio:

TT_WR = ((TT_BsR/G)/(LgR/G))^(TT_x)

We can convert this back to an equivalent run ratio in the reference environment (which for this series we’ve defined as having Pythagorean exponent r = 1.881) by solving for AdjTT_RR in the equation:

so

We could convert this run ratio back to a team runs/game in the reference environment, and then to a team runs/out, and then use our equation for tying individual R+/O+ to theoretical team R/O to get an equivalent R+/O+ ratio. But why bother with all that, when we will just end up dividing it by the reference environment R/O to get our relative adjusted R+/O+? I noted above that there was a direct relationship between the theoretical team’s run ratio (which is equal to the theoretical team’s R/O divided by league R/O) and the batter’s relative R+/O+:

Rel R+/O+ = (TT_RR – 1)*9 + 1

So our Relative Adjusted R+/O+ can be calculated as: