Walk Like a Sabermetrician

The End

2022-02-15T10:50:00.003-05:00

This will be the final post on this blog - the archives will remain up for as long as Google will allow.

I do not currently have any new content to share, but in the future you can find this blog on Substack at https://walksaber.substack.com/?r=19pmi9

Pythagenpat Using Run Rates

2022-01-26T08:04:00.032-05:00

The widespread implementation of seven-inning games during the 2020 season forced a re-examination of some of the standard sabermetric tools. One example is Pythagorean records. It would be foolish to expect that the same run ratio achieved over nine innings would lead to the same expected winning percentage as if it had been achieved over seven innings. Thus, simply taking a team’s composite runs scored and allowed for the season, which consisted of some unique to that team distribution of seven-inning and nine-inning games, and expecting the standard Pythagorean approach to be the best estimate of their overall winning percentage was also foolish.

The approach that one should select to deal with this issue depends on what the desired final estimate is. If one wanted to estimate how many games a team should have won over the course of that season, one reasonable approach would be to develop a proper Pythagorean or Pythagenpat exponent for seven-inning games, and then calculate a team’s estimated winning percentage in seven-inning games using that value, in nine-inning games using the standard approach, and then weighting the results by the percentage of seven-inning and nine-inning games for the team (defining this in terms of the scheduled length of the game and not the actual number of innings that was played in the case of extra-inning seven-inning games).

Tom Tango studied games that were tied entering the third inning to simulate a seven-inning game, and found a Pythagorean exponent of 1.57 was appropriate. Of course that’s fixed rather than Pythagenpat exponent, but you could use the same approach to develop an equivalent Pythagenpat formula, and then apply as described above.

I decided that I more interested in attempting to estimate what the team’s W% would have been under standard conditions (i.e. nine-inning games as the default, as we normally view a major league season). Thus I was interested in what a team’s W% “should have been” had they played in a normal season. This allowed me to skip the step of dealing with seven-inning games, and instead think about the best way to fit their 2020 data into the standard formulas. Of course, the silly runs scored in extra inning games are a problem, but I chose to ignore them for the sake of expediency (and in hopes that this all would be a temporary problem) and use the team’s runs (and allowed) per nine innings to plug into Pythagenpat.

In thinking about this, I was reminded of a related issue that I have been aware of for a long time, which is the reduced accuracy of Pythagorean estimates (and really all R/RA estimates of W%) as pertains to home and away games. If you look at 2010-2019 major league data and use Pythagenpat with x = RPG^.29, the RMSE of estimate team W% multiplied by 162 is 3.977 (for the sake of convenience I’ll just call this RMSE going forward, but this can be thought of as the standard error per 162 games). If you just look at away games, the RMSE is 6.537, and for home games it is 6.992.

It should not surprise us that the error is larger, as we have just halved the number of games for each observation, and we should generally expect larger deviations from expectation over small samples. However, it’s not just the magnitude of the error that matters. Over this span, home teams averaged a .535 W% and road teams (of course) the complement of .465. But the Pythagenpat record of home teams was .514, and for road teams .486. One’s first inclination upon seeing this might be to say “Aha! Evidence of home field advantage manifesting itself. Home teams exceed their Pythagenpat record by .021 wins due to [insert explanation...strategic advantage of batting in the bottom of the ninth, crowd support allowing them to will runs when needed, etc.]”

One sabermetrician who encountered this phenomenon and developed a more likely (and indeed, obvious upon reflection) explanation for it was Thomas Tress. His article “Bias Against the Home Team in the Pythagorean Theorem” was published in the May 2004 By The Numbers. Tress provided the obvious explanation that home teams often don’t bat in the bottom of the ninth, which means that they often have fewer opportunities to score runs than they do to allow runs. Tress offers a correction with a scalar multiplier that can be applied to a home team’s runs (and of course also to the road team’s runs allowed) as a corrector.

Tress’ approach is a solid one, but it addresses only the home/road Pythagorean conundrum that we entered on a detour, rather than my primary concern about length of game (this is not a criticism as it was not intended to). The issues are related because the home team not batting in the bottom of the ninth is one way in which game lengths vary from the standard nine innings that are inherently assumed in most metrics (or, more precisely, they assume the average number of innings in the data which was used to calibrate them, which we’ll get to in due course).

I should point out that there is another issue that pertains to home teams that also distorts Pythagorean records, which is truncated bottom of the ninths (or tenths, elevenths, etc.). Foregone bottom of the ninths are more obviously troublesome, but truncated bottom of the ninths (in which a walkoff victory is achieved before three outs are recorded) which leave home teams’ runs totals lower than they would otherwise be, as run expectancy is left on the table when the game ends. I will not be correcting for that here; it is a lesser problem than foregone bottom of the ninths for the sake of Pythagorean records, and there’s no easy fix (one could add to a home team’s runs scored and an away team’s runs allowed the run expectancy that existed at the end of the game, but this is not a correction that can quickly be made with a conventional dataset). You can avoid this problem by using runs created rather than actual runs, as the potential runs are still reflected in the calculation, but that changes the whole nature of the Pythagorean record by opening up a second dimension of luck (“sequencing” of offensive events rather than simply “timing” of runs).

Ignoring the truncated innings issue, there is an obvious approach that should help address both the home field issue and the question of shortened games, which is using a rate of runs scored and allowed that considers outs/innings rather than raw totals or rates (most commonly runs/game) that don’t take into account outs/innings. Since Pythagenpat is built around runs per game determining the exponent, I will take the approach of using runs/9 innings.

Before jumping into the Pythagenpat implications, two points on this definition:

1. It’s easy to know a team’s defensive innings, as it’s just their innings pitched. For offenses, you can use Plate Appearances – Runs – Left on Base (at least for non-Manfred innings), although it’s easier if you can just get opponents’ innings pitched, or opponents’ putouts, since PO/3 = IP by definition.

2. I am using 9 innings because it is the regulation game length, but it actually corresponds to a slightly longer game than what we actually saw in 2010-2019. For those seasons, the average outs/game was 26.82, which is equivalent to 8.94 innings/game.

I’m using 2010-2019 data for this post not because I think ten years (300 team seasons) is an appropriate sample when conditions of the game have not changed in the last century to an extent that should significantly influence Pythagorean records. The more mundane explanation is that data on actual team outs, home and away, is not easily accessible, and the easiest way I know how to get is through Retrosheet’s Game Logs which are an absolutely fantastic resource. But I didn’t want to spend a significant amount of time parsing them, which is why I limited my sample to ten years.

My first step was to optimize standard Pythagenpat to work with this dataset, so that any RMSE comparisons we make after building a rate-based Pythagenpat formula are on a level playing field. However, I was quite surprised by what I found - the Pythagenpat exponent that minimizes RMSE for the 2010-2019 majors is .264 (in other words, the Pythagorean exponent x = RPG^.264).

Typically, a value in the range .28 - .29 minimizes RMSE. I was so surprised by .264 that I thought for a moment I might have made an error compiling the data from the game logs, so I checked the Lahman database at the season level to be sure. The data was accurate – this set of 300 teams happen to actually have a lower Pythagenpat exponent than I am conditioned to seeing.

For the purpose of a proof of concept of using rates, this is not really an issue; however, I certainly question whether the best fit values I’ve found for the rate approach should be broadly applied across all league-seasons. I will leave it up to anyone who ultimately decides to implement these concepts to decide whether a larger sample is required to calibrate the exponents.

With that being said, the differences in RMSE using the lower Pythagenpat exponent are not earth-shattering. Using .264, the RMSE for all games is 3.923, with 7.015 for home games and 6.543 for away games, with the home/road RMSEs actually higher than those for the standard exponent. I provide these values for reference only as the real point of this exercise is to look at what happens for a rate-based Pythagenpat.

First, let’s define some basic terms:

R/9 = Runs/Actual Outs * 27

RA/9 = Runs Allowed/Innings Pitched * 9

RPG9 = R/9 + RA/9

x = RPG9^z (z will be our Pythagenpat exponent and x the resulting Pythagorean exponent for a given RPG9)

W% = (R/9)^x/((R/9)^x + (RA/9)^x)

The value of z that minimized RMSE for this dataset is .244. That RMSE is 3.771, which is a significant improvement over the optimized Pythagenpat that does not use rates. This is encouraging, as if there was no advantage to be had this whole exercise would be a waste of time. I also think it’s intuitive that considering rates rather than just raw run totals would allow us to improve our winning percentage estimate. After all, the only differences between raw runs and rates for a team season will arise due to how the team performs in individual games.

To with, we can define opportunities to score runs in terms of outs, since outs are the correct denominator for a team-level evaluation of runs scored/allowed on a rate basis. A perfectly average team would expect to have an equal number of opportunities for their offense and defense, but a good team will allow its opponents’ offense more opportunities (since they will forego more bottom of the ninths at home and play more bottom of the ninths on the road), and a bad team will get more opportunities for its own offense. These differences don’t arise randomly, but due to team performance. So we should expect a slight improvement in accuracy of our winning percentage estimate when we allow these corrections, but it should be slight since foregone bottom of the ninths have a ceiling in practice and a lower ceiling in reality (even very bad teams often forego bottom of the ninths and even very good teams frequently lose at home or at least need a walkoff to win).

Better yet, the reductions in RMSE for home games (5.779) and road (5.215) are larger, which we might have expected as the impact of foregone bottom of the ninths will not be as smooth across teams when considering home and road separately. When using this rate approach, the expected W% for all home teams in the dataset is .536, compared to the actual home W% of .535. So there is no evidence of any home field advantage in converting runs/runs allowed to wins that does not get wiped away by taking opportunities to score/allow runs into account, contrary to what one might conclude from a naïve Pythagenpat analysis.

A further note is that if you calculate a team’s total expected wins as a weighted average of their home and road rate Pythagenpats, the RMSE is a little better (3.754) than just looking at the combined rate. This also should not surprise, as we have sneaked in more data about how a team actually distributed its runs scored and allowed across games by slicing the data into two pieces instead of one. If we calculated a Pythagenpat record for every game and then aggregated, we should expect to maximize accuracy, but at that point we are kind of losing the point of a Pythagorean approach (we can make the RMSE zero if in that case we replace Pythagenpat with a rule that if R > RA, we should assign a value of 1 expected win and if R < RA we should assign a value of 0 expected wins).

Again, I would consider this a demonstration of concept rather than a suggestion that this be implemented with a rate Pythagenpat exponent of .244. My hunch is that the best value to use over a broad range of team-seasons is higher than .244. Also, I think that for just looking at total season records, a standard approach is sufficient. If you ever are working with a situation in which you can expect to see significant discrepancies between the number of foregone bottom of the ninths for a team and its opponents (as is definitely the case when considering home and away games separately, and may be the case to a much lesser extent for extremely good or extremely bad teams), then you may want to consider calculating Pythagenpat using run rates rather than raw totals.

Rate Stat Series, pt. 16: Summary

2022-01-05T08:21:00.001-05:00

This series spans fifteen posts, over thirty tables, and over 25,000 words. I don’t really expect anyone to slog through all that. So here I want to express the key points of the series as succinctly and with as little math as possible. In doing so, it will become apparent that I haven’t broken any new ground in this series, which is even more reason not to slog through the rest.

1. The proper denominator for a rate stat (where “rate stat” is defined as a measure of overall offensive productivity expressed in units of runs or wins, rather than the rate of any given event or subset of events) for a team is outs. This is obviously true if you take a moment to examine it, and is one of the core fundamental insights of sabermetrics. Because when a pitcher is in the game, he functions as his own team, outs are also the proper denominator for any overall pitching rate stat.

2. The number of plate appearances any team gets is a function of their rate of making outs (if we ignore enough statistical categories, this boils down to their On Base Average). On the team level, plate appearances are an inappropriate rate stat denominator as it is illogical to penalize a team for avoiding outs more effectively than another.

3. At the individual batter level, neither outs nor plate appearances are a satisfactory denominator if an estimate of absolute runs created is used as the numerator of the rate stat. Beyond their primary contributions to their team through their direct actions at the plate and on the bases, batters make a secondary contribution by avoiding outs, thus generating additional plate appearances for their teammates. But individual batters don’t operate in a vacuum. An individual contributes to his team’s plate appearance total, but doesn’t individually define it as he only makes up one-ninth of the lineup. Using outs as a denominator treats an individual as if he alone defines his team. Using plate appearances, on the other hand, does not value the secondary contribution that a batter makes by generating additional opportunities for his teammate, absent some adjustment.

4. There are three frameworks through which we can evaluate an individual’s offense. The first, which I do not advocate at all, is to treat the player as a team, plugging the individual’s stats into a dynamic run estimator like Runs Created or Base Runs. The second is to use linear weights to evaluate either absolute runs created (as, for example, Estimated Runs Produced or Extrapolated Runs do) or runs above average (ala Pete Palmer’s Batting Runs). The third is to construct a theoretical team, using a dynamic run estimator to estimate the runs created by a hypothetical team that consists of the batter in question plus eight other (typically league average) players.

5. The selection of approach to run estimation should not be divorced from the choice of rate stat. The assumptions inherent in each of the approaches to run estimation suggest similar, consistently reasoned assumptions that would make sense to use in developing a rate stat. While it is possible and justifiable to mix certain elements across the framework, my point of view is that it makes more sense to keep the “frameworks” pure, and utilize the rate stat that makes the most sense to pair with the chosen run estimator.

6. Using linear weights runs above average (RAA) rather than absolute linear weights runs created as the numerator does enable the use of plate appearances as the denominator, because the RAA estimate already incorporates the batter’s secondary contribution. However, RAA/PA may not be everyone’s ideal choice for a rate stat, because…

7. Some rates can be compared (while maintaining meaningful units) differentially (i.e. subtracting the values for two players makes sense); others are ratio comparable (i.e. dividing the values for two players makes sense); some are neither differentially nor ratio comparable, and some are both. I prefer metrics that can be compared either way, but RAA/PA is only differentially comparable. FanHome poster Sibelius developed an adjustment called R+/PA, that depending on how you look at either adds the league average R/PA to RAA/PA, or makes an adjustment to absolute runs created before dividing by PA, that allows ratio comparisons for the rate stat.

8. wOBA, which is now in wide use thanks to its popularization by Tom Tango and Fangraphs, is a variant of the RAA/PA family as well, although it doesn’t maintain direct differential or ratio comparability.

9. Despite the issues with R/O as a rate stat for an individual, using it to calculate RAA will produce the same result for the RAA total as R+/PA, assuming that the inputs are consistently defined. R/O causes very minor distortion when used to compare normal players, and would cause much distortion with extreme players, but remains a useful shortcut rate stat. There are many worse choices one could make in devising an individual rate stat than using R/O. R/O remains the correct rate stat for a team; the RAA/PA family of metrics is inappropriate for the same reason R/PA is inappropriate for a team, in addition to some issues that would arise if attempting to define terms like “R+” for a team, as their actual runs scored or estimated runs created is already based on the number of plate appearances that they actually generated.

10. One can argue that batters also make tertiary contributions to their team through their impact on the run values of all of their teammate’s actions. The impact is very small for most hitters, dwarfed by their primary and secondary contributions, and if attempting to quantify them one must be careful to ensure that it’s not just measurement error. Attempting to capture these impacts lends itself to use of a theoretical team approach, which uses a dynamic run estimator to model how a batter’s impact on a team.

11. The theoretical team approach gives rise to a rate stat that David Smyth called R+/O+, which is expressed on a R/O scale but produces the same RAA given the same inputs. It can be applied to the linear weights framework as well, and offers an option if one prefers to express results on the R/O scale rather than R/PA, and thus have the same scale for the individual and team rate stat.

12. If you wish to compare rates across run environments, differentials between the individual and the league usually aren’t sufficient as higher run environments make equal differences less valuable in terms of wins. If you assume a fixed Pythagorean exponent for your win conversion, the case can be made that ratios do capture the win difference, but as soon as you introduce a run environment-dependent Pythagorean exponent that better models reality, this assumption fails. It is also necessary to consider that simply comparing the individual to the league average may not properly capture the dynamic of how the individual’s run contribution contributes to his team’s wins. There is also a potential complication from how differences in league PA/G impact rates denominated in PA. All of this is to say that there is no simple solution to converting run rate stats to their win-equivalents, and care should be taken in doing so, especially considering that the impact may be relatively small for many cases.

Rate Stat Series, pt. 15: Mixing Frameworks

2021-12-08T08:24:00.000-05:00

Thus far I have employed what I’ve described as a “puritanical” approach to matching run estimator, denominator, and win conversion for each of the three frameworks for evaluating an individual’s offensive performance. While I think my logic for this approach is sound, I do not think it is necessarily wrong to mix components in a different manner than I’ve described. This will be a brief discussion of which of these potential hybrids make more sense than others, and a few issues to keep in mind if you choose to do so.

For the player as a team framework, there are many places in the process in which a batter’s value (at least to the extent that we can define and model it) as a member of a real team is distorted. At the very beginning of the process, a dynamic run estimator is applied directly to individual statistics. This creates distortion. Then the rate stat is runs/out; this doesn’t create a tremendous amount of distortion, as runs/out is a defensible if not perfect choice for an individual rate stat even if you don’t use the player as a team framework. Then if we convert to win value, we create a tremendous amount of distortion by essentially multiplying the run distortion by nine – instead of just mis-estimating an individual’s run contribution, we compound the problem by assuming that the entire team hitting like him would shape the run environment to be something radically different from the league average.

This is an opportune juncture to make a point that I should have made earlier in the series – for league average hitters, the distortion will be very small, and the differences across all of the various methodologies we’ve discussed will be lesser. I have focused in this series on a small group of hitters, basically the five most productive and five least productive in each league-season. These are of course the hitters most impacted by different methodological decisions. A league average hitter would by definition have Base Runs = LW_RC = TT_BsR (at least as we’ve linked these formulas in this series), would by definition have 0 RAA or 0 WAA, would by definition have a 100 relative rate, regardless of which approach you take. This series, and more generally my sabermetric interests, tend to focus on the extreme cases, and evaluating methods with an eye to applicability to a wide range of performance levels. Additionally, extreme players are the ones people in general tend to care the most about – you can imagine people debating who had the better offensive season, Frank Robinson in 1966 or Frank Thomas in 1994. I cannot imagine many people doing the same for Earl Battey and Gary Gaetti.

While we could avoid compounding the issues in the player as a team approach, but why bother? The framework is inferior to linear weights or theoretical team in every possible way, except one could argue ease of calculation gives it an advantage over TT. The only value to be had in evaluating a player as a team is a theoretical exercise, and if you lose commitment to that theoretical exercise it ceases to have any value at all.

For the theoretical team framework, you could just calculate TT_BsR, and then treat it in the same way as linear weights, or calculate TT_BsRP and treat it in the same way as R+. However, it would be pretty silly to go through the additional effort needed to calculate the TT run estimates instead of their linear weight analogues only to use them in the same way. Unlike in the case of player as team, there is no argument to be made here that you would be making the results more reasonable by doing so, as the TT estimates can be combined with team-appropriate rate stat denominators and team-appropriate conversions from runs to wins. Here the result of mixing frameworks would be extra work coupled with less pure results.

The only mixture of frameworks that makes sense, then, is to mix the run estimation components of the linear weights framework with the rate stat and win conversion components of the theoretical team framework. A logical path that might defend this approach would be: It is questionable whether the valuation of tertiary offensive contribution claimed by theoretical team approach are accurate or material. Thus our best estimate of a player’s run contribution to a theoretical team remains his linear weights RC or RAA. When it comes to win estimation, we are on much firmer ground in understanding how team runs and runs allowed translate to wins than we are in measuring individual contributions to team runs. We shouldn’t refuse to use this knowledge in the name of methodological consistency, but rather we should use the best possible estimates for each component of the framework. That means using the full Pythagenpat approach coupled with linear weights run estimates.

Convinced? I’m not, but let me walk through an example of how we could apply this hybrid approach to the Franks. We can start with our wRC estimate, which we will now use in place of TT_BsRP as “R_+” going forward for this hybrid linear-theoretical team framework. Then we can use any of our TT rate stats - I’ll show R+/O+ rather than R+/PA+ here, as I think it’s the former that might serve to make this hybrid framework an attractive option. R+/O+ allows us to express the individual and team rate on the same basis and better yet, doing so while using the most fundamental of rate stats (R/O) as that basis.

O+ remains equal to PA*(1 – LgOBA), and these relative R+/O+ figures are the same as our relative R+/PA, which makes sense – the numerators are the same, and the only difference in the denominators is multiplying PA by (1 – LgOBA). So an alternative way of expressing the relationship is:

R+/O+ = (R+/PA)/(1 – LgOBA)

I will skip some steps here, since they were all covered in the last installment – no need to convert this to a W% and then convert back to a relative adjusted R+/O+.

TT_R/O = (R+/O+)*(1/9) + LgR/O*(8/9)

TT_x= ((TT_R/O)*LgO/G + LgR/G)^.29

TT_WR = ((TT_R/O)/Lg(R/O))^TT_x

RelAdj R+/O+ = (TT_WR^(1/r) - 1)*9 + 1

It might be helpful to take a step back and look at our results for the relative adjusted metric (whether R+/PA or R+/O+) for each of the four options we’ve considered, which are:

A: linear run estimate and fixed RPG based on league average (final rate based on R+/PA)

B: linear run estimate and dynamic RPG based on player’s impact on team (final rate based on R+/PA)

C: linear run estimate and Pythagenpat theoretical team win estimate (hybrid approach discussed in this installment; final rate based on R+/O+)

D: theoretical team run estimate (full theoretical team approach; final rate based on R+/O+)

I included a third row which shows the percentage by which Thomas’ figure exceeded Robinson’s. The TT (D) approach maximizes each player’s value and also the difference between them. The hybrid approach (C) falls in between the pure linear approach (A) and the TT approach (D). One thing that I did not fully expect to see is that the linear approach that varies RPW based on the estimated RPG of a team with the player added (B) produces a lower estimate than any of the other approaches, and is the outlier of the bunch. I didn’t make enough of this in part 13, but we are actually better off assuming that the individual has no impact on RPW than adjusting RPW based on his own impact on the theoretical team’s RPG.

Note that when we translate to a reference league, we are fixing each theoretical team's RPG to the same level. In reality, the Frank Thomas TT will have a higher RPG than the Matt Walbeck TT for any given reference environment. However, this is not an issue, because the runs value we're reporting is not real. It is intended to be an equivalent run value that reflects the player's win contribution for a common frame of reference. When we use the full Pythagenpat approach, the theoretical team's winning percentage is preserved, so the runs are now an abstract quantity and do not need to tie out to any actual number of runs in this environment. In this sense, when we convert TT to a win-equivalent rate, we're doing something similar to what I said the linear weights framework was doing - we're making the run environments equal after adding the player. The difference is that in this case we are capturing the disparate impact of the players on team wins first, then restating a win ratio as an equivalent run ratio given the assumption of a stable run environment. Thus the runs are an abstraction, but the value they represent is preserved.

When we do the same with a dynamic RPW approach (i.e. when we use the Palmerian approach of adding the batter's RAA/G to the RPG and then calculating RPW), we run into difficulties because while we have fixed a WAA total, and can then translate that WAA to an equivalent for the reference environment, we have not taken into account that the batter would need to contribute more runs to actually produce the same number of wins. This is not a problem for the full Pythagenpat approach because we used a run ratio that modified the batter's abstract runs in concert with the run environment.

Now there is a way we could address this, but it results in a mess of an equation that I'm not sure has an algebraic solution (whether it does or not, I have no interest in trying to solve it). Basically, we can solve for the RAA that would be needed to preserve the batter's original WAA in the reference environment. To do this, we need to remember to apply the reference PA adjustment. For this application, I think it’s easiest to apply directly to the batter’s original WAA, so WAA*Ref(PA/G)/Lg(PA/G):

We could then say that adjWAA = X/(2*(refRPG + X/G)^.71) where X = the batter's RAA and G could be the batter's actual game or some kind of PA-equivalent games as long as we’re consistent with what was used in the original WAA calculation. I used the Goal Seek function in Excel to solve the following equations for the Franks:

for Robinson: 8.552 = X/(2*(8.83 + X/155)^.71), X = 85.555

for Thomas: 7.217 = X/(2*(8.83 + X/113)^.71), X = 71.176

With these new estimates of RAA, we can get relative adjusted R+/PA by first dividing by each batter’s PA, then adding in the reference R/PA, and then dividing by the reference R/PA:

This approach allows us to more correctly calculate a win-equivalent rate stat for the linear weights framework when allowing RPW to vary based on the batter’s impact on his team’s run environment. However, given that I don't know how to solve for x algebraically, and my pre-existing philosophical issues with using this approach with the linear weights framework to begin with, I think that at this point there are two better choices:

1. If you want to maintain methodological purity, stick with the fixed RPW for the linear weights framework like I advocate

2. Use the hybrid framework, which embraces the "real" run to win conversion and actually makes this math easier

With "B" out of the picture, what we see for A, C, and D make sense in relation to each other. C produces a lower final relativity for great hitters like the Franks because it recognizes that when they inflate their team's run environments, RPW increases. D is higher than both because the initial run estimate is higher, due to using TT_BsR rather than LW, and thus giving each batter credit for their estimated tertiary contributions. And our revised linear weight approach falls somewhere in between.

Of course, it’s possible that I’m wrong about this, and the hybrid or theoretical team approaches that make use of Pythagenpat are overstating the impact of the player on the runs to win conversion. I can’t prove that I am right about this, but I would offer two rejoinders:

1. The full Pythagenpat approach produces the same W% for the theoretical team across different environments by definition, and that is the most important number at the end of the lined if we are trying to build a win-equivalent rate stat.

2. We should expect the results from the linear and the Pythagenpat approach to be similar (which they are after making proper adjustments), as our RPW formula is consistent with Pythagenpat for .500 teams. While the relationship between the RPW formula and Pythagenpat will fray as we insert more extreme teams, the theoretical team approach doesn’t produce any extremes for even great hitters like the Franks. For example, if we use the Palmer approach, Thomas’ 77.8 RAA in 113 games takes an average 1994 AL team that scores 5.23 R/G up to 5.92 R/G, which would rank second in the league, even with the Yankees. This is not an extreme team performance when it comes to applying the win estimation formulas. If we use Pythagenpat, such a team would be expected to have a .5623 W% using the RPW formula and a .5620 W% using Pythagenpat. So we shouldn’t expect the results to diverge too much when applying the approaches to derive win-equivalent rate stats.

And with that, we have come to the end of everything that I wanted to say about rate stats. I will close out the series with one final installment that attempts to briefly summarize my main points with limited math.

Hitting by Position, 2021

2021-12-01T10:44:00.065-05:00

The first obvious thing to look at is the positional totals for 2021, with the data coming from Baseball-Reference. "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the total for all positions, including pitchers (but excluding pinch hitters). “LPADJ” is the long-term offensive positional adjustment, based on 2010-2019 data (see more below). The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:

Having reviewed this data annually for about fifteen seasons, I would strongly caution about drawing any conclusions about shifting norms from single season results, which are better treated as curiosities (and as strong warnings against using single-year or other short time period averages in developing any kind of positional adjustment for use in a player value system). The most interesting curiosity then is shortstops outhitting left fielders, essentially even with third basemen and even DHs. The bumper crop of free agent shortstops has accentuated recognition of the current strength of the position, and their collective 2021 offensive performance lives up to the hype.

However, I think the most interesting group is the pitchers. The lowest PADJ ever recorded by pitchers was -5 in 2018; they rebounded to 0 in 2019 and now fell back to -4 after taking a year off. This on its face should not be surprising, but remember that pitcher performance was buoyed by Shohei Ohtani. What would it look like sans Ohtani?

While Ohtani had only 65 PA as a pitcher (1.48% of all pitcher PA), he accounted for 6% of their doubles, 5% of their runs, RBI, and walks, and 17% of their home runs). Ohtani’s hitting as a pitcher paled in comparison to what he did as a DH, although he still created runs at 13% better rate than the MLB non-pitcher positional average. Without Ohtani, pitchers would have set a new low with a -6 PADJ. It’s interesting to consider that if the DH is made universal in the new CBA, future seasons’ data will likely show pitchers combine for competent offensive performances.

My next table is usually total pitcher performance for NL teams, but such a display would be incomplete without including the Angels. All team figures from this point forward in the post are park-adjusted. The RAA figures for each position are baselined against the overall major league average RG for the position, except for left field and right field which are pooled:

The teams with the highest RAA by position were:

C--SF, 1B--SF, 2B--LA, 3B--ATL, SS--SD, LF--CIN, CF--BAL, RF--PHI, DH--LAA

Those are pretty self-explanatory although it’s fun that Shohei Ohtani is essentially responsible for two of his teams positions ranking #1--let’s see Boster Posey or Brandon Belt do that!

I find it more entertaining to gawk at the teams that had the lowest RAA at each position (the listed player is the one who started the most games at the position, which does not always mean they were most responsible for the dreadful performance):

Hunter Dozier pulled the reverse Ohtani as the leading starter at third and right; he hit .213/.286/.390 with 3.7 RG, so it wasn’t really his fault. It’s kind of sad to see Miguel Cabrera leading the Tigers DHs to oblivion, and it kind of was his fault. Cabrera’s overall line (.256/.321/.386 for 4.2 RG) wasn’t that bad, but in 180 PA as a first baseman he posted a 844 (unadjusted) OPS while in 335 PA his OPS was just 617.

The next table shows the correlation (r) between each team’s RG for each position (excluding pitchers) and the long-term position adjustment (using pooled 1B/DH and LF/RF). A high correlation indicates that a team’s offense tended to come from positions that you would expect it to:

I didn’t dig through years of these posts to check, but Kansas City’s negative correlation may be the lowest I’ve ever seen. The Royals only above average positions were catcher, second base, and shortstop.

The following tables, broken out by division, display RAA for each position, with teams sorted by the sum of positional RAA. Positions with negative RAA are in red, and positions that are +/-20 RAA are bolded:

A few notes:

* Only five of the fifteen AL teams had positive RAA from their position players, while each NL division had three teams with positive RAA.

* Baltimore’s infield production was the worst in the majors at -91 runs, and only Cedric Mullins’ center field prevented them from having below average performance at all positions. Texas was saved from the same fate only by their right fielders who were just +1 run; the Rangers poor production is impressive for how consistently bad it was across the board.

* Their Texas neighbors were the opposite in displaying consistently good production across the board; Houston’s outfield was the best in the majors at +63 runs, while their infield was second in the AL to Toronto.

* San Francisco and Los Angles were nearly mirror images of each other; the Dodgers narrowly edged the Giants for the top infield in MLB (+89 to +87), while their outfielders were both slightly above average (+5 to +4). LA catchers were outstanding (their 25 RAA tied for second in the majors with the Blue Jays and ChiSox), but the Giants were better at 32 RAA, giving them a three run edge for the majors top total positional RAA.

A spreadsheet with full data is available here.

Crude Team Ratings, 2021

2021-11-17T08:31:00.038-05:00

Crude Team Rating (CTR) is my name for a simple methodology of ranking teams based on their win ratio (or estimated win ratio) and their opponents’ win ratios. A full explanation of the methodology is here, but briefly:

1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.

2) Figure the average win ratio of the team’s opponents.

3) Adjust for strength of schedule, resulting in a new set of ratings.

4) Begin the process again. Repeat until the ratings stabilize.

The resulting rating, CTR, is an adjusted win/loss ratio rescaled so that the majors’ arithmetic average is 100. The ratings can be used to directly estimate W% against a given opponent (without home field advantage for either side); a team with a CTR of 120 should win 60% of games against a team with a CTR of 80 (120/(120 + 80)).

First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS:

This was not a great year for the playoff teams representing those that had the strongest W-L records in context as Toronto, Seattle, and Oakland all were significantly better than St. Louis and the world champs from Atlanta. The reason for this quickly becomes apparent when you look at the average aW%s by division (I use aW% to aggregate the performance of multiple teams rather than CTR because the latter is expressed as a win ratio—for a simple example a 90-72 team and a 72-90 team will end up with an average win ratio of 1.025 but their composite and average winning percentages will both be .500):

The NL East, despite being described by at least one feckless prognosticator as “the toughest division in baseball”, was in fact the worst division in baseball by a large margin. Atlanta had the second-weakest SOS in MLB, turning their lackluster 88-73 record into something even less impressive in context. In defense of the Braves, they did lose some significant pieces to injury and have a multi-year track record of being a strong team, as well as looking better when the CTRs are based on expected record (i.e. Pythagenpat using actual runs/runs allowed):

Here we see the Dodgers overtake the Giants by a large margin as MLB’s top team, and it actually lines up better with the playoff participants as the Braves rank highly and the Mariners drop.

One weakness of CTR is that I use the same starting win metric to calculate both team strength and strength of schedule in one iterative process. But one could make the case that in order to best put W-L records in context, it would make more sense to use each team’s actual W-L record to determine their ranking but use expected W% or some other measure to estimate strength of schedule. Such an approach would simultaneously recognize that a team should be evaluated on the basis of their actual wins and losses (assuming the objective is to measure “championship worthiness” or some similar hard-to-define but intuitively comprehensible standard), but that just because an opponent had “good luck” or were “efficient” in converting runs to wins, they didn’t necessarily represent a stronger foe. This would give a team credit for its own “efficiency” without letting it accrue credit for its opponents “efficiency”

This is what the ratings look like using predicted W% (using runs created/runs created allowed) as the starting point:

Finally, I will close by reverting to CTRs based on actual W-L, but this time taking the playoffs into account. I am not a big fan of including the playoffs - obviously they represent additional games which provide additional information about team quality, but they are played under very different circumstances than regular season games (particular with respect to pitcher usage), and the fact that series are terminated when a team clinches biases the W-L records that emerge from series. Nonetheless, here they are, along with a column showing each team’s percentage change in CTR relative to the regular season W-L only version. Unsurprisingly, the Braves are the big winner, although they still only rank twelfth in MLB. The biggest loser are the Rays, although they still rank #3 and lead the AL. The Dodgers rating actually declined slightly more than the Giants despite winning their series; they end up with a 6-5 record weighing down their regular season, and with seven of those games coming against teams that are ranked just #12 and #13, the uptick in SOS was not enough to offset it.

Hypothetical Award Ballots, 2021

2021-11-10T09:13:00.020-05:00

AL ROY:

1. LF Randy Arozarena, TB

2. SP Luis Garcia, HOU

3. SP Casey Mize, DET

4. SP Shane McClanahan, TB

5. CF Adolis Garcia, TEX

Arozarena will likely win the award on name recognition if nothing else, but one could very easily make a case for Garcia, who I actually have slightly ahead in RAR 37 to 35. Arozarena’s baserunning and fielding are largely a wash, but Garcia’s RAR using eRA and dRA are slightly lower (32 and 31). That’s enough for me to slide Arozarena ahead. Adolis Garcia is an interesting case, as his standard offensive stats will probably land him high in the voting, but his OBA was only .289 which contributed to him ranking fifth among position players in RAR. But he has excellent fielding metrics (16 DRS and 12 UZR) which gets him back on my ballot. Among honorable mentions, Wander Franco had 21 RAR in just seventy games which is by far the best rate of performance. Ryan Mountcastle’s homer totals will get him on conventional ballots, but he appears to be slight minus as a fielder and was a below average hitter for a first baseman.

NL ROY:

1. 2B Jonathan India, CIN

2. SP Trevor Rogers, MIA

3. RF Dylan Carlson, STL

4. SP Ian Anderson, ATL

5. C Tyler Stephenson, CIN

India is the clear choice among position players and Rogers among pitchers, and I see no reason to make any adjustment to their RAR ordering. In fact, it’s pretty much RAR order all the way down.

AL Cy Young:

1. Robbie Ray, TOR

2. Gerrit Cole, NYA

3. Carlos Rodon, CHA

4. Jose Berrios, MIN/TOR

5. Nathan Eovaldi, BOS

The 2021 AL Cy Young race has to be the worst for a non-shortened season in history; while long-term trends are driving down starter workloads, let’s hope that a full previous season will make the 2022 Cy Young race at least a little less depressing. Robbie Ray is the obvious choice, leading the league in innings and ranking second to Carlos Rodon in RRA for a twelve-run RAR lead over Lance McCullers; Ray’s peripherals are less impressive, but are still solid. In addition to the pitchers on my ballot, McCullers, Lance Lynn, and Chris Bassitt could all easily be included as the seven pitchers behind Ray could be reasonably placed in just about any order.

NL Cy Young:

1. Zack Wheeler, PHI

2. Corbin Burnes, MIL

3. Walker Buehler, LA

4. Max Scherzer, WAS/LA

5. Brandon Woodruff, MIL

The NL race is almost the opposite of the AL, with five solid candidates who could be ranked in almost any order, even for a normal season. The easiest way to explain my reasoning is to show each pitcher’s RAR by each of the three metrics:

Wheeler and Burnes get the nods for my top two spots as they were equally good in the peripheral-based metrics, which I feel is sufficient to elevate them above RAR leader Buehler. It’s worth noting that Burnes was the leader in all three of the RA metrics, but Wheeler led the league with 213 innings while Burnes was nineteenth with 167. I suspect Burnes will win the actual vote, and while it’s tempting to side with the guy with spectacular rate stats, a 46 inning gap is enormous.

AL MVP:

1. DH/SP Shohei Ohtani, LAA

2. 1B Vladimir Guerrero, TOR

3. 1B Matt Olson, OAK

4. 2B Marcus Semien, TOR

5. 3B Jose Ramirez, CLE

6. SS Carlos Correa, HOU

7. RF Aaron Judge, NYA

8. SP Robbie Ray, TOR

9. RF Kyle Tucker, HOU

10. 2B Brandon Lowe, TB

A first baseman and a DH are the two AL offensive RAR leaders in a season in which no pitcher comes close to a top of the MVP ballot performance. The first baseman hits .305/.394/.589 to the DH’s .256/.372/.589, over 59 additional plate appearances. Under these circumstances, how can the first baseman possible rank second on the ballot, and a distant second at that? When the DH also pitches 130 innings with a RRA 31% lower than league average.

This should seem like a fairly obvious conclusion, and I suspect that Ohtani will handily win the award, but whether out of the need to generate “controversial” content or some other explanation that would indict their mental faculties, talking heads have spent a great deal of time pretending that this was a reasonable debate. I thought it would have been quite fascinating to see Guerrero win the triple crown as a test case of whether twice in a decade the mystical deference to the traditional categories could deny an Angel having a transcendent season of a MVP award.

For the rest of the ballot, if you take the fielding metrics at face value, you can make the case that Marcus Semien was actually the Most Valuable Blue Jay; I do not, with Carlos Correa serving as a prime example. He was +21 in DRS but only +3 in UZR, which is the difference between leading the league in position player bWAR and slotting seventh on my ballot (as he would fall behind Judge if I went solely on UZR).

The omission of Salvardor Perez will certainly be a deviation from the actual voting. Perez’ OBA was just .315, and despite 48 homers he created “just” 99 runs. Worse yet, his defensive value was -13 runs per Baseball Prospectus. I would rank him not just behind the ten players listed, but Cedric Mullins, Bo Bichette, Xander Bogaerts, Yasmani Grandal, Rafael Devers, and a slew of starting pitchers. I don’t think he was one of the twenty most valuable players in the AL.

NL MVP:

1. RF Juan Soto, WAS

2. RF Bryce Harper, PHI

3. SS Trea Turner, WAS/LA

4. SP Zack Wheeler, PHI

5. SS Fernando Tatis, SD

6. SP Corbin Burnes, MIL

7. SP Walker Buehler, LA

8. 1B Paul Goldschmidt, STL

9. RF Tyler O’Neill, STL

10. SP Brandon Woodruff, MIL

Having not carefully examined the statistics during the season, two things surprised me about this race, which it was quickly apparent would come down to the well-matched right fielders, each of whom were among the best young players ever when they burst on the scene, one of whom inherited the other’s job more or less, and both of whom still toil in the same division. The first was that Soto, despite his dazzling OBA, actually ranked a smidge behind Harper offensively; the second was that Soto had a significant advantage in the fielding metrics that elevated him to the top.

Taking the more straightforward comparison first, Soto and Harper had essentially the same batting average (I’m ignoring park factors as WAS and PHI helpfully had a 101 PF, so it won’t change the comparison between the two), .313 to .309. Soto had the clear edge in W+HB rate despite the pair ranking one-two in the NL (22.7% of PA to Harper’s 17.8%), while Harper had a sizeable edge in isolated power (.305 to .221; Harper had only six more homers than Soto, but 22 more doubles). The walks and power essentially cancel out (Harper had a .520 Secondary Average to Soto’s .514, again ranking one-two in the circuit). Each created 116 runs, but despite his OBA edge Soto made twelve more outs as he had fifty six more plate appearances. That leaves Harper with a narrow two RAR lead.

Fangraphs estimates that Soto’s non-steal baserunning was one run better than average, Harper’s zero. So it comes down to fielding, where Soto has +3 DRS and +2 UZR to Harper’s -6/+2. As a crude combination with regression to put the result on an equal footing with offensive value, I typically sum the two and divide by four, which leaves Soto +1 and Harper -1, to create a total value difference of two runs in favor of Soto.

Obviously, this difference is so narrow that one should barely even feel the need to address a choice to put Harper on top of their ballot. One could easily reason that the Phillies were in the race, and Harper contributed to keeping them in said race with his September/October performance (1157 September OPS). But I have been pretty consistent in not giving any consideration to a team’s position in the standings, so my only sanity check was to take a closer look at fielding using very crude but accessible metrics. My non-scientific impression would be that Harper might be something like a B- fielder and Soto a C.

I looked at the putout rate for each, dividing putouts by team AB – HR – K – A + SF (this essentially defines the outfielder’s potential plays as any balls including hits put in play, removing plays actually made by infielders of which assists serve as an approximation. Obviously there is much that is not considered even that might be approximated from the standard Baseball Guide data, like actual GB/FB ratio, handedness of pitchers and opposing batters, etc.) and multiplying by each player’s innings in the outfield divided by the team’s total innings. Viewed in this manner, Soto made a putout on 13.7% of potential plays to Harper’s 11.2%.

A second crude check which may be free of unknown team-level biases but that introduces its own problems in that the other players are very different is to compare each player’s putout rate to that of his team’s other right fielders. For this, we can just look at per 9 innings as we have to assume that the other team level inputs in our putout % (HR, K, A, SF) were uniformly distributed between Soto/Harper’s innings and those played by other Nationals/Phillies right fielders. Soto recorded 2.17 PO/9 innings while other Nationals RF recorded 1.98: Harper 1.64 to other Phillies 1.51, so Soto recorded 10% more putouts than his teammates and Harper 9%.

Is any of this remotely conclusive? Of course not, but it is sufficient to convince me that the proposition that “Juan Soto was two runs more valuable than Bryce Harper in the field” is reasonable, and that in turn is enough to make Soto seem a whisker more valuable than Harper. It’s a very close race, much more interesting than the more discussed AL race (which in truth is interesting only because of Ohtani’s remarkable season and not any comparison to other players).

I think the rest of the ballot follows RAR very closely with the pitchers mixed in. Max Scherzer ranked ahead of Brandon Woodruff on my Cy Young list, but they flip here as Woodruff was merely bad offensively (-1 run created); Scherzer didn’t reach base in 59 plate appearances (-5).

Rate Stat Series, pt. 14: Relativity for the Theoretical Team Framework

2021-10-20T08:09:00.001-04:00

Before jumping into win-equivalent rate stats for the theoretical team framework, I think it would be helpful to re-do our theoretical team calculations on a purely rate basis. This is, after all, a rate stat series. In discussing the TT framework in pts. 9-11, I started by using the player’s PA to define the PA of the team, as Bill James chose to do with his TT Runs Created. This allowed our initial estimate of runs created or RAA to remain grounded in the player’s actual season.

An alternative (and as we will see, equivalent) approach would be to eschew all of the “8*PA” and just express everything in rates to begin with. When originally discussing TT, I didn’t show it that way, but maybe I should have. I found that my own thinking when trying to figure out the win equivalent TT rates was greatly aided by walking through this process first.

Again, everything is equivalent to what we did before – if you just divide a lot of those equations by PA, you will get to the same place a lot quicker than I’m going to. The theoretical team framework we’re working with assumes that the batter gets 1/9 of the PA for the theoretical team. It’s also mathematically true that for Base Runs:

BsR/PA = (A*B/(B+C) + D)/PA = (A/PA)*(B/PA)/(B/PA + C/PA) + D/PA

If for the sake of writing formulas we rename A/PA as ROBA (Runners On Base Average), B/PA as AF (Advancement Factor; I’ve been using this abbreviation long before it came into mainstream usage in other contexts), C/PA as OA (Out Average), and D/PA as HRPA (Home Runs/PA), we can then write:

BsR/PA = ROBA*AF/(AF + OA) + HRPA

Since it is also true that R/O = R/PA/(1 – OBA), in this case it is true that:

BsR/O = (BsR/PA)/OA

We can use these equations to calculate the Base Runs per out for a theoretical team (I’m going to skip over “reference team” notation and just assume that the reference team is a league average team):

TT_ROBA = 1/9*ROBA + 8/9*LgROBA

TT_AF = 1/9*AF + 8/9*LgAF

TT_OA = 1/9*OA + 8/9*LgOA

TT_HRPA = 1/9*HRPA + 8/9*LgHRPA

TT_BsR/PA = TT_ROBA*TT_AF/(TT_AF + TT_OA) + TT_HRPA

TT_BsR/O = (TT_ROBA*TT_AF/(TT_AF + TT_OA) + TT_HRPA)/TT_OA

Here’s a sample calculation for 1994 Frank Thomas:

To calculate a win-equivalent rate stat, we can use the TT_BsR/O figure as a starting point (it suggests that a theoretical team of 1/9 Thomas and 8/9 league average would score .2344 runs/out). We don’t need to go through this additional calculation, though; when we calculated R+/O+ (or R+/PA+, RAA+/O+, or RAA+/PA+), we already had everything we needed for this calculation.

You will see if you do the math that:

TT_BsR/O = (RAA+/O+)*(1/9) + LgR/O

TT_BsR/O = (R+/O+)*(1/9) + (LgR/O)*(8/9)

TT_BsR/O = (RAA+/PA+)/(1 – LgOBA)*(1/9) + LgR/O

TT_BsR/O = (R+/PA+)/(1 - LgOBA)*(1/9) + (LgR/O)*(8/9)

You could view this as a validation of the R+/O+ approach, as it does what it set out to do, which is to isolate the batter’s contribution to the theoretical team’s runs/out. Once we’ve established the team’s runs/out, it is pretty simple to convert to wins. I will just give formulas as I think they are pretty self-explanatory:

TT_BsR/G = TT_BsR/O*LgO/G

TT_RPG = TT_BsR/G + LgR/G

TT_x = TT_RPG^.29

TT_W% = (TT_BsR/G)^TT_x/((TT_BsR/G)^TT_x + (LgR/G)^TT_x)

Walking through this for the Franks, we have:

One thing to note here is that if we look at the theoretical team’s R/O (or R/G) relative to the league average, subtract one, multiply by nine, and add one back in, we will Thomas and Robinson’s relative R+/O+. This is not a surprising result given what we saw above regarding the relationship between R+/O+ and theoretical team R/O.

We now have a W% for the theoretical team, which we could leave alone as a rate stat, but it’s not very satisfying to me to have an individual rate stat expressed as a team W%. If we subtract .5, we have WAA/Team G; we could interpret this as meaning that Thomas is estimated to add .0609 wins per game and Robinson .0546 to a theoretical team on which they get 1/9 of PA. Another option would be to convert this WAA back to a total, defining “games” as PA/Lg(PA/G), and then we could have WAA+/PA+ or WAA+/O+ as rates.

In keeping with the general format established in this series, though, my final answer for a win-equivalent rate stat for the TT framework will be to convert the winning percentage (actually, we’ll use win ratio since it makes the math easier) back to the reference environment, and calculate a relative adjusted R+/O+. Since everything will be on an outs basis (as we’re using O+), we don’t need to worry about league PA/G when calculating our relative adjusted R+/O+.

Instead of calculating TT_W%, we could have left it in the form of team win ratio:

TT_WR = ((TT_BsR/G)/(LgR/G))^(TT_x)

We can convert this back to an equivalent run ratio in the reference environment (which for this series we’ve defined as having Pythagorean exponent r = 1.881) by solving for AdjTT_RR in the equation:

TT_WR = AdjTT_RR^r

AdjTT_RR = TT_WR^(1/r)

We could convert this run ratio back to a team runs/game in the reference environment, and then to a team runs/out, and then use our equation for tying individual R+/O+ to theoretical team R/O to get an equivalent R+/O+ ratio. But why bother with all that, when we will just end up dividing it by the reference environment R/O to get our relative adjusted R+/O+? I noted above that there was a direct relationship between the theoretical team’s run ratio (which is equal to the theoretical team’s R/O divided by league R/O) and the batter’s relative R+/O+:

Rel R+/O+ = (TT_RR – 1)*9 + 1

So our Relative Adjusted R+/O+ can be calculated as:

RelAdj R+/O+ = (AdjTT_RR - 1)*9 + 1

I brought back our original relative R+/O+ (prior to going through the win-equivalent math) for comparison. Thomas gains slightly and Robinson loses more, because the value of his relative runs is lower in a high scoring environment. This is a similar conclusion to what we saw when comparing relative R+/PA and the relative adjusted R+/PA for Robinson and Thomas. Nominal runs are more valuable when the run scoring environment is lower, because it takes fewer marginal runs to create a marginal win. Relative runs are more valuable when the scoring environment is higher, because the win ratio expected to result from a given run ratio increases due to the higher Pythagenpat exponent.

At this point, we have exhausted my thoughts and ideas concerning the theoretical issues in designing individual batter rate stats. Next time I will discuss mixing up our rate stats and the frameworks within which I assert each should ideally be used.

End of Season Statistics, 2021

2021-10-05T12:56:00.002-04:00

While this edition of End of Season Statistics will more closely resemble the reports I published through 2019 than the 2020 edition did, there are still a number of issues created by the revised rules, particularly the extra innings rule and seven-inning doubleheaders. Seeing as that both of these changes could be walked back for 2022, I have not attempted to revise my approach to take them into account – should they become permanent, then and only then will I invest time in trying to make the necessary adjustments (some of which I outlined here) to fit the data they produce within traditional sabermetric structures.

In the mean time, there will be three main consequences of the rules:

1) While I will provide relief pitcher statistics this year, I will base the value metrics on eRA rather than RA or ERA. RA is hopelessly polluted by the Manfred runners, while ERA is hopelessly polluted by virtue of being ERA. While I would prefer to base the value metric on a measure that reflects runs actually allowed, they are messy enough to begin with in the case of relievers that I am not too concerned about it.

2) When computing value metrics for starting pitchers, I will be comparing their RRA to the estimated league average RA rather than the actual one, since the latter is polluted by Manfred runners even though starters’ statistics themselves are immune from the impact.

3) Team run per game metrics will be expressed per 9 innings (27 outs), and a per 9 innings approach will be used to calculate expected winning percentages. As such, these will not exactly be an attempt to estimate what any given team’s W% should have been, but rather a theoretical estimate of what their W% would have been had they been playing under normal rules. For actual runs and runs allowed, these will still be distorted by Manfred runners, but accounting for that is again much more trouble than a (hopefully) two-year interlude justifies.

The data comes from a number of different sources. Most of the data comes from Baseball-Reference; I will try to note exceptions as they come up.

The basic philosophy behind these stats is to use the simplest methods that have acceptable accuracy. Of course, "acceptable" is in the eye of the beholder, namely me. I use Pythagenpat not because other run/win converters, like a constant RPW or a fixed exponent are not accurate enough for this purpose, but because it's mine and it would be kind of odd if I didn't use it.

If I seem to be a stickler for purity in my critiques of others' methods, I'd contend it is usually in a theoretical sense, not an input sense. So when I exclude hit batters, I'm not saying that hit batters are worthless or that they *should* be ignored; it's just easier not to mess with them and not that much less accurate (note: hit batters are actually included in the offensive statistics now).

I also don't really have a problem with people using sub-standard methods (say, Basic RC) as long as they acknowledge that they are sub-standard. If someone pretends that Basic RC doesn't undervalue walks or cause problems when applied to extreme individuals, I'll call them on it; if they explain its shortcomings but use it regardless, I accept that. Take these last three paragraphs as my acknowledgment that some of the statistics displayed here have shortcomings as well, and I've at least attempted to describe some of them in the discussion below.

The League spreadsheet is pretty straightforward--it includes league totals and averages for a number of categories, most or all of which are explained at appropriate junctures throughout this piece. The advent of interleague play has created two different sets of league totals--one for the offense of league teams and one for the defense of league teams. Before interleague play, these two were identical. I do not present both sets of totals (you can figure the defensive ones yourself from the team spreadsheet, if you desire), just those for the offenses. The exception is for the defense-specific statistics, like innings pitched and quality starts. The figures for those categories in the league report are for the defenses of the league's teams. However, I do include each league's breakdown of basic pitching stats between starters and relievers (denoted by "s" or "r" prefixes), and so summing those will yield the totals from the pitching side.

I added a column this year for “ActO”, which is actual (rather than estimated) outs made by the team offensively. This can be determined from the official statistics as PA – R – LOB. I have then replaced the column I usually show for league R/G (“N”) with R/9, which is actually R*27/ActO, which is equivalent to R*9/IP. This restates the league run average in the more familiar per nine innings. I’ve done the same for “OG”, which is Outs/Game but only for those outs I count in the individual hitter’s stats (AB – H + CS) ,“PA/G”, which is normally just (AB + W)/G, and “KG” and “WG” (normally just K/G and W/G) – these are now “O/9”, “PA/9”, still “KG”/”WG” and are per 27 actual outs.

The Team spreadsheet focuses on overall team performance--wins, losses, runs scored, runs allowed. The columns included are: Park Factor (PF), Winning Percentage (W%), Expected W% (EW%), Predicted W% (PW%), wins, losses, runs, runs allowed, Runs Created (RC), Runs Created Allowed (RCA), Home Winning Percentage (HW%), Road Winning Percentage (RW%) [exactly what they sound like--W% at home and on the road], R/9, RA/9, Runs Created/9 (RC/9), Runs Created Allowed/9 (RCA/9), and Runs Per Game (the average number of runs scored an allowed per game). For the offensive categories, runs/9 are based on runs per 27 actual outs; for pitching categories, they are runs/9 innings.

I based EW% and PW% on R/9 and RA/9 (and RC/9 and RCA/9) rather than the actual runs totals. This means that what they are not estimating what a team’s winning percentage should have been in the actual game constructions that they played, but what they should have been if playing nine inning games but scoring/allowing runs at the same rate per inning. EW%, which is based on actual R and RA, is also polluted by inflated runs in extra inning games; PW%, which is based on RC and RCA, doesn’t suffer from this distortion.

The runs and Runs Created figures are unadjusted, but the per-game averages are park-adjusted, except for RPG which is also raw. Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. The formula is:

A = H + W - HR - CS

B = (2TB - H - 4HR + .05W + 1.5SB)*.76

C = AB - H

D = HR

Naturally, A*B/(B + C) + D.

Park factors are based on five years of data when applicable (so 2017 - 2021), include both runs scored and allowed, and they are regressed towards average (PF = 1), with the amount of regression varying based on the number of games in total in the sample. There are factors for both runs and home runs. The initial PF (not shown) is:

iPF = (H*T/(R*(T - 1) + H) + 1)/2

where H = RPG in home games, R = RPG in road games, T = # teams in league (14 for AL and 16 for NL). Then the iPF is converted to the PF by taking x*iPF + (1-x), where x = .1364*ln(G/162) + .5866. I will expound upon how this formula was derived in a future post.

It is important to note, since there always seems to be confusion about this, that these park factors already incorporate the fact that the average player plays 50% on the road and 50% at home. That is what the adding one and dividing by 2 in the iPF is all about. So if I list Fenway Park with a 1.02 PF, that means that it actually increases RPG by 4%.

In the calculation of the PFs, I did not take out “home” games that were actually at neutral sites (of which there were a rash in 2020). The Blue Jays multiple homes make things very messy, so I just used their 2021 data only.

There are also Team Offense and Defense spreadsheets. These include the following categories:

Team offense: Plate Appearances, Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Walks and Hit Batters per At Bat (WAB), Isolated Power (SLG - BA), R/G at home (hR/G), and R/G on the road (rR/G) BA, OBA, SLG, WAB, and ISO are park-adjusted by dividing by the square root of park factor (or the equivalent; WAB = (OBA - BA)/(1 - OBA), ISO = SLG - BA, and SEC = WAB + ISO).

Team defense: Innings Pitched, BA, OBA, SLG, Innings per Start (IP/S), Starter's eRA (seRA), Reliever's eRA (reRA), Quality Start Percentage (QS%), RA/G at home (hRA/G), RA/G on the road (rRA/G), Battery Mishap Rate (BMR), Modified Fielding Average (mFA), and Defensive Efficiency Record (DER). BA, OBA, and SLG are park-adjusted by dividing by the square root of PF; seRA and reRA are divided by PF.

The three fielding metrics I've included are limited it only to metrics that a) I can calculate myself and b) are based on the basic available data, not specialized PBP data. The three metrics are explained in this post, but here are quick descriptions of each:

1) BMR--wild pitches and passed balls per 100 baserunners = (WP + PB)/(H + W - HR)*100

2) mFA--fielding average removing strikeouts and assists = (PO - K)/(PO - K + E)

3) DER--the Bill James classic, using only the PA-based estimate of plays made. Based on a suggestion by Terpsfan101, I've tweaked the error coefficient. Plays Made = PA - K - H - W - HR - HB - .64E and DER = PM/(PM + H - HR + .64E)

Next are the individual player reports. I defined a starting pitcher as one with 15 or more starts. All other pitchers are eligible to be included as a reliever. If a pitcher has 40 appearances, then they are included. Additionally, if a pitcher has 50 innings and less than 50% of his appearances are starts, he is also included as a reliever (this allows some swingmen type pitchers who wouldn’t meet either the minimum start or appearance standards to get in). This would be a good point to note that I didn't do much to adjust for the opener--I made some judgment calls (very haphazard judgment calls) on which bucket to throw some pitchers in. This is something that I should definitely give some more thought to in coming years.

For all of the player reports, ages are based on simply subtracting their year of birth from 2021. I realize that this is not compatible with how ages are usually listed and so “Age 27” doesn’t necessarily correspond to age 27 as I list it, but it makes everything a heckuva lot easier, and I am more interested in comparing the ages of the players to their contemporaries than fitting them into historical studies, and for the former application it makes very little difference. The "R" category records rookie status with a "R" for rookies and a blank for everyone else. Also, all players are counted as being on the team with whom they played/pitched (IP or PA as appropriate) the most.

For relievers, the categories listed are: Games, Innings Pitched, estimated Plate Appearances (PA), Run Average (RA), Relief Run Average (RRA), Earned Run Average (ERA), Estimated Run Average (eRA), DIPS Run Average (dRA), Strikeouts per Game (KG), Walks per Game (WG), Guess-Future (G-F), Inherited Runners per Game (IR/G), Batting Average on Balls in Play (%H), Runs Above Average (RAA), and Runs Above Replacement (RAR).

IR/G is per relief appearance (G - GS); it is an interesting thing to look at, I think, in lieu of actual leverage data. You can see which closers come in with runners on base, and which are used nearly exclusively to start innings. Of course, you can’t infer too much; there are bad relievers who come in with a lot of people on base, not because they are being used in high leverage situations, but because they are long men being used in low-leverage situations already out of hand.

For starting pitchers, the columns are: Wins, Losses, Innings Pitched, Estimated Plate Appearances (PA), RA, RRA, ERA, eRA, dRA, KG, WG, G-F, %H, Pitches/Start (P/S), Quality Start Percentage (QS%), RAA, and RAR. RA and ERA you know--R*9/IP or ER*9/IP, park-adjusted by dividing by PF. The formulas for eRA and dRA are based on the same Base Runs equation and they estimate RA, not ERA.

* eRA is based on the actual results allowed by the pitcher (hits, doubles, home runs, walks, strikeouts, etc.). It is park-adjusted by dividing by PF.

* dRA is the classic DIPS-style RA, assuming that the pitcher allows a league average %H, and that his hits in play have a league-average S/D/T split. It is park-adjusted by dividing by PF.

The formula for eRA is:

A = H + W - HR

B = (2*TB - H - 4*HR + .05*W)*.78

C = AB - H = K + (3*IP - K)*x (where x is figured as described below for PA estimation and is typically around .93) = PA (from below) - H - W

eRA = (A*B/(B + C) + HR)*9/IP

To figure dRA, you first need the estimate of PA described below. Then you calculate W, K, and HR per PA (call these %W, %K, and %HR). Percentage of balls in play (BIP%) = 1 - %W - %K - %HR. This is used to calculate the DIPS-friendly estimate of %H (H per PA) as e%H = Lg%H*BIP%.

Now everything has a common denominator of PA, so we can plug into Base Runs:

A = e%H + %W

B = (2*(z*e%H + 4*%HR) - e%H - 5*%HR + .05*%W)*.78

C = 1 - e%H - %W - %HR

cRA = (A*B/(B + C) + %HR)/C*a

z is the league average of total bases per non-HR hit (TB - 4*HR)/(H - HR), and a is the league average of (AB - H) per game.

Also shown are strikeout and walk rate, both expressed as per game. By game I mean not nine innings but rather the league average of PA/G. I have always been a proponent of using PA and not IP as the denominator for non-run pitching rates, and now the use of per PA rates is widespread. Usually these are expressed as K/PA and W/PA, or equivalently, percentage of PA with a strikeout or walk. I don’t believe that any site publishes these as K and W per equivalent game as I am here. This is not better than K%--it’s simply applying a scalar multiplier. I like it because it generally follows the same scale as the familiar K/9.

To facilitate this, I’ve finally corrected a flaw in the formula I use to estimate plate appearances for pitchers. Previously, I’ve done it the lazy way by not splitting strikeouts out from other outs. I am now using this formula to estimate PA (where PA = AB + W):

PA = K + (3*IP - K)*x + H + W

Where x = league average of (AB - H - K)/(3*IP - K)

Then KG = K*Lg(PA/G) and WG = W*Lg(PA/G).

G-F is a junk stat, included here out of habit because I've been including it for years. It was intended to give a quick read of a pitcher's expected performance in the next season, based on eRA and strikeout rate. Although the numbers vaguely resemble RAs, it's actually unitless. As a rule of thumb, anything under four is pretty good for a starter. G-F = 4.46 + .095(eRA) - .113(K*9/IP). It is a junk stat. JUNK STAT JUNK STAT JUNK STAT. Got it?

%H is BABIP, more or less--%H = (H - HR)/(PA - HR - K - W), where PA was estimated above. Pitches/Start includes all appearances, so I've counted relief appearances as one-half of a start (P/S = Pitches/(.5*G + .5*GS). QS% is just QS/(G - GS); I don't think it's particularly useful, but Doug's Stats include QS so I include it.

I've used a stat called Relief Run Average (RRA) in the past, based on Sky Andrecheck's article in the August 1999 By the Numbers; that one only used inherited runners, but I've revised it to include bequeathed runners as well, making it equally applicable to starters and relievers. One thing that's become more problematic as time goes on for calculating this expanded metric is the sketchy availability of bequeathed runner data for relievers. As a result, only bequeathed runners left by starters (and "relievers" when pitching as starters) are taken into account here. I use RRA as the building block for baselined value estimates for all pitchers. I explained RRA in this article, but the bottom line formulas are:

BRSV = BRS - BR*i*sqrt(PF)

IRSV = IR*i*sqrt(PF) - IRS

RRA = ((R - (BRSV + IRSV))*9/IP)/PF

Given the difficulties of looking at the league average of actual runs due to Manfred rules, I decided to use eRA to calculate the baselined metrics for relievers. So they are no longer based on actual runs allowed by the pitcher, but rather on the component statistics. For starters, I will use the actual runs allowed in the form of RRA, but compared to the league average eRA. Starters’ statistics are not influenced by the Manfred runners, but the league average RA is still artificially inflated by them, so the league eRA should be a better measure of what the league average RRA would be in lieu of Manfred runners. I say “should” as this assumes that the eRA formula is properly calibrated, and it’s hard to calibrate any runs created formula when you don’t know what the league average runs should be. I remain unconvinced that most saberemtricians have fully grasped all of the implications of the Manfred runners on the 2020-2021 statistics, and if these rules are maintained going forward it will require much more effort to maintain basic sabermetric measures. In any event, the RAA/RAR formulas I’m using are:

RAA (relievers) = (.951*Lg(eRA) - eRA)*IP/9

RAA (starters) = (1.025*Lg(eRA) - eRA)*IP/9

RAR (relievers) = (1.11*Lg(eRA) - RRA)*IP/9

RAR (starters) = (1.28*Lg(eRA) - RRA)*IP/9

All players with 250 or more plate appearances (official, total plate appearances) are included in the Hitters spreadsheets (along with some players close to the cutoff point who I was interested in). Each is assigned one position, the one at which they appeared in the most games. The statistics presented are: Games played (G), Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Runs Created (RC), Runs Created per Game (RG), Speed Score (SS), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), and Runs Above Replacement (RAR).

Starting in 2015, I'm including hit batters in all related categories for hitters, so PA is now equal to AB + W+ HB. Outs are AB - H + CS. BA and SLG you know, but remember that without SF, OBA is just (H + W + HB)/(AB + W + HB). Secondary Average = (TB - H + W + HB)/AB = SLG - BA + (OBA - BA)/(1 - OBA). I have not included net steals as many people (and Bill James himself) do, but I have included HB which some do not.

BA, OBA, and SLG are park-adjusted by dividing by the square root of PF. This is an approximation, of course, but I'm satisfied that it works well (I plan to post a couple articles on this some time during the offseason). The goal here is to adjust for the win value of offensive events, not to quantify the exact park effect on the given rate. I use the BA/OBA/SLG-based formula to figure SEC, so it is park-adjusted as well.

Runs Created is actually Paul Johnson's ERP, more or less. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available.

For 2015, I refined the formula a little bit to:

1. include hit batters at a value equal to that of a walk

2. value intentional walks at just half the value of a regular walk

3. recalibrate the multiplier based on the last ten major league seasons (2005-2014)

This revised RC = (TB + .8H + W + HB - .5IW + .7SB - CS - .3AB)*.310

RC is park adjusted by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the Runs Created per Game rate, is RC/O*26. For a very long time, dating back to the Jamesian era, 25.5 has been a good approximation for the number of (AB – H + CS) per game, but it has been creeping up, and per 9 innings this year it was right around 26, so I am using that value now.

I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).

Several years ago I switched from using my own "Speed Unit" to a version of Bill James' Speed Score; of course, Speed Unit was inspired by Speed Score. I only use four of James' categories in figuring Speed Score. I actually like the construct of Speed Unit better as it was based on z-scores in the various categories (and amazingly a couple other sabermetricians did as well), but trying to keep the estimates of standard deviation for each of the categories appropriate was more trouble than it was worth.

Speed Score is the average of four components, which I'll call a, b, c, and d:

a = ((SB + 3)/(SB + CS + 7) - .4)*20

b = sqrt((SB + CS)/(S + W))*14.3

c = ((R - HR)/(H + W - HR) - .1)*25

d = T/(AB - HR - K)*450

James actually uses a sliding scale for the triples component, but it strikes me as needlessly complex and so I've streamlined it. He looks at two years of data, which makes sense for a gauge that is attempting to capture talent and not performance, but using multiple years of data would be contradictory to the guiding principles behind this set of reports (namely, simplicity. Or laziness. You're pick.) I also changed some of his division to mathematically equivalent multiplications.

The baselined stats are calculated in the same basic manner the pitcher stats are, using the league average RG:

HRAA = (RG – LgRG)*O/26

RAA = (RG – LgRG*PADJ)*O/26

RAR = (RG – LgRG*PADJ*.73)*O/26

PADJ is the position adjustment, based on 2010-2019 offensive data. For catchers it is .92; for 1B/DH, 1.14; for 2B, .99; for 3B, 1.07; for SS, .95; for LF/RF, 1.09; and for CF, 1.05. As positional flexibility takes hold, fielding value is better quantified, and the long-term evolution of the game continues, it's right to question whether offensive positional adjustments are even less reflective of what we are trying to account for than they were in the past. But while I do not claim that the relationship is or should be perfect, at the level of talent filtering that exists to select major leaguers, there should be an inverse relationship between offensive performance by position and the defensive responsibilities of the position. Not a perfect one, but a relationship nonetheless. An offensive positional adjustment than allows for a more objective approach to setting a position adjustment. Again, I have to clarify that I don’t think subjectivity in metric design is a bad thing - any metric, unless it’s simply expressing some fundamental baseball quantity or rate (e.g. “home runs” or “on base average”) is going to involve some subjectivity in design (e.g linear or multiplicative run estimator, any myriad of different ways to design park factors, whether to include a category like sacrifice flies that is more teammate-dependent).

The replacement levels I have used here are very much in line with the values used by other sabermetricians. This is based both on my own "research", my interpretation of other's people research, and a desire to not stray from consensus and make the values unhelpful to the majority of people who may encounter them.

Replacement level is certainly not settled science. There is always going to be room to disagree on what the baseline should be. Even if you agree it should be "replacement level", any estimate of where it should be set is just that--an estimate. Average is clean and fairly straightforward, even if its utility is questionable; replacement level is inherently messy. So I offer the average baseline as well.

For position players, replacement level is set at 73% of the positional average RG (since there's a history of discussing replacement level in terms of winning percentages, this is roughly equivalent to .350). For starting pitchers, it is set at 128% of the league average RA (.380), and for relievers it is set at 111% (.450).

The spreadsheets are published as Google Spreadsheets, which you can download in Excel format by changing the extension in the address from "=html" to "=xlsx", or in open format as "=ods", or in csv as "=csv". That way you can download them and manipulate things however you see fit.

Crude Playoff Odds--2021

2021-10-04T17:08:00.056-04:00

These are very simple playoff odds, based on my crude rating system for teams using an equal mix of W%, EW% (based on R/RA), PW% (based on RC/RCA), and 69 games of .500. They account for home field advantage by assuming a .500 team wins 54.2% of home games (major league average 2006-2015). They assume that a team's inherent strength is constant from game-to-game. They do not generally account for any number of factors that you would actually want to account for if you were serious about this, including but not limited to injuries, the current construction of the team rather than the aggregate seasonal performance, pitching rotations, estimated true talent of the players, etc.

The CTRs that are fed in are:

Wildcard game odds (the least useful since the pitching matchups aren’t taken into account, and that matters most when there is just one game):

DS:

LCS:

World Series:

Everything combined:

If the Dodgers win the wildcard game, they become the World Series favorites at 19.9%, with the Giants falling to 19.0%; the Rays fall to 16.1%, so the Dodgers don't have a huge impact on the AL odds (the Dodgers are given a 32.6% to win the pennant should they win the wildcard game). If the Cardinals win, the Giants jump to 25.9% and the Rays to 17.6% (of course the Giants benefit greatly because they have an estimated 68% chance to beat the Cardinals in the NLDS but only 50% to beat the Dodgers). Ranging into the realm of the subjective, I personally favor Houston to win the AL pennant and think Milwaukee will benefit from concentrating innings in front-line pitchers (even sans Devin Williams) and from being on the opposite side of the bracket from the NL West.

I don't have the energy for a rant about what a preposterous proposition it is to make the Dodgers play the Cardinals, or about how the much-hyped four-way battle for the AL wildcard yesterday would never happen under Rob Manfred's desired system (or how if it did it would be between teams struggling to reach .500). The playoffs simultaneously manage to be one of the best and worst things about baseball, and every expansion will serve to enhance the latter.

Rate Stat Series, pt. 13: Relativity for the Linear Weights Framework

2021-09-29T08:35:00.060-04:00

Of the three frameworks for evaluating individual offense, linear weights offers the simplest calculation of runs created or RAA, but will be the hardest to convert to a win-equivalent rate – mentally if not computationally. In order to do this, we need to consider what our metrics actually represent and make our choices accordingly. The path that I am going to suggest is not inevitable – it makes sense to me, but there are certainly valid alternative paths.

In attempting to measure the win value of a batter’s performance in the linear weights framework, we could construct a theoretical team and measure his win impact on it. In so doing, one could argue that the batter’s tertiary impact (which would be ignored under such an approach) is immaterial, perhaps even illusory, and that the process of converting runs to wins is independent from the development of the run estimate. Thus we could use a static approach for estimating runs and a dynamic team approach for converting those runs to wins.

I would argue in turn that the most consistent approach is to continue to operate under the assumption that linear weights represents a batter’s impact on a team that is average once he is added to it, and thus not allow any dynamism in the runs to wins conversion. Since under this school of thought all teams are equal, whether we add Frank Thomas or Matt Walbeck, there is no need to account for how those players change the run environment and the run/win conversion – because they both ultimately operate in the same run environment.

One could argue that I am taking a puritanical viewpoint, and that this would become especially clear in a case in which one compared the final result of the linear weights framework to the final result of the theoretical team framework. As we’ve seen, RAA is very similar between the two approaches, but the run/win conversions will diverge more if in one case we ignore the batter’s impact on the run environment. In any event, the methodology we’ll use for the theoretical team framework will be applicable to linear weights as well, if you desire to use it.

Since we will not be modeling any dynamic impact of the batter upon the team’s run environment, it is an easy choice to start with RAA and convert it to wins above average (WAA) by dividing by a runs per win (RPW) value. An example of this is the rule of thumb that 10 runs = 1 win, so 50 RAA would be worth 5 WAA.

There are any number of methods by which we could calculate RPW, and a couple philosophical paths to doing so. On the latter, I’m assuming that we want our RPW to be represent the best estimate of the number of marginal runs it would take for a .500 team (or more precisely a team with R = RA) to earn a marginal win. Since I’ve presumed that Pythagenpat is the correct run to win conversion, the most consistent is to use the RPW implied by Pythagenpat, which is:

RPW = 2*RPG^(1 – z) where z is the Pythagenpat exponent

so when z = .29, RPW = 2*RPG^.71

For the 1966 AL, this produces 8.588 RPW and for the 1994 AL it is 10.584. So we can calculate LW_WAA = LW_RAA/RPW, and LW_WAA/PA seems like the natural choice for a rate stat:

This tightens the gap between Robinson and Thomas as compared to a RAA/PA comparison, and since we’ve converted to wins, we can look at WAA/PA without having to worry about the underlying contextual differences (note: this is actually not true, but I’m going to pretend like it is for a little bit for the sake of the flow of this discussion).

There is another step we could take, which is to recognize that the Franks do influence the context in which their wins are earned, driving up their team’s RPGs and thus RPWs and thus their own WAAs. Again, I would contend that a theoretically pure linear weights framework assumes that the team is average after the player is added. Others would contend that by making that assertion I’m elevating individual tertiary offensive contributions to a completely unwarranted level of importance, ignoring a measurable effect of individual contribution because the methodology ignores an immaterial one. This is a perfectly fair critique, and so I will also show how we can adjust for the hitter’s impact on the team RPW in this step. Pete Palmer makes this adjustment as part of converting from Batting Runs (which is what I’m calling LW_RAA) to Batting Wins (what I’m calling LW_WAA), and far be it from me to argue too vociferously against Pete Palmer when it comes to a linear weights framework.

What Palmer would have you do next (conceptually as he uses a different RPW methodology) is take the batter’s RAA, divide by his games played, and add to RPG to get the RPG for an average team with the player in question added. It’s that simple because RPG already represents average runs scored per game by both teams and RAA already captures a batter’s primary and secondary contributions to his team’s offense. One benefit or drawback of this approach, depending on one’s perspective, is that unlike the theoretical team approach it is tethered to the player’s actual plate appearances/games. Using the theoretical team approach from this series, a batter always gets 1/9 of team PA. Under this approach, a batter’s real world team PA, place in the batting order, frequency of being removed from the game, etc. will have a slight impact on our estimate of his impact on an average team. We could also eschew using real games played, and instead use something like “team PA game equivalents”. For example, in the 1994 AL the average team had 38.354 PA/G; Thomas, with 508 PA, had the equivalent of 119.21 games for an average hitter getting 1/9 of an average team’s PA (508/38.354*9). I’ve used real games played, as Palmer did, in the examples that follow.

Applying the Palmerian approach to our RPW equation:

TmRPG = LgRPG + RAA/G

TmRPW = 2*TmRPG^.71

LW_WAA = LW_RAA/TmRPW

For the Franks, we get:

The difference between Thomas and Robinson didn’t change much, but both lose WAA and WAA/PA due to their effect on the team’s run environment as each run is less valuable as more are scored.

I have used WAA/PA as a win-equivalent rate without providing any justification for doing so. In fact, there is very good theoretical reason for not doing so. One of the key underpinnings of all of our rate stats is that plate appearances are not fixed across contexts – they are a function of team OBA. Wins are fixed across contexts – always exactly one per game. Thus when we compare Robinson and Thomas, it’s not enough to simply look at WAA/PA; we also need to adjust for the league PA/G difference or else the denominator of our win-equivalent rate stat will distort the relativity we have so painstakingly tried to measure.

In the 1966 AL, teams averaged 36.606 PA/G; in 1994, it was 38.354. Imagine that we had two hitters from these leagues with identical WAA and PA. We don’t have to imagine it; in 1966 Norm Siebern had .847 WAA in 399 PA, and in 1994 Felix Jose had .845 WAA in 401 PA (I’m using Palmer-style WAA in this example). It seems that Siebern had a minuscule advantage over Jose. But while wins are fixed across contexts (one per game, regardless of the time and place), plate appearances are not. A batter using 401 PA in 1994 was taking a smaller share of the average PA than one taking 399 in 1966 (you might be yelling about the difference in total games played between the two leagues due to the strike, but remember that WAA already has taken into account the performance of an average player – whether over a 113 or 162 game team-season is irrelevant when comparing their WAA figures). In 1994, 401 PA represented 10.46 team games worth of PA; in 1966, 399 represented 10.90 worth. In fact, Siebern’s WAA rate was not higher than Jose’s; despite having two fewer PA, Siebern took a larger share of his team’s PA to contribute his .85 WAA than Jose did.

If we do not make a correction of this type and just use WAA/PA, we will be suggesting that the hitters of 1966 were more productive on a win-equivalent rate basis than the hitters of 1994 (although this is difficult to prove as by definition the average player’s WAA/PA will be 0, regardless of the environment in which they played). I don’t want to get bogged down in this discussion too much, so I will point you here for a discussion focused just on this aspect of comparing across league-seasons.

There are a number of different ways you could adjust for this; the “team games of PA” approach I used would be one. The approach I will use is to pick a reference PA/G, similar to our reference Pythagenpat exponent from the last installment, and force everyone to this scale. For all seasons 1961-2019, the average PA/G is 37.359 which I will define as refPA/G. The average R/G is 4.415, so the average RPG is 8.83 and the refRPW is 9.390.

If we calculate:

adjWAA/PA = WAA/PA * Lg(PA/G)/ref(PA/G)

Then we will have restated a hitter’s WAA rate in the reference environment. This is an option as our final linear weight win stat:

This increases Thomas’ edge over Robinson, while giving Jose a miniscule lead over Siebern. As a final rate stat, I find it a little unsatisfying for a couple of reasons:

1. while the ultimate objective of an offense is to contribute to wins, runs feels like a more appropriate unit

2. related to #1, wins compress the scale between hitters. There’s nothing wrong with this to the extent that it forces us to recognize that small differences between estimates fall squarely within the margin of error inherent to the exercise, but it makes quoting and understanding the figures more of a challenge.

3. WAA/PA, adjusted for PA context or not, is only differentially comparable; ideally we’d like to have a comparable ratio

My solution to this is to first convert adjusted WAA/PA to an adjusted RAA/PA, which takes care of objections #1 and 2, then to convert it to an adjusted R+/PA, which takes care of objection #3. At each stage we have a perfectly valid rate stat; it’s simply a matter of preference.

To do this seems simple (let’s not get too attached to this approach, which we’ll revisit in a future post):

adjRAA/PA = adjWAA/PA*refRPW (remember, by adjusting WAA/PA using the refPA/G, we’ve restated everything in the terms of the reference league)

adjR+/PA = adjRAA/PA + ref(R/PA) (reference R/PA is .1182, which can be obtained by dividing the ref R/G by the ref PA/G)

We can also compute a relative adjusted R+/PA:

reladjR+/PA = (adjR+/PA)/(ref(R/PA))

= ((RAA/PA)/TmRPW * Lg(PA/G)/ref(PA/G) * refRPW + Ref(R/PA))/Ref(R/PA)

= (RAA/PA)/TmRPW * Lg(PA/G)/ref(PA/G) * refRPW/ref(R/PA) + 1

I included raw R+/PA and its ratio the league average (relative R+/PA) to compare to this final relative adjusted R+/PA. For three of the players, the differences are small; it is only Frank Robinson whose standing is significantly diminished. This may seem counterintuitive, but remember that the more ordinary hitters have much smaller impact on RPW than the Franks. Relative to Thomas, Robinson gets more of a boost from his lower RPW (his team RPW was 19% lower than Thomas) than Thomas does from the PA adjustment (the 1994 AL had 4.8% more PA/G than the 1966 AL).

We could also return to the puritan approach (which I actually stubbornly favor for the linear weights framework) and make these adjustments as well. The equations are the same as above except where we use TmRPW, we will instead use LgRPW – reverting to assuming that the batter has no impact on the run/win conversion.

Here the impact on the Franks is similar; both are hurt of course when we consider their impact on TmRPW. Next time, we will quit messing around with half-measures – no more mixing linear run contributions with dynamic run/win converters. We’re going full theoretical team.

Rate Stat Series, pt. 12: Relativity for the Player as a Team Framework

2021-09-15T08:07:00.083-04:00

I have now covered all of the ground I wished to cover regarding the construction of rate stats for various frameworks for evaluating individual offense, so I think this would be a good point to take stock of the conclusions I have drawn. From here on out, I will write about these tenets as if they are inviolable, which is not actually the case (“sound” and “well-reasoned to me” are as far as I’m willing to go), but I’m moving on to other topics:

* For team offense, Runs/Out is the only proper rate stat

* If evaluating individuals as teams (i.e. applying a dynamic run estimator like Runs Created or Base Runs directly to an individual’s statistics), Runs/Out is also the proper rate stat. Any other choice breaks consistency with treating the individual as a team, and consistency is the only thing going for such an approach. On a related note, don’t treat individuals as teams.

* If using a linear weights framework, the proper rate stat is RAA/PA, R+/PA, or restatement of those (wOBA is the one commonly used). These metrics all produce consistent rank orders and can be easily restated from one another. Any of them are valid choices at the user’s discretion, with the only objective distinction being whether you want ratio and differential comparability (R+/PA), only care about differential comparability (R+/PA or RAA/PA), or would prefer a different scale altogether at the sacrifice of direct comparisons without futher modifications (wOBA).

* If using a theoretical team framework, R+/O+ and R+/PA+ are equivalent, with the user needing to decide whether they want to think about individual performance in terms of R/O or R/PA.

Throughout this series, I have not worried about context, and originally envisioned a final installment that would briefly discuss some of the issues with comparing player’s rates across league-seasons. In putting it together, I decided that it will take a few installments to do this properly. It will also take another league-season to pair with the 1994 AL in order to make cross-context comparisons.

I have chosen the 1966 AL to fill this role. In the expansion era (1961 – 2019), the AL has averaged 4.52 R/G. 1994 was third-highest at 5.23, 15.6% higher than average. The closest league to being an inverse relative to the average is the 1966 AL (3.89 R/G, 13.8% lower than average). The obvious choice would have been 1968, since it was the most extreme, but as 1994 was not the most extreme I thought a league that was similarly low scoring would be more appropriate.

The other reason I like the 1966 AL for his purpose is that, just as was the case in 1994, the superior offensive player in the league was a future Hall of Fame slugger named Frank. In making comparisons across the twenty-eight year and (more importantly) 1.34 R/G differences between these two league-seasons, the two Franks will serve as our primary reference points.

To look at the 1966 AL we will first need to define our runs created formulas. I will not repeat the explanations of how these are calculated – I used the exact same approach as for the 1994 AL, which are demonstrated primarily in the parts 1, 9, and 10.

Base Runs:

A = H + W – HR = S + D + T + W

B = (2TB - H – 4HR + .05W)*.79920 = .7992S + 2.3976D + 3.996T + 2.3976HR + .0400W

C = AB – H = Outs

D = HR

BsR = (A*B)/(B + C) + D

Linear Weights:

LW_RC = .4560S + .7751D + 1.0942T + 1.4787HR + .3044W - .0841(outs)

LW_RAA = .4560S + .7751D + 1.0942T + 1.4787HR + .3044W - .2369(outs)

wOBA = (.8888S + 1.2981D + 1.7075T + 2.2006HR + .6944W)/PA

Theoretical Team:

TT_BsR = (A + 2.246PA)*(B + 2.346PA)/(B + C + 7.915PA) + D - .6658PA

TT_BsRP = ((A + 2.246PA)*(B + 2.346PA)/(B + C + 7.915PA) + D + .185PA)*PAR – .8519PA

TT_RAA = ((A + 2.246PA)*(B + 2.346PA)/(B + C + 7.915PA) + D + .185PA)*PAR – .9572PA

Using these equations, let’s look at the leaderboards for the key stats for each framework. First, for the player as a team:

Of course, we’ll get slightly different results from the different frameworks, but two things should be obvious: Robinson was the best offensive player in the league (his lead over Mantle in second place is bigger than the gap from Mantle to ninth-place Curt Blefary, and he was also among the leaders in PA), and the difference in league offensive levels carried through to individuals.

Next for the linear weights framework:

And finally for the theoretical team framework:

Now comes the hard part – how do we compare these performances from the 1966 AL against those from the 1994 AL? It should be implied, but in all comparisons that follow, I am only concerned about the value of a given player’s contributions relative to the context in which he played; this is not about whether/to what extent the overall quality of play differed between 1966 and 1994, or about how the existence of pitcher hitting in 1966 but not in 1994 impacted the league average, etc. I also am not going all the way in accounting for context, as I’ve still done nothing to park-adjust. Park adjustments are important, but whether you adjust or not is irrelevant to the question of which rate stat you should use. I’m also not interested in how “impressive” a given rate is due to the variation between players (e.g. z-scores) – I’m simply trying to quantify the win value of the runs a player has contributed.

The obvious first step to compare players across two league-seasons is to compare the difference or ratio of their rate to the league rate. As we’ve discussed throughout this series, some stats can be compared using both differences and ratios, but others can only be compared (at least meaningfully) with one or the others, and others still can be compared but the ratio or difference no longer has any baseball meaning unless some transformation is carried out (wOBA is the most prominent example we’ve touched on).

The table below shows the rates for our two Franks, the respective league rate, and the difference and ratio (if applicable) for each of the key rate stats we’ve looked at:

One of the other reasons I was drawn to the pairing of these two league-seasons is how close Thomas and Robinson are in the key metrics when compared to the league using a ratio. Thomas has a slight advantage in each metric, except for BsR/O, where the flaw of treating an individual as a team can be seen. Playing in a high offense context, Thomas’ crazy rates (to put it in simple terms related to but not the run unit stats used in this series, Thomas hit .353/.492/.729 to Robinson’s .316/.406/.637) result in a large estimated tertiary contribution when treating him as his own team. Expressing these metrics as ratios hammers home that even for extreme performers (defining “extreme” within the usual confines of a major league season), there isn’t much difference between using the linear weights and theoretical team frameworks, and both are reasonable choices of framework. Treating an individual as a team, not so much.

We’ve expressed each player’s rate relative to the league average, but we haven’t answered the question of which construct is appropriate – the difference or the ratio, or does the answer differ depending on the metric? And how can we make that determination? In lieu of some guiding principle, it is just a matter of personal preference. Most people are drawn to ratios, and the widespread adoption of metrics like ERA+, ERA-, OPS+, and wRC+ speak to that preference. This case illustrates one of the reasons why ratios make sense intuitively – while the Franks are very close when comparing ratios, Thomas has a healthy lead when comparing differences as the league average R/PA and R/O for 1994 are much higher than the same for 1966. It seems intuitive that a batter will be able to exceed the league average by a larger difference when it is .2419 rather than .1528.

Still, in order to really understand how to compare players across contexts, and ensure that our intuition is grounded in reality, we need to think about how runs relate to wins. All of the metrics in that table which I consider the key metrics discussed in this series have been expressed in runs, and this is a natural starting point as the goal of an offense is to score runs. Of course the ultimate reason why an offense wants to score runs is so that they might contribute to wins.

I will break one of the rules I’ve tried to adhere to by begging the question and asserting that Pythagenpat and associated constructs are the correct way to convert runs to wins. Of course it is just a model, and while it is a good one, it is not perfect, but I think it is the best choice given its accuracy and its seemingly reasonable results for extreme situations.

Using Pythagorean also lends itself to adoption of a ratio rate stat, as one way to express the Pythagorean Theorem is that the expected ratio of wins to losses is equal to the ratio of runs to runs allowed raised to some power. If we start by thinking about the team/player as team framework for evaluating offense, it is natural to construct a win-unit rate stat by treating the team’s R/O as the numerator and the league average as the denominator. The ratio of these two, raised to some power, is then the equivalent win ratio that results. This is exactly the approach that Bill James took in his early Baseball Abstracts.

We can easily use this relationship to demonstrate that a simple difference of R/O does not capture the differences in win values between players. If one player exceeds the league average R/O by 10% and another 50%, then assuming a Pythagorean exponent of 2, player A will have a “win ratio” of 1.21 and player B will have a win ratio of 2.25. Even if we convert those to winning percentages (.5475 and .6923), the difference between the two player’s R/O does not capture the win value difference.

The ratio doesn’t either, of course, since it would need to be squared, but if we assume a constant Pythagorean exponent like 2, this is simply a matter of scaling, with no impact on the rank order to players. However, if we use a custom Pythagorean exponent, this assumption breaks down, as we can illustrate by comparing the two Franks. Since we’re starting with the assumption that Pythagenpat is correct, this means that we will always need to make some kind of adjustment in order to convert our run rate into an equivalent win rate.

Since we are treating the players as teams, the consistent approach is to first calculate the RPG for each player as a team, then calculate the Pythagenpat exponent for the situation, then convert their relative BsR/O to an equivalent win value. The RPG for the 1966 AL was 3.893 with 25.482 O/G, while in the 1994 AL it was 5.226 with 25.188 O/G.

T_RPG = LgR/G + BsR/O*LgO/G

x = T_RPG^.29 (for a Pythagenpat z value of .29)

Relative BsR/O = (BsR/O)/(LgBsR/O)

Win Ratio = (Relative BsR/O)^x

Offensive Winning Percentage = Win Ratio/(Win Ratio + 1) = (BsR/O)^x/((BsR/O)^x + (LgBsR/O)^x)

Here, we conclude that just looking at Relative BsR/O understates Thomas’ superiority, as Pythagenpat estimates more wins for a given run ratio when the RPG increases (e.g. a run ratio of 8/4 will produce more wins than a run ratio of 7/3.5).

If we were actually going to use this framework, we could leave our final relative rate stat in the form of a win ratio or OW%, but I would prefer to convert it back to a run rate. Even within the conceit of the player as a team framework, it’s hard to know what to do with an OW% (or the even less familiar win ratio). We can say that Thomas’ OW% of .903 means that a team that hit like Thomas and had league average defense (runs allowed) would be expected to have a .903 W%, but even if you follow the player as team framework, you would probably like to have a result that can be more easily tied back to the player’s performance as the member of a team, not what record the Yuma Mutant Clone Franks would have. One way to return the win ratio to a more familiar scale is to convert it back to a run ratio.

Let’s define “r” to be the Pythagenpat exponent for some reference context, which I will define to be 8.83 RPG – the major league average for the expansion era (1961 – 2019). We can then easily convert our estimated win ratios for the Franks to run ratios that would produce the same estimated win ratio in the reference (8.83 RPG) environment.

The calculation is simply (Win Ratio)^(1/r). Since r = 8.83^.29 = 1.881, this becomes Win Ratio^.5316, which produces 2.614 for Robinson and 3.282 for Thomas. Following the time-honored custom of dropping the decimal place, we wind up with a win-equivalent relative BsR/O of 261 for Robinson and 328 for Thomas, compared to their initial values of 217 and 260 respectively. These both increased because both players would radically alter their run environments, increasing the win value of their relative BsR/O. I would demonstrate the lesser impact on more typical performers if I cared about this framework beyond being thorough.

While I do think that this example demonstrates that just looking at a ratio of R/O does not appropriately capture the win impact between players, the individual OW% approach goes many steps too far, but it is the logical conclusion of the player as a team methodology. If you were on that train until we got to the end of the line, I’d encourage you to consider jumping off at one of the earlier stations (I’d prefer you get on the green or red linear weights or theoretical team lines rather than ever embarking on the player as team blue line). Next time, I’ll explore those paths to win-unit rate stats using those frameworks.

Rate Stat Series, pt. 11: Rate Stats for the Theoretical Team Framework II

2021-09-01T08:36:00.037-04:00

Once PAR has been incorporated, it should be clear that a different approach will be needed as our run estimate already includes the batter’s secondary contribution – it is starting out on a similar basis to wRC. We also can fall back on our necessary test for a rate stat – it must produce the same RAA as the RAA produced by the full implementation of the framework we are looking at.

To argue why this should be the gateway criteria in a slightly different way than I did previously, consider the place of RAA in the theoretical team framework. The theoretical team framework is an attempt to value the batter by estimating the difference between the runs scored for a team on which he accumulates an even share of the plate appearances, and the runs scored for a team on which he does not play. While we have constructed an absolute estimate of runs created using this approach, it inherently is screaming out for a marginal approach – taking the difference between the team with the player and the team without the player. That is exactly what RAA is supposed to represent – and by calculating TT_RAA, we have (to the best of our ability) captured the batter’s primary, secondary, and tertiary contributions. TT_RAA is the marginal impact of the player on a team, and any rate stat that starts by using TT_BsRP should produce the same RAA figure as TT_RAA.

So let’s look at the key figures for our hitters:

Here, “RAA” is what we calculated above, based on TT_BsR/O or TT_BsR+/PA. As you can see, it does not match TT_RAA. If it did, we wouldn’t need to worry about a special set of rate stats for the TT framework – we could just piggyback on the same methodology used for linear weights. It’s clear that we need to apply something different to TT_BsRP when we use the full-blown theoretical team approach including PAR. I would also argue that this is evidence that the theoretical team approach should not be viewed simply as an alternative path to using linear weights, but rather a third unique framework for evaluating a batter’s contribution.

I would now go through a tortured explanation of the logic behind finding a denominator that achieves the desired result, but there is no need – David Smyth developed the answer and explained it clearly and concisely in a June 21, 2001 FanHome post:

On the subject of a rate stat such as R+/PA or R+/O, etc....

The whole idea behind this method is to compute the impact of a player on a theoretical team. On the team level (even a theoretical one), impact is in terms of runs and outs. The R+ generated by the procedure on this thread [NOTE: R+ is theoretically equivalent to what I am calling TT_BsRP] tells us the difference in runs between the theoretical team without the indicated batter (that is, a team of 8 average hitters), and the theoretical team with the indicated batter added. We can also compute the difference in the OUTS between these two teams, and use that total as the rate stat denominator. Call it O+.

And it's easy to calculate. You simply multiply the batter PA by the out percentage (1-OBA) of the reference team...It seems to me that the preferred rate stat for the R+ framework is not R+/PA nor R+/O; it's R+/O+.

Frank Thomas had 149.9 TT_BsRP in 508 PA. The reference team had a .3433 OBA (same as the league average). Thus, his TT_BsRP/O+ (a specific implementation of Smyth’s R+/O+) would be:

O+ = PA*(1 – LgOBA) = 508*(1 - .3433) = 333.6

TTBsRP/O+ = TT_BsRP/O+ = 149.9/333.6 = .4494

What does this represent? It is not easy to explain it in a sentence, but I will try: Frank Thomas contributed .4494 runs to the theoretical team’s offense for each of its outs distributed to him. When Smyth said that PA*(1 – RefOBA) was equal to the difference in the team’s outs, he was referring to the theoretical team construct, in which the difference in team plate appearances is the batter’s own PA, since in the left hand term (T_BsR) the team has 9*PA, and in the right hand term (R_BsR), the team has 8*PA. The difference in outs is this difference (1*PA) times the team’s out rate.

In this sense, O+ can be thought of in terms of “freezing team outs”. We know that for a team (excluding the list of complicating circumstances like rainouts, foregone batting in the bottom of the ninth, and walkoffs), outs are fixed quality. Regardless of whether we add Frank Thomas or Matt Walbeck to a reference team, the team’s outs are fixed. In the TT/PAR approach, we start by capturing the batter in question’s secondary contribution through the PAR multiplier, but we never directly change the number of PA we start from. Thus, Thomas’ 508 PA and Walbeck’s 355 PA will turn into outs at the same rate for the purpose of the calculation. In reality, Thomas’ team will compile more PA thanks to his greater secondary contribution, but the equation handles this with a multiplier, freezing the original value.

You may not find that explanation convincing – I am struggling to articulate a concept that I may be fooling myself into believing I understand. You might be more convinced by the demonstration of what Thomas’ RAA is under this approach:

TT_RAA = (TT_BSRP/O+ - LgR/O)*O+

which for Thomas = (.4494 - .2075)*333.6 = 80.7 which is the same as his TT_RAA

In fact, I jumped the gun by calling it TT_RAA before I proved it was equal to what we previously called TT_RAA. It is in fact:

This leaves unanswered the question of what the rate stat using TT_RAA should be. Similar to how we had R+/PA and RAA/PA when working with linear weights, we should have an appropriate rate stat for the RAA figure. By manipulating the TT_RAA equation, we can see that TT_RAA/O+ should be consistent with R+/O+:

TT_RAA = (R+/O+ - LgR/O)*O+

so divide both sides by O+ to get:

TT_RAA/O+ = R+/O+ - LgR/O

Thus TT_RAA/O+ and R+/O+ are analogous to RAA/PA and R+/PA, except that the difference is LgR/O rather than LgR/PA.

So we’re done, right? Or is there something vaguely unsatisfying about all this, that after an entire series in which I argued that R/O was a team measure and not an individual one, does it bother you that we have left our rate stat for an individual in the form of R/O?

On one hand, it absolutely shouldn’t – R/O is the correct choice of rate stat for a team, and in this case we have modeled the player’s impact on the team and its R/O, and so there’s nothing wrong with expressing a final result in terms of R/O. The problem was not with R/O per se – it was with the inputs that we were putting in for individual batters. Additionally, R+/O+ allows our team and individual rate stats to converge. Team R+/O+ will reduce to team R/O if accept the premise that the proper “reference team” for a team is itself, and that it’s R+ is equal to its actual runs scored since actual runs scored obviously accounts for the team’s primary, secondary, and tertiary offensive actions. Not to mention any quaternary actions we could dream up or the fact that using those terms doesn’t even make sense when talking about teams.

On the other hand, O+ is not in any way an intuitive metric – just re-read my tortured explanation of what it represents. A batter’s R+/O+ can be contextualized in any number of meaningful ways, just as regular old R/O can, but I’m not sure even those presentations (ala Runs Created/25.2 Outs) are truly relatable to a player’s performance except as a scaling device.

There is a very simple and equivalent alternative, though, and you may have already noticed what it might be from the formulas. We defined O+ as PA*(1 – LgOBA), which is really just plate appearances times a constant. If we just divide by that constant (1 – LgOBA), we can restate everything on the basis of PA, with no loss of ratio comparability.

Doing this, we now have:

(TT_BsRP/PA – LgR/PA)*PA = TT_RAA

and TT_BsRP/PA = R+/O+ * (1 – LgOBA)

R+/PA+ = TTBsRP/PA

I am going to call TT_BsRP/PA (R+/PA+) even though PA+ is just equal to PA. I’m doing this primarily for ease of discussion, so that R+/PA+ will represent the theoretical team calculation, while R+/PA will represent the linear weights equivalent—and they are equivalent (which also means that R+/O+ could be applied to the linear weights framework – more on this in a later installment). My other justification is that in theoretical team framework, the actual number of plate appearances we plug in is not of importance when dealing with a rate stat, as we are always defining the reference team’s PA to be eight times the batter’s PA. We use the batter’s PA because it allows estimates like TT_BsR to reflect his actual playing time, but if all we care about is the rate at the end, we could use 1 plate appearance or 650 or any number we wanted. Thus I prefer to think about this quantity of plate appearances as the player’s share of the reference team’s PA rather than really being tied to his own, and feel justified in distinguishing it through the abbreviation PA+.

This approach to developing a rate stat for the TT framework can be thought of as “freezing plate appearances” as compared to the “freezing outs” approach of R+/O+. I first became aware of it through a FanHome post in 2007 by David Smyth (surprise). By that time it was over five years since Smyth had published the R+/O+ methodology and I had adopted it myself, so by the time we discussed what I am calling R+/PA+ I was in the interesting position of advocating one Smyth construct against the newer. We quickly verified their equivalence and left it there, which is the position I’m taking now.

By definition, R+/O+ and R+/PA+ are perfectly correlated since the only difference between the two is multiplying by a constant. One allows us to express a rate stat in terms of R/O, which is consistent with how we would state a team rate stat; the other allow us to express it in terms of R/PA, which is how we would state a rate state from the linear weight framework. Both can be compared using ratios, which will be equivalent; both can be compared using differences, with the question of what the denominator for that difference should be up to the user. Both denominators can be used with RAA as well for RAA+/O+ or RAA+/PA+ (using RAA+ to refer RAA calculated within the full-blown theoretical team framework), and since R+/O+ and R+/PA+ can both be used to calculate RAA, they will also be consistent with RAA+ per O+ or PA+.

I’ll close with a sample calculation. Frank Thomas had 149.9 TT_BsRP in 508 PA, which is .2951 R+/PA+. The league average R/PA was .1363, so Thomas’ TT_RAA was (.2951 - .1363)*508 = 80.7, same as calculated previously using R+/O+ or directly from the original TT_RAA formula. The league OBA was .3433, so Thomas’ R+/O+ is equal to .2951/(1 - .3433) = .4494 as calculated previously.

Rate Stat Series, pt. 10: Rate Stats for the Theoretical Team Framework I

2021-08-18T08:26:00.027-04:00

In calculating TT_BsR for a batter, we have taken into account both his primary and tertiary impact on the offense, but we have neglected to address his secondary impact – that is, the value of the additional plate appearances he generates for his team by avoiding outs. There’s a relatively simple way to apply an adjustment for this using the framework for TT_BsR we’ve already developed. David Smyth called this adjustment PAR for Plate Appearance Ratio, and it is based on the same logic about how PAs are generated that we have relied on many times.

PAR is equal to the ratio of the theoretical team’s plate appearances to the plate appearances a league average team would have had. Remember that:

PA/G = (O/G)/(1 – OBA)

O/G is a constant that we set at the league level – I will call it X in the algebra that follows. We need to know the OBA of the theoretical team; since our player in question gets 1/9 of the PA and the rest of the team is assumed to be league average, this is very simply:

T_OBA = 1/9*OBA + 8/9*LgOBA

Then T_PA/G = X/(1 - (1/9*OBA + 8/9*LgOBA)) = X/(1 – 1/9*OBA - 8/9*LgOBA), while the league PA/G will be X/(1 – LgOBA). The ratio between the two will be:

(X/(1 – 1/9*OBA – 8/9*LgOBA))/(X/(1 – LgOBA)) = (1 – LgOBA)/(1 – 1/9*OBA – 8/9*LgOBA) = PAR

Since Frank Thomas had a .4921 OBA and the league average was .3433, his PAR is:

(1 - .3433)/(1 – 1/9*.4921 – 8/9*.3433) = 1.0258

This means that a theoretical team on which the Big Hurt had an equal share of the PA would end up generating 2.58% more PA than a league average team.

In order to take Thomas’ secondary contribution into account, we can return to the definitions from the last installment and calculate:

TT_BsRP = T_BsR*PAR – R_BsR

PAR is only applied to T_BsR (the base runs estimate for the theoretical team with Thomas) because the reference team, filled with league average players, will continue to have the same number of PA as before (which we’ve set to equal eight times Thomas’ PA). Filling in those terms for the 1994, the formula is:

TT_BsRP = ((A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D + .232PA)*PAR – 1.090PA

Note that we can no longer combine the D term from T_BsR with the R_BsR term as the former also needs to be inflated by PAR (Thomas’ teammates will hit more homers in those extra 2.58% PA they now enjoy).

Applying PAR increases Thomas’ TT_BsR from 132.2 to 149.9, a significant increase. This figure is more comparable to his wRC (147.1) than to other runs created estimates we’ve examined, as it’s already taken into account the value of his secondary contributions.

You may note that there is the potential of some circularity here, as we are using Thomas’ actual PA as the starting point, but Thomas’ actual PA already inherently include his real secondary contribution to the 1994 White Sox. That is to say that some of the 508 PA that Thomas actually recorded were made possible by his own generation of PA for that team. This is a good argument for using a theoretical number of PA for Thomas rather than his actual PA. Thomas recorded 508 of Chicago’s 4439 PA, or 11.44%. So we could instead use 11.44% of the league average team PA total (4366.9), in which case he would have 499.7 restated PA to plug into the Theoretical Team methodology (this is ignoring that his contribution to the White Sox also had an impact on the league average PA). Of course, in so doing we would also have to proportionally scale back his portion of the T_A, T_B, T_C, and T_D components by 499.7/508.

On the other hand, the secondary contribution of a batter through generating PA is in the background of the linear framework as well (and any other framework that considers his actual PA), it’s just that the connection leaps to the mind more quickly when modeling the other aspects of a theoretical team. I’m going to ignore this going forward, as this is after all a rate stat series, and also note that we shouldn’t ignore the fact that a batter can benefit from the additional opportunities he helps to create. The fact that the quality of his teammates influences how many opportunities he gets in the real world is at some level unavoidable.

At this point, we should also express Thomas’ contribution in terms of RAA. This is a simple modification; instead of setting R_BsR equal to the league average BsR/PA times 8 times the player’s PA, we would just need to multiply by 9 times the player’s PA so that the lineup isn’t magically shortened and instead we compare T_BsR to what a team would score with an average player in our man’s place. I did not bother running this before introducing PAR, because if there’s one thing we’ve learned from this series is that it doesn’t make a lot of sense to talk about batter RAA without taking out rate into account. So with PAR for the 1994 AL have:

TT_RAA = ((A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D + .232PA)*PAR – 1.226PA

We now have three possible theoretical team approaches, and have yet to address the question of this series: what should the rate form be? The guiding principle of this series has been that the properties of the numerator (usually a run estimate) should be logically consistent with the choice of denominator, so we should consider each of the three theoretical team approaches separately.

First is TT_BsR, which is just an estimate of the batter’s impact on the team runs scored, taking into account primary and tertiary (but not secondary) impacts. It is akin to LW_RC, with the key difference being that LW_RC does not attempt to value the batter’s tertiary impact. However, I contend that incorporating tertiary contributions does not alter the considerations when developing a rate stat. The tertiary effect is how the batter’s performance changes the underlying run environment of the team, independently of the change in plate appearances. What we are left which is an estimate of the contribution the batter made in his actual plate appearances – the only difference is that we recognize that those outcomes influenced the value of all of the other offensive events recorded by the team.

So our choices for a rate stat are the same as those for LW_RC. We can first calculate RAA (using R/O), and then take RAA/PA, or we can calculate how many additional PA the batter generated/outs he avoided, add those to his TT_BsR, and divide by PA (the R+/PA) approach. These approaches will be equivalent if we add back in LgR/PA to RAA/PA, we could convert to wOBA, we could calculate wRC along the way...all the same options.

The math will be the same as shown in parts 7 and 8, except we will substitute TT_BsR for LW_RC everywhere it pops up. Here is a leaderboard for some of the key metrics using LW_RC (we’ve seen this all before):

Now the same metrics, except substituting TT_BsR for LW_RC in all calculations:

This was a lot of work to get largely the same results. Maybe applying PAR will make things more interesting?

Rate Stat Series, pt. 9: Theoretical Teams

2021-08-04T08:13:00.025-04:00

We now depart the orderly, neat world of linear weights for the frontiers of offensive evaluation/rate stat development. Allow me to posit that there are three ways in which a batter impacts his team’s run scoring:

1. Through the direct, immediate consequences of his actions (e.g. he draws a walk or flies out). We could call this his primary contribution.

2. Through how those results create or fail to create additional opportunities for his teammates to bat (what I have been calling PA generation). We could call this his secondary contribution (I do so with some reservations because I do like secondary average, which uses “primary” to refer to direct contributions captured batting average and “secondary” to refer to other direct contributions like extra bases on hits, walks, and steals).

3. Through how his impact on the team alters the value of the actions of his teammates. This tertiary effect is hard to define, but we know that the run value of any offensive event is dependent on the context in which it occurs. A walk does no good if no one else in the lineup gets on base; each out is more costly in terms of runs in a higher scoring environment. Dynamic run estimators vary the value of each event based on the frequencies of all offensive events, while linear weights keep them fixed.

I listed and labeled these three elements of offensive production in the order of their magnitude; the third is very small, small enough that it is often ignored. Crucially for this discussion, it is small enough that if we are not careful, in attempting to measure it we could cause more unintended distortion with respect to the evaluation of #1 and #2 so as to make the exercise not just a waste of time, but actively harmful to our understanding.

So far in this series, we have looked at individual offense through two frameworks. Treating the player as a team by plugging his stats directly into a dynamic run estimator, we have captured (but distorted) #1 and #2 and given excessive weight to #3 by pretending as if the 8 teammates all perform at the level of the individual in question. By using linear weights, we have treated the player as if he was part of a semi-static environment where his direct actions and PA generation have an impact on his team, but that no matter how he performs, it has no impact on the offensive environment in which the other eight batters perform.

I believe that a third framework, which captures the impact of all three ways in which a batter affects team runs scored, is theoretically superior to the other approaches. This will involve modeling a team with and without our player – constructing a “theoretical team” in which eight members of the lineup perform at a given level and our player occupies one lineup spot. However, there are cautions, which I alluded to above:

1. The math becomes more complicated. As long as increased complexity corresponds to a more sound approach from a theoretical perspective, this is not objectionable to me, but that’s a minority viewpoint.

2. The impact of #3 is very small relative to #1 and #2, and is arguably negligible, especially when we consider all of the error bars that exist around run estimation, park factors, positional adjustments, and the myriad other variables which will come into play when the estimates are put to full use as part of an overall player evaluation system.

3. If the model which you use to implement this framework is poor, the distortions created when compared to a linear weights framework will swamp the attempt to measure the minuscule impact of #3. Even if your model is good (and I will be using Base Runs and I am quite confident that it is a good model), the linear weights framework is so robust that there is still some risk in abandoning it to chase capturing very small effects.

My original series on rate stats failed on this count, as I begged the question by assuming that a theoretical team approach was correct and using that as one of the testing criteria for other metrics. Again, I believe that the framework is theoretically correct, but the implementation is trickier, and I am not so arrogant today to believe that the model and my implementation are unquestionably superior to using a linear weights framework. To return to a bad and wildly overwrought nautical metaphor, linear weights provide a safe harbor with calm waters in which it is tempting to stay and not venture on to high seas where theoretical team frameworks tempt with the promise of riches but tempests and other dangers lurk.

Before starting, I want to note a handful of people who made significant contributions to the theoretical team concept. One is David Tate, who developed Marginal Lineup Value, which used the framework of basic runs created in conjunction with a theoretical team. Keith Woolner refined and popularized MLV. In 1998, Bill James published the approach that I will use here, although of course he used runs created. Published a year later, Jim Furtado’s Extrapolated Wins methodology used a linear run estimator (his XR) but fleshed out theoretical team concepts with respect to win impact and replacement level. Furtado also, along with G. Jay Walker and Don Malcolm, took apart James' theoretical team RC to understand what was going on behind the scenes. Finally, David Smyth, the developer of Base Runs, was the first to apply a TT construct to BsR and also developed the PAR adjustment which we’ll get to eventually.

Finally, before diving in to the specific implementation of TT I will use in this series, I want to note that by “theoretical team”, I am referring only to constructs that explicitly attempt to place the player on a theoretical/”reference” team, and use a dynamic run estimator to estimate his run impact. It does not refer to other approaches that may be undertaken to apply a dynamic run estimator to an individual hitter. One such example is the technique, so far as I know first used by Dick Cramer with his runs created-like run estimator, of calculating a batter’s runs created as the difference between the league with his stats and he league without them. This is a clever approach for using a dynamic run estimator in evaluating individuals, but not a TT approach. In fact, it more closely resembles the approach we used in this series to develop linear weights from Base Runs. The larger you make the pool to which the player is added, the more you dilute his impact. The differentiation approach takes this to the limit (see what I did there?) by isolating each event and calculating its value if it had no impact at all on the offensive environment.

In contrast, a TT approach uses a realistic scale between the individual and team; a typical approach is to assume that the individual gets 1/9 of team plate appearances. Using a 1/8 ratio between player and reference team does not require us to believe that the player actually had 1/9 of his team’s PA in the real world. One could use a player’s actual percentage of team PA and weight accordingly, but there is a balancing act: one one hand, we want to accurately capture the degree to which the batter impacted the team, but we also don’t want to lose sight of where his impact is actually felt. Consider a batter who plays in just one game in the season, getting four plate appearances. If you use his actual percentage of team PA (which might be something like 0.05%) to calculate his impact on the team, he will have had essentially no tertiary effect. That is a distortion of reality, though – he really had something closer to 11.1% of the team’s PA, in the game in which he actually played. From the perspective of evaluating his impact on the team, the other 161 games are an accounting fiction, no more relevant to him than to games between other teams played thirty years prior (in fact, we should acknowledge that runs are actually scored at the inning level, which is where we started working out the math on PA generation).

So we will assume that the reference team always has eight times as many plate appearances as the player in question (which of course is equivalent to saying the player gets 1/9 of team PA). We could get cute and recognize that based on a player’s batting order position, his expected share of PA will change, and give different players a different share (while still limiting the scope to games/innings in which the batter actually played), but 1/9 is clean and any alternative approach would leave most batters pretty close to 1/9. The concept is simple; the formula will look a little messy. We start with our Base Runs equation:

A = H + W – HR = S + D + T + W

B = (2TB - H – 4HR + .05W)*.79776 = .7978S + 2.3933D + 3.9888T + 2.3933HR + .0399W

C = AB – H = Outs

D = HR

BsR = (A*B)/(B + C) + D

We will start by calculating the team’s runs with the player. This will take the same form, but now our A, B, C, and D components will start with the player’s stats and add eight reference players. I will assume that the reference player is a league average performer, and thus the reference team is a league average team prior to the addition of our player. One could make the case that with respect to the tertiary effect of a player, linear weights framework sidestep the issue by assuming an inverse relationship between the quality of the player in question and the quality of the reference team. That is, by using static linear weights for all players regardless of their performance, a linear weights framework implicitly assumes that the team is average after the player is added. Thus Frank Thomas is added to a worse team than Matt Walbeck, such that at the end of the day the run values of all events are the same between the Thomas team and the Walbeck team.

If you are tempted to sweat the details and subtract the player’s stats from the league before determining league average, don’t. It is actually surprising how little impact the choice of reference team has on the outcome (which a cynic might note is a reason for suspecting that the tertiary effect is de minimis, but what’s the fun in that?) This is why James is able to get away with using a single final formula for converting the player’s A, B, and C factors in Runs Created (for which he laid out 24 different versions to cover major league history) to TT RC by using just one equation. It’s not technically correct, of course, but as long as long as the reference team is within a reasonable range of major league offense, it’s not debilitating.

Without our player, the reference team will have a number of plate appearances equal to eight times the individual’s PA, and will perform at the league average, so we can define each factor for the reference team as follows, with the calculation using the 1994 AL averages shown:

R_A = Lg(A/PA)*PA*8 = .3143*PA*8 = 2.514PA

R_B = Lg(B/PA)*PA*8 = .3402*PA*8 = 2.722PA

R_C = Lg(C/PA)*PA*8 = .6567*PA*8 = 5.254PA

R_D = Lg(D/PA)*PA*8 = .0290*PA*8 = .232PA

Then for the team with the player, the team versions of the A, B, C, and D factors are just the player’s factor plus eight times his PA times the league average of the factor/PA:

T_A = A + R_A = A + 2.514PA

T_B = B + R_B = B + 2.722PA

T_C = C + R_C = C + 5.254PA

T_D = D + R_D = D + .232PA

In order to isolate the individual’s impact, we need to calculate how many runs his new theoretical team would score and subtract the runs that the reference team would have scored with just eight reference players. The team’s BsR will be:

T_BsR = T_A*T_B/(T_B + T_C) + T_D

Some of the PA terms in the denominator can be combined, so for the 1994 AL we get:

T_BsR = (A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D + .232PA

The reference team’s run scored will be equal to the league average BsR/PA times 8 times the player’s PA; to calculate league BsR/PA we can just plug the league average A, B, C, and D factors per PA into the BsR equation to get BsR/PA, then multiply:

R_BsR = (.3143*.3402/(.3402 + .6567) + .0290)*8*PA = 1.090PA

So our estimate of the individual’s run contribution to the theoretical team, which we’ll call Theoretical Team Base Runs (TT_BsR) is just the difference:

TT_BsR = T_BsR – R_BsR

Since we have PA in each term, for the 1994 AL it simplifies to:

TT_BsR = (A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D - .8579PA

If we apply Frank Thomas’ statistics directly to Base Runs, we get an estimate of 139.0. If we use Base Runs to estimate linear weight coefficients for the league, we get 131.4 (what we’ve been calling LW_RC in this series). If we use the TT approach, we get 132.2. As you can see, the TT estimate is not that much different than the full linear estimate, which does call into question the need for the TT approach. After all, Thomas is one of the most extreme hitters in the league; if he barely moves the needle, who will?

Regardless of the utility of this approach, I find it useful as an intellectual exercise because I believe the framework is the closest to approximating the real relationship between an individual batter and team performance. For a series ostensibly about rate stats, I’ve spent an entire post just setting up the numerator; don’t rate stats typically have a denominator as well? Seriously, though, if there’s one takeway I would like a reader to glean from this series, it is that if you want to set up an offensive evaluation system, you need to think through all of the pieces as you develop it. Starting with a run estimator, and then slapping on a rate state, and a baseline, and whatever bells and whistles you need, is not a sound approach. The choice of run estimator determines which denominator you should use, and the two should be compatible.

Rate State Series, pt. 8: Rate Stats for Linear Weights III

2021-07-21T08:23:00.035-04:00

This installment will tie up a few miscellaneous loose threads pertaining to rate stats to be used within a linear weights framework. The first relates to R+/PA, which I demonstrated is a “correct” rate stat to be used in this framework; I prefer it to RAA/PA, which produces identical linear differences between players, because it can be meaningfully compared on a ratio basis as well.

One issue with R+/PA for users is that R/PA is not a quantity that most people have intuitive feel for. We could still use RAA/650 PA, or we could use R+/650 PA, or we could convert R+/PA to a “team game basis” (along the lines of the previously discussed runs/25.2 or runs/27 outs) by multiplying by the league PA/G, or a long-term average thereof. As long as it’s a scalar multiplier consistently applied, there’s no real distortion imposed on the results – it’s just a matter of user preference as to which scale should be used.

A thornier issue is the implications of RAA/PA or R+/PA for teams. I have encountered sabermetricians for whom I have a great deal of respect whose ideal rate stat would be equally applicable to individuals and teams. I do not think - for reasons that were discussed earlier in this series - that this is possible. Of everything I’ve asserted in this series, I’m most confident that R/O is the proper rate stat for teams. Applying it to the normal range of major league players causes distortions, which while not disastrous are too frequent for me to just ignore.

Let’s look at all of the teams in the 1994 AL through these rate stats. For this exercise, we can greatly simplify the calculation of R+/PA – we don’t need to estimate extra PA, we know exactly how many PA each team has generated. We can also calculate RAA simply by using R/O. While we could always look at the number of runs a given run estimator produces for a team, it is also cleaner to just use actual runs scored:

In this case, the rank order using R/O and R+/PA is the same – this does not have to be the case, and I’m a little disappointed it worked out this way as it would have been nice to have a real-life example from this league-season. Again, the fundamental reason why plate appearances are not the correct rate state denominator for teams is that since team plate appearances are solely a function of team OBA, it is of no consequence how many plate appearances they need to produce the same R/O rate as another team. An offense that every inning produced four walks and three outs would score 1/3 R/O, the same as a team that every inning produced one home run and three outs, but the former would score .167 R/PA and the latter .250. In a league where the average team scored 1/6 R/O and .123 R/PA, each would get .5 RAA per inning, but the former would have a R+/PA of .194 and the latter .248.

To make the point again, if we use actual team runs scored or an estimate of team runs created as the starting point for a team rate stat, there is no need to attempt to measure the impact of their PA generation. In fact, making an additional adjustment would be inappropriate double-counting. The team’s totals already incorporate this impact, as the team’s actual plate appearances were a function of their on base average.

What if someone decided they wanted to construct a rate stat that didn’t include any subtraction in the numerator, and instead would just be a positive quality per plate appearance? If you follow this path, you will no longer have a final result expressed in a unit of runs, but perhaps due to your desire for a different scale this is preferable. Or maybe you want a rate stat that can never assume a negative value, as any stat that has a negative linear weight coefficient for an event has the potential to do; as we’ve discussed, even our absolute runs variant of linear weights will produce negative values for pitcher-level hitters. Maybe you want to make one of these alterations and them impose a different scale for some reason.

One way you could do this is to eliminate outs from the LW_RAA/PA equation. If you just ignore them, you will distort all of the values, and be left with a rate stat that is not only unitless, but also is just plain wrong. However, recognizing that outs are simply the complement of the events with the positive coefficients (since we’ve restricted our list of events to mutually exclusive and exhaustive components of plate appearances given our simplified definitions of terms for this series), we could rectify this by just subtracting the absolute run out value from each positive event. To put this in practice, start with our LW_RAA/PA equation:

LW_RAA/PA = (.5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs))/PA

Now we will subtract -.3150 from each of the events in the numerator that comprise the complement of outs (again, in this case this is all of the other variables in the numerator) to get a rate stat that for the moment I will call U (for “unitless”, although it’s really just for ease of reference) :

U = (.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W)/PA

This looks weird, since these weights are no longer in units of R or even R+, but what value does it take on for the league average? By definition, LW_RAA/PA for the league is zero, so this resolves to zero for the league:

LW_RAA/PA = (.5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs))/PA

The manipulation we did to get U was just to subtract -.3150*PA from everything, so:

(.5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs) - (-.3150*PA) )/PA = (0 + .3150*PA)/PA = .3150

So the league average of U is by definition equal to the negative of the outs coefficient (.3150 for the 1994 AL). So what happens if we compare a player’s U to the league average?

(.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W )/PA – LgU

multiply U by PA/PA to get

(.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W)/PA – (LgU *PA)/PA

= (.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W – LgU*PA)/PA

plug back in that LgU = .3150 in our case to get:

(.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W – .3150PA)/PA

If we distribute the -.3150 to all of the events already in the numerator that are plate appearances, then the leftover PA are outs, and so we are left with:

(.5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs))/PA

So U - LgU = LW_RAA/PA. Even though the coefficients have been manipulated into unitless, non-negative values, U maintains the necessary condition to be a proper rate stat for a linear weight framework because it produces the correct RAA/PA.

Why would someone want to express an equivalent to RAA/PA in these terms? This is not a question that I am qualified to answer, since I don’t personally use this form – but there’s a decent chance that you, as a consumer of baseball analysis circa 2021 have been exposed to it, whether you realize it or not.

Since U is now unitless, one might wish to manipulate it in order to put it on a scale that they find useful, perhaps the scale of a fundamental statistic. If you think about what U is, the numerator includes all on-base events, weighted by their run value – the RAA out coefficient. The denominator is just plate appearances. There is a fundamental statistic that we have been using throughout this series that takes a similar form, except instead of weighting the on-base events, it gives them all an equal weight of one. The fundamental form is the same though:

U = (.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W)/PA

other stat = (S + D + T + HR + W)/PA

The other stat is of course On Base Average. If one were to tweak the numerator coefficients for each event in U, it could be put on the same scale as OBA, but instead of giving each on-base event a weight of one, it would give each on-base event a weight based on its run value. It would be a weighted On Base Average. Enter Tom Tango and wOBA, which is now one of the two most ubiquitous rate stat in general sabermetric use thanks to its prominence at Fangraphs (with the other being OPS+ thanks to Baseball-Reference).

In order to convert what I have called U to wOBA, we just need to multiply it by the ratio between the league OBA and U, or equivalently the ratio between the league OBA and the negative of the LW_RAA out coefficient. For the 1994 AL, the average OBA was .3433 and U is .3150, so the ratio is 1.0896 (I will define V = LgOBA/LgU just for ease of writing formulas) which produces:

wOBA = U*V = (.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W)/PA*1.0896

wOBA = (.8956S + 1.2566D + 1.6176T + 1.9744HR + .7241W)/PA

I think there is a lack of understanding among even engaged consumers of sabermetric analysis as to what wOBA actually represents, and how it relates to linear weights. I admit that even as someone who is deeply interested in anything related to run estimators or rate stats, as I don’t actively use wOBA, I have to remind myself why it works. I have fielded multiple email questions from people who obviously think much more deeply about the construction of metrics than the average consumer, and they don’t know the connection and don’t find it intuitive.

I don’t consider this is the fault of Tango, who has explained this. Dave Studeman has written a good article explaining this. But for whatever reason it feels like a secret despite the best efforts of the people who are responsible for it.

So I will take a second here and spell out the relationship between the wOBA family of metrics, which includes wOBA, wRC (Weighted Runs Created, which will be equivalent to what we called R+), and wRAA (Weighted RAA, which is equivalent to LW_RAA), and R+/PA and LW_RAA/PA as we’ve discussed in this series. Keep in mind that I am speaking generally about these relationships using the metrics as I have defined them in this series, and not specifically about the implementation at Fangraphs or anywhere else, although the basic relationships hold. Some of the algebraic demonstrations can be simplified.

If you wish to convert wOBA into RAA (which Fangraphs would call wRAA), we’ve already demonstrated that we can compare U directly league U, so when using wOBA instead of U you just need to remember to remove V:

wRAA = (wOBA – LgwOBA)/V * PA

To calculate wRC (which will be equivalent to R+), you need to add in the league average R/PA times the batter’s PA:

wRC = wRAA + LgR/PA*PA = (wOBA – LgwOBA)/V*PA + LgR/PA*PA = ((wOBA – LgwOBA)/V + LgR/PA)*PA

wOBA can be easily converted to wRAA/PA, which is equal to LW_RAA/PA:

wRAA/PA = LW_RAA/PA = (wOBA – LgwOBA)/V

and since R+/PA = LW_RAA/PA + LgR/PA, we can also write:

R+/PA = (wOBA – LgwOBA)/V + LgR/PA

Let’s run a leaderboard using the “w” stats, although we’ve already seen most of these values in different guises:

Personally, I prefer R+/PA to wOBA as a rate stat, as the former is directly comparable both as a difference and a ratio, while the latter has to be manipulated in order to be compared either way – both differences and ratios of wOBA have no intrinsic meaning. However, the advantages of having a scale similar to that of OBA, where no negative values are possible, where each event has a clean positive weight, and the “natural” denominator of plate appearances is used. Tango et. al. also took advantage of its structure to more easily apply statistical techniques in The Book, so there are certainly reasons why a user might prefer it, and it is well-constructed by which I mean that the scaling does not cause issues for derivative metrics as long as you know how to account for it.

Finally, the third loose thread I wanted to address in this post. Prior to introducing wOBA, Tango developed a rate stat version of linear weights he called Linear Weight Ratio. It was somewhat conceptually similar to wOBA in that it sought to eliminate the negative weight for outs. Instead of adding the out value to the positive events in the numerator and thus being able to dispense with the negative value of the out as is the case for wOBA, LWR was constructed by making outs the denominator. In order to make the relationship between the positive events more clear, the value of a single was fixed at 1.0, with the coefficients for the other events defined as the ratio of the LW value for the given event to the LW value for a single. If you let the LW value of a single be s, a double be d, etc., then the formula for LWR is:

LWR = (s/s*S + d/s*D + t/s*T + hr/s*HR + w/s*W)/Outs

Which for our 1994 AL weights of .5069S + .8382D + 1.1695T + 1.4970HR + .3495W becomes:

LWR = (S + 1.6536D + 2.3072T + 2.9532HR + .6895W)/Outs

If we define “x” as the absolute LW value of the out (-.1076 in this case), there are some very straightforward relationships between LWR and LW_RC/O:

LW_RC/O = s*LWR + x

LWR = (LW_RC/O + x)/s

If we define “y” as the average LW value of the out (-.3150 in this case), we can define similar relationships between LWR and LW_RAA/O:

LW_RAA/O = s*LWR + y

LWR = (LW_RAA/O + y)/s

Of course, LW_RAA is the building block for our “correct” rate stat (be it LW_RAA/PA, R+/PA, or wOBA) for the linear weights framework, so you could with some additional algebra convert LWR to those, although LWR sans manipulation does not meet the differential or ratio comparability standards.

Rate State Series, pt. 7: Rate Stats for Linear Weights II

2021-07-07T08:58:00.041-04:00

In the last post, we ended with a dilemma: we have a metric (linear weights RAA/PA) that we believe is correct for a linear framework, but it can’t be compared using ratios. We also have an approach (linear weights RC/O) that produces the right RAA but as a rate is inconsistent with RAA/PA. I asked whether we might be able to make adjustments to one or both of these that would get us to the right place.

Let’s start with trying to manipulate R/O to get a proper rate stat. We know that using R/O for individual players (even when using a linear weights estimate for runs created) overstates their value by treating them as a team with respect to their plate appearance generation. We know that using R/PA fails miserably because it doesn’t account for PA generation at all. What if, instead of trying to manipulate the denominator to get an acceptable rate stat, we attempted to manipulate the numerator?

Linear weight runs created with the “-.1 type out value” doesn’t take into account the extra opportunities created (or not created) for a batter’s teammates by his avoidance of outs, but we could make an explicit adjustment that would take care of this. In part 2, we laid out the math to calculate plate appearances as a function of OBA:

PA/G = (LgO/G)/(1 – OBA)

To compare a player’s rate of PA generation to the league, we can calculate the PA/G he would generate as a team, less the league average. For the sake of simplicity, let’s use the variable X to represent league O/G, and the variable EPA to mean “Extra PA” relative to what would be produced by a hitter with a league average OBA. (What follows is a needless bunch of algebra, but I wanted to demonstrate that the final result is tied back to the equation that relates PA/G, O/G, and OBA):

X/(1 – OBA) – X/(1 – LgOBA) = EPA/G

where “games” are defined as X outs. Since this is the case, we can factor out X from the left side of the equation and divide both sides by X, which converts EPA/G to EPA/Out:

1/(1 – OBA) – 1/(1 – LgOBA) = EPA/O

For ease of notation, I’m going to switch to using O/PA to represent the hitter’s 1/(1 – OBA); the complement of OBA is O/PA, and it’s reciprocal is thus PA/O. I will leave LgOBA as a variable since it is a constant from the perspective of calculating the individual’s PA generation:

PA/O – 1/(1 – LgOBA) = EPA/O

As this is an individual metric, I’d rather express it with a denominator of PA than O, so I will multiply both sides by O/PA to get:

(PA/O – 1/(1 – LgOBA))*O/PA = EPA/PA = 1 – 1/(1 – LgOBA)*(1 – OBA) since O/PA is just 1 – OBA

= 1 – (1 – OBA)/(1 – LgOBA)

replace 1 with (1 – LgOBA)/(1 – LgOBA) to get:

(1 – LgOBA)/(1 – LgOBA) – (1 – OBA)/(1 – LgOBA) = (1 – LgOBA)*(1 – OBA)/(1 – LgOBA)

= (OBA – LgOBA)/(1 – LgOBA) = EPA/PA

This simple equation yields the number of additional PA generated for a batter per PA, beyond what a batter with a league average OBA would have generated. Of course, if we multiply by PA, we will get the raw number of extra PA generated. As an example, Frank Thomas had a .4921 OBA in a league where the OBA was .3433, and he had 508 PA, so he created an additional 115.1 PA for his team beyond what an average hitter would have contributed:

(.4921 - .3433)/(1 - .3433)*508 = 115.1

In order to use this with our modified R/PA, we need to convert it to runs, which can be done by simply multiplying by the league average of .1363 R/PA to get 15.7. As we’ve computed previously, Thomas had 131.4 LW_RC, which is his direct run contribution as a result of his own PA, but without considering the impact he had on his team by creating additional PA (or, more precisely, on a league average team since the entire linear weights framework we’ve been working with this in this series is built from the league average). That was worth an additional 15.7 runs, so his total run contribution for the numerator was about 147.1 runs.

Before moving on, credit is due to the developer of this approach. This methodology, at least expressed in terms of absolute runs plus the PA impact, was first developed by a BaseballBoards.com poster with the moniker “Sibelius”; if he was not the first, he is certainly the person who introduced me to this concept. Sibelius’ original construct combined this with Jim Furtado’s XR (Extrapolated Runs), so he called it XR+ and the resulting rate XR+/PA. I will be using the more generic R+/PA to describe this metric going forward.

Additionally, Sibelius calculated the “+” portion due to PA generation in a mathematically equivalent but quicker way than I did. This later tripped me up, as I had been influenced by his ideas but approached the problem in the manner I did above, starting from extra PA and converting to runs. Thus I posted it on the board as if it was a new, alternative approach rather than simply a different way of calculating the same thing. My faux paus was quickly pointed out and I gladly concede it, although I personally still find the “proof” above to be the most straightforward way to understand the math. But Sibelius’ calculation is more straightforward and I will use it going forward. Instead of messing around with PA, he jumped straight to outs, calculating the number of outs that the player avoided beyond those an average OBA hitter would have made in the same number of PA. Then he multiplied by the league average R/O to get the runs from PA generation/out avoidance. Six of one, half a dozen of the other as of course R/PA = R/O * (1 – OBA), and so Sibelius' equation for the “+” portion would look like this:

(OBA – LgOBA)*PA*LgR/O

For Thomas, this is (.4921 - .3433)*508*.2075 = 15.7

So our formula for R+/PA will be:

R+/PA = (LW_RC + (OBA – LgOBA)*PA*LgR/O)/PA

Note that this is equivalent to:

R+/PA = LW_RC/PA + (OBA - LgOBA)*LgR/O

In order to compute the associated RAA:

RAA = (R+/PA – LgR/PA)*PA

This works as of course the league average R+/PA is equal to the league average R/PA by definition.

Here are the top and bottom five hitters from the 1994 AL:

The first column for RAA is RAA based on R+/PA, and it is an exact match for LW_RAA. Thus R+/PA meets the necessary condition for being an acceptable rate stat. However, so did R/O. We need to test that it produces the same rank order as LW_RAA/PA beyond these players; a good place to start would be with the cohort of four players who were jumbled in order when looking at RAA/PA and R/O:

Here we get a match in rank order. Of course, looking at fourteen players and finding consistency in their ranking doesn’t prove that there would never be a player for whom the two would not agree. We will need some other approach to demonstrate that.

Let’s leave R+/PA for now and go back to the other approach we were considering, which was making an adjustment to RAA/PA that would allow it to produce meaningful ratio comparisons. The obvious solution is to add the league average R/PA to RAA/PA to get a modified absolute R/PA. After all, RAA/PA represents an individual’s contribution above average per PA, taking the impact of PA generation into account. Adding back in the league average R/PA will return this to an absolute R/PA basis, that will equal raw R/PA at the league level by definition, and that unlike LW_RC/PA will capture the value of extra PA generated by the batter. We know that by definition it will match RAA, since the difference between an individual and the league average will be equal to RAA/PA (RAA/PA + LgR/PA – LgR/PA).

Here are the top and bottom 5, with “mod R/PA” being the metric we’re discussing (LW_RAA/PA + LgR/PA):

If these figures look familiar, it’s because (with minor rounding discrepancies), they are equal to the R+/PA figures we just calculated. That’s right – all of that algebra to calculate extra plate appearances or outs avoided, and multiply by the league average R/PA or R/O, add back to LW_RC – we could have dispensed with all of it, and just added LgR/PA to LW_RAA/PA.

This is our “proof” that R+/PA as we built it from adjusted absolute runs created meets our criteria for a proper rate stat in a linear weights framework – it’s equivalent to using RAA/PA. The two parallel approaches are not in fact parallel – they are alternative ways of getting to the same place.

To demonstrate why the math works out this way, let’s return to our two linear weights equations from part 1:

LW_RC = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .1076(outs)

LW_RAA = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs)

If we think about everything on a per plate appearance basis and building R+/PA using the RAA/PA + LgR/PA path, what we’re doing is taking the second equation and adding the league average R/PA for every event that corresponds to a plate appearance, which is all of them (in a full blown implementation, where we would be considering non-PA events both in the linear weights formula and the calculation of the PA generation impact, this wouldn’t be as clean). Since the league R/PA is .1363, this results in:

R+ = .6432S + .9745D + 1.3058T + 1.6332HR + .4858W - .1788(outs)

Alternatively, from the Sibelius approach, the plus component can be manipulated to read:

(OBA – LgOBA)*PA*LgR/O = ((H+W)/PA – LgOBA)*PA*LgR/O = (H + W – LgOBA*PA)*LgR/O

Continuing to treat everything on a per PA basis, this is (H + W – LgOBA)*LgR/O

Thus any hit or walk contributes (1 – LgOBA)*LgR/O = LgR/PA = .1363, and each out “contributes” (0 – LgOBA)*LgR/O = -LgOBA*LgR/O = -.3433*.2075 = -.0712. Adding .1363 to the LW_RC value for each event results in the same weights we just found (i.e. .6432 for a single), as the weights for on base events are the same between the LW_RC and LW_RAA equations. The out is now worth -.1076 + -.0712 = -.1788 runs, also the same value. We knew from the results that the two approaches must be equivalent, but I always prefer a “proof” when possible.

At this juncture we reach a philosophical question: should we dispense with the notion of absolute runs created for individuals altogether, and replace it with a R+ approach? In other words, instead of our default way of discussing an individual’s contribution being something like: “Frank Thomas created 131 runs in 508 plate appearances while making 258 outs”, and presenting RC, the rate stat, and some metric compared to a baseline, we could cut to the chase and say “Frank Thomas contributed 147 runs in 508 plate appearances”, with the rate and baselined metric. This approach does away with the need to explicitly think about outs, since their special impact beyond plate appearances has been built in to the 147 R+. It also prevents the awkward situation of having a runs created figure of 131 but no handy denominator with which to convert it to a rate stat without doing a R+ or RAA calculation first.

There is no right answer, although there certainly is a good case to be made for just reporting Thomas’ runs created as 147, since it limits the likelihood of user-initiated bad math. Call it inertial reasoning if you must, but I still like the idea that the runs created figure is the batter’s direct contribution, and the secondary impact of PA generation is captured when looking at the rate or the baselined stat.

Rate Stat Series, pt. 6: Rate Stats for Linear Weights I

2021-06-23T08:28:00.045-04:00

Last time, I attempted to demonstrate that linear weights producing a result in terms of runs above average properly account for the value of extra plate appearances created by a batter. From here on, I will talk about this contention in the same way as I would any scientifically-demonstrated fact, even though I admit that I have not provided a “proof” in the mathematical sense. The casual use of language is intended to prevent wasting space repeating myself rather than an attempt to claim a more robust result than is appropriate.

Since we have demonstrated that LW_RAA captures the player’s direct and indirect contributions to team offense (at least within the linear framework of metrics), I would contend that it follows that any linear weights-based rate stat we should propose must return the same RAA as we would get from eschewing a rate stat altogether and just applying our linear weights formula with the “-.3 type out value”. One very simple way to do that is to simply use RAA, rather than a measure of absolute runs created, as the numerator for the rate stat. In this case, the obvious denominator is plate appearances.

Using outs in the denominator would be appropriate for a team metric, but for an individual would overstate the value of his PA generation. For a simple demonstration of this, recall from part three the equation:

PA/G = (O/G)/(1 – OBA)

and the equivalent PA = O*(1 – OBA), given our simple definitions which as a refresher are PA = AB + W, O = AB – H, and OBA = (H + W)/(AB + W)

This direct relationship between plate appearances, outs, and OBA means that if we were to use outs as the denominator rather than PA, we would be inflating the rates for players with higher OBAs, even though we’ve already demonstrated that the PA generation impact of higher OBAs is captured in the numerator (RAA).

Here were the top and bottom 5 performers in terms of RAA/PA, which for ease of use I have restated as RAA/650 PA (This works out to 152.5 full games for a player getting 1/9 of an average 1994 AL team’s PA; sadly, these teams didn’t get anywhere close to playing a 162 game season. There’s no special significance to 650 other than that it is a round number that is reasonably close to the number of PA a full-season batter might accumulate):

You may have noticed that the identity and order of these players did not change from when we used a fully dynamic approach that treated them as if they were their own teams (BsR/O). This recalls a simple truth that I am burying beneath thousands of words of minutia – any reasonable approach will reach similar conclusions for the majority of situations. Even a poorly designed, indefensible metric like OPS will get you 98% of the way there. This series is about the tiny differences that lie beyond that point and the larger differences that arise in values as opposed to rank order.

To wit, while the rank order remains the same, the RAA values are different using linear weights, and with the exception of Griffey, less extreme. All of the other players have moved closer to average, with Frank Thomas losing a whopping 7.7 RAA due to using a linear approach rather than imagining an entire lineup of Big Hurts.

At this point, we could stop, and simply use RAA/PA or RAA/650 PA as our final linear weights-grounded rate stat. However, it lacks the very useful trait of ratio comparability that would be desirable in the ideal rate stat. While linear differences in RAA/PA can be compared (e.g. Thomas contributed an additional .081 RAA/PA beyond what Chili Davis did), the ratios are not particularly useful. Consider two players who each had 600 PA, one contributing +1 RAA and the other -1 RAA. If you can explain the practical baseball interpretation of the resulting ratio of -1, be my guest.

This happens because we have applied the high baseline of average to the metric. However, we have another version of linear weights that is based on absolute runs from which we could build a rate stat. Remember that in order to qualify for consideration, the RAA that results from that rate stat must be equivalent to the RAA from simply applying the linear weights formula and not comparing a rate to the league average.

The obvious first choice is Runs/PA, using our formula for linear weights runs created. In this case, I am not showing the leaders, but rather the same ten players in the same order. RAA in this case is (LW_RC/PA – LgR/PA)*PA:

The results are not even close to what we need, and the reason is simple: we have not accounted in any way for each batter’s PA generation. There is an easy alternative that might correct this: using outs in the denominator. This just takes us to the correct team rate stat, although the numerator is linear rather than dynamic (either in the case of using Base Runs or using actual runs scored on the team level). As such it does implicitly consider PA generation; might it produce satisfactory results for individuals? Here RAA = (LW_RC/O – LgR/O)*O:

A perfect match. Absolute runs per out produces the same RAA as the direct application of LW_RAA. R/O also has the advantages of being meaningfully comparable as both a difference and a ratio and is the same as the correct team rate stat. Everything is great, except we haven’t answered the question: Does it actually work?

Remember, I said that matching LW_RAA was a necessary condition for our proposed alternative linear weight rate stat to meet – it is not a sufficient condition. We’ve already concluded that RAA/PA is a proper rate stat for a linear framework; in order for an alternative to be acceptable, it must produce results that are consistent with RAA/PA. How do we determine this consistency, other than matching the RAA result, when the numerators and the denominators each start from a different basis?

One simple but obvious way to determine if they are consistent is to confirm that they result in the same rank order of players. They did for our most extreme hitters, but does that hold for all hitters?

Apparently not. I didn’t have to go to far to find this little cluster of hitters as they rank 7-10 in RAA/PA. None of them rank in the same spot as Vaughn is first in RAA/650 but second in RC/O; Lofton is second/third; Mack is third/fourth; and Clark is fourth/first.

I slipped OBA onto the chart because it helps to explain what is going on. Will Clark’s .433 OBA ranked fifth among AL hitters with 200 PA; using outs in the denominator helps him as it implicitly assumes that he represents an entire team with a .433 OBA. While all of these hitters had excellent OBAs relative to the league, R/O goes a bit too far in valuing this. Given that R/O still produces the right RAA, any distortions have to be somewhat limited for normal players.

One can come up with extreme thought experiments, like a player with a .999 OBA all from walks versus a player with a .600 OBA, all from home runs. A team made up of the former would score what would certainly feel to the opposing pitching coach like a nearly infinite number of runs; but as a single player in a lineup, his impact would be muted. It’s not necessary to answer the thought experiment as to which would be more productive to see that R/O applied to extreme individual players would break down. Incidentally, it is exactly this type of scenario that I got into trouble trying to “prove” in my last attempt at this series – while I do think that the theoretical team methods I’ll discuss later provide reasonable estimates for this situation, relying too heavily on them for proofs is begging the question.

So, we have a rate stat (LW_RAA/PA) that works, but it lacks ratio comparability. There are (at least) two ways we could go about solving this problem:

1. We could adjust LW_RC/PA in some way to take into account the value of PA generation

2. We could manipulate LW_RAA/PA so that it’s no longer a measure of runs above average, but instead on an absolute runs basis

Next time I’ll explore these two parallel questions...if in fact they are parallel at all.

Rate Stat Series, pt. 5: Linear Weights Background

2021-06-09T08:52:00.040-04:00

Linear methods sidestep the issues that arise from applying dynamic run estimators to players by simply ignoring any non-linearity in the run scoring process altogether. While this is clearly technically incorrect, it is closer to reality than pretending that a player’s performance interacts with itself. Since an individual makes up only 1/9 of a lineup, it is much closer to reality to pretend that his performance has no impact on the run environment of his team than to pretend that it defines the run environment of his team. Linear weights also have the advantage of being easy to work with, easy to adapt to different baselines, and easy to understand and build. Their major drawback is that the weights are subject to variation due to changes in the macro run environment (as distinguished from the marginal change to the run environment attributable to an individual player).

Linear methods were pioneered by FC Lane and George Lindsey, but it was Pete Palmer who used them to develop an entire player evaluation system, publish historical results, and bring them into the position of the chief rival to Runs Created in the 1980s. Curiously (especially since Palmer is a prolific and brilliant sabermetrician whose pioneering work includes park factors, variable runs per win, using the negative binomial distribution to model team runs per game, and more), Palmer’s player evaluation system as laid out in The Hidden Game of Baseball and later Total Baseball and the ESPN Baseball Encyclopedia never bothered to convert its offensive centerpiece Linear Weights into a rate statistic.

This gap contributed to two developments that I personally consider unfortunate. First, confusion about how to convert linear weights to a rate may have hampered the adoption of the entire family of metrics, and this confusion generally persisted until the publication of The Book by Tom Tango, Mitchel Lichtman, and Andy Dolphin. Second, Palmer did offer up a rate stat, but he did not tie it to linear weights, or in its crudest form to any meaningful units at all. That’s because Normalized OPS (later called Production), which you may know as OPS+, was the rate stat coupled with linear weights batting runs.

To my knowledge, Palmer has never really explained why he didn’t derive a rate stat from linear weights to use; the explanations have instead focused on the ease and reasonable accuracy of OPS. In The Hidden Game, the discussion of linear weights transitions to OPS with “For those to whom calculation is anathema, or at least no pleasure, Batter Runs, or Linear Weights, has a ‘shadow stat’ which tracks its accuracy to a remarkable degree and is a breeze to calculate: OPS, or On Base Average Plus Slugging Percentage.”

Coincidentally, Palmer recently published an article in the Fall 2019 Baseball Research Journal titled “Why OPS Works”, which covers a lot of the history of his development of linear weights and OPS, but still doesn’t explain exactly why a linear weights rate wasn’t part of the presentation.

Without the brilliant mind of Palmer to guide us, where should we turn for a proper linear weights-based rate stat? To answer that question, I think it’s necessary to briefly examine how linear weights work. For this discussion, I am taking for granted that the empirical derivation of linear weights is representative of all linear weight formulas. This is not literally true, as belied by the fact that the linear weights I’m using in this series were derived from Base Runs, not from empirical data. If we were using an optimized Base Runs formula, the resulting weights would be very close to empirical weights derived for a similar offensive environment, but other approaches to calculating linear coefficients like multiple regression can deviate significantly from the empirical weights. Even so, the final results are similar enough that the principles hold for reasonable alternative linear weight approaches.

What follows will be elementary for those of you familiar with linear weights, but let’s walk through a sample inning featuring the star of our series, Frank Thomas. I want to use this example to illustrate two properties of linear weights when using the “-.3 type out value” (i.e. when the result is runs above average): the conservation of runs, and the constant negative value of outs. This example will simplify things slightly, as in reality not every event in the inning cleanly maps to a batting event that is included in a given linear weights formula (e.g. wild pitches, balks, extra bases on errors, etc.) It also will presume that the run expectancy table we use for the example corresponds perfectly to our linear weights, which it does not. Still, the principles are generally applicable to properly constructed linear weights methods, even if the weights were derived from other run expectancy tables or, as is the case for us in this series, by another means altogether (I’m using the intrinsic weights derived from Base Runs for the 1994 AL totals).

Baseball Prospectus has annual run expectancy tables; their table for the 1994 majors is:

On July 18, Chicago came to bat in the bottom of the seventh trailing Detroit 9-5. Their run expectancy for the inning was .5545 as Mike LaValliere stood in against Greg Cadaret. He drew a walk, which raised the Sox RE to .9543, and thus was worth .3998 runs. The rest of the inning played out as follows:

1. If we were going to develop empirical LW coefficients based on this inning, we would conclude that a home run was worth 2.658 runs on average, and thus our linear weight coefficient for a home run would be 2.658. The other events would be valued:

This is in fact how empirical LW are developed, but of course a much larger sample size (typically at least an entire league-season) is used.

2. The team’s runs above average for the inning is always conserved. We started the inning with the bases empty and nobody out for a RE of .5545. This is the same as saying that the average for an inning is .5545 runs scored. The White Sox actually scored 4 runs, and the total of the linear weight values of the plays was 3.4455 runs, which is 4 - .5545. They scored 3.4455 runs more than an average team would be expected to in an inning. The sum of the linear weight values will always match this.

Because of this, we can be assured that the run value of additional plate appearances created by the positive events of the batters has been taken into account in the linear weight values. If this were not the case, runs would not be conserved.

3. Since that is true, it is also true that the sum of the LW values of the positive events (which is 4.8128 runs) plus the sum of the LW values of the outs (-1.3673) must be equal to the runs above average for the inning (3.4455). The sum of the values of the outs will be higher in innings in which more potential runs were “undone” by outs, as was the case here. On the other hand, an inning in which three outs are recorded in order will result in -.5545 runs.

We can use this fact to isolate the run value of the out between the portion that is due to ending the inning (what Tom Tango has called the “inning killer” effect of the out; this is the -.5545 that is the minimum out value for an inning), and that which is due to wasting the run potential of the positive events (what’s left over, in this case, -.8128 runs).

If we wish to convert our linear weights from an estimator of runs above average to an estimator of absolute runs, we need to back out the inning killer value of the out (since it will be present for every inning equally and serves to conserve total RAA) from the overall value of the out, leaving the remainder which we do not need to worry about as it would have to be debited from the value of the positive events in order to conserve runs.

So we can take -.5545/3 = .1848 and add it back to the linear weight RAA out value, which for our example was -.3150. This results in an absolute out run value of -.1302, In our example we’re using -.1076; these don’t reconcile because:

1. our linear weights don’t consider all events (we’re ignoring hit batters, sacrifices, all manner of baserunning outs, etc.)

2. our linear weights weren’t empirically derived from the 1994 RE table as the .1848 adjustment was

While the numbers don’t (and shouldn’t!) balance perfectly in this case, this is the theoretical bridge for converting empirical linear weights from a RAA basis to an absolute runs basis. I would also contend is serves as a demonstration by inductive reasoning that absolute linear weights do not capture the PA generation impact of avoiding outs, but RAA linear weights do.

Note that converting to the “-.1 type out value” does not eliminate the result of negative runs altogether. An offensive player who is bad enough will be credited with negative runs created (if it helps you to imagine what this level of production might look like, consider that the total offensive contributions of pitchers has hovered near zero absolute runs created in the last decade). For real major league position players, this will not happen except due to sample size. If you’d like an interpretation, I have found this helpful (I stole it from someone, probably Tom Tango, and have badly paraphrased): Since linear weights fix the values of each event for all members of the team, the level at which runs created are negative is the level at which in order to conserve team runs, the weights of positive events cannot be reduced – the poor batter essentially undoes some of the positive contributions of his teammates.

As an aside, the first paper I’m aware of that made the connection between the two linear weight approaches in this manner (rather than simply solving algebraically for the difference between the two without providing theoretical underpinning) was published by Gary Skoog in a guest article in the 1987 Baseball Abstract. This article, titled “Measuring Runs Created: The Value Added Approach” is available at Baseball Think Factory.

Rate Stat Series, pt. 4: Players as Teams

2021-05-26T08:33:00.020-04:00

A dynamic run estimator is a run estimator that allows offensive events to interact with each other, such that the value of a given event is not fixed as would be the case in a linear weights formula (e.g. a single is worth .50 runs), but rather is dependent upon all of the other components of the batting line. Dynamic run estimators are great in theory, since the run scoring process for a team is obviously dynamic and not linear. However, there are two issues:

1. They are harder to design than linear estimators. Any idiot with a spreadsheet and a dataset can run a linear regression on runs scored and have a linear estimator when they are done. It may not be a good one, but it will be functional and will probably have a low RMSE when estimating team runs scored. To develop a dynamic model, one must consider the run scoring process and produce a simplified model, but not so simplified as to not produce reasonably accurate estimates.

This is not a series about run estimators, but the most commonly used dynamic run estimator, Bill James’ Runs Created, suffers from flaws that make it unable to handle extreme offenses. A much better model, David Smyth’s Base Runs, is powerful and will be used here.

2. They are not appropriate to apply to individual offensive statistics. Dynamic estimators always involve multiplying base runners by some factor representing advancement of baserunners (in Runs Created that’s the end of the story, Base Runs accounts for the unique nature of home runs). This multiplication is inappropriate when applied to an individual player, as now Frank Thomas’ high OBA is multiplied directly with his high power which advances runners. In reality, there is some interaction, but Thomas’ impact is diluted by being just 1/9th of the lineup. Inputting his statistics into a dynamic run estimator produces an estimate of how many runs a team would score if each batter hit like Thomas.

Due to this issue, I do not advocate applying dynamic run estimators directly to individuals, but this post will still address the rate stat implications of such applications. Later we will discuss theoretical team methods that allow the use of a dynamic run estimator while still accounting for the fact that the player is just one of nine in the lineup.

This series will now discuss what I believe to the be the proper rate stats for a particular framework for evaluating individual offense. One of my objectives is that for each option of a framework for building a rate stat presented, there be at least one variation that is linearly comparable and one that is ratio comparable. I’ve defined those terms as I use them at length before, so here I will be brief:

* A statistic is linearly comparable if the difference between two figures is meaningful. A hitter with a .400 OBA would reach base 100 times more than a hitter with a .300 OBA over 1000 PA.

* A statistic is ratio comparable if the ratio between two figures is meaningful. Our .400 OBA player reached base 33.3% more frequently than the .300 OBA player

Ideally, our metric will facilitate both types of comparison, but if not, I will endeavor to present an alternative formulation that fills the gap. I will not propose any metrics that are neither linearly comparable or ratio comparable because they are the scourge of sabermetrics (hello OPS).

The underlying principle of the discussion that follows for the three frameworks (treating the player as a team, a full linear model, and a theoretical team model) is that the rate stat should be consistent with the run estimator used. If the run estimator treats the player as if he is a team, then the corresponding rate stat should treat the player as if he is a team.

In this case, that makes it very simple. The proper denominator for a team rate stat is outs. If you apply Runs Created, Base Runs, or some other run estimator directly to an individual player, the proper denominator is outs.

At this point in the discussion, this may ring as a somewhat hollow declaration, as I have only indirectly made the case for why we might want to use a denominator other than outs for an individual when it is so clearly the proper choice for a team. Since I’m suggesting that outs are the proper choice for this framework, I’ll defer that case for later.

In this case, I advocate for using outs when applying a dynamic run estimator to a team because it is the only consistent treatment. The only justification for going down this path (other than needing something quick and dirty) is a theoretical exercise – how many runs would a team that hit like Frank Thomas score? While I don’t think this theoretical result is appropriate for attempting to value Thomas’ contribution the 1994 White Sox, it at least does have an interpretation. If you start mixing frameworks, you really have a mess on your hands. There’s no good reason (other than crude estimation) to apply a dynamic run estimator directly to an individual; there’s no sense in deviating from the corresponding rate stat in order to try to make the results more comparable to a better approach to evaluating individual offensive contribution. Just use the better approach, and if you insist on misapplying a dynamic run estimator to individual players, make outs the denominator so that at least you have a theoretically coherent suite of metrics.

I should note that Bill James in the 1980s took this entire process to its logical conclusion. After applying Runs Created to individuals, dividing by outs, and multiplying by a constant that was close to the league outs/game for the definition of outs chosen, he went a step further and used the Pythagorean theorem to estimate the winning percentage that this team would have if it allowed an average number of runs. He then converted it to wins and losses by using the number of outs the player made to define games, which caused all kinds of problems, but at least he was committed.

This will be the first of several times that I’ll run a leaderboard for the 1994 AL using a particular framework. Here we have the top 5 and bottom 5 performers with at least 200 PA in Base Runs/Out. RAA is “Runs Above Average” and is calculated simply as (BsR/O – LgR/O) * Outs. Spoiler alert: No matter how we slice it, Frank Thomas is going to come out as the leading hitter in this league, as he raked .353/.492/.729 on his way to a second consecutive MVP award.

I am showing at least one more decimal place on each metric than I usually would just to allow for a little more precise calculation if you’re following along; it is no way a statement about the significance of the ten-thousandths of runs per out.

Runs per out can of course be scaled; Bill James multiplied it by the league average outs/game appropriate given the categories be considered in the computation of outs. For instance, in this case, since we’re defining outs as AB – H, the average outs/game will be around 25.2 (for the 1994 AL it was 25.19). A more complete accounting of outs, like AB – H + CS + SH + SF + DP, would get close to 27 outs/game. While putting individual contribution on a team games basis is nonsensical on some level, since it is just a scalar multiplier it causes no real distortion and provides a scale that is easily understandable, in the same manner that ERA or K/9 are understood by everyone other than Matt Underwood and Harold Reynolds.

Rate Stat Series, pt. 3: Teams

2021-05-12T07:25:00.004-04:00

If I tell you that three teams in the same league-season played the same number of games (113), and that one of them scored 679 runs, another scored 670, and the third scored 633, how confident would you be in using this limited data to rank the productivity of their offenses? As usual in this series, we are ignoring park factors and other contextual factors (like quality of opposition/not having to face one’s own pitching staff); since they are from the same league-season, you don’t need to worry about whether the win value of each team’s runs was the same. Assume also that runs will be distributed across games by a known distribution like Enby, so the distribution is also not a differentiator. Assume that we don’t care about any “luck”; the actual total is what matters, not what a run estimator came up with. What else do you need to know?

I would contend that given the (admittedly restrictive) parameters I’ve placed on the exercise, you now know almost everything you need to know. In a small number of cases, and to a small extent, you are missing valuable information – but for most situations, you should need no additional information.

Now suppose I told you something similar about three players: same league season, same number of games played (111), and three runs created estimates: one player created 106 runs, one 92, and one 88. Do you feel like you need any additional information to put these players in the proper order of offensive productivity?

I hope that your answer here is yes, and a lot of it. I’ve told you how many games each have played, but that doesn’t tell you how many opportunities they’ve had at the plate. Sure enough, in this case one of the players had substantially fewer plate appearances than the others (489, 490, 451 respectively). Given that the player who created 90 runs had 39 more plate appearances than the player who created 86, it seems likely that the latter player was actually more productive on a rate basis.

I did not tell you how many plate appearances each of the three teams had in their 113 games; I don’t think it’s relevant to the question at hand, but the answer is 4493, 4611, and 4556 respectively. Why do we need to know plate appearances (or something) in the case of players, but not in the case of teams? Understanding this gets to the heart of the reason this series needs to exist at all, why applying the same rate stat to team offenses and player offense may not work as intended.

In the previous installment, I asked the question: “Where do plate appearances come from?” The answer is that every inning (excluding walkoff situations) starts with three PAs guaranteed, and only by avoiding outs (reaching base and not being subsequently retired on the bases) can a team generate additional plate appearances.

From a team perspective, then, plate appearances are not an appropriate denominator for a rate stat, because differences in team plate appearances are the result of differences in performance between the teams. To return to the three teams discussed above, they are the 1994 Indians, Yankees, and White Sox respectively. The Indians had the fewest PA of the three yet scored the most runs. Does this mean that their offense, which already scored more runs than the other two clubs, was even more superior than the raw numbers would suggest?

An offense does not set out to maximize its plate appearances, nor does it set out to score the maximum number of runs it can in the minimum number of plate appearances. An offense sets out to maximize its total runs scored. Plate appearances are a function of the rate at which a team makes outs. At this point it might be helpful to consider the three teams:

New York’s OBA was 22 points higher than Cleveland’s and thus they generated an extra plate appearance per game. When ranking team offenses, it wouldn’t make sense to penalize the Yankees for this, which would be the case if we used R/PA. The difference in plate appearances simply reflects the different manner in which New York and Cleveland went about creating runs. For a team, plate appearances are inextricably linked with their OBA. Each inning, a team attempts to score as many runs as it possibly can before making three outs. It’s possible to score one run in a complete inning with as few as four or as many as seven plate appearances. Whether a team uses four, five, six, or seven plate appearances to score a single run is irrelevant in terms of that run’s impact on them winning or losing the game (*). Thus outs or an equivalent like innings are the correct choice for the denominator of a team rate stat.

(*) I am speaking here simply about the direct impact of the runs scored and not any downstream effects or the predictive value of team performance. Perhaps the team that uses seven PA to score one run benefits by wearing down the opposing pitcher or is more likely to have success in the future because they had four of seven batters reach base compared to one in four for the team that only needed four PA. Here we’re just focused on the win value directly attributable to the run scored and not any secondary or predictive effects.

The fact that outs are fixed for each team each inning (ignoring walkoffs) means that outs are also fixed for each team each game (ignoring walkoffs, rainouts, extra innings, and foregone bottom of the ninths). Which means that outs are also fixed for each team each season (ignoring those factors and cases in which teams don’t play out their full schedules, or have to play tiebreakers), which means that R/G and raw seasonal runs scored total are essentially equivalent to looking at R/O for a team. So for the question I asked at the beginning of the article, just knowing that the three teams had played an equal number of games, we had a pretty good idea how they would “truly” rank using R/O.

For players, this is not at all the case, since even in an equal number of games, players will get different numbers of plate appearances for a variety of reason (batting order position, the team’s OBA (remember, higher OBA teams will generate more PA), whether or not they play the full game), a fact that is intuitive to most baseball fans. What is less intuitive, though, is that even in the same number of plate appearances, players can make very different numbers of outs. Since we’ve already accepted that team OBA defines how many plate appearances a team will generate, it isn’t much of a leap to conclude that if we have two players who create the same number of runs (using a formula that doesn’t explicitly account for their impact on the team’s OBA) in the same number of plate appearances, the player who makes fewer outs was more productive when we consider the totality of their offensive contribution. Even though the two players were equally productive in their plate appearances, the player who made fewer outs generated more plate appearances for his teammates, a second-order effect that needs to be considered when evaluating individual offensive contribution. For teams, the runs scored total already reflects this effect.

This would be an appropriate time to note that this series is focused on evaluating offenses, but of course every offensive metric can be reviewed in reverse as a defensive metric. However, since the obvious denominator for teams is outs, it is also the obvious denominator for individual pitchers. We don’t need to worry about a pitcher’s impact on his team’s plate appearances – when he is in the game, he is solely responsible (setting aside the question of how the team’s performance should be allocated between the pitcher and his fielders) for the number of plate appearances the opponent generates, and his goal is to record three outs while minimizing the number of runs he allows, regardless of how many opponents come to the plate. Outs are clearly the correct denominator for the rate stat, and innings pitched are nothing more than outs/3 (and even better, IP account for all outs, including many that don’t show up in the standard statistical categories).

In thinking about the development of early baseball statistics and the legacy of those standard statistics on how the overwhelming majority of fans thought about baseball before the sabermetric revolution took hold, it is striking that the early statisticians understood these concepts as they applied to pitchers. When pitchers were completing almost all their starts, simple averages of earned runs allowed sufficed, for the same reason that team R/G tells you most everything you need to do. As complete games became rarer, ERA took hold, properly using innings in the denominator. For most of the twentieth century, and even post-sabermetric revolution, baseball fans are conditioned to think about innings pitched as the denominator for all manner of pitching metrics – even those like strikeout and walk frequency for which plate appearances would make a much more logical denominator. (Of course, present day sabermetrics has embraced metrics like K% and W% for pitchers, but the per inning versions remain in use as well).

The parallel development of offensive statistics resulted in the opposite phenomenon. While early box scores tracked “hands out” (essentially outs made) for individual batters, batting average eventually became the dominant statistic. Setting aside the issues with “at bats” and how they distort people’s thinking and saddled us with the mouthful of “plate appearances” to describe the more fundamental quantity of the two, the standard batting statistics have conditioned fans to think about batting rates (walk rate, home run rate, etc.) in the correct manner (or one adjacent to being correct, depending on whether at bats or plate appearances are the denominator), but leave people struggling with how to properly express a batter’s overall productivity. Again, this is the opposite problem of how pitching statistics were traditionally constructed. One can imagine that it all might be very different had the Batting Average taken the form of a hit/out ratio rather than hits/at bats.

Rate Stat Series, pt. 2: PA Generation

2021-04-28T07:16:00.000-04:00

This is a little bit of a detour and certainly nothing new (I don’t know who originally laid out this logic/math – the earliest use I’m aware of was in 1960 by D’Esopo & Lefkowitz as part of their Scoring Index model), but I think a discussion of it is appropriate in the context of this series, and I will later make use of these formulas. It’s also ground I covered in the original series, but I think my explanation this time is slightly more coherent.

Each batting team starts each inning (excluding scenarios where a walkoff is possible) with three plate appearances guaranteed. Thus each team starts each game with twenty-seven plate appearances guaranteed (excluding scenarios where the home team forgoes batting the bottom of the ninth, rainouts, post-2020 doubleheaders, etc.). Any plate appearances beyond that must be earned by batters avoiding outs. Since it’s more natural to think of a positive outcome rather than the avoidance of a negative outcome, I will simplify and say that each extra plate appearance must be earned by a batter reaching base (and not being subsequently retired on the basepaths).

For the sake of discussion (and keeping with the simple set of statistics being used in the metrics in this series), I’m going to ignore the existence of baserunning outs, including caught stealing, pickoffs, outs stretching, outs advancing, and runners retired on double/triple plays (although not on fielder’s choices, since the batter is charged with an out in that case). I’m going to assume that the out rate is the complement of on base average, which in this series will be defined simply as (H + W)/(AB + W). In reality, considering all the ways in which outs can be made, it would be a more involved equation (I’ve used the acronym NOA for Not Out Average and OA for the complement, Out Average) which would look something like this, although it still doesn’t think I’ve accounted for every possible event (you try incorporating fielders’ choices without complicating the equation significantly):

NOA = (H + W + HB + CI + ROE – CS – DP – Outs Stretching – Outs Advancing – Pickoffs – 2*TP)/(AB + W + HB + SF + SH + CI)

Alternatively, for a team when LOB data is available (and ignoring the walkoff situation), you could have OA = (Plate Appearances – Runs Scored – Left On Base)/Plate Appearances. All of this is just an attempt to calculate, as best we can from the available statistics we have restricted ourselves to, Outs/Plate Appearances. NOA or OA as appropriate could be substituted for OBA in the equations that follow as long as the appropriate corresponding adjustments are made to the numerator.

Let’s assume for the purpose of developing an equation for team plate appearances that the OBA is constant across each of the nine batters in the lineup and doesn’t vary for any other reason (this is obviously never true, but it is a fine simplifying assumption for modeling PA generation). Then a team will start an inning with three plate appearances. For each of those three guaranteed PAs, there is a probability (equal to OBA, given our assumption) that the batter avoids an out (reaches base, given that there are no baserunning outs). This increases the expected number of plate appearances by OBA.

It doesn’t stop there, though. Each additional PA that is generated also has an OBA chance of creating an additional PA, which itself has an OBA chance of creating an additional PA. Thus, for each of the guaranteed PA, the expected final number of team PA is:

OBA + OBA*OBA + OBA*OBA*OBA + … = OBA + OBA^2 + OBA^3 + … OBA^n

which when n is infinity and OBA is between 0 and 1 (which it must be by definition) resolves to:

OBA/(1 – OBA)

The 1994 AL had an OBA of .343. Thus, each guaranteed plate appearance should have generated .343/(1 - .343) = .522 additional plate appearances. In an average inning, starting with three guaranteed PA, we would expect 3 + 3*.522 = 3*(1 + .522) = 4.566 PA, and thus in a game we would expect 9*4.566 = 41.09 PA. Note that instead of calculating the .522 additional PA, we can simplify this to 3/(1 – OBA) for an inning or 27/(1 – OBA) for a game. In reality there were 39.24 PA, so we have an unacceptable 4.7% error. What went wrong?

I’m mixing definitions of plate appearances and definitions of OBA incorrectly, and also ignored that the three guaranteed PA are equal to the number of outs permitted in the inning. In order to estimate the number of plate appearances per inning or game consistently, we need to divide the average number of outs/game by 1 – OBA:

PA/G = (O/G)/(1 – OBA)

The definition of outs that corresponds to our simple (H + W)/(AB + W) complement of out average is AB – H. In the 1994 AL there were 25.19 outs/game using this definition, so our expected PA/G is:

25.19/(1 - .353) = 38.34

The actual average was 38.35; we’re off due to rounding as this is now just a mathematical truism since by our simplified definitions plate appearances = outs + times on base. Using this equation to estimate team PA/G from their OBA for the 1994 AL, the RMSE is .259, which is about .7% of the average PA/G. We shouldn’t expect perfect accuracy at the team level since team PA will be affected by different quantities of all the statistical categories we’re ignoring that have an impact on the actual number of PA a team generates, as well as differences in number of extra inning games, foregone bottom of the ninths, and walkoff-shortened innings.

The key points to keep in mind as we move forward in discussing rate stats are:

1. The number of plate appearances a team will get is a function of their out rate, and simplifying terms we can very accurately estimate team PA as a function of on base average

2. Since players have an impact on the number of plate appearances their team gets, and thus the number of plate appearances they get, a proper rate stat for measuring overall offensive productivity must account for that impact

Almost Perfect

2021-04-15T08:18:00.000-04:00

In my earlier days as a baseball fan, I was really interested in no-hitters, and outside of the Indians winning the World Series, my most fervent desire as a fan was to witness one even if only on the radio. Eventually this faded, due to some combination of growing jaded about the extent to which baseball fans sometimes elevate trivial events above game outcomes, the pernicious influence of Voros McCracken on how I thought about the hits column for pitchers, and after fifteen years of intense baseball-watching finally witnessing one (I'm now up to five).

Perfect games retain a bit more of their mystique for me, due to being much more rare (someone who has watched as many games over the years as I have is bound to have seen a no-hitter, but one can't really expect to see a perfect game) and not relying on any arbitrary distinction between hits and errors (which of course doesn't affect all no-hitters). The three closest games I have taken in to being perfect games prior to last night were Mike Mussina against the Indians in 1997 and Armando Galarraga's should-have been perfect game against the Indians in 2010. The latter game is case in point of what I meant about fans sometimes being more interested in trivial events than game outcomes - there was more outcry in favor of replay as a result of that game then there was cumulatively from many calls that much more directly influenced which team won a given game.

Last night's effort by Carlos Rodon combined elements of both of the ninth innings of these games in the way that people who believe in hocus pocus should embrace. From Galarraga's, we took the extremely close play at first base, with Josh Naylor playing the role of Jason Donald, desperately trying to reach first after making weak contract towards first base. In this case, the play was actually much closer, but no replay was required as the call on the field was that Jose Abreu beat him to the bag by a narrow margin.

From the Mussina game, we borrowed the man, lineup slot, and fielding position to break it up. With one out in the ninth, the Indians catcher. Sandy Alomar singled off Mussina, while Roberto Perez was only hit in the back foot with a slider, but history repeated itself in who ended it. Of course, if Rodon had to lose the perfect game, he got the better outcome than the other two, as he at least got to keep the no-hitter.

Naturally, all of the near perfect games I've seen have been pitched against the Indians. In addition to the infinitely more important distinction of now having the longest World Series drought, after Joe Musgrove's no-hitter for the Padres, the Indians now have the longest drought between no-hitters, it having been nearly forty years since Len Barker's perfect game.

I was keeping score of the Mussina game and Rodon's effort last night, but not the Galarraga game, which I listened to on the radio while I watched some other game on TV.

Rate Stat Series, pt. 1: Introduction

2021-04-14T08:04:00.000-04:00

This blog has existed for sixteen years now, and yet with the exception of some (relatively) recent stuff I’ve written about the Enby distribution for team runs per game and the Cigol approach to estimating team winning percentage from Enby, almost all of the interesting sabermetric work appeared in the blog’s first five years, and most in the first year or two.

There are a number of reasons for that - one is that when I started, I was a college student with a lot more free time on his hands than I have with a 9-5. Related, I was also more eager to spend a lot of time staring at numbers on my free time when I didn’t spend a good portion of my day staring at numbers. Remember the Bill James line about how a column of numbers that would put an actuary to sleep can be made to dance if you put Bombo Rivera’s picture on the flip side of the card? Sometimes the numbers do indeed dance, but the actuary in question would rather watch a ballgame or read about the Battle of Gravelines than manipulate them in the evening, dancing or no.

More generally, there has been much less to investigate in the area of sabermetrics that I primarily practice, which I will call for the lack of a better term “classical sabermetrics”. I would define classical sabermetrics as sabermetric study which is primarily focused on game-level (or higher, e.g. season, player career, etc.) data that relates to baseball outcomes on the field (e.g. hits, walks, runs scored, wins). Classical sabermetrics is/was the primary field of inquiry of those I have previously called first or second-generation sabermetricians.

Classical sabermetrics is not dead, but to date the last great achievement of the field was turned in by Voros McCracken when he developed DIPS. I’m not arrogant enough to declare that nothing more will ever be found in the classical field, and there is still much work to be done, but at least as far as I can see, it is highly likely that it will consist of tinkering and incrementally improving work that has already been done, and probably with little impact on the practical implementation of sabermetric ideas. For example, I still would love to find a modification to Pythagenpat that works better for 2 RPG environments, or a different run estimator construct that would preserve the good properties of Base Runs while better handling teams that hit tons of triples. All of this is quite theoretical, and of no practical value to someone who is attempting to run the Pirates.

Which increasingly is what sabermetric practitioners are attempting to do, whether directly through employment by major league teams, or indirectly through publishing post-classical sabermetric research in the public sphere. Let me be very clear: this is not in any way a lament for a simpler, purer time in the past. I think it’s wonderful that sabermetric analysis has transcended the constraints of the data used in its classical practice and is exerting an influence on the game on the field.

Notwithstanding, I am still a classical sabermetrician, not because I don’t value the insight provided by post-classical sabermetrics but because I don’t have some combination of the skillset or the way of thinking or the resources or the drive to become proficient enough in newer techniques to offer anything of value in that space. Thus it is natural that I have less to share here.

The topic that I am embarking on discussing is squarely in the realm of “quite theoretical and of no practical to someone who is attempting to run the Pirates”. About fifteen years ago, I started writing a “Rate Stat Series”, and aborted it somewhere in the middle. I have stated several times that I intend to revisit it, but until now have not. The Rate Stat Series was and now is intended to be a discussion of how best to express a batter’s overall productivity in a single rate stat. I should note three things that it is not:

1. The discussion is strictly limited to the construction of a rate stat measuring overall offensive productivity, not a subset thereof. I am not suggesting that if you are measuring a batter’s walk rate, strikeout rate, ground-rule double rate, or any other component rate you can dream up, that you should follow the conclusions here. For most general applications, plate appearances makes perfect sense as the denominator for a rate for any of those quantities. There may be reasons to follow a sort of decision tree approach that results in different denominators for some applications (McCracken was an innovator in this approach, in DIPS and park factors). All of that is well and good and completely outside the scope of this series.

2. The premise presupposes that the unit of measurement of a batter’s productivity has already been converted to a run-basis. Thus it is not a question of OPS v. OTS v. OPS+ v. 1.8*OBA + SLG v. wOBA v. EqA v. TAv v. whatever, but rather what the denominator for a batter’s estimated run contribution should be. The obvious choices are outs and plate appearances, but there are other possibilities. Spoiler alert: My answer is “it depends”.

3. Revolutionary, groundbreaking, or any other similar adjective. I’m attempting to describe my thoughts on methods that already exist and were created by other people in a coherent, unified format.

In sitting down to write this, I realized I made two fundamental mistakes in my first attempt:

1. I was attempting to “prove” my preferences mathematically, which is not a bad thing in theory, but some of what I was doing begged the question and some of this discussion is of a theoretical nature that lends itself more to logical reasoning/“proofs” than to mathematical “proofs”. I’ve tried to anchor my conclusions in math, logic, and reason where possible, but have also embraced that some of it is subjective and must be so.

2. I posted pieces before I finished writing the whole thing, or even knowing exactly where it was going.

These are rectified in this attempt – all of my assertions are wildly unsupported and as I hit post, all planned installments exist in at least a detailed outline form. While I have attempted to avoid the two mistakes I identified in the previous series, as I look at this series in full I can see I have may have just replaced them with two characteristics that will make reading this a real chore:

1. I’m overly wordy; repeating myself a lot and trying to be way too precise in my language (although I fear not as precise as the topic demands). There’s a lot of jargon in an attempt to delineate between the various concepts and methodological choices.

2. There’s way too much algebra; where possible, I didn’t want to just assert that mathematical operations resolved in a certain way and give an empirical example that backs me up, so there’s a lot of “proofs” that will be of no general interest.

Allow me to close by laying some groundwork for future posts. I am going to use the 1994 AL as a reference point, and when I use examples they will generally be drawn from this league-season. Why have I chosen the 1994 AL?

1. 1994 was the year I became a baseball fan, and I was primarily focused on the AL at that time, so it is nostalgic. I have not turned into a get off my lawn type who thinks that baseball reached its zenith in 1994 and it’s all been downhill since, but I do think that about 1994 Topps, the greatest baseball card set of all-time.

2. As the year in which the “silly ball era” really broke out, and due to the strike shortening the season, there are some fairly extreme performances that are useful when talking about the differences between rate stat approaches.

As discussed, this series starts from the premise that a batter’s contribution is measured in terms of runs, and work from there. This approach does not require the use of any particular run estimator, although one of my assertions is that the choice of run estimator and the choice of rate/denominator for the rate are logically linked. There are three types of run estimators that I will use in the series: a dynamic model, a linear model, and a hybrid theoretical team model.

In order to avoid differences in the run estimator(s) used unduly influencing differences in the resulting rate stats, I am going to anchor a set of internally consistent run estimators in the reference period of the 1994 AL. It will come as no surprise if you’ve read anything I’ve written about run estimators in the past that I am using Base Runs for this job. The point of this series is not to tell you which particular run estimator to use or how to construct it. It really doesn’t matter which version of Base Runs I use (if you are still stuck on Runs Created, there’s no judgment from this corner, at least for the duration of this discussion), or which categories I include in the formula – this is about the conceptual issues regarding the rate that you calculate after estimating the batter’s run contribution, so I am keeping it very simple, looking just at hits, walks, and at bats (thus defining outs as at bats minus hits) and ignoring steals/caught stealing, hit batters, intentional walks, sacrifices, etc.. Since I’m doing this with the run estimator, I will also do it with most other statistics I cite – for example, throughout this series OBA will be (H + W)/(AB + W), and PA will just be AB + W.

A version of Base Runs I have used is below. It’s not perfect by any means; it overvalues extra base hits as we’ll see below, but again, the specific estimator is for example only in this series – the thinking behind constructing the resulting rates is what we’re after:

A = H + W – HR

B = (2TB - H – 4HR + .05W)*.78

C = AB – H

D = HR

BsR = (A*B)/(B + C) + D

Typically, any reconciliation of Base Runs to a desired estimate number of runs scored for an entity like a league is done using the B factor, since it is already something of a balancing factor in the formula, representing the somewhat nebulous concept of “advancement” while the other components (A = baserunners, C = outs, D = automatic runs) represent much more tightly defined quantities. In order to force the Base Runs estimate for the 1994 AL to equal the actual number of runs scored, you need to replace the .78 multiplier with .79776, which can be determined by first calculating the needed B value (where R is the actual runs scored total):

Needed B = (R – D)*C/(A – R + D)

Divide this by (2TB – H – 4HR + .05W) and you get a .79776 multiplier. I usually don’t force the estimated runs equal to the actual runs, but for this series, I want to be internally consistent between all of the estimators and also be able to write formulas using league runs rather than having to worry about any discrepancies between league runs and estimated runs.

So our dynamic run estimator (BsR) used throughout this series will be:

A = H + W – HR = S + D + T + W

B = (2TB - H – 4HR + .05W)*.79776 = .7978S + 2.3933D + 3.9888T + 2.3933HR + .0399W

C = AB – H = Outs

D = HR

BsR = (A*B)/(B + C) + D

To be consistent, I will also use the intrinsic linear weights for the 1994 AL that are derived from this BsR equation as the linear weights run estimator. The intrinsic linear weights are derived through partial differentiation of BsR with respect to each component. If we define A, B, C, and D to be the league totals of those, and a, b, c, and d to be the coefficient for a given event in each of the A, B, C, and D factors respectively, than the linear weight of a given event is calculated as:

LW = ((B + C)*(A*b + B*a) – A*B*(b + c))/(B + C)^2 + d

For the 1994 AL, this results in the equation, where RC is to denote absolute runs created:

LW_RC = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .1076(outs)

We will also need a version of LW expressed in the classic Pete Palmer style to produce runs above average rather than absolute runs. That’s just a simple algebra problem to solve for the out value needed to bring the league total to zero, which results in:

LW_RAA = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs)

I am ignoring any questions about what the appropriate baseline for valuing individual offensive performance is. Regardless of where you side between replacement level, average, and other less common approaches, I hope you will agree that average is a good starting point which can usually be converted to an alternative baseline much more easily than if you start with an alternative baseline. Average is also the natural starting point for linear weights analysis since the empirical technique of calculating linear weights based on average changes in average run expectancy is by definition going to produce an estimate of runs above average.

Later we will also have some “theoretical team” run estimators built off this same foundation, but discussion of them will fit better when discussing that concept in greater detail.

I will also be ignoring park factors and the question of context in this series (at least until the very end, where I will circle back to context). Since I am narrowly focused on the construction of the final rate stat, rather than a full-blown implementation of a rating system for players, park factors can be ignored. Since I am anchoring everything in the 1994 AL, the context of the league run environment can also be ignored since it will be equal for all players once we ignore park factors.

Give Us This Day Our Daily Ball

2021-04-01T08:09:00.000-04:00

Rob Manfred, who art Commissioner

Halloweth be our game

Thy rule changes be undone, thy no longer assault fun

In 2022 as it was in 2002

Give us this day our daily ball

And reconcile with Tony Clark as we reconcile to runners on in extra innings

And lead us not into strike or lockout

And deliver us from pitchers hitting

For thine is the office and the power and the responsibility until 2024

Play ball