Wednesday, September 15, 2021

Rate Stat Series, pt. 12: Relativity for the Player as a Team Framework

I have now covered all of the ground I wished to cover regarding the construction of rate stats for various frameworks for evaluating individual offense, so I think this would be a good point to take stock of the conclusions I have drawn. From here on out, I will write about these tenets as if they are inviolable, which is not actually the case (“sound” and “well-reasoned to me” are as far as I’m willing to go), but I’m moving on to other topics:

* For team offense, Runs/Out is the only proper rate stat

* If evaluating individuals as teams (i.e. applying a dynamic run estimator like Runs Created or Base Runs directly to an individual’s statistics), Runs/Out is also the proper rate stat. Any other choice breaks consistency with treating the individual as a team, and consistency is the only thing going for such an approach. On a related note, don’t treat individuals as teams.

* If using a linear weights framework, the proper rate stat is RAA/PA, R+/PA, or restatement of those (wOBA is the one commonly used). These metrics all produce consistent rank orders and can be easily restated from one another. Any of them are valid choices at the user’s discretion, with the only objective distinction being whether you want ratio and differential comparability (R+/PA), only care about differential comparability (R+/PA or RAA/PA), or would prefer a different scale altogether at the sacrifice of direct comparisons without futher modifications (wOBA).

* If using a theoretical team framework, R+/O+ and R+/PA+ are equivalent, with the user needing to decide whether they want to think about individual performance in terms of R/O or R/PA.

Throughout this series, I have not worried about context, and originally envisioned a final installment that would briefly discuss some of the issues with comparing player’s rates across league-seasons. In putting it together, I decided that it will take a few installments to do this properly. It will also take another league-season to pair with the 1994 AL in order to make cross-context comparisons.

I have chosen the 1966 AL to fill this role. In the expansion era (1961 – 2019), the AL has averaged 4.52 R/G. 1994 was third-highest at 5.23, 15.6% higher than average. The closest league to being an inverse relative to the average is the 1966 AL (3.89 R/G, 13.8% lower than average). The obvious choice would have been 1968, since it was the most extreme, but as 1994 was not the most extreme I thought a league that was similarly low scoring would be more appropriate.

The other reason I like the 1966 AL for his purpose is that, just as was the case in 1994, the superior offensive player in the league was a future Hall of Fame slugger named Frank. In making comparisons across the twenty-eight year and (more importantly) 1.34 R/G differences between these two league-seasons, the two Franks will serve as our primary reference points.

To look at the 1966 AL we will first need to define our runs created formulas. I will not repeat the explanations of how these are calculated – I used the exact same approach as for the 1994 AL, which are demonstrated primarily in the parts 1, 9, and 10.

Base Runs:

A = H + W – HR = S + D + T + W

B = (2TB - H – 4HR + .05W)*.79920 = .7992S + 2.3976D + 3.996T + 2.3976HR + .0400W

C = AB – H = Outs

D = HR

BsR = (A*B)/(B + C) + D

Linear Weights:

LW_RC = .4560S + .7751D + 1.0942T + 1.4787HR + .3044W - .0841(outs)

LW_RAA = .4560S + .7751D + 1.0942T + 1.4787HR + .3044W - .2369(outs)

wOBA = (.8888S + 1.2981D + 1.7075T + 2.2006HR + .6944W)/PA

Theoretical Team:

TT_BsR = (A + 2.246PA)*(B + 2.346PA)/(B + C + 7.915PA) + D - .6658PA

TT_BsRP = ((A + 2.246PA)*(B + 2.346PA)/(B + C + 7.915PA) + D + .185PA)*PAR – .8519PA

TT_RAA = ((A + 2.246PA)*(B + 2.346PA)/(B + C + 7.915PA) + D + .185PA)*PAR – .9572PA

Using these equations, let’s look at the leaderboards for the key stats for each framework. First, for the player as a team:



Of course, we’ll get slightly different results from the different frameworks, but two things should be obvious: Robinson was the best offensive player in the league (his lead over Mantle in second place is bigger than the gap from Mantle to ninth-place Curt Blefary, and he was also among the leaders in PA), and the difference in league offensive levels carried through to individuals.

Next for the linear weights framework:

And finally for the theoretical team framework:



Now comes the hard part – how do we compare these performances from the 1966 AL against those from the 1994 AL? It should be implied, but in all comparisons that follow, I am only concerned about the value of a given player’s contributions relative to the context in which he played; this is not about whether/to what extent the overall quality of play differed between 1966 and 1994, or about how the existence of pitcher hitting in 1966 but not in 1994 impacted the league average, etc. I also am not going all the way in accounting for context, as I’ve still done nothing to park-adjust. Park adjustments are important, but whether you adjust or not is irrelevant to the question of which rate stat you should use. I’m also not interested in how “impressive” a given rate is due to the variation between players (e.g. z-scores) – I’m simply trying to quantify the win value of the runs a player has contributed.

The obvious first step to compare players across two league-seasons is to compare the difference or ratio of their rate to the league rate. As we’ve discussed throughout this series, some stats can be compared using both differences and ratios, but others can only be compared (at least meaningfully) with one or the others, and others still can be compared but the ratio or difference no longer has any baseball meaning unless some transformation is carried out (wOBA is the most prominent example we’ve touched on). 

The table below shows the rates for our two Franks, the respective league rate, and the difference and ratio (if applicable) for each of the key rate stats we’ve looked at:


One of the other reasons I was drawn to the pairing of these two league-seasons is how close Thomas and Robinson are in the key metrics when compared to the league using a ratio. Thomas has a slight advantage in each metric, except for BsR/O, where the flaw of treating an individual as a team can be seen. Playing in a high offense context, Thomas’ crazy rates (to put it in simple terms related to but not the run unit stats used in this series, Thomas hit .353/.492/.729 to Robinson’s .316/.406/.637) result in a large estimated tertiary contribution when treating him as his own team. Expressing these metrics as ratios hammers home that even for extreme performers (defining “extreme” within the usual confines of a major league season), there isn’t much difference between using the linear weights and theoretical team frameworks, and both are reasonable choices of framework. Treating an individual as a team, not so much.

We’ve expressed each player’s rate relative to the league average, but we haven’t answered the question of which construct is appropriate – the difference or the ratio, or does the answer differ depending on the metric? And how can we make that determination? In lieu of some guiding principle, it is just a matter of personal preference. Most people are drawn to ratios, and the widespread adoption of metrics like ERA+, ERA-, OPS+, and wRC+ speak to that preference. This case illustrates one of the reasons why ratios make sense intuitively – while the Franks are very close when comparing ratios, Thomas has a healthy lead when comparing differences as the league average R/PA and R/O for 1994 are much higher than the same for 1966. It seems intuitive that a batter will be able to exceed the league average by a larger difference when it is .2419 rather than .1528.

Still, in order to really understand how to compare players across contexts, and ensure that our intuition is grounded in reality, we need to think about how runs relate to wins. All of the metrics in that table which I consider the key metrics discussed in this series have been expressed in runs, and this is a natural starting point as the goal of an offense is to score runs. Of course the ultimate reason why an offense wants to score runs is so that they might contribute to wins.  

I will break one of the rules I’ve tried to adhere to by begging the question and asserting that Pythagenpat and associated constructs are the correct way to convert runs to wins. Of course it is just a model, and while it is a good one, it is not perfect, but I think it is the best choice given its accuracy and its seemingly reasonable results for extreme situations. 

Using Pythagorean also lends itself to adoption of a ratio rate stat, as one way to express the Pythagorean Theorem is that the expected ratio of wins to losses is equal to the ratio of runs to runs allowed raised to some power. If we start by thinking about the team/player as team framework for evaluating offense, it is natural to construct a win-unit rate stat by treating the team’s R/O as the numerator and the league average as the denominator. The ratio of these two, raised to some power, is then the equivalent win ratio that results. This is exactly the approach that Bill James took in his early Baseball Abstracts.

We can easily use this relationship to demonstrate that a simple difference of R/O does not capture the differences in win values between players. If one player exceeds the league average R/O by 10% and another 50%, then assuming a Pythagorean exponent of 2, player A will have a “win ratio” of 1.21 and player B will have a win ratio of 2.25. Even if we convert those to winning percentages (.5475 and .6923), the difference between the two player’s R/O does not capture the win value difference.

The ratio doesn’t either, of course, since it would need to be squared, but if we assume a constant Pythagorean exponent like 2, this is simply a matter of scaling, with no impact on the rank order to players. However, if we use a custom Pythagorean exponent, this assumption breaks down, as we can illustrate by comparing the two Franks. Since we’re starting with the assumption that Pythagenpat is correct, this means that we will always need to make some kind of adjustment in order to convert our run rate into an equivalent win rate.

Since we are treating the players as teams, the consistent approach is to first calculate the RPG for each player as a team, then calculate the Pythagenpat exponent for the situation, then convert their relative BsR/O to an equivalent win value. The RPG for the 1966 AL was 3.893 with 25.482 O/G, while in the 1994 AL it was 5.226 with 25.188 O/G.

T_RPG = LgR/G + BsR/O*LgO/G

x = T_RPG^.29 (for a Pythagenpat z value of .29)

Relative BsR/O = (BsR/O)/(LgBsR/O)

Win Ratio = (Relative BsR/O)^x

Offensive Winning Percentage = Win Ratio/(Win Ratio + 1) = (BsR/O)^x/((BsR/O)^x + (LgBsR/O)^x)


Here, we conclude that just looking at Relative BsR/O understates Thomas’ superiority, as Pythagenpat estimates more wins for a given run ratio when the RPG increases (e.g. a run ratio of 8/4 will produce more wins than a run ratio of 7/3.5). 

If we were actually going to use this framework, we could leave our final relative rate stat in the form of a win ratio or OW%, but I would prefer to convert it back to a run rate. Even within the conceit of the player as a team framework, it’s hard to know what to do with an OW% (or the even less familiar win ratio). We can say that Thomas’ OW% of .903 means that a team that hit like Thomas and had league average defense (runs allowed) would be expected to have a .903 W%, but even if you follow the player as team framework, you would probably like to have a result that can be more easily tied back to the player’s performance as the member of a team, not what record the Yuma Mutant Clone Franks would have. One way to return the win ratio to a more familiar scale is to convert it back to a run ratio.

Let’s define “r” to be the Pythagenpat exponent for some reference context, which I will define to be 8.83 RPG – the major league average for the expansion era (1961 – 2019). We can then easily convert our estimated win ratios for the Franks to run ratios that would produce the same estimated win ratio in the reference (8.83 RPG) environment.

The calculation is simply (Win Ratio)^(1/r). Since r = 8.83^.29 = 1.881, this becomes Win Ratio^.5316, which produces 2.614 for Robinson and 3.282 for Thomas. Following the time-honored custom of dropping the decimal place, we wind up with a win-equivalent relative BsR/O of 261 for Robinson and 328 for Thomas, compared to their initial values of 217 and 260 respectively. These both increased because both players would radically alter their run environments, increasing the win value of their relative BsR/O. I would demonstrate the lesser impact on more typical performers if I cared about this framework beyond being thorough.

While I do think that this example demonstrates that just looking at a ratio of R/O does not appropriately capture the win impact between players, the individual OW% approach goes many steps too far, but it is the logical conclusion of the player as a team methodology. If you were on that train until we got to the end of the line, I’d encourage you to consider jumping off at one of the earlier stations (I’d prefer you get on the green or red linear weights or theoretical team lines rather than ever embarking on the player as team blue line). Next time, I’ll explore those paths to win-unit rate stats using those frameworks.

Wednesday, September 01, 2021

Rate Stat Series, pt. 11: Rate Stats for the Theoretical Team Framework II

Once PAR has been incorporated, it should be clear that a different approach will be needed as our run estimate already includes the batter’s secondary contribution – it is starting out on a similar basis to wRC.  We also can fall back on our necessary test for a rate stat – it must produce the same RAA as the RAA produced by the full implementation of the framework we are looking at.

To argue why this should be the gateway criteria in a slightly different way than I did previously, consider the place of RAA in the theoretical team framework. The theoretical team framework is an attempt to value the batter by estimating the difference between the runs scored for a team on which he accumulates an even share of the plate appearances, and the runs scored for a team on which he does not play. While we have constructed an absolute estimate of runs created using this approach, it inherently is screaming out for a marginal approach – taking the difference between the team with the player and the team without the player. That is exactly what RAA is supposed to represent – and by calculating TT_RAA, we have (to the best of our ability) captured the batter’s primary, secondary, and tertiary contributions. TT_RAA is the marginal impact of the player on a team, and any rate stat that starts by using TT_BsRP should produce the same RAA figure as TT_RAA.

So let’s look at the key figures for our hitters:


Here, “RAA” is what we calculated above, based on TT_BsR/O or TT_BsR+/PA. As you can see, it does not match TT_RAA. If it did, we wouldn’t need to worry about a special set of rate stats for the TT framework – we could just piggyback on the same methodology used for linear weights. It’s clear that we need to apply something different to TT_BsRP when we use the full-blown theoretical team approach including PAR. I would also argue that this is evidence that the theoretical team approach should not be viewed simply as an alternative path to using linear weights, but rather a third unique framework for evaluating a batter’s contribution.

I would now go through a tortured explanation of the logic behind finding a denominator that achieves the desired result, but there is no need – David Smyth developed the answer and explained it clearly and concisely in a June 21, 2001 FanHome post:

On the subject of a rate stat such as R+/PA or R+/O, etc....

The whole idea behind this method is to compute the impact of a player on a theoretical team. On the team level (even a theoretical one), impact is in terms of runs and outs. The R+ generated by the procedure on this thread [NOTE: R+ is theoretically equivalent to what I am calling TT_BsRP] tells us the difference in runs between the theoretical team without the indicated batter (that is, a team of 8 average hitters), and the theoretical team with the indicated batter added. We can also compute the difference in the OUTS between these two teams, and use that total as the rate stat denominator. Call it O+.

And it's easy to calculate. You simply multiply the batter PA by the out percentage (1-OBA) of the reference team...It seems to me that the preferred rate stat for the R+ framework is not R+/PA nor R+/O; it's R+/O+. 

Frank Thomas had 149.9 TT_BsRP in 508 PA. The reference team had a .3433 OBA (same as the league average). Thus, his TT_BsRP/O+ (a specific implementation of Smyth’s R+/O+) would be:

O+ = PA*(1 – LgOBA) = 508*(1 - .3433) = 333.6

TTBsRP/O+ = TT_BsRP/O+ = 149.9/333.6 = .4494

What does this represent? It is not easy to explain it in a sentence, but I will try: Frank Thomas contributed .4494 runs to the theoretical team’s offense for each of its outs distributed to him. When Smyth said that PA*(1 – RefOBA) was equal to the difference in the team’s outs, he was referring to the theoretical team construct, in which the difference in team plate appearances is the batter’s own PA, since in the left hand term (T_BsR) the team has 9*PA, and in the right hand term (R_BsR), the team has 8*PA. The difference in outs is this difference (1*PA) times the team’s out rate.

In this sense, O+ can be thought of in terms of “freezing team outs”. We know that for a team (excluding the list of complicating circumstances like rainouts, foregone batting in the bottom of the ninth, and walkoffs), outs are fixed quality. Regardless of whether we add Frank Thomas or Matt Walbeck to a reference team, the team’s outs are fixed. In the TT/PAR approach, we start by capturing the batter in question’s secondary contribution through the PAR multiplier, but we never directly change the number of PA we start from. Thus, Thomas’ 508 PA and Walbeck’s 355 PA will turn into outs at the same rate for the purpose of the calculation. In reality, Thomas’ team will compile more PA thanks to his greater secondary contribution, but the equation handles this with a multiplier, freezing the original value.

You may not find that explanation convincing – I am struggling to articulate a concept that I may be fooling myself into believing I understand. You might be more convinced by the demonstration of what Thomas’ RAA is under this approach:

TT_RAA = (TT_BSRP/O+ - LgR/O)*O+

which for Thomas = (.4494 - .2075)*333.6 = 80.7 which is the same as his TT_RAA

In fact, I jumped the gun by calling it TT_RAA before I proved it was equal to what we previously called TT_RAA. It is in fact:


This leaves unanswered the question of what the rate stat using TT_RAA should be. Similar to how we had R+/PA and RAA/PA when working with linear weights, we should have an appropriate rate stat for the RAA figure. By manipulating the TT_RAA equation, we can see that TT_RAA/O+ should be consistent with R+/O+:

TT_RAA = (R+/O+ - LgR/O)*O+ 

so divide both sides by O+ to get:

TT_RAA/O+ = R+/O+ - LgR/O

Thus TT_RAA/O+ and R+/O+ are analogous to RAA/PA and R+/PA, except that the difference is LgR/O rather than LgR/PA.

So we’re done, right? Or is there something vaguely unsatisfying about all this, that after an entire series in which I argued that R/O was a team measure and not an individual one, does it bother you that we have left our rate stat for an individual in the form of R/O?

On one hand, it absolutely shouldn’t – R/O is the correct choice of rate stat for a team, and in this case we have modeled the player’s impact on the team and its R/O, and so there’s nothing wrong with expressing a final result in terms of R/O. The problem was not with R/O per se – it was with the inputs that we were putting in for individual batters. Additionally, R+/O+ allows our team and individual rate stats to converge. Team R+/O+ will reduce to team R/O if accept the premise that the proper “reference team” for a team is itself, and that it’s R+ is equal to its actual runs scored since actual runs scored obviously accounts for the team’s primary, secondary, and tertiary offensive actions. Not to mention any quaternary actions we could dream up or the fact that using those terms doesn’t even make sense when talking about teams. 

On the other hand, O+ is not in any way an intuitive metric – just re-read my tortured explanation of what it represents. A batter’s R+/O+ can be contextualized in any number of meaningful ways, just as regular old R/O can, but I’m not sure even those presentations (ala Runs Created/25.2 Outs) are truly relatable to a player’s performance except as a scaling device.

There is a very simple and equivalent alternative, though, and you may have already noticed what it might be from the formulas. We defined O+ as PA*(1 – LgOBA), which is really just plate appearances times a constant. If we just divide by that constant (1 – LgOBA), we can restate everything on the basis of PA, with no loss of ratio comparability. 

Doing this, we now have:

(TT_BsRP/PA – LgR/PA)*PA = TT_RAA

and TT_BsRP/PA = R+/O+ * (1 – LgOBA)

R+/PA+ = TTBsRP/PA

I am going to call TT_BsRP/PA (R+/PA+) even though PA+ is just equal to PA. I’m doing this primarily for ease of discussion, so that R+/PA+ will represent the theoretical team calculation, while R+/PA will represent the linear weights equivalent—and they are equivalent (which also means that R+/O+ could be applied to the linear weights framework – more on this in a later installment). My other justification is that in theoretical team framework, the actual number of plate appearances we plug in is not of importance when dealing with a rate stat, as we are always defining the reference team’s PA to be eight times the batter’s PA. We use the batter’s PA because it allows estimates like TT_BsR to reflect his actual playing time, but if all we care about is the rate at the end, we could use 1 plate appearance or 650 or any number we wanted. Thus I prefer to think about this quantity of plate appearances as the player’s share of the reference team’s PA rather than really being tied to his own, and feel justified in distinguishing it through the abbreviation PA+.

This approach to developing a rate stat for the TT framework can be thought of as “freezing plate appearances” as compared to the “freezing outs” approach of R+/O+. I first became aware of it through a FanHome post in 2007 by David Smyth (surprise). By that time it was over five years since Smyth had published the R+/O+ methodology and I had adopted it myself, so by the time we discussed what I am calling R+/PA+ I was in the interesting position of advocating one Smyth construct against the newer. We quickly verified their equivalence and left it there, which is the position I’m taking now. 

By definition, R+/O+ and R+/PA+ are perfectly correlated since the only difference between the two is multiplying by a constant. One allows us to express a rate stat in terms of R/O, which is consistent with how we would state a team rate stat; the other allow us to express it in terms of R/PA, which is how we would state a rate state from the linear weight framework. Both can be compared using ratios, which will be equivalent; both can be compared using differences, with the question of what the denominator for that difference should be up to the user. Both denominators can be used with RAA as well for RAA+/O+ or RAA+/PA+ (using RAA+ to refer RAA calculated within the full-blown theoretical team framework), and since R+/O+ and R+/PA+ can both be used to calculate RAA, they will also be consistent with RAA+ per O+ or PA+.

I’ll close with a sample calculation. Frank Thomas had 149.9 TT_BsRP in 508 PA, which is .2951 R+/PA+. The league average R/PA was .1363, so Thomas’ TT_RAA was (.2951 - .1363)*508 = 80.7, same as calculated previously using R+/O+ or directly from the original TT_RAA formula. The league OBA was .3433, so Thomas’ R+/O+  is equal to .2951/(1 - .3433) = .4494 as calculated previously. 

Wednesday, August 18, 2021

Rate Stat Series, pt. 10: Rate Stats for the Theoretical Team Framework I

In calculating TT_BsR for a batter, we have taken into account both his primary and tertiary impact on the offense, but we have neglected to address his secondary impact – that is, the value of the additional plate appearances he generates for his team by avoiding outs. There’s a relatively simple way to apply an adjustment for this using the framework for TT_BsR we’ve already developed. David Smyth called this adjustment PAR for Plate Appearance Ratio, and it is based on the same logic about how PAs are generated that we have relied on many times.

PAR is equal to the ratio of the theoretical team’s plate appearances to the plate appearances a league average team would have had. Remember that:

PA/G = (O/G)/(1 – OBA)

O/G is a constant that we set at the league level – I will call it X in the algebra that follows. We need to know the OBA of the theoretical team; since our player in question gets 1/9 of the PA and the rest of the team is assumed to be league average, this is very simply:

T_OBA = 1/9*OBA + 8/9*LgOBA

Then T_PA/G  = X/(1 - (1/9*OBA + 8/9*LgOBA)) = X/(1 – 1/9*OBA - 8/9*LgOBA), while the league PA/G will be X/(1 – LgOBA). The ratio between the two will be:

(X/(1 – 1/9*OBA – 8/9*LgOBA))/(X/(1 – LgOBA)) = (1 – LgOBA)/(1 – 1/9*OBA – 8/9*LgOBA) = PAR

Since Frank Thomas had a .4921 OBA and the league average was .3433, his PAR is:

(1 - .3433)/(1 – 1/9*.4921 – 8/9*.3433) = 1.0258

This means that a theoretical team on which the Big Hurt had an equal share of the PA would end up generating 2.58% more PA than a league average team. 

In order to take Thomas’ secondary contribution into account, we can return to the definitions from the last installment and calculate:

TT_BsRP = T_BsR*PAR – R_BsR

PAR is only applied to T_BsR (the base runs estimate for the theoretical team with Thomas) because the reference team, filled with league average players, will continue to have the same number of PA as before (which we’ve set to equal eight times Thomas’ PA). Filling in those terms for the 1994, the formula is:

TT_BsRP = ((A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D + .232PA)*PAR – 1.090PA

Note that we can no longer combine the D term from T_BsR with the R_BsR term as the former also needs to be inflated by PAR (Thomas’ teammates will hit more homers in those extra 2.58% PA they now enjoy).

Applying PAR increases Thomas’ TT_BsR from 132.2 to 149.9, a significant increase. This figure is more comparable to his wRC (147.1) than to other runs created estimates we’ve examined, as it’s already taken into account the value of his secondary contributions.

You may note that there is the potential of some circularity here, as we are using Thomas’ actual PA as the starting point, but Thomas’ actual PA already inherently include his real secondary contribution to the 1994 White Sox. That is to say that some of the 508 PA that Thomas actually recorded were made possible by his own generation of PA for that team. This is a good argument for using a theoretical number of PA for Thomas rather than his actual PA. Thomas recorded 508 of Chicago’s 4439 PA, or 11.44%. So we could instead use 11.44% of the league average team PA total (4366.9), in which case he would have 499.7 restated PA to plug into the Theoretical Team methodology (this is ignoring that his contribution to the White Sox also had an impact on the league average PA). Of course, in so doing we would also have to proportionally scale back his portion of the T_A, T_B, T_C, and T_D components by 499.7/508. 

On the other hand, the secondary contribution of a batter through generating PA is in the background of the linear framework as well (and any other framework that considers his actual PA), it’s just that the connection leaps to the mind more quickly when modeling the other aspects of a theoretical team. I’m going to ignore this going forward, as this is after all a rate stat series, and also note that we shouldn’t ignore the fact that a batter can benefit from the additional opportunities he helps to create. The fact that the quality of his teammates influences how many opportunities he gets in the real world is at some level unavoidable.

At this point, we should also express Thomas’ contribution in terms of RAA. This is a simple modification; instead of setting R_BsR equal to the league average BsR/PA times 8 times the player’s PA, we would just need to multiply by 9 times the player’s PA so that the lineup isn’t magically shortened and instead we compare T_BsR to what a team would score with an average player in our man’s place. I did not bother running this before introducing PAR, because if there’s one thing we’ve learned from this series is that it doesn’t make a lot of sense to talk about batter RAA without taking out rate into account. So with PAR for the 1994 AL have:

TT_RAA = ((A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D + .232PA)*PAR – 1.226PA

We now have three possible theoretical team approaches, and have yet to address the question of this series: what should the rate form be? The guiding principle of this series has been that the properties of the numerator (usually a run estimate) should be logically consistent with the choice of denominator, so we should consider each of the three theoretical team approaches separately.

First is TT_BsR, which is just an estimate of the batter’s impact on the team runs scored, taking into account primary and tertiary (but not secondary) impacts. It is akin to LW_RC, with the key difference being that LW_RC does not attempt to value the batter’s tertiary impact. However, I contend that incorporating tertiary contributions does not alter the considerations when developing a rate stat. The tertiary effect is how the batter’s performance changes the underlying run environment of the team, independently of the change in plate appearances. What we are left which is an estimate of the contribution the batter made in his actual plate appearances – the only difference is that we recognize that those outcomes influenced the value of all of the other offensive events recorded by the team.

So our choices for a rate stat are the same as those for LW_RC. We can first calculate RAA (using R/O), and then take RAA/PA, or we can calculate how many additional PA the batter generated/outs he avoided, add those to his TT_BsR, and divide by PA (the R+/PA) approach. These approaches will be equivalent if we add back in LgR/PA to RAA/PA, we could convert to wOBA, we could calculate wRC along the way...all the same options.

The math will be the same as shown in parts 7 and 8, except we will substitute TT_BsR for LW_RC everywhere it pops up. Here is a leaderboard for some of the key metrics using LW_RC (we’ve seen this all before):

Now the same metrics, except substituting TT_BsR for LW_RC in all calculations:

This was a lot of work to get largely the same results. Maybe applying PAR will make things more interesting? 

Wednesday, August 04, 2021

Rate Stat Series, pt. 9: Theoretical Teams

We now depart the orderly, neat world of linear weights for the frontiers of offensive evaluation/rate stat development. Allow me to posit that there are three ways in which a batter impacts his team’s run scoring:

1. Through the direct, immediate consequences of his actions (e.g. he draws a walk or flies out). We could call this his primary contribution.

2. Through how those results create or fail to create additional opportunities for his teammates to bat (what I have been calling PA generation). We could call this his secondary contribution (I do so with some reservations because I do like secondary average, which uses “primary” to refer to direct contributions captured batting average and “secondary” to refer to other direct contributions like extra bases on hits, walks, and steals).

3. Through how his impact on the team alters the value of the actions of his teammates. This tertiary effect is hard to define, but we know that the run value of any offensive event is dependent on the context in which it occurs. A walk does no good if no one else in the lineup gets on base; each out is more costly in terms of runs in a higher scoring environment. Dynamic run estimators vary the value of each event based on the frequencies of all offensive events, while linear weights keep them fixed.

I listed and labeled these three elements of offensive production in the order of their magnitude; the third is very small, small enough that it is often ignored. Crucially for this discussion, it is small enough that if we are not careful, in attempting to measure it we could cause more unintended distortion with respect to the evaluation of #1 and #2 so as to make the exercise not just a waste of time, but actively harmful to our understanding.

So far in this series, we have looked at individual offense through two frameworks. Treating the player as a team by plugging his stats directly into a dynamic run estimator, we have captured (but distorted) #1 and #2 and given excessive weight to #3 by pretending as if the 8 teammates all perform at the level of the individual in question. By using linear weights, we have treated the player as if he was part of a semi-static environment where his direct actions and PA generation have an impact on his team, but that no matter how he performs, it has no impact on the offensive environment in which the other eight batters perform.

I believe that a third framework, which captures the impact of all three ways in which a batter affects team runs scored, is theoretically superior to the other approaches. This will involve modeling a team with and without our player – constructing a “theoretical team” in which eight members of the lineup perform at a given level and our player occupies one lineup spot. However, there are cautions, which I alluded to above:

1. The math becomes more complicated. As long as increased complexity corresponds to a more sound approach from a theoretical perspective, this is not objectionable to me, but that’s a minority viewpoint.

2. The impact of #3 is very small relative to #1 and #2, and is arguably negligible, especially when we consider all of the error bars that exist around run estimation, park factors, positional adjustments, and the myriad other variables which will come into play when the estimates are put to full use as part of an overall player evaluation system.

3. If the model which you use to implement this framework is poor, the distortions created when compared to a linear weights framework will swamp the attempt to measure the minuscule impact of #3. Even if your model is good (and I will be using Base Runs and I am quite confident that it is a good model), the linear weights framework is so robust that there is still some risk in abandoning it to chase capturing very small effects.

My original series on rate stats failed on this count, as I begged the question by assuming that a theoretical team approach was correct and using that as one of the testing criteria for other metrics. Again, I believe that the framework is theoretically correct, but the implementation is trickier, and I am not so arrogant today to believe that the model and my implementation are unquestionably superior to using a linear weights framework. To return to a bad and wildly overwrought nautical metaphor, linear weights provide a safe harbor with calm waters in which it is tempting to stay and not venture on to high seas where theoretical team frameworks tempt with the promise of riches but tempests and other dangers lurk.

Before starting, I want to note a handful of people who made significant contributions to the theoretical team concept. One is David Tate, who developed Marginal Lineup Value, which used the framework of basic runs created in conjunction with a theoretical team. Keith Woolner refined and popularized MLV. In 1998, Bill James published the approach that I will use here, although of course he used runs created. Published a year later, Jim Furtado’s Extrapolated Wins methodology used a linear run estimator (his XR) but fleshed out theoretical team concepts with respect to win impact and replacement level. Furtado also, along with G. Jay Walker and Don Malcolm, took apart James' theoretical team RC to understand what was going on behind the scenes. Finally, David Smyth, the developer of Base Runs, was the first to apply a TT construct to BsR and also developed the PAR adjustment which we’ll get to eventually.

Finally, before diving in to the specific implementation of TT I will use in this series, I want to note that by “theoretical team”, I am referring only to constructs that explicitly attempt to place the player on a theoretical/”reference” team, and use a dynamic run estimator to estimate his run impact. It does not refer to other approaches that may be undertaken to apply a dynamic run estimator to an individual hitter. One such example is the technique, so far as I know first used by Dick Cramer with his runs created-like run estimator, of calculating a batter’s runs created as the difference between the league with his stats and he league without them. This is a clever approach for using a dynamic run estimator in evaluating individuals, but not a TT approach. In fact, it more closely resembles the approach we used in this series to develop linear weights from Base Runs. The larger you make the pool to which the player is added, the more you dilute his impact. The differentiation approach takes this to the limit (see what I did there?) by isolating each event and calculating its value if it had no impact at all on the offensive environment.

In contrast, a TT approach uses a realistic scale between the individual and team; a typical approach is to assume that the individual gets 1/9 of team plate appearances. Using a 1/8 ratio between player and reference team does not require us to believe that the player actually had 1/9 of his team’s PA in the real world. One could use a player’s actual percentage of team PA and weight accordingly, but there is a balancing act: one one hand, we want to accurately capture the degree to which the batter impacted the team, but we also don’t want to lose sight of where his impact is actually felt. Consider a batter who plays in just one game in the season, getting four plate appearances. If you use his actual percentage of team PA  (which might be something like 0.05%) to calculate his impact on the team, he will have had essentially no tertiary effect. That is a distortion of reality, though – he really had something closer to 11.1% of the team’s PA, in the game in which he actually played. From the perspective of evaluating his impact on the team, the other 161 games are an accounting fiction, no more relevant to him than to games between other teams played thirty years prior (in fact, we should acknowledge that runs are actually scored at the inning level, which is where we started working out the math on PA generation).

So we will assume that the reference team always has eight times as many plate appearances as the player in question (which of course is equivalent to saying the player gets 1/9 of team PA). We could get cute and recognize that based on a player’s batting order position, his expected share of PA will change, and give different players a different share (while still limiting the scope to games/innings in which the batter actually played), but 1/9 is clean and any alternative approach would leave most batters pretty close to 1/9. The concept is simple; the formula will look a little messy. We start with our Base Runs equation:

A = H + W – HR = S + D + T + W

B = (2TB - H – 4HR + .05W)*.79776 = .7978S + 2.3933D + 3.9888T + 2.3933HR + .0399W

C = AB – H = Outs

D = HR

BsR = (A*B)/(B + C) + D

We will start by calculating the team’s runs with the player. This will take the same form, but now our A, B, C, and D components will start with the player’s stats and add eight reference players. I will assume that the reference player is a league average performer, and thus the reference team is a league average team prior to the addition of our player. One could make the case that with respect to the tertiary effect of a player, linear weights framework sidestep the issue by assuming an inverse relationship between the quality of the player in question and the quality of the reference team. That is, by using static linear weights for all players regardless of their performance, a linear weights framework implicitly assumes that the team is average after the player is added. Thus Frank Thomas is added to a worse team than Matt Walbeck, such that at the end of the day the run values of all events are the same between the Thomas team and the Walbeck team. 

If you are tempted to sweat the details and subtract the player’s stats from the league before determining league average, don’t. It is actually surprising how little impact the choice of reference team has on the outcome (which a cynic might note is a reason for suspecting that the tertiary effect is de minimis, but what’s the fun in that?) This is why James is able to get away with using a single final formula for converting the player’s A, B, and C factors in Runs Created (for which he laid out 24 different versions to cover major league history) to TT RC by using just one equation. It’s not technically correct, of course, but as long as long as the reference team is within a reasonable range of major league offense, it’s not debilitating.

Without our player, the reference team will have a number of plate appearances equal to eight times the individual’s PA, and will perform at the league average, so we can define each factor for the reference team as follows, with the calculation using the 1994 AL averages shown:

R_A = Lg(A/PA)*PA*8 = .3143*PA*8 = 2.514PA

R_B = Lg(B/PA)*PA*8 = .3402*PA*8 = 2.722PA

R_C = Lg(C/PA)*PA*8 = .6567*PA*8 = 5.254PA

R_D = Lg(D/PA)*PA*8 = .0290*PA*8 = .232PA

Then for the team with the player, the team versions of the A, B, C, and D factors are just the player’s factor plus eight times his PA times the league average of the factor/PA:

T_A = A + R_A = A + 2.514PA

T_B = B + R_B  = B + 2.722PA

T_C = C + R_C = C + 5.254PA

T_D = D + R_D = D + .232PA

In order to isolate the individual’s impact, we need to calculate how many runs his new theoretical team would score and subtract the runs that the reference team would have scored with just eight reference players. The team’s BsR will be:

T_BsR = T_A*T_B/(T_B + T_C) + T_D

Some of the PA terms  in the denominator can be combined, so for the 1994 AL we get:

T_BsR = (A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D + .232PA

The reference team’s run scored will be equal to the league average BsR/PA times 8 times the player’s PA; to calculate league BsR/PA we can just plug the league average A, B, C, and D factors per PA into the BsR equation to get BsR/PA, then multiply:

R_BsR = (.3143*.3402/(.3402 + .6567) + .0290)*8*PA = 1.090PA

So our estimate of the individual’s run contribution to the theoretical team, which we’ll call Theoretical Team Base Runs (TT_BsR) is just the difference:

TT_BsR = T_BsR – R_BsR

Since we have PA in each term, for the 1994 AL it simplifies to:

TT_BsR = (A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D - .8579PA

If we apply Frank Thomas’ statistics directly to Base Runs, we get an estimate of 139.0. If we use Base Runs to estimate linear weight coefficients for the league, we get 131.4 (what we’ve been calling LW_RC in this series). If we use the TT approach, we get 132.2. As you can see, the TT estimate is not that much different than the full linear estimate, which does call into question the need for the TT approach. After all, Thomas is one of the most extreme hitters in the league; if he barely moves the needle, who will?

Regardless of the utility of this approach, I find it useful as an intellectual exercise because I believe the framework is the closest to approximating the real relationship between an individual batter and team performance. For a series ostensibly about rate stats, I’ve spent an entire post just setting up the numerator; don’t rate stats typically have a denominator as well? Seriously, though, if there’s one takeway I would like a reader to glean from this series, it is that if you want to set up an offensive evaluation system, you need to think through all of the pieces as you develop it. Starting with a run estimator, and then slapping on a rate state, and a baseline, and whatever bells and whistles you need, is not a sound approach. The choice of run estimator determines which denominator you should use, and the two should be compatible. 

Wednesday, July 21, 2021

Rate State Series, pt. 8: Rate Stats for Linear Weights III

This installment will tie up a few miscellaneous loose threads pertaining to rate stats to be used within a linear weights framework. The first relates to R+/PA, which I demonstrated is a “correct” rate stat to be used in this framework; I prefer it to RAA/PA, which produces identical linear differences between players, because it can be meaningfully compared on a ratio basis as well.

One issue with R+/PA for users is that R/PA is not a quantity that most people have intuitive feel for. We could still use RAA/650 PA, or we could use R+/650 PA, or we could convert R+/PA to a “team game basis” (along the lines of the previously discussed runs/25.2 or runs/27 outs) by multiplying by the league PA/G, or a long-term average thereof. As long as it’s a scalar multiplier consistently applied, there’s no real distortion imposed on the results – it’s just a matter of user preference as to which scale should be used.

A thornier issue is the implications of RAA/PA or R+/PA for teams. I have encountered sabermetricians for whom I have a great deal of respect whose ideal rate stat would be equally applicable to individuals and teams. I do not think - for reasons that were discussed earlier in this series - that this is possible. Of everything I’ve asserted in this series, I’m most confident that R/O is the proper rate stat for teams. Applying it to the normal range of major league players causes distortions, which while not disastrous are  too frequent for me to just ignore.

Let’s look at all of the teams in the 1994 AL through these rate stats. For this exercise, we can greatly simplify the calculation of R+/PA – we don’t need to estimate extra PA, we know exactly how many PA each team has generated. We can also calculate RAA simply by using R/O. While we could always look at the number of runs a given run estimator produces for a team, it is also cleaner to just use actual runs scored:


In this case, the rank order using R/O and R+/PA is the same – this does not have to be the case, and I’m a little disappointed it worked out this way as it would have been nice to have a real-life example from this league-season. Again, the fundamental reason why plate appearances are not the correct rate state denominator for teams is that since team plate appearances are solely a function of team OBA, it is of no consequence how many plate appearances they need to produce the same R/O rate as another team. An offense that every inning produced four walks and three outs would score 1/3 R/O, the same as a team that every inning produced one home run and three outs, but the former would score .167 R/PA and the latter .250. In a league where the average team scored 1/6 R/O and .123 R/PA, each would get .5 RAA per inning, but the former would have a R+/PA of .194 and the latter .248.

To make the point again, if we use actual team runs scored or an estimate of team runs created as the starting point for a team rate stat, there is no need to attempt to measure the impact of their PA generation. In fact, making an additional adjustment would be inappropriate double-counting. The team’s totals already incorporate this impact, as the team’s actual plate appearances were a function of their on base average.

What if someone decided they wanted to construct a rate stat that didn’t include any subtraction in the numerator, and instead would just be a positive quality per plate appearance? If you follow this path, you will no longer have a final result expressed in a unit of runs, but perhaps due to your desire for a different scale this is preferable. Or maybe you want a rate stat that can never assume a negative value, as any stat that has a negative linear weight coefficient for an event has the potential to do; as we’ve discussed, even our absolute runs variant of linear weights will produce negative values for pitcher-level hitters. Maybe you want to make one of these alterations and them impose a different scale for some reason.

One way you could do this is to eliminate outs from the LW_RAA/PA equation. If you just ignore them, you will distort all of the values, and be left with a rate stat that is not only unitless, but also is just plain wrong. However, recognizing that outs are simply the complement of the events with the positive coefficients (since we’ve restricted our list of events to mutually exclusive and exhaustive components of plate appearances given our simplified definitions of terms for this series), we could rectify this by just subtracting the absolute run out value from each positive event. To put this in practice, start with our LW_RAA/PA equation:

LW_RAA/PA = (.5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs))/PA

Now we will subtract -.3150 from each of the events in the numerator that comprise the complement of outs (again, in this case this is all of the other variables in the numerator) to get a rate stat that for the moment I will call U (for “unitless”, although it’s really just for ease of reference) :

U = (.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W)/PA

This looks weird, since these weights are no longer in units of R or even R+, but what value does it take on for the league average? By definition, LW_RAA/PA for the league is zero, so this resolves to zero for the league:

LW_RAA/PA = (.5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs))/PA 

The manipulation we did to get U was just to subtract -.3150*PA from everything, so:

(.5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs) - (-.3150*PA) )/PA = (0 + .3150*PA)/PA = .3150

So the league average of U is by definition equal to the negative of the outs coefficient (.3150 for the 1994 AL). So what happens if we compare a player’s U to the league average?

(.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W )/PA – LgU 

multiply U by PA/PA to get

(.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W)/PA – (LgU *PA)/PA

= (.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W – LgU*PA)/PA

plug back in that LgU = .3150 in our case to get:

(.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W – .3150PA)/PA

If we distribute the -.3150 to all of the events already in the numerator that are plate appearances, then the leftover PA are outs, and so we are left with:

(.5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs))/PA

So U - LgU = LW_RAA/PA. Even though the coefficients have been manipulated into unitless, non-negative values, U maintains the necessary condition to be a proper rate stat for a linear weight framework because it produces the correct RAA/PA.

Why would someone want to express an equivalent to RAA/PA in these terms? This is not a question that I am qualified to answer, since I don’t personally use this form – but there’s a decent chance that you, as a consumer of baseball analysis circa 2021 have been exposed to it, whether you realize it or not. 

Since U is now unitless, one might wish to manipulate it in order to put it on a scale that they find useful, perhaps the scale of a fundamental statistic. If you think about what U is, the numerator includes all on-base events, weighted by their run value – the RAA out coefficient. The denominator is just plate appearances. There is a fundamental statistic that we have been using throughout this series that takes a similar form, except instead of weighting the on-base events, it gives them all an equal weight of one. The fundamental form is the same though:

U = (.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W)/PA

other stat = (S + D + T + HR + W)/PA

The other stat is of course On Base Average. If one were to tweak the numerator coefficients for each event in U, it could be put on the same scale as OBA, but instead of giving each on-base event a weight of one, it would give each on-base event a weight based on its run value. It would be a weighted On Base Average. Enter Tom Tango and wOBA, which is now one of the two most ubiquitous rate stat in general sabermetric use thanks to its prominence at Fangraphs (with the other being OPS+ thanks to Baseball-Reference).

In order to convert what I have called U to wOBA, we just need to multiply it by the ratio between the league OBA and U, or equivalently the ratio between the league OBA and the negative of the LW_RAA out coefficient. For the 1994 AL, the average OBA was .3433 and U is .3150, so the ratio is 1.0896 (I will define V = LgOBA/LgU just for ease of writing formulas) which produces:

wOBA = U*V = (.8219S + 1.1533D + 1.4846T + 1.8210HR + .6646W)/PA*1.0896

wOBA = (.8956S + 1.2566D + 1.6176T + 1.9744HR + .7241W)/PA

I think there is a lack of understanding among even engaged consumers of sabermetric analysis as to what wOBA actually represents, and how it relates to linear weights. I admit that even as someone who is deeply interested in anything related to run estimators or rate stats, as I don’t actively use wOBA, I have to remind myself why it works. I have fielded multiple email questions from people who obviously think much more deeply about the construction of metrics than the average consumer, and they don’t know the connection and don’t find it intuitive.

I don’t consider this is the fault of Tango, who has explained this. Dave Studeman has written a good article explaining this. But for whatever reason it feels like a secret despite the best efforts of the people who are responsible for it.

So I will take a second here and spell out the relationship between the wOBA family of metrics, which includes wOBA, wRC (Weighted Runs Created, which will be equivalent to what we called R+), and wRAA (Weighted RAA, which is equivalent to LW_RAA), and R+/PA and LW_RAA/PA as we’ve discussed in this series. Keep in mind that I am speaking generally about these relationships using the metrics as I have defined them in this series, and not specifically about the implementation at Fangraphs or anywhere else, although the basic relationships hold. Some of the algebraic demonstrations can be simplified.

If you wish to convert wOBA into RAA (which Fangraphs would call wRAA), we’ve already demonstrated that we can compare U directly league U, so when using wOBA instead of U you just need to remember to remove V:

wRAA = (wOBA – LgwOBA)/V * PA

To calculate wRC (which will be equivalent to R+),  you need to add in the league average R/PA times the batter’s PA:

wRC = wRAA + LgR/PA*PA = (wOBA – LgwOBA)/V*PA + LgR/PA*PA = ((wOBA – LgwOBA)/V + LgR/PA)*PA

wOBA can be easily converted to wRAA/PA, which is equal to LW_RAA/PA:

wRAA/PA = LW_RAA/PA = (wOBA – LgwOBA)/V

and since R+/PA = LW_RAA/PA + LgR/PA, we can also write:

R+/PA = (wOBA – LgwOBA)/V + LgR/PA

Let’s run a leaderboard using the “w” stats, although we’ve already seen most of these values in different guises:

Personally, I prefer R+/PA to wOBA as a rate stat, as the former is directly comparable both as a difference and a ratio, while the latter has to be manipulated in order to be compared either way – both differences and ratios of wOBA have no intrinsic meaning. However, the advantages of having a scale similar to that of OBA, where no negative values are possible, where each event has a clean positive weight, and the “natural” denominator of plate appearances is used. Tango et. al. also took advantage of its structure to more easily apply statistical techniques in The Book, so there are certainly reasons why a user might prefer it, and it is well-constructed by which I mean that the scaling does not cause issues for derivative metrics as long as you know how to account for it.

Finally, the third loose thread I wanted to address in this post. Prior to introducing wOBA, Tango developed a rate stat version of linear weights he called Linear Weight Ratio. It was somewhat conceptually similar to wOBA in that it sought to eliminate the negative weight for outs. Instead of adding the out value to the positive events in the numerator and thus being able to dispense with the negative value of the out as is the case for wOBA, LWR was constructed by making outs the denominator. In order to make the relationship between the positive events more clear, the value of a single was fixed at 1.0, with the coefficients for the other events defined as the ratio of the LW value for the given event to the LW value for a single. If you let the LW value of a single be s, a double be d, etc., then the formula for LWR is:

LWR = (s/s*S + d/s*D + t/s*T + hr/s*HR + w/s*W)/Outs

Which for our 1994 AL weights of  .5069S + .8382D + 1.1695T + 1.4970HR + .3495W becomes:

LWR = (S + 1.6536D + 2.3072T + 2.9532HR + .6895W)/Outs

If we define “x” as the absolute LW value of the out (-.1076 in this case), there are some very straightforward relationships between LWR and LW_RC/O:

LW_RC/O = s*LWR + x

LWR = (LW_RC/O + x)/s

If we define “y” as the average LW value of the out (-.3150 in this case), we can define similar relationships between LWR and LW_RAA/O:

LW_RAA/O = s*LWR + y

LWR = (LW_RAA/O + y)/s

Of course, LW_RAA is the building block for our “correct” rate stat (be it LW_RAA/PA, R+/PA, or wOBA) for the linear weights framework, so you could with some additional algebra convert LWR to those, although LWR sans manipulation does not meet the differential or ratio comparability standards. 

Wednesday, July 07, 2021

Rate State Series, pt. 7: Rate Stats for Linear Weights II

In the last post, we ended with a dilemma: we have a metric (linear weights RAA/PA) that we believe is correct for a linear framework, but it can’t be compared using ratios. We also have an approach (linear weights RC/O) that produces the right RAA but as a rate is inconsistent with RAA/PA. I asked whether we might be able to make adjustments to one or both of these that would get us to the right place.

Let’s start with trying to manipulate R/O to get a proper rate stat. We know that using R/O for individual players (even when using a linear weights estimate for runs created) overstates their value by treating them as a team with respect to their plate appearance generation. We know that using R/PA fails miserably because it doesn’t account for PA generation at all. What if, instead of trying to manipulate the denominator to get an acceptable rate stat, we attempted to manipulate the numerator?

Linear weight runs created with the “-.1 type out value” doesn’t take into account the extra opportunities created (or not created) for a batter’s teammates by his avoidance of outs, but we could make an explicit adjustment that would take care of this. In part 2, we laid out the math to calculate plate appearances as a function of OBA:

PA/G = (LgO/G)/(1 – OBA)

To compare a player’s rate of PA generation to the league, we can calculate the PA/G he would generate as a team, less the league average. For the sake of simplicity, let’s use the variable X to represent league O/G, and the variable EPA to mean “Extra PA” relative to what would be produced by a hitter with a league average OBA. (What follows is a needless bunch of algebra, but I wanted to demonstrate that the final result is tied back to the equation that relates PA/G, O/G, and OBA):

X/(1 – OBA) – X/(1 – LgOBA) = EPA/G

where “games” are defined as X outs. Since this is the case, we can factor out X from the left side of the equation and divide both sides by X, which converts EPA/G to EPA/Out:

1/(1 – OBA) – 1/(1 – LgOBA) = EPA/O

For ease of notation, I’m going to switch to using O/PA to represent the hitter’s 1/(1 – OBA); the complement of OBA is O/PA, and it’s reciprocal is thus PA/O. I will leave LgOBA as a variable since it is a constant from the perspective of calculating the individual’s PA generation:

PA/O – 1/(1 – LgOBA) = EPA/O

As this is an individual metric, I’d rather express it with a denominator of PA than O, so I will multiply both sides by O/PA to get:

(PA/O – 1/(1 – LgOBA))*O/PA = EPA/PA = 1 – 1/(1 – LgOBA)*(1 – OBA) since O/PA is just 1 – OBA

= 1 – (1 – OBA)/(1 – LgOBA) 

replace 1 with (1 – LgOBA)/(1 – LgOBA) to get:

(1 – LgOBA)/(1 – LgOBA) – (1 – OBA)/(1 – LgOBA) = (1 – LgOBA)*(1 – OBA)/(1 – LgOBA)

= (OBA – LgOBA)/(1 – LgOBA) = EPA/PA

This simple equation yields the number of additional PA generated for a batter per PA, beyond what a batter with a league average OBA would have generated. Of course, if we multiply by PA, we will get the raw number of extra PA generated. As an example, Frank Thomas had a .4921 OBA in a league where the OBA was .3433, and he had 508 PA, so he created an additional 115.1 PA for his team beyond what an average hitter would have contributed:

(.4921 - .3433)/(1 - .3433)*508 = 115.1

In order to use this with our modified R/PA, we need to convert it to runs, which can be done by simply multiplying by the league average of .1363 R/PA to get 15.7. As we’ve computed previously, Thomas had 131.4 LW_RC, which is his direct run contribution as a result of his own PA, but without considering the impact he had on his team by creating additional PA (or, more precisely, on a league average team since the entire linear weights framework we’ve been working with this in this series is built from the league average). That was worth an additional 15.7 runs, so his total run contribution for the numerator was about 147.1 runs.

Before moving on, credit is due to the developer of this approach. This methodology, at least expressed in terms of absolute runs plus the PA impact, was first developed by a BaseballBoards.com poster with the moniker “Sibelius”; if he was not the first, he is certainly the person who introduced me to this concept. Sibelius’ original construct combined this with Jim Furtado’s XR (Extrapolated Runs), so he called it XR+ and the resulting rate XR+/PA. I will be using the more generic R+/PA to describe this metric going forward.

Additionally, Sibelius calculated the “+” portion due to PA generation in a mathematically equivalent but quicker way than I did. This later tripped me up, as I had been influenced by his ideas but approached the problem in the manner I did above, starting from extra PA and converting to runs. Thus I posted it on the board as if it was a new, alternative approach rather than simply a different way of calculating the same thing. My faux paus was quickly pointed out and I gladly concede it, although I personally still find the “proof” above to be the most straightforward way to understand the math. But Sibelius’ calculation is more straightforward and I will use it going forward. Instead of messing around with PA, he jumped straight to outs, calculating the number of outs that the player avoided beyond those an average OBA hitter would have made in the same number of PA. Then he multiplied by the league average R/O to get the runs from PA generation/out avoidance. Six of one, half a dozen of the other as of course R/PA = R/O * (1 – OBA), and so Sibelius' equation for the “+” portion would look like this:

(OBA – LgOBA)*PA*LgR/O

For Thomas, this is (.4921 - .3433)*508*.2075 = 15.7

So our formula for R+/PA will be:

R+/PA = (LW_RC + (OBA – LgOBA)*PA*LgR/O)/PA

Note that this is equivalent to:

R+/PA = LW_RC/PA + (OBA - LgOBA)*LgR/O

In order to compute the associated RAA:

RAA = (R+/PA – LgR/PA)*PA

This works as of course the league average R+/PA is equal to the league average R/PA by definition.

Here are the top and bottom five hitters from the 1994 AL:



The first column for RAA is RAA based on R+/PA, and it is an exact match for LW_RAA. Thus R+/PA meets the necessary condition for being an acceptable rate stat. However, so did R/O. We need to test that it produces the same rank order as LW_RAA/PA beyond these players; a good place to start would be with the cohort of four players who were jumbled in order when looking at RAA/PA and R/O:


Here we get a match in rank order. Of course, looking at fourteen players and finding consistency in their ranking doesn’t prove that there would never be a player for whom the two would not agree. We will need some other approach to demonstrate that.

Let’s leave R+/PA for now and go back to the other approach we were considering, which was making an adjustment to RAA/PA that would allow it to produce meaningful ratio comparisons. The obvious solution is to add the league average R/PA to RAA/PA to get a modified absolute R/PA. After all, RAA/PA  represents an individual’s contribution above average per PA, taking the impact of PA generation into account. Adding back in the league average R/PA will return this to an absolute R/PA basis, that will equal raw R/PA at the league level by definition, and that unlike LW_RC/PA will capture the value of extra PA generated by the batter. We know that by definition it will match RAA, since the difference between an individual and the league average will be equal to RAA/PA (RAA/PA + LgR/PA – LgR/PA).

Here are the top and bottom 5, with “mod R/PA” being the metric we’re discussing (LW_RAA/PA + LgR/PA):

If these figures look familiar, it’s because (with minor rounding discrepancies), they are equal to the R+/PA figures we just calculated. That’s right – all of that algebra to calculate extra plate appearances or outs avoided, and multiply by the league average R/PA or R/O, add back to LW_RC – we could have dispensed with all of it, and just added LgR/PA to LW_RAA/PA.

This is our “proof” that R+/PA as we built it from adjusted absolute runs created meets our criteria for a proper rate stat in a linear weights framework – it’s equivalent to using RAA/PA. The two parallel approaches are not in fact parallel – they are alternative ways of getting to the same place.

To demonstrate why the math works out this way, let’s return to our two linear weights equations from part 1:

LW_RC = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .1076(outs)

LW_RAA = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs)

If we think about everything on a per plate appearance basis and building R+/PA using the RAA/PA + LgR/PA path, what we’re doing is taking the second equation and adding the league average R/PA for every event that corresponds to a plate appearance, which is all of them (in a full blown implementation, where we would be considering non-PA events both in the linear weights formula and the calculation of the PA generation impact, this wouldn’t be as clean). Since the league R/PA is .1363, this results in:

R+ = .6432S + .9745D + 1.3058T + 1.6332HR + .4858W - .1788(outs)

Alternatively, from the Sibelius approach, the plus component can be manipulated to read:

(OBA – LgOBA)*PA*LgR/O = ((H+W)/PA – LgOBA)*PA*LgR/O = (H + W – LgOBA*PA)*LgR/O

Continuing to treat everything on a per PA basis, this is (H + W – LgOBA)*LgR/O

Thus any hit or walk contributes (1 – LgOBA)*LgR/O = LgR/PA = .1363, and each out “contributes” (0 – LgOBA)*LgR/O = -LgOBA*LgR/O = -.3433*.2075 = -.0712. Adding .1363 to the LW_RC value for each event results in the same weights we just found (i.e. .6432 for a single), as the weights for on base events are the same between the LW_RC and LW_RAA equations. The out is now worth -.1076 + -.0712 = -.1788 runs, also the same value. We knew from the results that the two approaches must be equivalent, but I always prefer a “proof” when possible.

At this juncture we reach a philosophical question: should we dispense with the notion of absolute runs created for individuals altogether, and replace it with a R+ approach? In other words, instead of our default way of discussing an individual’s contribution being something like: “Frank Thomas created 131 runs in 508 plate appearances while making 258 outs”, and presenting RC, the rate stat, and some metric compared to a baseline, we could cut to the chase and say “Frank Thomas contributed 147 runs in 508 plate appearances”, with the rate and baselined metric. This approach does away with the need to explicitly think about outs, since their special impact beyond plate appearances has been built in to the 147 R+. It also prevents the awkward situation of having a runs created figure of 131 but no handy denominator with which to convert it to a rate stat without doing a R+ or RAA calculation first. 

There is no right answer, although there certainly is a good case to be made for just reporting Thomas’ runs created as 147, since it limits the likelihood of user-initiated bad math. Call it inertial reasoning if you must, but I still like the idea that the runs created figure is the batter’s direct contribution, and the secondary impact of PA generation is captured when looking at the rate or the baselined stat. 

Wednesday, June 23, 2021

Rate Stat Series, pt. 6: Rate Stats for Linear Weights I

Last time, I attempted to demonstrate that linear weights producing a result in terms of runs above average properly account for the value of extra plate appearances created by a batter. From here on, I will talk about this contention in the same way as I would any scientifically-demonstrated fact, even though I admit that I have not provided a “proof” in the mathematical sense. The casual use of language is intended to prevent wasting space repeating myself rather than an attempt to claim a more robust result than is appropriate.

Since we have demonstrated that LW_RAA captures the player’s direct and indirect contributions to team offense (at least within the linear framework of metrics), I would contend that it follows that any linear weights-based rate stat we should propose must return the same RAA as we would get from eschewing a rate stat altogether and just applying our linear weights formula with the “-.3 type out value”.  One very simple way to do that is to simply use RAA, rather than a measure of absolute runs created, as the numerator for the rate stat. In this case, the obvious denominator is plate appearances.

Using outs in the denominator would be appropriate for a team metric, but for an individual would overstate the value of his PA generation. For a simple demonstration of this, recall from part three the equation:

PA/G = (O/G)/(1 – OBA)

and the equivalent PA = O*(1 – OBA), given our simple definitions which as a refresher are PA = AB + W, O = AB – H, and OBA = (H + W)/(AB + W)

This direct relationship between plate appearances, outs, and OBA means that if we were to use outs as the denominator rather than PA, we would be inflating the rates for players with higher OBAs, even though we’ve already demonstrated that the PA generation impact of higher OBAs is captured in the numerator (RAA).

Here were the top and bottom 5 performers in terms of RAA/PA, which for ease of use I have restated as RAA/650 PA (This works out to 152.5 full games for a player getting 1/9 of an average 1994 AL team’s PA; sadly, these teams didn’t get anywhere close to playing a 162 game season. There’s no special significance to 650 other than that it is a round number that is reasonably close to the number of PA a full-season batter might accumulate):


You may have noticed that the identity and order of these players did not change from when we used a fully dynamic approach that treated them as if they were their own teams (BsR/O). This recalls a simple truth that I am burying beneath thousands of words of minutia – any reasonable approach will reach similar conclusions for the majority of situations. Even a poorly designed, indefensible metric like OPS will get you 98% of the way there. This series is about the tiny differences that lie beyond that point and the larger differences that arise in values as opposed to rank order.

To wit, while the rank order remains the same, the RAA values are different using linear weights, and with the exception of Griffey, less extreme. All of the other players have moved closer to average, with Frank Thomas losing a whopping 7.7 RAA due to using a linear approach rather than imagining an entire lineup of Big Hurts.

At this point, we could stop, and simply use RAA/PA or RAA/650 PA as our final linear weights-grounded rate stat. However, it lacks the very useful trait of ratio comparability that would be desirable in the ideal rate stat. While linear differences in RAA/PA can be compared (e.g. Thomas contributed an additional .081 RAA/PA beyond what Chili Davis did), the ratios are not particularly useful. Consider two players who each had 600 PA, one contributing +1 RAA and the other -1 RAA. If you can explain the practical baseball interpretation of the resulting ratio of -1, be my guest.

This happens because we have applied the high baseline of average to the metric. However, we have another version of linear weights that is based on absolute runs from which we could build a rate stat. Remember that in order to qualify for consideration, the RAA that results from that rate stat must be equivalent to the RAA from simply applying the linear weights formula and not comparing a rate to the league average.

The obvious first choice is Runs/PA, using our formula for linear weights runs created. In this case, I am not showing the leaders, but rather the same ten players in the same order. RAA in this case is (LW_RC/PA – LgR/PA)*PA:

The results are not even close to what we need, and the reason is simple: we have not accounted in any way for each batter’s PA generation. There is an easy alternative that might correct this: using outs in the denominator. This just takes us to the correct team rate stat, although the numerator is linear rather than dynamic (either in the case of using Base Runs or using actual runs scored on the team level). As such it does implicitly consider PA generation; might it produce satisfactory results for individuals? Here RAA = (LW_RC/O – LgR/O)*O:

A perfect match. Absolute runs per out produces the same RAA as the direct application of LW_RAA. R/O also has the advantages of being meaningfully comparable as both a difference and a ratio and is the same as the correct team rate stat. Everything is great, except we haven’t answered the question: Does it actually work?

Remember, I said that matching LW_RAA was a necessary condition for our proposed alternative linear weight rate stat to meet – it is not a sufficient condition. We’ve already concluded that RAA/PA is a proper rate stat for a linear framework; in order for an alternative to be acceptable, it must produce results that are consistent with RAA/PA. How do we determine this consistency, other than matching the RAA result, when the numerators and the denominators each start from a different basis? 

One simple but obvious way to determine if they are consistent is to confirm that they result in the same rank order of players. They did for our most extreme hitters, but does that hold for all hitters? 

Apparently not. I didn’t have to go to far to find this little cluster of hitters as they rank 7-10 in RAA/PA. None of them rank in the same spot as Vaughn is first in RAA/650 but second in RC/O; Lofton is second/third; Mack is third/fourth; and Clark is fourth/first. 

I slipped OBA onto the chart because it helps to explain what is going on. Will Clark’s .433 OBA ranked fifth among AL hitters with 200 PA; using outs in the denominator helps him as it implicitly assumes that he represents an entire team with a .433 OBA. While all of these hitters had excellent OBAs relative to the league, R/O goes a bit too far in valuing this. Given that R/O still produces the right RAA, any distortions have to be somewhat limited for normal players.

One can come up with extreme thought experiments, like a player with a .999 OBA all from walks versus a player with a .600 OBA, all from home runs. A team made up of the former would score what would certainly feel to the opposing pitching coach like a nearly infinite number of runs; but as a single player in a lineup, his impact would be muted. It’s not necessary to answer the thought experiment as to which would be more productive to see that R/O applied to extreme individual players would break down. Incidentally, it is exactly this type of scenario that I got into trouble trying to “prove” in my last attempt at this series – while I do think that the theoretical team methods I’ll discuss later provide reasonable estimates for this situation, relying too heavily on them for proofs is begging the question. 

So, we have a rate stat (LW_RAA/PA) that works, but it lacks ratio comparability. There are (at least) two ways we could go about solving this problem:

1. We could adjust LW_RC/PA in some way to take into account the value of PA generation

2. We could manipulate LW_RAA/PA so that it’s no longer a measure of runs above average, but instead on an absolute runs basis

Next time I’ll explore these two parallel questions...if in fact they are parallel at all.

Wednesday, June 09, 2021

Rate Stat Series, pt. 5: Linear Weights Background

Linear methods sidestep the issues that arise from applying dynamic run estimators to players by simply ignoring any non-linearity in the run scoring process altogether. While this is clearly technically incorrect, it is closer to reality than pretending that a player’s performance interacts with itself. Since an individual makes up only 1/9 of a lineup, it is much closer to reality to pretend that his performance has no impact on the run environment of his team than to pretend that it defines the run environment of his team. Linear weights also have the advantage of being easy to work with, easy to adapt to different baselines, and easy to understand and build. Their major drawback is that the weights are subject to variation due to changes in the macro run environment (as distinguished from the marginal change to the run environment attributable to an individual player). 

Linear methods were pioneered by FC Lane and George Lindsey, but it was Pete Palmer who used them to develop an entire player evaluation system, publish historical results, and bring them into the position of the chief rival to Runs Created in the 1980s. Curiously (especially since Palmer is a prolific and brilliant sabermetrician whose pioneering work includes park factors, variable runs per win, using the negative binomial distribution to model team runs per game, and more), Palmer’s player evaluation system as laid out in The Hidden Game of Baseball and later Total Baseball and the ESPN Baseball Encyclopedia never bothered to convert its offensive centerpiece Linear Weights into a rate statistic.

This gap contributed to two developments that I personally consider unfortunate. First, confusion about how to convert linear weights to a rate may have hampered the adoption of the entire family of metrics, and this confusion generally persisted until the publication of The Book by Tom Tango, Mitchel Lichtman, and Andy Dolphin. Second, Palmer did offer up a rate stat, but he did not tie it to linear weights, or in its crudest form to any meaningful units at all. That’s because Normalized OPS (later called Production), which you may know as OPS+, was the rate stat coupled with linear weights batting runs.

To my knowledge, Palmer has never really explained why he didn’t derive a rate stat from linear weights to use; the explanations have instead focused on the ease and reasonable accuracy of OPS. In The Hidden Game, the discussion of linear weights transitions to OPS with “For those to whom calculation is anathema, or at least no pleasure, Batter Runs, or Linear Weights, has a ‘shadow stat’ which tracks its accuracy to a remarkable degree and is a breeze to calculate: OPS, or On Base Average Plus Slugging Percentage.”

Coincidentally, Palmer recently published an article in the Fall 2019 Baseball Research Journal titled “Why OPS Works”, which covers a lot of the history of his development of linear weights and OPS, but still doesn’t explain exactly why a linear weights rate wasn’t part of the presentation.

Without the brilliant mind of Palmer to guide us, where should we turn for a proper linear weights-based rate stat? To answer that question, I think it’s necessary to briefly examine how linear weights work. For this discussion, I am taking for granted that the empirical derivation of linear weights is representative of all linear weight formulas. This is not literally true, as belied by the fact that the linear weights I’m using in this series were derived from Base Runs, not from empirical data. If we were using an optimized Base Runs formula, the resulting weights would be very close to empirical weights derived for a similar offensive environment, but other approaches to calculating linear coefficients like multiple regression can deviate significantly from the empirical weights. Even so, the final results are similar enough that the principles hold for reasonable alternative linear weight approaches.

What follows will be elementary for those of you familiar with linear weights, but let’s walk through a sample inning featuring the star of our series, Frank Thomas. I want to use this example to illustrate two properties of linear weights when using the “-.3 type out value” (i.e. when the result is runs above average): the conservation of runs, and the constant negative value of outs. This example will simplify things slightly, as in reality not every event in the inning cleanly maps to a batting event that is included in a given linear weights formula (e.g. wild pitches, balks, extra bases on errors, etc.) It also will presume that the run expectancy table we use for the example corresponds perfectly to our linear weights, which it does not. Still, the principles are generally applicable to properly constructed linear weights methods, even if the weights were derived from other run expectancy tables or, as is the case for us in this series, by another means altogether (I’m using the intrinsic weights derived from Base Runs for the 1994 AL totals).

Baseball Prospectus has annual run expectancy tables; their table for the 1994 majors is:


On July 18, Chicago came to bat in the bottom of the seventh trailing Detroit 9-5. Their run expectancy for the inning was .5545 as Mike LaValliere stood in against Greg Cadaret. He drew a walk, which raised the Sox RE to .9543, and thus was worth .3998 runs. The rest of the inning played out as follows:


1. If we were going to develop empirical LW coefficients based on this inning, we would conclude that a home run was worth 2.658 runs on average, and thus our linear weight coefficient for a home run would be 2.658. The other events would be valued:


This is in fact how empirical LW are developed, but of course a much larger sample size (typically at least an entire league-season) is used.

2. The team’s runs above average for the inning is always conserved. We started the inning with the bases empty and nobody out for a RE of .5545. This is the same as saying that the average for an inning is .5545 runs scored. The White Sox actually scored 4 runs, and the total of the linear weight values of the plays was 3.4455 runs, which is 4 - .5545. They scored 3.4455 runs more than an average team would be expected to in an inning. The sum of the linear weight values will always match this.

Because of this, we can be assured that the run value of additional plate appearances created by the positive events of the batters has been taken into account in the linear weight values. If this were not the case, runs would not be conserved.

3. Since that is true, it is also true that the sum of the LW values of the positive events (which is 4.8128 runs) plus the sum of the LW values of the outs (-1.3673) must be equal to the runs above average for the inning (3.4455). The sum of the values of the outs will be higher in innings in which more potential runs were “undone” by outs, as was the case here. On the other hand, an inning in which three outs are recorded in order will result in -.5545 runs.

We can use this fact to isolate the run value of the out between the portion that is due to ending the inning (what Tom Tango has called the “inning killer” effect of the out; this is the -.5545 that is the minimum out value for an inning), and that which is due to wasting the run potential of the positive events (what’s left over, in this case, -.8128 runs). 

If we wish to convert our linear weights from an estimator of runs above average to an estimator of absolute runs, we need to back out the inning killer value of the out (since it will be present for every inning equally and serves to conserve total RAA) from the overall value of the out, leaving the remainder which we do not need to worry about as it would have to be debited from the value of the positive events in order to conserve runs.

So we can take -.5545/3 = .1848 and add it back to the linear weight RAA out value, which for our example was -.3150. This results in an absolute out run value of -.1302, In our example we’re using -.1076; these don’t reconcile because:

1. our linear weights don’t consider all events (we’re ignoring hit batters, sacrifices, all manner of baserunning outs, etc.)

2. our linear weights weren’t empirically derived from the 1994 RE table as the .1848 adjustment was

While the numbers don’t (and shouldn’t!) balance perfectly in this case, this is the theoretical bridge for converting empirical linear weights from a RAA basis to an absolute runs basis. I would also contend is serves as a demonstration by inductive reasoning that absolute linear weights do not capture the PA generation impact of avoiding outs, but RAA linear weights do.

Note that converting to the “-.1 type out value” does not eliminate the result of negative runs altogether. An offensive player who is bad enough will be credited with negative runs created (if it helps you to imagine what this level of production might look like, consider that the total offensive contributions of pitchers has hovered near zero absolute runs created in the last decade). For real major league position players, this will not happen except due to sample size. If you’d like an interpretation, I have found this helpful (I stole it from someone, probably Tom Tango, and have badly paraphrased): Since linear weights fix the values of each event for all members of the team, the level at which runs created are negative is the level at which in order to conserve team runs, the weights of positive events cannot be reduced – the poor batter essentially undoes some of the positive contributions of his teammates.

As an aside, the first paper I’m aware of that made the connection between the two linear weight approaches in this manner (rather than simply solving algebraically for the difference between the two without providing theoretical underpinning) was published by Gary Skoog in a guest article in the 1987 Baseball Abstract. This article, titled “Measuring Runs Created: The Value Added Approach” is available at Baseball Think Factory.