## Wednesday, July 07, 2021

### Rate State Series, pt. 7: Rate Stats for Linear Weights II

In the last post, we ended with a dilemma: we have a metric (linear weights RAA/PA) that we believe is correct for a linear framework, but it can’t be compared using ratios. We also have an approach (linear weights RC/O) that produces the right RAA but as a rate is inconsistent with RAA/PA. I asked whether we might be able to make adjustments to one or both of these that would get us to the right place.

Let’s start with trying to manipulate R/O to get a proper rate stat. We know that using R/O for individual players (even when using a linear weights estimate for runs created) overstates their value by treating them as a team with respect to their plate appearance generation. We know that using R/PA fails miserably because it doesn’t account for PA generation at all. What if, instead of trying to manipulate the denominator to get an acceptable rate stat, we attempted to manipulate the numerator?

Linear weight runs created with the “-.1 type out value” doesn’t take into account the extra opportunities created (or not created) for a batter’s teammates by his avoidance of outs, but we could make an explicit adjustment that would take care of this. In part 2, we laid out the math to calculate plate appearances as a function of OBA:

PA/G = (LgO/G)/(1 – OBA)

To compare a player’s rate of PA generation to the league, we can calculate the PA/G he would generate as a team, less the league average. For the sake of simplicity, let’s use the variable X to represent league O/G, and the variable EPA to mean “Extra PA” relative to what would be produced by a hitter with a league average OBA. (What follows is a needless bunch of algebra, but I wanted to demonstrate that the final result is tied back to the equation that relates PA/G, O/G, and OBA):

X/(1 – OBA) – X/(1 – LgOBA) = EPA/G

where “games” are defined as X outs. Since this is the case, we can factor out X from the left side of the equation and divide both sides by X, which converts EPA/G to EPA/Out:

1/(1 – OBA) – 1/(1 – LgOBA) = EPA/O

For ease of notation, I’m going to switch to using O/PA to represent the hitter’s 1/(1 – OBA); the complement of OBA is O/PA, and it’s reciprocal is thus PA/O. I will leave LgOBA as a variable since it is a constant from the perspective of calculating the individual’s PA generation:

PA/O – 1/(1 – LgOBA) = EPA/O

As this is an individual metric, I’d rather express it with a denominator of PA than O, so I will multiply both sides by O/PA to get:

(PA/O – 1/(1 – LgOBA))*O/PA = EPA/PA = 1 – 1/(1 – LgOBA)*(1 – OBA) since O/PA is just 1 – OBA

= 1 – (1 – OBA)/(1 – LgOBA)

replace 1 with (1 – LgOBA)/(1 – LgOBA) to get:

(1 – LgOBA)/(1 – LgOBA) – (1 – OBA)/(1 – LgOBA) = (1 – LgOBA)*(1 – OBA)/(1 – LgOBA)

= (OBA – LgOBA)/(1 – LgOBA) = EPA/PA

This simple equation yields the number of additional PA generated for a batter per PA, beyond what a batter with a league average OBA would have generated. Of course, if we multiply by PA, we will get the raw number of extra PA generated. As an example, Frank Thomas had a .4921 OBA in a league where the OBA was .3433, and he had 508 PA, so he created an additional 115.1 PA for his team beyond what an average hitter would have contributed:

(.4921 - .3433)/(1 - .3433)*508 = 115.1

In order to use this with our modified R/PA, we need to convert it to runs, which can be done by simply multiplying by the league average of .1363 R/PA to get 15.7. As we’ve computed previously, Thomas had 131.4 LW_RC, which is his direct run contribution as a result of his own PA, but without considering the impact he had on his team by creating additional PA (or, more precisely, on a league average team since the entire linear weights framework we’ve been working with this in this series is built from the league average). That was worth an additional 15.7 runs, so his total run contribution for the numerator was about 147.1 runs.

Before moving on, credit is due to the developer of this approach. This methodology, at least expressed in terms of absolute runs plus the PA impact, was first developed by a BaseballBoards.com poster with the moniker “Sibelius”; if he was not the first, he is certainly the person who introduced me to this concept. Sibelius’ original construct combined this with Jim Furtado’s XR (Extrapolated Runs), so he called it XR+ and the resulting rate XR+/PA. I will be using the more generic R+/PA to describe this metric going forward.

Additionally, Sibelius calculated the “+” portion due to PA generation in a mathematically equivalent but quicker way than I did. This later tripped me up, as I had been influenced by his ideas but approached the problem in the manner I did above, starting from extra PA and converting to runs. Thus I posted it on the board as if it was a new, alternative approach rather than simply a different way of calculating the same thing. My faux paus was quickly pointed out and I gladly concede it, although I personally still find the “proof” above to be the most straightforward way to understand the math. But Sibelius’ calculation is more straightforward and I will use it going forward. Instead of messing around with PA, he jumped straight to outs, calculating the number of outs that the player avoided beyond those an average OBA hitter would have made in the same number of PA. Then he multiplied by the league average R/O to get the runs from PA generation/out avoidance. Six of one, half a dozen of the other as of course R/PA = R/O * (1 – OBA), and so Sibelius' equation for the “+” portion would look like this:

(OBA – LgOBA)*PA*LgR/O

For Thomas, this is (.4921 - .3433)*508*.2075 = 15.7

So our formula for R+/PA will be:

R+/PA = (LW_RC + (OBA – LgOBA)*PA*LgR/O)/PA

Note that this is equivalent to:

R+/PA = LW_RC/PA + (OBA - LgOBA)*LgR/O

In order to compute the associated RAA:

RAA = (R+/PA – LgR/PA)*PA

This works as of course the league average R+/PA is equal to the league average R/PA by definition.

Here are the top and bottom five hitters from the 1994 AL:

The first column for RAA is RAA based on R+/PA, and it is an exact match for LW_RAA. Thus R+/PA meets the necessary condition for being an acceptable rate stat. However, so did R/O. We need to test that it produces the same rank order as LW_RAA/PA beyond these players; a good place to start would be with the cohort of four players who were jumbled in order when looking at RAA/PA and R/O:

Here we get a match in rank order. Of course, looking at fourteen players and finding consistency in their ranking doesn’t prove that there would never be a player for whom the two would not agree. We will need some other approach to demonstrate that.

Let’s leave R+/PA for now and go back to the other approach we were considering, which was making an adjustment to RAA/PA that would allow it to produce meaningful ratio comparisons. The obvious solution is to add the league average R/PA to RAA/PA to get a modified absolute R/PA. After all, RAA/PA  represents an individual’s contribution above average per PA, taking the impact of PA generation into account. Adding back in the league average R/PA will return this to an absolute R/PA basis, that will equal raw R/PA at the league level by definition, and that unlike LW_RC/PA will capture the value of extra PA generated by the batter. We know that by definition it will match RAA, since the difference between an individual and the league average will be equal to RAA/PA (RAA/PA + LgR/PA – LgR/PA).

Here are the top and bottom 5, with “mod R/PA” being the metric we’re discussing (LW_RAA/PA + LgR/PA):

If these figures look familiar, it’s because (with minor rounding discrepancies), they are equal to the R+/PA figures we just calculated. That’s right – all of that algebra to calculate extra plate appearances or outs avoided, and multiply by the league average R/PA or R/O, add back to LW_RC – we could have dispensed with all of it, and just added LgR/PA to LW_RAA/PA.

This is our “proof” that R+/PA as we built it from adjusted absolute runs created meets our criteria for a proper rate stat in a linear weights framework – it’s equivalent to using RAA/PA. The two parallel approaches are not in fact parallel – they are alternative ways of getting to the same place.

To demonstrate why the math works out this way, let’s return to our two linear weights equations from part 1:

LW_RC = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .1076(outs)

LW_RAA = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs)

If we think about everything on a per plate appearance basis and building R+/PA using the RAA/PA + LgR/PA path, what we’re doing is taking the second equation and adding the league average R/PA for every event that corresponds to a plate appearance, which is all of them (in a full blown implementation, where we would be considering non-PA events both in the linear weights formula and the calculation of the PA generation impact, this wouldn’t be as clean). Since the league R/PA is .1363, this results in:

R+ = .6432S + .9745D + 1.3058T + 1.6332HR + .4858W - .1788(outs)

Alternatively, from the Sibelius approach, the plus component can be manipulated to read:

(OBA – LgOBA)*PA*LgR/O = ((H+W)/PA – LgOBA)*PA*LgR/O = (H + W – LgOBA*PA)*LgR/O

Continuing to treat everything on a per PA basis, this is (H + W – LgOBA)*LgR/O

Thus any hit or walk contributes (1 – LgOBA)*LgR/O = LgR/PA = .1363, and each out “contributes” (0 – LgOBA)*LgR/O = -LgOBA*LgR/O = -.3433*.2075 = -.0712. Adding .1363 to the LW_RC value for each event results in the same weights we just found (i.e. .6432 for a single), as the weights for on base events are the same between the LW_RC and LW_RAA equations. The out is now worth -.1076 + -.0712 = -.1788 runs, also the same value. We knew from the results that the two approaches must be equivalent, but I always prefer a “proof” when possible.

At this juncture we reach a philosophical question: should we dispense with the notion of absolute runs created for individuals altogether, and replace it with a R+ approach? In other words, instead of our default way of discussing an individual’s contribution being something like: “Frank Thomas created 131 runs in 508 plate appearances while making 258 outs”, and presenting RC, the rate stat, and some metric compared to a baseline, we could cut to the chase and say “Frank Thomas contributed 147 runs in 508 plate appearances”, with the rate and baselined metric. This approach does away with the need to explicitly think about outs, since their special impact beyond plate appearances has been built in to the 147 R+. It also prevents the awkward situation of having a runs created figure of 131 but no handy denominator with which to convert it to a rate stat without doing a R+ or RAA calculation first.

There is no right answer, although there certainly is a good case to be made for just reporting Thomas’ runs created as 147, since it limits the likelihood of user-initiated bad math. Call it inertial reasoning if you must, but I still like the idea that the runs created figure is the batter’s direct contribution, and the secondary impact of PA generation is captured when looking at the rate or the baselined stat.