Wednesday, August 18, 2021

Rate Stat Series, pt. 10: Rate Stats for the Theoretical Team Framework I

In calculating TT_BsR for a batter, we have taken into account both his primary and tertiary impact on the offense, but we have neglected to address his secondary impact – that is, the value of the additional plate appearances he generates for his team by avoiding outs. There’s a relatively simple way to apply an adjustment for this using the framework for TT_BsR we’ve already developed. David Smyth called this adjustment PAR for Plate Appearance Ratio, and it is based on the same logic about how PAs are generated that we have relied on many times.

PAR is equal to the ratio of the theoretical team’s plate appearances to the plate appearances a league average team would have had. Remember that:

PA/G = (O/G)/(1 – OBA)

O/G is a constant that we set at the league level – I will call it X in the algebra that follows. We need to know the OBA of the theoretical team; since our player in question gets 1/9 of the PA and the rest of the team is assumed to be league average, this is very simply:

T_OBA = 1/9*OBA + 8/9*LgOBA

Then T_PA/G  = X/(1 - (1/9*OBA + 8/9*LgOBA)) = X/(1 – 1/9*OBA - 8/9*LgOBA), while the league PA/G will be X/(1 – LgOBA). The ratio between the two will be:

(X/(1 – 1/9*OBA – 8/9*LgOBA))/(X/(1 – LgOBA)) = (1 – LgOBA)/(1 – 1/9*OBA – 8/9*LgOBA) = PAR

Since Frank Thomas had a .4921 OBA and the league average was .3433, his PAR is:

(1 - .3433)/(1 – 1/9*.4921 – 8/9*.3433) = 1.0258

This means that a theoretical team on which the Big Hurt had an equal share of the PA would end up generating 2.58% more PA than a league average team. 

In order to take Thomas’ secondary contribution into account, we can return to the definitions from the last installment and calculate:

TT_BsRP = T_BsR*PAR – R_BsR

PAR is only applied to T_BsR (the base runs estimate for the theoretical team with Thomas) because the reference team, filled with league average players, will continue to have the same number of PA as before (which we’ve set to equal eight times Thomas’ PA). Filling in those terms for the 1994, the formula is:

TT_BsRP = ((A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D + .232PA)*PAR – 1.090PA

Note that we can no longer combine the D term from T_BsR with the R_BsR term as the former also needs to be inflated by PAR (Thomas’ teammates will hit more homers in those extra 2.58% PA they now enjoy).

Applying PAR increases Thomas’ TT_BsR from 132.2 to 149.9, a significant increase. This figure is more comparable to his wRC (147.1) than to other runs created estimates we’ve examined, as it’s already taken into account the value of his secondary contributions.

You may note that there is the potential of some circularity here, as we are using Thomas’ actual PA as the starting point, but Thomas’ actual PA already inherently include his real secondary contribution to the 1994 White Sox. That is to say that some of the 508 PA that Thomas actually recorded were made possible by his own generation of PA for that team. This is a good argument for using a theoretical number of PA for Thomas rather than his actual PA. Thomas recorded 508 of Chicago’s 4439 PA, or 11.44%. So we could instead use 11.44% of the league average team PA total (4366.9), in which case he would have 499.7 restated PA to plug into the Theoretical Team methodology (this is ignoring that his contribution to the White Sox also had an impact on the league average PA). Of course, in so doing we would also have to proportionally scale back his portion of the T_A, T_B, T_C, and T_D components by 499.7/508. 

On the other hand, the secondary contribution of a batter through generating PA is in the background of the linear framework as well (and any other framework that considers his actual PA), it’s just that the connection leaps to the mind more quickly when modeling the other aspects of a theoretical team. I’m going to ignore this going forward, as this is after all a rate stat series, and also note that we shouldn’t ignore the fact that a batter can benefit from the additional opportunities he helps to create. The fact that the quality of his teammates influences how many opportunities he gets in the real world is at some level unavoidable.

At this point, we should also express Thomas’ contribution in terms of RAA. This is a simple modification; instead of setting R_BsR equal to the league average BsR/PA times 8 times the player’s PA, we would just need to multiply by 9 times the player’s PA so that the lineup isn’t magically shortened and instead we compare T_BsR to what a team would score with an average player in our man’s place. I did not bother running this before introducing PAR, because if there’s one thing we’ve learned from this series is that it doesn’t make a lot of sense to talk about batter RAA without taking out rate into account. So with PAR for the 1994 AL have:

TT_RAA = ((A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D + .232PA)*PAR – 1.226PA

We now have three possible theoretical team approaches, and have yet to address the question of this series: what should the rate form be? The guiding principle of this series has been that the properties of the numerator (usually a run estimate) should be logically consistent with the choice of denominator, so we should consider each of the three theoretical team approaches separately.

First is TT_BsR, which is just an estimate of the batter’s impact on the team runs scored, taking into account primary and tertiary (but not secondary) impacts. It is akin to LW_RC, with the key difference being that LW_RC does not attempt to value the batter’s tertiary impact. However, I contend that incorporating tertiary contributions does not alter the considerations when developing a rate stat. The tertiary effect is how the batter’s performance changes the underlying run environment of the team, independently of the change in plate appearances. What we are left which is an estimate of the contribution the batter made in his actual plate appearances – the only difference is that we recognize that those outcomes influenced the value of all of the other offensive events recorded by the team.

So our choices for a rate stat are the same as those for LW_RC. We can first calculate RAA (using R/O), and then take RAA/PA, or we can calculate how many additional PA the batter generated/outs he avoided, add those to his TT_BsR, and divide by PA (the R+/PA) approach. These approaches will be equivalent if we add back in LgR/PA to RAA/PA, we could convert to wOBA, we could calculate wRC along the way...all the same options.

The math will be the same as shown in parts 7 and 8, except we will substitute TT_BsR for LW_RC everywhere it pops up. Here is a leaderboard for some of the key metrics using LW_RC (we’ve seen this all before):

Now the same metrics, except substituting TT_BsR for LW_RC in all calculations:

This was a lot of work to get largely the same results. Maybe applying PAR will make things more interesting? 

Wednesday, August 04, 2021

Rate Stat Series, pt. 9: Theoretical Teams

We now depart the orderly, neat world of linear weights for the frontiers of offensive evaluation/rate stat development. Allow me to posit that there are three ways in which a batter impacts his team’s run scoring:

1. Through the direct, immediate consequences of his actions (e.g. he draws a walk or flies out). We could call this his primary contribution.

2. Through how those results create or fail to create additional opportunities for his teammates to bat (what I have been calling PA generation). We could call this his secondary contribution (I do so with some reservations because I do like secondary average, which uses “primary” to refer to direct contributions captured batting average and “secondary” to refer to other direct contributions like extra bases on hits, walks, and steals).

3. Through how his impact on the team alters the value of the actions of his teammates. This tertiary effect is hard to define, but we know that the run value of any offensive event is dependent on the context in which it occurs. A walk does no good if no one else in the lineup gets on base; each out is more costly in terms of runs in a higher scoring environment. Dynamic run estimators vary the value of each event based on the frequencies of all offensive events, while linear weights keep them fixed.

I listed and labeled these three elements of offensive production in the order of their magnitude; the third is very small, small enough that it is often ignored. Crucially for this discussion, it is small enough that if we are not careful, in attempting to measure it we could cause more unintended distortion with respect to the evaluation of #1 and #2 so as to make the exercise not just a waste of time, but actively harmful to our understanding.

So far in this series, we have looked at individual offense through two frameworks. Treating the player as a team by plugging his stats directly into a dynamic run estimator, we have captured (but distorted) #1 and #2 and given excessive weight to #3 by pretending as if the 8 teammates all perform at the level of the individual in question. By using linear weights, we have treated the player as if he was part of a semi-static environment where his direct actions and PA generation have an impact on his team, but that no matter how he performs, it has no impact on the offensive environment in which the other eight batters perform.

I believe that a third framework, which captures the impact of all three ways in which a batter affects team runs scored, is theoretically superior to the other approaches. This will involve modeling a team with and without our player – constructing a “theoretical team” in which eight members of the lineup perform at a given level and our player occupies one lineup spot. However, there are cautions, which I alluded to above:

1. The math becomes more complicated. As long as increased complexity corresponds to a more sound approach from a theoretical perspective, this is not objectionable to me, but that’s a minority viewpoint.

2. The impact of #3 is very small relative to #1 and #2, and is arguably negligible, especially when we consider all of the error bars that exist around run estimation, park factors, positional adjustments, and the myriad other variables which will come into play when the estimates are put to full use as part of an overall player evaluation system.

3. If the model which you use to implement this framework is poor, the distortions created when compared to a linear weights framework will swamp the attempt to measure the minuscule impact of #3. Even if your model is good (and I will be using Base Runs and I am quite confident that it is a good model), the linear weights framework is so robust that there is still some risk in abandoning it to chase capturing very small effects.

My original series on rate stats failed on this count, as I begged the question by assuming that a theoretical team approach was correct and using that as one of the testing criteria for other metrics. Again, I believe that the framework is theoretically correct, but the implementation is trickier, and I am not so arrogant today to believe that the model and my implementation are unquestionably superior to using a linear weights framework. To return to a bad and wildly overwrought nautical metaphor, linear weights provide a safe harbor with calm waters in which it is tempting to stay and not venture on to high seas where theoretical team frameworks tempt with the promise of riches but tempests and other dangers lurk.

Before starting, I want to note a handful of people who made significant contributions to the theoretical team concept. One is David Tate, who developed Marginal Lineup Value, which used the framework of basic runs created in conjunction with a theoretical team. Keith Woolner refined and popularized MLV. In 1998, Bill James published the approach that I will use here, although of course he used runs created. Published a year later, Jim Furtado’s Extrapolated Wins methodology used a linear run estimator (his XR) but fleshed out theoretical team concepts with respect to win impact and replacement level. Furtado also, along with G. Jay Walker and Don Malcolm, took apart James' theoretical team RC to understand what was going on behind the scenes. Finally, David Smyth, the developer of Base Runs, was the first to apply a TT construct to BsR and also developed the PAR adjustment which we’ll get to eventually.

Finally, before diving in to the specific implementation of TT I will use in this series, I want to note that by “theoretical team”, I am referring only to constructs that explicitly attempt to place the player on a theoretical/”reference” team, and use a dynamic run estimator to estimate his run impact. It does not refer to other approaches that may be undertaken to apply a dynamic run estimator to an individual hitter. One such example is the technique, so far as I know first used by Dick Cramer with his runs created-like run estimator, of calculating a batter’s runs created as the difference between the league with his stats and he league without them. This is a clever approach for using a dynamic run estimator in evaluating individuals, but not a TT approach. In fact, it more closely resembles the approach we used in this series to develop linear weights from Base Runs. The larger you make the pool to which the player is added, the more you dilute his impact. The differentiation approach takes this to the limit (see what I did there?) by isolating each event and calculating its value if it had no impact at all on the offensive environment.

In contrast, a TT approach uses a realistic scale between the individual and team; a typical approach is to assume that the individual gets 1/9 of team plate appearances. Using a 1/8 ratio between player and reference team does not require us to believe that the player actually had 1/9 of his team’s PA in the real world. One could use a player’s actual percentage of team PA and weight accordingly, but there is a balancing act: one one hand, we want to accurately capture the degree to which the batter impacted the team, but we also don’t want to lose sight of where his impact is actually felt. Consider a batter who plays in just one game in the season, getting four plate appearances. If you use his actual percentage of team PA  (which might be something like 0.05%) to calculate his impact on the team, he will have had essentially no tertiary effect. That is a distortion of reality, though – he really had something closer to 11.1% of the team’s PA, in the game in which he actually played. From the perspective of evaluating his impact on the team, the other 161 games are an accounting fiction, no more relevant to him than to games between other teams played thirty years prior (in fact, we should acknowledge that runs are actually scored at the inning level, which is where we started working out the math on PA generation).

So we will assume that the reference team always has eight times as many plate appearances as the player in question (which of course is equivalent to saying the player gets 1/9 of team PA). We could get cute and recognize that based on a player’s batting order position, his expected share of PA will change, and give different players a different share (while still limiting the scope to games/innings in which the batter actually played), but 1/9 is clean and any alternative approach would leave most batters pretty close to 1/9. The concept is simple; the formula will look a little messy. We start with our Base Runs equation:

A = H + W – HR = S + D + T + W

B = (2TB - H – 4HR + .05W)*.79776 = .7978S + 2.3933D + 3.9888T + 2.3933HR + .0399W

C = AB – H = Outs

D = HR

BsR = (A*B)/(B + C) + D

We will start by calculating the team’s runs with the player. This will take the same form, but now our A, B, C, and D components will start with the player’s stats and add eight reference players. I will assume that the reference player is a league average performer, and thus the reference team is a league average team prior to the addition of our player. One could make the case that with respect to the tertiary effect of a player, linear weights framework sidestep the issue by assuming an inverse relationship between the quality of the player in question and the quality of the reference team. That is, by using static linear weights for all players regardless of their performance, a linear weights framework implicitly assumes that the team is average after the player is added. Thus Frank Thomas is added to a worse team than Matt Walbeck, such that at the end of the day the run values of all events are the same between the Thomas team and the Walbeck team. 

If you are tempted to sweat the details and subtract the player’s stats from the league before determining league average, don’t. It is actually surprising how little impact the choice of reference team has on the outcome (which a cynic might note is a reason for suspecting that the tertiary effect is de minimis, but what’s the fun in that?) This is why James is able to get away with using a single final formula for converting the player’s A, B, and C factors in Runs Created (for which he laid out 24 different versions to cover major league history) to TT RC by using just one equation. It’s not technically correct, of course, but as long as long as the reference team is within a reasonable range of major league offense, it’s not debilitating.

Without our player, the reference team will have a number of plate appearances equal to eight times the individual’s PA, and will perform at the league average, so we can define each factor for the reference team as follows, with the calculation using the 1994 AL averages shown:

R_A = Lg(A/PA)*PA*8 = .3143*PA*8 = 2.514PA

R_B = Lg(B/PA)*PA*8 = .3402*PA*8 = 2.722PA

R_C = Lg(C/PA)*PA*8 = .6567*PA*8 = 5.254PA

R_D = Lg(D/PA)*PA*8 = .0290*PA*8 = .232PA

Then for the team with the player, the team versions of the A, B, C, and D factors are just the player’s factor plus eight times his PA times the league average of the factor/PA:

T_A = A + R_A = A + 2.514PA

T_B = B + R_B  = B + 2.722PA

T_C = C + R_C = C + 5.254PA

T_D = D + R_D = D + .232PA

In order to isolate the individual’s impact, we need to calculate how many runs his new theoretical team would score and subtract the runs that the reference team would have scored with just eight reference players. The team’s BsR will be:

T_BsR = T_A*T_B/(T_B + T_C) + T_D

Some of the PA terms  in the denominator can be combined, so for the 1994 AL we get:

T_BsR = (A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D + .232PA

The reference team’s run scored will be equal to the league average BsR/PA times 8 times the player’s PA; to calculate league BsR/PA we can just plug the league average A, B, C, and D factors per PA into the BsR equation to get BsR/PA, then multiply:

R_BsR = (.3143*.3402/(.3402 + .6567) + .0290)*8*PA = 1.090PA

So our estimate of the individual’s run contribution to the theoretical team, which we’ll call Theoretical Team Base Runs (TT_BsR) is just the difference:

TT_BsR = T_BsR – R_BsR

Since we have PA in each term, for the 1994 AL it simplifies to:

TT_BsR = (A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D - .8579PA

If we apply Frank Thomas’ statistics directly to Base Runs, we get an estimate of 139.0. If we use Base Runs to estimate linear weight coefficients for the league, we get 131.4 (what we’ve been calling LW_RC in this series). If we use the TT approach, we get 132.2. As you can see, the TT estimate is not that much different than the full linear estimate, which does call into question the need for the TT approach. After all, Thomas is one of the most extreme hitters in the league; if he barely moves the needle, who will?

Regardless of the utility of this approach, I find it useful as an intellectual exercise because I believe the framework is the closest to approximating the real relationship between an individual batter and team performance. For a series ostensibly about rate stats, I’ve spent an entire post just setting up the numerator; don’t rate stats typically have a denominator as well? Seriously, though, if there’s one takeway I would like a reader to glean from this series, it is that if you want to set up an offensive evaluation system, you need to think through all of the pieces as you develop it. Starting with a run estimator, and then slapping on a rate state, and a baseline, and whatever bells and whistles you need, is not a sound approach. The choice of run estimator determines which denominator you should use, and the two should be compatible.