Wednesday, August 04, 2021

Rate Stat Series, pt. 9: Theoretical Teams

We now depart the orderly, neat world of linear weights for the frontiers of offensive evaluation/rate stat development. Allow me to posit that there are three ways in which a batter impacts his team’s run scoring:

1. Through the direct, immediate consequences of his actions (e.g. he draws a walk or flies out). We could call this his primary contribution.

2. Through how those results create or fail to create additional opportunities for his teammates to bat (what I have been calling PA generation). We could call this his secondary contribution (I do so with some reservations because I do like secondary average, which uses “primary” to refer to direct contributions captured batting average and “secondary” to refer to other direct contributions like extra bases on hits, walks, and steals).

3. Through how his impact on the team alters the value of the actions of his teammates. This tertiary effect is hard to define, but we know that the run value of any offensive event is dependent on the context in which it occurs. A walk does no good if no one else in the lineup gets on base; each out is more costly in terms of runs in a higher scoring environment. Dynamic run estimators vary the value of each event based on the frequencies of all offensive events, while linear weights keep them fixed.

I listed and labeled these three elements of offensive production in the order of their magnitude; the third is very small, small enough that it is often ignored. Crucially for this discussion, it is small enough that if we are not careful, in attempting to measure it we could cause more unintended distortion with respect to the evaluation of #1 and #2 so as to make the exercise not just a waste of time, but actively harmful to our understanding.

So far in this series, we have looked at individual offense through two frameworks. Treating the player as a team by plugging his stats directly into a dynamic run estimator, we have captured (but distorted) #1 and #2 and given excessive weight to #3 by pretending as if the 8 teammates all perform at the level of the individual in question. By using linear weights, we have treated the player as if he was part of a semi-static environment where his direct actions and PA generation have an impact on his team, but that no matter how he performs, it has no impact on the offensive environment in which the other eight batters perform.

I believe that a third framework, which captures the impact of all three ways in which a batter affects team runs scored, is theoretically superior to the other approaches. This will involve modeling a team with and without our player – constructing a “theoretical team” in which eight members of the lineup perform at a given level and our player occupies one lineup spot. However, there are cautions, which I alluded to above:

1. The math becomes more complicated. As long as increased complexity corresponds to a more sound approach from a theoretical perspective, this is not objectionable to me, but that’s a minority viewpoint.

2. The impact of #3 is very small relative to #1 and #2, and is arguably negligible, especially when we consider all of the error bars that exist around run estimation, park factors, positional adjustments, and the myriad other variables which will come into play when the estimates are put to full use as part of an overall player evaluation system.

3. If the model which you use to implement this framework is poor, the distortions created when compared to a linear weights framework will swamp the attempt to measure the minuscule impact of #3. Even if your model is good (and I will be using Base Runs and I am quite confident that it is a good model), the linear weights framework is so robust that there is still some risk in abandoning it to chase capturing very small effects.

My original series on rate stats failed on this count, as I begged the question by assuming that a theoretical team approach was correct and using that as one of the testing criteria for other metrics. Again, I believe that the framework is theoretically correct, but the implementation is trickier, and I am not so arrogant today to believe that the model and my implementation are unquestionably superior to using a linear weights framework. To return to a bad and wildly overwrought nautical metaphor, linear weights provide a safe harbor with calm waters in which it is tempting to stay and not venture on to high seas where theoretical team frameworks tempt with the promise of riches but tempests and other dangers lurk.

Before starting, I want to note a handful of people who made significant contributions to the theoretical team concept. One is David Tate, who developed Marginal Lineup Value, which used the framework of basic runs created in conjunction with a theoretical team. Keith Woolner refined and popularized MLV. In 1998, Bill James published the approach that I will use here, although of course he used runs created. Published a year later, Jim Furtado’s Extrapolated Wins methodology used a linear run estimator (his XR) but fleshed out theoretical team concepts with respect to win impact and replacement level. Furtado also, along with G. Jay Walker and Don Malcolm, took apart James' theoretical team RC to understand what was going on behind the scenes. Finally, David Smyth, the developer of Base Runs, was the first to apply a TT construct to BsR and also developed the PAR adjustment which we’ll get to eventually.

Finally, before diving in to the specific implementation of TT I will use in this series, I want to note that by “theoretical team”, I am referring only to constructs that explicitly attempt to place the player on a theoretical/”reference” team, and use a dynamic run estimator to estimate his run impact. It does not refer to other approaches that may be undertaken to apply a dynamic run estimator to an individual hitter. One such example is the technique, so far as I know first used by Dick Cramer with his runs created-like run estimator, of calculating a batter’s runs created as the difference between the league with his stats and he league without them. This is a clever approach for using a dynamic run estimator in evaluating individuals, but not a TT approach. In fact, it more closely resembles the approach we used in this series to develop linear weights from Base Runs. The larger you make the pool to which the player is added, the more you dilute his impact. The differentiation approach takes this to the limit (see what I did there?) by isolating each event and calculating its value if it had no impact at all on the offensive environment.

In contrast, a TT approach uses a realistic scale between the individual and team; a typical approach is to assume that the individual gets 1/9 of team plate appearances. Using a 1/8 ratio between player and reference team does not require us to believe that the player actually had 1/9 of his team’s PA in the real world. One could use a player’s actual percentage of team PA and weight accordingly, but there is a balancing act: one one hand, we want to accurately capture the degree to which the batter impacted the team, but we also don’t want to lose sight of where his impact is actually felt. Consider a batter who plays in just one game in the season, getting four plate appearances. If you use his actual percentage of team PA  (which might be something like 0.05%) to calculate his impact on the team, he will have had essentially no tertiary effect. That is a distortion of reality, though – he really had something closer to 11.1% of the team’s PA, in the game in which he actually played. From the perspective of evaluating his impact on the team, the other 161 games are an accounting fiction, no more relevant to him than to games between other teams played thirty years prior (in fact, we should acknowledge that runs are actually scored at the inning level, which is where we started working out the math on PA generation).

So we will assume that the reference team always has eight times as many plate appearances as the player in question (which of course is equivalent to saying the player gets 1/9 of team PA). We could get cute and recognize that based on a player’s batting order position, his expected share of PA will change, and give different players a different share (while still limiting the scope to games/innings in which the batter actually played), but 1/9 is clean and any alternative approach would leave most batters pretty close to 1/9. The concept is simple; the formula will look a little messy. We start with our Base Runs equation:

A = H + W – HR = S + D + T + W

B = (2TB - H – 4HR + .05W)*.79776 = .7978S + 2.3933D + 3.9888T + 2.3933HR + .0399W

C = AB – H = Outs

D = HR

BsR = (A*B)/(B + C) + D

We will start by calculating the team’s runs with the player. This will take the same form, but now our A, B, C, and D components will start with the player’s stats and add eight reference players. I will assume that the reference player is a league average performer, and thus the reference team is a league average team prior to the addition of our player. One could make the case that with respect to the tertiary effect of a player, linear weights framework sidestep the issue by assuming an inverse relationship between the quality of the player in question and the quality of the reference team. That is, by using static linear weights for all players regardless of their performance, a linear weights framework implicitly assumes that the team is average after the player is added. Thus Frank Thomas is added to a worse team than Matt Walbeck, such that at the end of the day the run values of all events are the same between the Thomas team and the Walbeck team. 

If you are tempted to sweat the details and subtract the player’s stats from the league before determining league average, don’t. It is actually surprising how little impact the choice of reference team has on the outcome (which a cynic might note is a reason for suspecting that the tertiary effect is de minimis, but what’s the fun in that?) This is why James is able to get away with using a single final formula for converting the player’s A, B, and C factors in Runs Created (for which he laid out 24 different versions to cover major league history) to TT RC by using just one equation. It’s not technically correct, of course, but as long as long as the reference team is within a reasonable range of major league offense, it’s not debilitating.

Without our player, the reference team will have a number of plate appearances equal to eight times the individual’s PA, and will perform at the league average, so we can define each factor for the reference team as follows, with the calculation using the 1994 AL averages shown:

R_A = Lg(A/PA)*PA*8 = .3143*PA*8 = 2.514PA

R_B = Lg(B/PA)*PA*8 = .3402*PA*8 = 2.722PA

R_C = Lg(C/PA)*PA*8 = .6567*PA*8 = 5.254PA

R_D = Lg(D/PA)*PA*8 = .0290*PA*8 = .232PA

Then for the team with the player, the team versions of the A, B, C, and D factors are just the player’s factor plus eight times his PA times the league average of the factor/PA:

T_A = A + R_A = A + 2.514PA

T_B = B + R_B  = B + 2.722PA

T_C = C + R_C = C + 5.254PA

T_D = D + R_D = D + .232PA

In order to isolate the individual’s impact, we need to calculate how many runs his new theoretical team would score and subtract the runs that the reference team would have scored with just eight reference players. The team’s BsR will be:

T_BsR = T_A*T_B/(T_B + T_C) + T_D

Some of the PA terms  in the denominator can be combined, so for the 1994 AL we get:

T_BsR = (A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D + .232PA

The reference team’s run scored will be equal to the league average BsR/PA times 8 times the player’s PA; to calculate league BsR/PA we can just plug the league average A, B, C, and D factors per PA into the BsR equation to get BsR/PA, then multiply:

R_BsR = (.3143*.3402/(.3402 + .6567) + .0290)*8*PA = 1.090PA

So our estimate of the individual’s run contribution to the theoretical team, which we’ll call Theoretical Team Base Runs (TT_BsR) is just the difference:

TT_BsR = T_BsR – R_BsR

Since we have PA in each term, for the 1994 AL it simplifies to:

TT_BsR = (A + 2.514PA)*(B + 2.722PA)/(B + C + 7.975PA) + D - .8579PA

If we apply Frank Thomas’ statistics directly to Base Runs, we get an estimate of 139.0. If we use Base Runs to estimate linear weight coefficients for the league, we get 131.4 (what we’ve been calling LW_RC in this series). If we use the TT approach, we get 132.2. As you can see, the TT estimate is not that much different than the full linear estimate, which does call into question the need for the TT approach. After all, Thomas is one of the most extreme hitters in the league; if he barely moves the needle, who will?

Regardless of the utility of this approach, I find it useful as an intellectual exercise because I believe the framework is the closest to approximating the real relationship between an individual batter and team performance. For a series ostensibly about rate stats, I’ve spent an entire post just setting up the numerator; don’t rate stats typically have a denominator as well? Seriously, though, if there’s one takeway I would like a reader to glean from this series, it is that if you want to set up an offensive evaluation system, you need to think through all of the pieces as you develop it. Starting with a run estimator, and then slapping on a rate state, and a baseline, and whatever bells and whistles you need, is not a sound approach. The choice of run estimator determines which denominator you should use, and the two should be compatible. 

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.