Wednesday, June 23, 2021

Rate Stat Series, pt. 6: Rate Stats for Linear Weights I

Last time, I attempted to demonstrate that linear weights producing a result in terms of runs above average properly account for the value of extra plate appearances created by a batter. From here on, I will talk about this contention in the same way as I would any scientifically-demonstrated fact, even though I admit that I have not provided a “proof” in the mathematical sense. The casual use of language is intended to prevent wasting space repeating myself rather than an attempt to claim a more robust result than is appropriate.

Since we have demonstrated that LW_RAA captures the player’s direct and indirect contributions to team offense (at least within the linear framework of metrics), I would contend that it follows that any linear weights-based rate stat we should propose must return the same RAA as we would get from eschewing a rate stat altogether and just applying our linear weights formula with the “-.3 type out value”.  One very simple way to do that is to simply use RAA, rather than a measure of absolute runs created, as the numerator for the rate stat. In this case, the obvious denominator is plate appearances.

Using outs in the denominator would be appropriate for a team metric, but for an individual would overstate the value of his PA generation. For a simple demonstration of this, recall from part three the equation:

PA/G = (O/G)/(1 – OBA)

and the equivalent PA = O*(1 – OBA), given our simple definitions which as a refresher are PA = AB + W, O = AB – H, and OBA = (H + W)/(AB + W)

This direct relationship between plate appearances, outs, and OBA means that if we were to use outs as the denominator rather than PA, we would be inflating the rates for players with higher OBAs, even though we’ve already demonstrated that the PA generation impact of higher OBAs is captured in the numerator (RAA).

Here were the top and bottom 5 performers in terms of RAA/PA, which for ease of use I have restated as RAA/650 PA (This works out to 152.5 full games for a player getting 1/9 of an average 1994 AL team’s PA; sadly, these teams didn’t get anywhere close to playing a 162 game season. There’s no special significance to 650 other than that it is a round number that is reasonably close to the number of PA a full-season batter might accumulate):

You may have noticed that the identity and order of these players did not change from when we used a fully dynamic approach that treated them as if they were their own teams (BsR/O). This recalls a simple truth that I am burying beneath thousands of words of minutia – any reasonable approach will reach similar conclusions for the majority of situations. Even a poorly designed, indefensible metric like OPS will get you 98% of the way there. This series is about the tiny differences that lie beyond that point and the larger differences that arise in values as opposed to rank order.

To wit, while the rank order remains the same, the RAA values are different using linear weights, and with the exception of Griffey, less extreme. All of the other players have moved closer to average, with Frank Thomas losing a whopping 7.7 RAA due to using a linear approach rather than imagining an entire lineup of Big Hurts.

At this point, we could stop, and simply use RAA/PA or RAA/650 PA as our final linear weights-grounded rate stat. However, it lacks the very useful trait of ratio comparability that would be desirable in the ideal rate stat. While linear differences in RAA/PA can be compared (e.g. Thomas contributed an additional .081 RAA/PA beyond what Chili Davis did), the ratios are not particularly useful. Consider two players who each had 600 PA, one contributing +1 RAA and the other -1 RAA. If you can explain the practical baseball interpretation of the resulting ratio of -1, be my guest.

This happens because we have applied the high baseline of average to the metric. However, we have another version of linear weights that is based on absolute runs from which we could build a rate stat. Remember that in order to qualify for consideration, the RAA that results from that rate stat must be equivalent to the RAA from simply applying the linear weights formula and not comparing a rate to the league average.

The obvious first choice is Runs/PA, using our formula for linear weights runs created. In this case, I am not showing the leaders, but rather the same ten players in the same order. RAA in this case is (LW_RC/PA – LgR/PA)*PA:

The results are not even close to what we need, and the reason is simple: we have not accounted in any way for each batter’s PA generation. There is an easy alternative that might correct this: using outs in the denominator. This just takes us to the correct team rate stat, although the numerator is linear rather than dynamic (either in the case of using Base Runs or using actual runs scored on the team level). As such it does implicitly consider PA generation; might it produce satisfactory results for individuals? Here RAA = (LW_RC/O – LgR/O)*O:

A perfect match. Absolute runs per out produces the same RAA as the direct application of LW_RAA. R/O also has the advantages of being meaningfully comparable as both a difference and a ratio and is the same as the correct team rate stat. Everything is great, except we haven’t answered the question: Does it actually work?

Remember, I said that matching LW_RAA was a necessary condition for our proposed alternative linear weight rate stat to meet – it is not a sufficient condition. We’ve already concluded that RAA/PA is a proper rate stat for a linear framework; in order for an alternative to be acceptable, it must produce results that are consistent with RAA/PA. How do we determine this consistency, other than matching the RAA result, when the numerators and the denominators each start from a different basis? 

One simple but obvious way to determine if they are consistent is to confirm that they result in the same rank order of players. They did for our most extreme hitters, but does that hold for all hitters? 

Apparently not. I didn’t have to go to far to find this little cluster of hitters as they rank 7-10 in RAA/PA. None of them rank in the same spot as Vaughn is first in RAA/650 but second in RC/O; Lofton is second/third; Mack is third/fourth; and Clark is fourth/first. 

I slipped OBA onto the chart because it helps to explain what is going on. Will Clark’s .433 OBA ranked fifth among AL hitters with 200 PA; using outs in the denominator helps him as it implicitly assumes that he represents an entire team with a .433 OBA. While all of these hitters had excellent OBAs relative to the league, R/O goes a bit too far in valuing this. Given that R/O still produces the right RAA, any distortions have to be somewhat limited for normal players.

One can come up with extreme thought experiments, like a player with a .999 OBA all from walks versus a player with a .600 OBA, all from home runs. A team made up of the former would score what would certainly feel to the opposing pitching coach like a nearly infinite number of runs; but as a single player in a lineup, his impact would be muted. It’s not necessary to answer the thought experiment as to which would be more productive to see that R/O applied to extreme individual players would break down. Incidentally, it is exactly this type of scenario that I got into trouble trying to “prove” in my last attempt at this series – while I do think that the theoretical team methods I’ll discuss later provide reasonable estimates for this situation, relying too heavily on them for proofs is begging the question. 

So, we have a rate stat (LW_RAA/PA) that works, but it lacks ratio comparability. There are (at least) two ways we could go about solving this problem:

1. We could adjust LW_RC/PA in some way to take into account the value of PA generation

2. We could manipulate LW_RAA/PA so that it’s no longer a measure of runs above average, but instead on an absolute runs basis

Next time I’ll explore these two parallel questions...if in fact they are parallel at all.

Wednesday, June 09, 2021

Rate Stat Series, pt. 5: Linear Weights Background

Linear methods sidestep the issues that arise from applying dynamic run estimators to players by simply ignoring any non-linearity in the run scoring process altogether. While this is clearly technically incorrect, it is closer to reality than pretending that a player’s performance interacts with itself. Since an individual makes up only 1/9 of a lineup, it is much closer to reality to pretend that his performance has no impact on the run environment of his team than to pretend that it defines the run environment of his team. Linear weights also have the advantage of being easy to work with, easy to adapt to different baselines, and easy to understand and build. Their major drawback is that the weights are subject to variation due to changes in the macro run environment (as distinguished from the marginal change to the run environment attributable to an individual player). 

Linear methods were pioneered by FC Lane and George Lindsey, but it was Pete Palmer who used them to develop an entire player evaluation system, publish historical results, and bring them into the position of the chief rival to Runs Created in the 1980s. Curiously (especially since Palmer is a prolific and brilliant sabermetrician whose pioneering work includes park factors, variable runs per win, using the negative binomial distribution to model team runs per game, and more), Palmer’s player evaluation system as laid out in The Hidden Game of Baseball and later Total Baseball and the ESPN Baseball Encyclopedia never bothered to convert its offensive centerpiece Linear Weights into a rate statistic.

This gap contributed to two developments that I personally consider unfortunate. First, confusion about how to convert linear weights to a rate may have hampered the adoption of the entire family of metrics, and this confusion generally persisted until the publication of The Book by Tom Tango, Mitchel Lichtman, and Andy Dolphin. Second, Palmer did offer up a rate stat, but he did not tie it to linear weights, or in its crudest form to any meaningful units at all. That’s because Normalized OPS (later called Production), which you may know as OPS+, was the rate stat coupled with linear weights batting runs.

To my knowledge, Palmer has never really explained why he didn’t derive a rate stat from linear weights to use; the explanations have instead focused on the ease and reasonable accuracy of OPS. In The Hidden Game, the discussion of linear weights transitions to OPS with “For those to whom calculation is anathema, or at least no pleasure, Batter Runs, or Linear Weights, has a ‘shadow stat’ which tracks its accuracy to a remarkable degree and is a breeze to calculate: OPS, or On Base Average Plus Slugging Percentage.”

Coincidentally, Palmer recently published an article in the Fall 2019 Baseball Research Journal titled “Why OPS Works”, which covers a lot of the history of his development of linear weights and OPS, but still doesn’t explain exactly why a linear weights rate wasn’t part of the presentation.

Without the brilliant mind of Palmer to guide us, where should we turn for a proper linear weights-based rate stat? To answer that question, I think it’s necessary to briefly examine how linear weights work. For this discussion, I am taking for granted that the empirical derivation of linear weights is representative of all linear weight formulas. This is not literally true, as belied by the fact that the linear weights I’m using in this series were derived from Base Runs, not from empirical data. If we were using an optimized Base Runs formula, the resulting weights would be very close to empirical weights derived for a similar offensive environment, but other approaches to calculating linear coefficients like multiple regression can deviate significantly from the empirical weights. Even so, the final results are similar enough that the principles hold for reasonable alternative linear weight approaches.

What follows will be elementary for those of you familiar with linear weights, but let’s walk through a sample inning featuring the star of our series, Frank Thomas. I want to use this example to illustrate two properties of linear weights when using the “-.3 type out value” (i.e. when the result is runs above average): the conservation of runs, and the constant negative value of outs. This example will simplify things slightly, as in reality not every event in the inning cleanly maps to a batting event that is included in a given linear weights formula (e.g. wild pitches, balks, extra bases on errors, etc.) It also will presume that the run expectancy table we use for the example corresponds perfectly to our linear weights, which it does not. Still, the principles are generally applicable to properly constructed linear weights methods, even if the weights were derived from other run expectancy tables or, as is the case for us in this series, by another means altogether (I’m using the intrinsic weights derived from Base Runs for the 1994 AL totals).

Baseball Prospectus has annual run expectancy tables; their table for the 1994 majors is:

On July 18, Chicago came to bat in the bottom of the seventh trailing Detroit 9-5. Their run expectancy for the inning was .5545 as Mike LaValliere stood in against Greg Cadaret. He drew a walk, which raised the Sox RE to .9543, and thus was worth .3998 runs. The rest of the inning played out as follows:

1. If we were going to develop empirical LW coefficients based on this inning, we would conclude that a home run was worth 2.658 runs on average, and thus our linear weight coefficient for a home run would be 2.658. The other events would be valued:

This is in fact how empirical LW are developed, but of course a much larger sample size (typically at least an entire league-season) is used.

2. The team’s runs above average for the inning is always conserved. We started the inning with the bases empty and nobody out for a RE of .5545. This is the same as saying that the average for an inning is .5545 runs scored. The White Sox actually scored 4 runs, and the total of the linear weight values of the plays was 3.4455 runs, which is 4 - .5545. They scored 3.4455 runs more than an average team would be expected to in an inning. The sum of the linear weight values will always match this.

Because of this, we can be assured that the run value of additional plate appearances created by the positive events of the batters has been taken into account in the linear weight values. If this were not the case, runs would not be conserved.

3. Since that is true, it is also true that the sum of the LW values of the positive events (which is 4.8128 runs) plus the sum of the LW values of the outs (-1.3673) must be equal to the runs above average for the inning (3.4455). The sum of the values of the outs will be higher in innings in which more potential runs were “undone” by outs, as was the case here. On the other hand, an inning in which three outs are recorded in order will result in -.5545 runs.

We can use this fact to isolate the run value of the out between the portion that is due to ending the inning (what Tom Tango has called the “inning killer” effect of the out; this is the -.5545 that is the minimum out value for an inning), and that which is due to wasting the run potential of the positive events (what’s left over, in this case, -.8128 runs). 

If we wish to convert our linear weights from an estimator of runs above average to an estimator of absolute runs, we need to back out the inning killer value of the out (since it will be present for every inning equally and serves to conserve total RAA) from the overall value of the out, leaving the remainder which we do not need to worry about as it would have to be debited from the value of the positive events in order to conserve runs.

So we can take -.5545/3 = .1848 and add it back to the linear weight RAA out value, which for our example was -.3150. This results in an absolute out run value of -.1302, In our example we’re using -.1076; these don’t reconcile because:

1. our linear weights don’t consider all events (we’re ignoring hit batters, sacrifices, all manner of baserunning outs, etc.)

2. our linear weights weren’t empirically derived from the 1994 RE table as the .1848 adjustment was

While the numbers don’t (and shouldn’t!) balance perfectly in this case, this is the theoretical bridge for converting empirical linear weights from a RAA basis to an absolute runs basis. I would also contend is serves as a demonstration by inductive reasoning that absolute linear weights do not capture the PA generation impact of avoiding outs, but RAA linear weights do.

Note that converting to the “-.1 type out value” does not eliminate the result of negative runs altogether. An offensive player who is bad enough will be credited with negative runs created (if it helps you to imagine what this level of production might look like, consider that the total offensive contributions of pitchers has hovered near zero absolute runs created in the last decade). For real major league position players, this will not happen except due to sample size. If you’d like an interpretation, I have found this helpful (I stole it from someone, probably Tom Tango, and have badly paraphrased): Since linear weights fix the values of each event for all members of the team, the level at which runs created are negative is the level at which in order to conserve team runs, the weights of positive events cannot be reduced – the poor batter essentially undoes some of the positive contributions of his teammates.

As an aside, the first paper I’m aware of that made the connection between the two linear weight approaches in this manner (rather than simply solving algebraically for the difference between the two without providing theoretical underpinning) was published by Gary Skoog in a guest article in the 1987 Baseball Abstract. This article, titled “Measuring Runs Created: The Value Added Approach” is available at Baseball Think Factory.