Wednesday, May 26, 2021

Rate Stat Series, pt. 4: Players as Teams

 A dynamic run estimator is a run estimator that allows offensive events to interact with each other, such that the value of a given event is not fixed as would be the case in a linear weights formula (e.g. a single is worth .50 runs), but rather is dependent upon all of the other components of the batting line. Dynamic run estimators are great in theory, since the run scoring process for a team is obviously dynamic and not linear. However, there are two issues:

1. They are harder to design than linear estimators. Any idiot with a spreadsheet and a dataset can run a linear regression on runs scored and have a linear estimator when they are done. It may not be a good one, but it will be functional and will probably have a low RMSE when estimating team runs scored. To develop a dynamic model, one must consider the run scoring process and produce a simplified model, but not so simplified as to not produce reasonably accurate estimates.

This is not a series about run estimators, but the most commonly used dynamic run estimator, Bill James’ Runs Created, suffers from flaws that make it unable to handle extreme offenses. A much better model, David Smyth’s Base Runs, is powerful and will be used here.

2. They are not appropriate to apply to individual offensive statistics. Dynamic estimators always involve multiplying base runners by some factor representing advancement of baserunners (in Runs Created that’s the end of the story, Base Runs accounts for the unique nature of home runs). This multiplication is inappropriate when applied to an individual player, as now Frank Thomas’ high OBA is multiplied directly with his high power which advances runners. In reality, there is some interaction, but Thomas’ impact is diluted by being just 1/9th of the lineup. Inputting his statistics into a dynamic run estimator produces an estimate of how many runs a team would score if each batter hit like Thomas.

Due to this issue, I do not advocate applying dynamic run estimators directly to individuals, but this post will still address the rate stat implications of such applications. Later we will discuss theoretical team methods that allow the use of a dynamic run estimator while still accounting for the fact that the player is just one of nine in the lineup.

This series will now discuss what I believe to the be the proper rate stats for a particular framework for evaluating individual offense. One of my objectives is that for each option of a framework for building a rate stat presented, there be at least one variation that is linearly comparable and one that is ratio comparable. I’ve defined those terms as I use them at length before, so here I will be brief:

* A statistic is linearly comparable if the difference between two figures is meaningful. A hitter with a .400 OBA would reach base 100 times more than a hitter with a .300 OBA over 1000 PA.

* A statistic is ratio comparable if the ratio between two figures is meaningful. Our .400 OBA player reached base 33.3% more frequently than the .300 OBA player

Ideally, our metric will facilitate both types of comparison, but if not, I will endeavor to present an alternative formulation that fills the gap. I will not propose any metrics that are neither linearly comparable or ratio comparable because they are the scourge of sabermetrics (hello OPS).

The underlying principle of the discussion that follows for the three frameworks (treating the player as a team, a full linear model, and a theoretical team model) is that the rate stat should be consistent with the run estimator used. If the run estimator treats the player as if he is a team, then the corresponding rate stat should treat the player as if he is a team.

In this case, that makes it very simple. The proper denominator for a team rate stat is outs. If you apply Runs Created, Base Runs, or some other run estimator directly to an individual player, the proper denominator is outs.

At this point in the discussion, this may ring as a somewhat hollow declaration, as I have only indirectly made the case for why we might want to use a denominator other than outs for an individual when it is so clearly the proper choice for a team. Since I’m suggesting that outs are the proper choice for this framework, I’ll defer that case for later.

In this case, I advocate for using outs when applying a dynamic run estimator to a team because it is the only consistent treatment. The only justification for going down this path (other than needing something quick and dirty) is a theoretical exercise – how many runs would a team that hit like Frank Thomas score? While I don’t think this theoretical result is appropriate for attempting to value Thomas’ contribution the 1994 White Sox, it at least does have an interpretation. If you start mixing frameworks, you really have a mess on your hands. There’s no good reason (other than crude estimation) to apply a dynamic run estimator directly to an individual; there’s no sense in deviating from the corresponding rate stat in order to try to make the results more comparable to a better approach to evaluating individual offensive contribution. Just use the better approach, and if you insist on misapplying a dynamic run estimator to individual players, make outs the denominator so that at least you have a theoretically coherent suite of metrics.

I should note that Bill James in the 1980s took this entire process to its logical conclusion. After applying Runs Created to individuals, dividing by outs, and multiplying by a constant that was close to the league outs/game for the definition of outs chosen, he went a step further and used the Pythagorean theorem to estimate the winning percentage that this team would have if it allowed an average number of runs. He then converted it to wins and losses by using the number of outs the player made to define games, which caused all kinds of problems, but at least he was committed.

This will be the first of several times that I’ll run a leaderboard for the 1994 AL using a particular framework. Here we have the top 5 and bottom 5 performers with at least 200 PA in Base Runs/Out. RAA is “Runs Above Average” and is calculated simply as (BsR/O – LgR/O) * Outs. Spoiler alert: No matter how we slice it, Frank Thomas is going to come out as the leading hitter in this league, as he raked .353/.492/.729 on his way to a second consecutive MVP award.


I am showing at least one more decimal place on each metric than I usually would just to allow for a little more precise calculation if you’re following along; it is no way a statement about the significance of the ten-thousandths of runs per out. 

Runs per out can of course be scaled; Bill James multiplied it by the league average outs/game appropriate given the categories be considered in the computation of outs. For instance, in this case, since we’re defining outs as AB – H, the average outs/game will be around 25.2 (for the 1994 AL it was 25.19). A more complete accounting of outs, like AB – H + CS + SH + SF + DP, would get close to 27 outs/game. While putting individual contribution on a team games basis is nonsensical on some level, since it is just a scalar multiplier it causes no real distortion and provides a scale that is easily understandable, in the same manner that ERA or K/9 are understood by everyone other than Matt Underwood and Harold Reynolds. 

Wednesday, May 12, 2021

Rate Stat Series, pt. 3: Teams

If I tell you that three teams in the same league-season played the same number of games (113), and that one of them scored 679 runs, another scored 670, and the third scored 633, how confident would you be in using this limited data to rank the productivity of their offenses? As usual in this series, we are ignoring park factors and other contextual factors (like quality of opposition/not having to face one’s own pitching staff); since they are from the same league-season, you don’t need to worry about whether the win value of each team’s runs was the same. Assume also that runs will be distributed across games by a known distribution like Enby, so the distribution is also not a differentiator. Assume that we don’t care about any “luck”; the actual total is what matters, not what a run estimator came up with. What else do you need to know?

I would contend that given the (admittedly restrictive) parameters I’ve placed on the exercise, you now know almost everything you need to know. In a small number of cases, and to a small extent, you are missing valuable information – but for most situations, you should need no additional information.

Now suppose I told you something similar about three players: same league season, same number of games played (111), and three runs created estimates: one player created 106 runs, one 92, and one 88. Do you feel like you need any additional information to put these players in the proper order of offensive productivity?

I hope that your answer here is yes, and a lot of it. I’ve told you how many games each have played, but that doesn’t tell you how many opportunities they’ve had at the plate. Sure enough, in this case one of the players had substantially fewer plate appearances than the others (489, 490, 451 respectively). Given that the player who created 90 runs had 39 more plate appearances than the player who created 86, it seems likely that the latter player was actually more productive on a rate basis.

I did not tell you how many plate appearances each of the three teams had in their 113 games; I don’t think it’s relevant to the question at hand, but the answer is 4493, 4611, and 4556 respectively. Why do we need to know plate appearances (or something) in the case of players, but not in the case of teams? Understanding this gets to the heart of the reason this series needs to exist at all, why applying the same rate stat to team offenses and player offense may not work as intended.

In the previous installment, I asked the question: “Where do plate appearances come from?” The answer is that every inning (excluding walkoff situations) starts with three PAs guaranteed, and only by avoiding outs (reaching base and not being subsequently retired on the bases) can a team generate additional plate appearances.

From a team perspective, then, plate appearances are not an appropriate denominator for a rate stat, because differences in team plate appearances are the result of differences in performance between the teams. To return to the three teams discussed above, they are the 1994 Indians, Yankees, and White Sox respectively. The Indians had the fewest PA of the three yet scored the most runs. Does this mean that their offense, which already scored more runs than the other two clubs, was even more superior than the raw numbers would suggest?

An offense does not set out to maximize its plate appearances, nor does it set out to score the maximum number of runs it can in the minimum number of plate appearances. An offense sets out to maximize its total runs scored. Plate appearances are a function of the rate at which a team makes outs. At this point it might be helpful to consider the three teams:



New York’s OBA was 22 points higher than Cleveland’s and thus they generated an extra plate appearance per game. When ranking team offenses, it wouldn’t make sense to penalize the Yankees for this, which would be the case if we used R/PA. The difference in plate appearances simply reflects the different manner in which New York and Cleveland went about creating runs. For a team, plate appearances are inextricably linked with their OBA. Each inning, a team attempts to score as many runs as it possibly can before making three outs. It’s possible to score one run in a complete inning with as few as four or as many as seven plate appearances. Whether a team uses four, five, six, or seven plate appearances to score a single run is irrelevant in terms of that run’s impact on them winning or losing the game (*). Thus outs or an equivalent like innings are the correct choice for the denominator of a team rate stat.

(*) I am speaking here simply about the direct impact of the runs scored and not any downstream effects or the predictive value of team performance. Perhaps the team that uses seven PA to score one run benefits by wearing down the opposing pitcher or is more likely to have success in the future because they had four of seven batters reach base compared to one in four for the team that only needed four PA. Here we’re just focused on the win value directly attributable to the run scored and not any secondary or predictive effects.

The fact that outs are fixed for each team each inning (ignoring walkoffs) means that outs are also fixed for each team each game (ignoring walkoffs, rainouts, extra innings, and foregone bottom of the ninths). Which means that outs are also fixed for each team each season (ignoring those factors and cases in which teams don’t play out their full schedules, or have to play tiebreakers), which means that R/G and raw seasonal runs scored total are essentially equivalent to looking at R/O for a team. So for the question I asked at the beginning of the article, just knowing that the three teams had played an equal number of games, we had a pretty good idea how they would “truly” rank using R/O.

For players, this is not at all the case, since even in an equal number of games, players will get different numbers of plate appearances for a variety of reason (batting order position, the team’s OBA (remember, higher OBA teams will generate more PA), whether or not they play the full game), a fact that is intuitive to most baseball fans. What is less intuitive, though, is that even in the same number of plate appearances, players can make very different numbers of outs. Since we’ve already accepted that team OBA defines how many plate appearances a team will generate, it isn’t much of a leap to conclude that if we have two players who create the same number of runs (using a formula that doesn’t explicitly account for their impact on the team’s OBA) in the same number of plate appearances, the player who makes fewer outs was more productive when we consider the totality of their offensive contribution. Even though the two players were equally productive in their plate appearances, the player who made fewer outs generated more plate appearances for his teammates, a second-order effect that needs to be considered when evaluating individual offensive contribution. For teams, the runs scored total already reflects this effect.

This would be an appropriate time to note that this series is focused on evaluating offenses, but of course every offensive metric can be reviewed in reverse as a defensive metric. However, since the obvious denominator for teams is outs, it is also the obvious denominator for individual pitchers. We don’t need to worry about a pitcher’s impact on his team’s plate appearances – when he is in the game, he is solely responsible (setting aside the question of how the team’s performance should be allocated between the pitcher and his fielders) for the number of plate appearances the opponent generates, and his goal is to record three outs while minimizing the number of runs he allows, regardless of how many opponents come to the plate. Outs are clearly the correct denominator for the rate stat, and innings pitched are nothing more than outs/3 (and even better, IP account for all outs, including many that don’t show up in the standard statistical categories).

In thinking about the development of early baseball statistics and the legacy of those standard statistics on how the overwhelming majority of fans thought about baseball before the sabermetric revolution took hold, it is striking that the early statisticians understood these concepts as they applied to pitchers. When pitchers were completing almost all their starts, simple averages of earned runs allowed sufficed, for the same reason that team R/G tells you most everything you need to do. As complete games became rarer, ERA took hold, properly using innings in the denominator. For most of the twentieth century, and even post-sabermetric revolution, baseball fans are conditioned to think about innings pitched as the denominator for all manner of pitching metrics – even those like strikeout and walk frequency for which plate appearances would make a much more logical denominator. (Of course, present day sabermetrics has embraced metrics like K% and W% for pitchers, but the per inning versions remain in use as well).

The parallel development of offensive statistics resulted in the opposite phenomenon. While early box scores tracked “hands out” (essentially outs made) for individual batters, batting average eventually became the dominant statistic. Setting aside the issues with “at bats” and how they distort people’s thinking and saddled us with the mouthful of “plate appearances” to describe the more fundamental quantity of the two, the standard batting statistics have conditioned fans to think about batting rates (walk rate, home run rate, etc.) in the correct manner (or one adjacent to being correct, depending on whether at bats or plate appearances are the denominator), but leave people struggling with how to properly express a batter’s overall productivity. Again, this is the opposite problem of how pitching statistics were traditionally constructed. One can imagine that it all might be very different had the Batting Average taken the form of a hit/out ratio rather than hits/at bats.