Tuesday, June 16, 2020

Preoccupied With 1985: Linear Weights and the Historical Abstract

I stumbled across this unpublished post while cleaning up some files – it was not particularly timely when written about ten years ago, and is even less timely now. Unlike some other old pieces I find, though, I don’t know why I never published it, other than maybe redundancy and beating a dead horse. I still agree with the opinions I expressed, and it is well above the low bar required for inclusion on this blog.

The original edition of Bill James’ Historical Baseball Abstract, published in 1985, is my favorite baseball book, and I am far from the only well-read baseball aficionado who holds it in such high regard. It contains a very engaging walk through each decade in major league history, some interesting material on rating players (including what has to be one of the first explicit discussions of peak versus career value in those terms), ratings of the best players by position and the top 100 players overall, and career statistics for about 200 all-time greats which seem like nothing in the internet age but at the time represented the most comprehensive collation on those players.

However, there is one section of the book which does not hold up well at all. It really didn’t hold up at the time, but I wasn’t in a position to judge that. James reviews The Hidden Game of Baseball, published the previous year by John Thorn and Pete Palmer, and gives his thoughts about the Linear Weights system.

James’ lifelong aversion to linear weights is somewhat legendary among those of us who delve deeply into these issues, but the discussion in the Historical Abstract is the source of the river, at least in terms of James’ published material. For years, James’ thoughts colored the perception of linear weights by many consumers of sabermetric research. This is no longer the case, as many people interested in sabermetrics twenty-five years later have never read the original book, and linear weights have been rehabilitated and widely accepted through the work of Mitchel Lichtman, Tom Tango, and now many others.

So to go back thirty years later and rake James’ essay over the coals is admittedly unfair. You may choose to look at this as gratuitous James-bashing if you please; that is not my intent, but I won’t protest any further than this paragraph. I think that some of the arguments James advances against linear weights are still heard today in different words, and occasionally you will still see a reference to the article from an old Runs Created diehard. And if one can address the concerns of the Bill James of 1985 on linear weights, it should go a long way in addressing the concerns of other critics.

It should be noted that James on the whole is quite complementary of The Hidden Game and its authors. I will be focusing on his critical comments on methodology, and so any excerpts I use will be of the argumentative variety and if taken without the disclaimer could give the wrong impression of James’ view of the work as a whole.

The first substantive argument that James offers against Palmer’s linear weights (in this case, really, the discussion is focused on the Batting Runs component) is their accuracy. The formula in question is:

BR = .46S + .80D + 1.02T + 1.40HR + .33(W + HB) + .3SB - .6CS - .25(AB - H) - .5(OOB)

As you know, Palmer’s formula uses an out value that returns an estimate of runs above average rather than absolute runs scored (in which case it would be somewhere around -.1). The formula listed by Palmer fixes the out value at -.25, but it is explained that the actual value is to be calculated for each league-season. James notes this, but then ignores it in using the Batting Runs formula to estimate team runs scored. To do so, he simply adds the above result to the league average of runs scored per team for the season. He opines that the resulting estimates are “[do] not, in fact, meet any reasonable standard of accuracy as a predictor of runs scored.”

And it’s true--they don’t. This is not because the BR formula does not work, but rather because James applied it incorrectly. As he explains, “For the sake of clarity, the formula as it appears above yields the number of runs that the teams should be above or below the league average; when you add in the league average, as I did here, you should get the number of runs that they score.”

This seems reasonable enough, but in fact it is an incorrect application of the formula. The correct way to use a linear weights above average formula to estimate total runs scored is to add the result to the league average runs/out multiplied by the number of outs the team actually made.

This can be demonstrated pretty simply by using the same league-seasons (1983, both leagues) that James uses in the initial test in the Historical Abstract. If you use the BR formula using -.25 as the out weight and simply add the result to the league average runs scored (in each respective league), the RMSE is 29.5. Refine that a little bit by adding in the number of outs each team made multiplied by the respective league runs/out (but still using -.25 as the out weight), the RMSE improves to 29.3. The James formula that uses the most comparable input, stolen base RC, has a RMSE of 24.4, and you can see why (in this limited sample; I’m certainly not advocating paying much heed to accuracy tests based on one year of data, and neither was James) he thought BR was less accurate. But had he applied the formula properly, by figuring custom out values for each league (-.255 in the AL and -.244 in the NL) and adding the resulting RAA estimate to league runs/out times team outs, he would have gotten a RMSE of 18.7.

In fairness to James, the authors of The Hidden Game did not do a great job in explaining the intricacies of linear weight calculations. The book is largely non-technical, and nitty-gritty details are glossed over. The proper method to compute total runs scored from the RAA estimate is never exactly explained, nor is the precise way to calculate the out value specific to a league-season (while it’s a matter of simple algebra, presenting the formula explicitly would have cleared up some confusion). To do a fair accuracy test versus a method like Runs Created, which does not take into account any data on league averages, you would also need to calculate the -.1 out value over a large sample and hold it constant, which Thorn and Palmer did not do or explain. In addition, the accuracy test was not as well-designed as it could have been, although that wouldn’t have had much of an impact on the results for Batting Runs or Runs Created, but rather for rate stats converted to runs.

James then goes on to explain the advantage that Batting Runs has in terms of being able to hone in on the correct value for runs scored, since it is defined to be correct on the league level. He is absolutely correct (as discussed in the preceding paragraph) that this is an unfair advantage to bestow in a run estimator accuracy test; however, it is also demonstrable that even under a fair test, Batting Runs and other similar linear weight methods acquit themselves nicely and are more accurate than comparable contemporary versions of Runs Created.

In the course of this discussion, James writes “What I would say, of course, is that while baseball changes, it changes very slowly over a long period of time; the value of an out in the American League in 1987 will be virtually identical with the value of an out in the American League in 1988.” This turned out to be an unfortunate future example for James since the AL averaged 4.90 runs/game in 1987 but just 4.36 in 1988. James’ point has merit--values should not jump around wildly for no reason other than the need to minimize RMSE--but the Batting Runs out value does not generally behave in a matter inconsistent with simply tracking changes in league scoring.

James’ big conclusion on linear weights is: “I think that the system of evaluation by linear weights is not at all accurate to begin with, does not become any more accurate with the substitution of figures derived from one season’s worth of data…Linear weights cannot possibly evaluate offense for the simplest of reasons: Offense is not linear.”

He continues “The creation of runs is not a linear activity, in which each element of the offense has a given weight regardless of the situation, but rather a geometric activity, in which the value of each element is dependent on the other elements.” James is correct that offense is not linear and that the value of any given event is dependent on the frequency of other events. But his conclusion that linear weights are incapable of evaluating offense is only supported by his faulty interpretation of the accuracy of Batting Runs. While offense is not linear, team offense is restricted to a narrow enough range that linear methods can accurately estimate team runs scored.

More importantly, James fails to recognize that while offense is dynamic, a poor dynamic estimator (such as his own Runs Created) is not necessarily (and in fact, is not) going to perform better than a linear weight method at the task of estimating runs scored. He also does not consider the problems that might be inherent in applying a dynamic run estimator directly to an individual player’s batting line, when the player is in fact a member of a team rather than his own team. Eventually, he would come to this realization and begin using a theoretical team version of Runs Created (which is one of the many reasons this criticism of his thirty-five year old essay can be viewed as unfair).

Much of the misunderstanding probably could have been avoided had Batting Runs been presented as absolute runs rather than runs above average. Palmer has never used an absolute version in any of his books, but of course many others have used absolute linear weight methods. One of the more prominent is Paul Johnson’s Estimated Runs Produced, which was brought to the public eye when none other than Bill James published Johnson's article in the 1985 Abstract annual.

Johnson’s ERP formula was dressed up in a way that made it plain to see that it was linear, but did not explicitly show the coefficient for each event as Batting Runs did. Still, it remains almost inexplicable that an analyst of James’ caliber did not see the connection between the two approaches, as he was writing two very different opinions on the merits of each nearly simultaneously.

James also applies his broad brush to Palmer’s win estimation method, saying that if you ask the Pythagorean method “If a team scores 800 runs and allows 600, how many games will they win?”, it gives you an answer (104), while “the linear weights” says “ask me after the season is over.”

The use of the phrase “wait until the season is over” is the kind of ill-conceived rhetoric that seems out of place in a James work but would be expected in a criticism of him by a clueless sportswriter. Any metric that compares to a baseline or includes anything other than the player’s own performance (such as a league average or a park factor) is going to see its output change as that independent input changes. That goes for many of James’ metrics as well (OW% for instance).

To the extent that the criticism has any validity, it should be used in the context of Batting Runs, since admittedly Palmer did not explain how to use linear weights to figure an absolute estimate of runs in the nature of Runs Created. To apply it to Palmer’s win estimator (RPW = 10*sqrt(runs per inning by both teams)) simply does not make sense. The win estimator does not rely on the league average; it accounts for the fact that each run is less valuable to a win as the total number of runs scored increases, but it doesn’t require the use of anything other than the actual statistics of the team and its opponents. (Of course, when applied to an individual player’s Batting Runs it does use the league average, which again is no different conceptually than many of James’ methods.) The Pythagorean formula with a fixed exponent has the benefit (compared to a linear estimator, even a dynamic one) of restricting W% to the range [0, 1], but it also treats all equal run ratios as translating to equal win ratios.

James concludes his essay by comparing the offensive production of Luke Easter in 1950 and Jimmy Wynn in 1968. His methods show Easter creating 94 runs making 402 outs and Wynn creating 91 runs making 413 outs, while Batting Runs shows Easter as +29 runs and Wynn +26.

James goes on to point out that the league Easter played in averaged 5.04 runs per game, while Wynn’s league averaged 3.43, and thus Wynn was the far superior offensive player, by a margin of +37 to +18 runs using RC. “Same problem--the linear weights method does not adapt to the needs of the analysis, and thus does not produce an accurate portrayal of the subject.”

In this case, James simply missed the disclaimer that the out weight varies with each league-season. While it makes sense to criticize the treatment of the league average as a known in testing the accuracy of a run estimator, it doesn’t make any sense at all to criticize using it when putting a batter’s season into context. Of course, James agrees that context is important, as he converts Easter and Wynn’s RC into baselined metrics in the same discussion.

When Batting Runs is allowed to calculate its out value as intended, it produces a similar verdict on the value of Easter and Wynn. In Total Baseball (using a slightly different but very much same in spirit Batting Runs formula), Palmer estimates Wynn at +38 and Easter at +14, essentially in agreement with from James’ estimate of +37 and +18. The concept of linear weights did not fail; James’ comprehension of it did. It doesn’t matter if that happened because Palmer and Thorn’s explanation wasn’t straightforward (or comprehensive) enough, or whether James just missed the boat, or a combination of both. Whatever the reason, the essay “Finding the Hidden Game, pt. 3” is not a fair or accurate assessment of the utility of linear weight methods and stands as the only real blemish on as good of a baseball book as has ever been written.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.