In the (finally!) conclusion to this series, I’m going to look at a couple of other rate stats that while not nearly as common as OPS, are amazingly common in their “invention”, if not in their use.
I refer to the multitude of statistics based on the number of bases a player has accounted for, divided either by his plate appearances or by his outs made. I am going to present a list of proposed stats based on this concept. Almost all of the ones I will list have been published in books or in SABR publications; it would be impossible to document all of the times that these types of metrics have been floated on message boards and other similar outlets. What is incredible to me about the reoccurrence of this idea is that most of the “inventors” seem to be utterly clueless that their great idea has been proposed countless times before. Barry Codell is the only one out there who can take credit for actually coming up with something new.
1) Codell published a piece on his Base-Out Percentage in a late 1970s SABR Baseball Research Journal:
BOP = (TB + W + HB + SB + SH + SF)/(AB - H + CS + SH + SF + DP)
2) Thomas Boswell, a columnist for the Washington Post, published his Total Average in Inside Sports:
TA = (TB + W + HB + SB)/(AB + W + HB + SB + CS)
3) Thomas Boswell quickly revised his TA, making it for all intents and purposes a knockoff of BOP:
TA = (TB + W + HB + SB)/(AB - H + CS + DP)
4) Breaking out of chronological order, Paul Adel posted on the internet about his Offense Ratio:
OR = (TB + W)/(AB - H)
5) Leo Leahy in his book Lumber Men introduced Bases to Out Ratio:
BTOR = (TB + W)/Outs
6) Lawrence Tenbarge, wrote in the 1996 SABR Baseball Research Journal about his Earned-Base Average:
EBA = (TB + W)/PA
7) Apparently unrelated, John McCarthy had his own EBA in his book Baseball’s All-Time Dream Team:
EBA = (TB + W + SB - CS)/(AB + W)
8) Stephen Grimble, in a book called Setting the Record Straight: Baseball’s Greatest Batters, used Base Production Average as a secondary measure:
BPA = (TB + W + SB - CS)/(AB + W)
9) Bill Gilbert has posted on the internet for many years now about Bases per Plate Appearance:
BPA = (TB + W + HB + SB - CS - DP)/(AB + W + HB + SF)
10) In their 1994 book, Essential Baseball, Norm Hitzges and Dave Lawson published TOPR (thanks to "Karmadrome" for the info, and see his comment below for additional details on Hitzges and Lawson's work):
TOPR = (TB + W + HB + SH + SF + SB - CS - DP)/(AB - H + SH + SF + CS + DP)11) In the 2000 Baseball Research Journal, Mark Kanter wrote about New Production:
NewProd = (TB + W + HB + INT)/(AB + W + HB + INT + SH + SF)
12) I wrote this article sometime in late December or early January, as I am wont to do. Two weeks ago, another one emerged from Greg Raleigh at Dugout Central, “Bases Per Out”. This is amusing in its own right since that is the site where Mike Pagliarulo throws out barbs at sabermetricians. Anyway:
BPO = (TB + W + HB + SB + SF + SH)/(AB - H + CS + SF + SH + DP)
I documented all that not because it’s important, but because I find it amusing. None of these measures (with the exception of TA, which was carried in Total Baseball, first for all players and in later editions just for season and all-time leaders) ever caught on, and so countless people “discovered” them. Boswell himself is delusional about the history of the idea, as evidenced in a 2005 chat:
“What matters is that ONE of the "new" stats created in the early '80--Total Average, Runs Created, OPS, etc--all of which were different versions of my TA (which was first) has gotten general acceptance.”
Since TA was preceded by BOP in the bases/outs ratio format, Boswell’s claim is absurd on its face, but of course Bill James introduced RC in 1979 (the same year as Boswell, and presumably in the spring as well), and Pete Palmer developed OPS sometime in the 1970s--although I’m not sure when he first published it, it was before 1979. Also, RC is essentially a repackaging of Earnshaw Cook’s Scoring Index (1964) or Dick Cramer’s BRA (mid-1970s), or Dick Cramer’s BWA (mid-1970s as well). Boswell didn’t even invent bases/out, let alone beat sabermetricians to developing advanced measures of offensive performance.
Is there really anything worth arguing about, though? Are there multiple people taking credit for developing New Coke? We’ll examine the two variations on the idea, one per PA and one per out. Since we are comparing with OPS, I’m only going to consider categories that OPS considers, which means we can define our two stats, which I’ll call BPA and TA, pretty easily:
BPA = (TB + W)/(AB + W)
TA = (TB + W)/(AB - H)
Just looking at them, you can see right off the bat that walks are weighted equally to singles, which we know is incorrect, and that a homer will be worth four times a single, twice a double, … Neither the faulty total base relationship nor the walk = single contention has a counterweight, which is not the case in OPS, where the faulty total base relationship present in SLG is offset by the equal treatment of all on base events in OBA, and the equality of OBA is offset by SLG.
It should come as no surprise, then, that when we look at equations to predict runs from these stats that they are not as accurate as those derived from OBA and SLG:
aR/P = 1.51(aBPA) - .52 RMSE = 27.12
aR/P = 1.19(aTA) - .20 RMSE = 24.95
aR/O = 1.78(aBPA) - .78 RMSE = 31.10
aR/O = 1.42(aTA) - .42 RMSE = 25.66
Total Average does better than BPA at predicting both R/PA and R/O, so we’ll remove BPA from the discussion at this point. When we use TA as a run estimator to predict R/PA, or as a stand alone rate stat (which in part one of this series I contended obligates one to consider the R/O relationship), what are we saying about the values of the offensive events?
Knowing that the TA for our data as a whole was .658, we can work on both equations, starting with R/O, which can be rewritten as:
Runs = (1.42*aTA - .42)*(AB - H)*.172 = .371*TA*(AB - H) - .072*(AB - H)
There is no need to differentiate this, since by definition TA*(AB - H) = TB + W. So we have Runs = .371(TB + W) - .072(AB - H), which can be expanded out to:
LW = .37S + .74D + 1.11T + 1.48HR + .37W - .072(AB - H)
So TA severely underweights singles, while overweighting homers and walks.
Looking at the PA relationship:
Runs = (1.19*aTA - .20)*(AB + W)*.117 = .212*(TA*PA) - .0234*PA
Differentiating, we get:
LW = .212*(pTA + PA*dTA) - .0234p
Where dTA = (O*b - B*o)/O^2, where O = outs, B = bases (TB + W), o = 1 for an out, b = total base weight of any event, and 1 for a walk as well. For our dataset, this results in:
LW = .43S + .74D + 1.06T + 1.37HR + .43W - .09(AB - H)
Here, the hit values are more reasonable, but the single is still undervalued, and the .06 run bump it received also is parceled out to the walk, compounding that issue.
Total Average and all of its cousins are just too simplistic to provide a good model of offense. The idea of working from bases is not a bad one; while runs are the most fundamental unit to work with, bases are highly correlated with runs. But the problem is that TA focuses only on the bases that are gained by the batter, and not by the impact that the batter has on the baserunners. There is no difference between a single and a walk if the bases are empty; it is the fact that singles often advance runners by two bases, or advance runners when walks do not force them, that makes singles more valuable. TA disavows this basic knowledge, and thus the weights used do not reflect the true value of offensive events. If you could account for all of the bases that a player is responsible for (or estimate them), then you would have a much better foundation from which to evaluate them.
So please, please, stop “inventing” these statistics, unless you are twelve years old, in which case you may have a pretty good future as a sabermetrician.