Monday, February 25, 2008

Beating a Dead Horse, pt. 4

In the (finally!) conclusion to this series, I’m going to look at a couple of other rate stats that while not nearly as common as OPS, are amazingly common in their “invention”, if not in their use.

I refer to the multitude of statistics based on the number of bases a player has accounted for, divided either by his plate appearances or by his outs made. I am going to present a list of proposed stats based on this concept. Almost all of the ones I will list have been published in books or in SABR publications; it would be impossible to document all of the times that these types of metrics have been floated on message boards and other similar outlets. What is incredible to me about the reoccurrence of this idea is that most of the “inventors” seem to be utterly clueless that their great idea has been proposed countless times before. Barry Codell is the only one out there who can take credit for actually coming up with something new.

1) Codell published a piece on his Base-Out Percentage in a late 1970s SABR Baseball Research Journal:

BOP = (TB + W + HB + SB + SH + SF)/(AB - H + CS + SH + SF + DP)

2) Thomas Boswell, a columnist for the Washington Post, published his Total Average in Inside Sports:

TA = (TB + W + HB + SB)/(AB + W + HB + SB + CS)

3) Thomas Boswell quickly revised his TA, making it for all intents and purposes a knockoff of BOP:

TA = (TB + W + HB + SB)/(AB - H + CS + DP)

4) Breaking out of chronological order, Paul Adel posted on the internet about his Offense Ratio:

OR = (TB + W)/(AB - H)

5) Leo Leahy in his book Lumber Men introduced Bases to Out Ratio:

BTOR = (TB + W)/Outs

6) Lawrence Tenbarge, wrote in the 1996 SABR Baseball Research Journal about his Earned-Base Average:

EBA = (TB + W)/PA

7) Apparently unrelated, John McCarthy had his own EBA in his book Baseball’s All-Time Dream Team:

EBA = (TB + W + SB - CS)/(AB + W)

8) Stephen Grimble, in a book called Setting the Record Straight: Baseball’s Greatest Batters, used Base Production Average as a secondary measure:

BPA = (TB + W + SB - CS)/(AB + W)

9) Bill Gilbert has posted on the internet for many years now about Bases per Plate Appearance:

BPA = (TB + W + HB + SB - CS - DP)/(AB + W + HB + SF)

10) In their 1994 book, Essential Baseball, Norm Hitzges and Dave Lawson published TOPR (thanks to "Karmadrome" for the info, and see his comment below for additional details on Hitzges and Lawson's work):

TOPR = (TB + W + HB + SH + SF + SB - CS - DP)/(AB - H + SH + SF + CS + DP)

11) In the 2000 Baseball Research Journal, Mark Kanter wrote about New Production:

NewProd = (TB + W + HB + INT)/(AB + W + HB + INT + SH + SF)

12) I wrote this article sometime in late December or early January, as I am wont to do. Two weeks ago, another one emerged from Greg Raleigh at Dugout Central, “Bases Per Out”. This is amusing in its own right since that is the site where Mike Pagliarulo throws out barbs at sabermetricians. Anyway:

BPO = (TB + W + HB + SB + SF + SH)/(AB - H + CS + SF + SH + DP)

I documented all that not because it’s important, but because I find it amusing. None of these measures (with the exception of TA, which was carried in Total Baseball, first for all players and in later editions just for season and all-time leaders) ever caught on, and so countless people “discovered” them. Boswell himself is delusional about the history of the idea, as evidenced in a 2005 chat:

“What matters is that ONE of the "new" stats created in the early '80--Total Average, Runs Created, OPS, etc--all of which were different versions of my TA (which was first) has gotten general acceptance.”

Since TA was preceded by BOP in the bases/outs ratio format, Boswell’s claim is absurd on its face, but of course Bill James introduced RC in 1979 (the same year as Boswell, and presumably in the spring as well), and Pete Palmer developed OPS sometime in the 1970s--although I’m not sure when he first published it, it was before 1979. Also, RC is essentially a repackaging of Earnshaw Cook’s Scoring Index (1964) or Dick Cramer’s BRA (mid-1970s), or Dick Cramer’s BWA (mid-1970s as well). Boswell didn’t even invent bases/out, let alone beat sabermetricians to developing advanced measures of offensive performance.

Is there really anything worth arguing about, though? Are there multiple people taking credit for developing New Coke? We’ll examine the two variations on the idea, one per PA and one per out. Since we are comparing with OPS, I’m only going to consider categories that OPS considers, which means we can define our two stats, which I’ll call BPA and TA, pretty easily:

BPA = (TB + W)/(AB + W)
TA = (TB + W)/(AB - H)

Just looking at them, you can see right off the bat that walks are weighted equally to singles, which we know is incorrect, and that a homer will be worth four times a single, twice a double, … Neither the faulty total base relationship nor the walk = single contention has a counterweight, which is not the case in OPS, where the faulty total base relationship present in SLG is offset by the equal treatment of all on base events in OBA, and the equality of OBA is offset by SLG.

It should come as no surprise, then, that when we look at equations to predict runs from these stats that they are not as accurate as those derived from OBA and SLG:

aR/P = 1.51(aBPA) - .52 RMSE = 27.12
aR/P = 1.19(aTA) - .20 RMSE = 24.95
aR/O = 1.78(aBPA) - .78 RMSE = 31.10
aR/O = 1.42(aTA) - .42 RMSE = 25.66

Total Average does better than BPA at predicting both R/PA and R/O, so we’ll remove BPA from the discussion at this point. When we use TA as a run estimator to predict R/PA, or as a stand alone rate stat (which in part one of this series I contended obligates one to consider the R/O relationship), what are we saying about the values of the offensive events?

Knowing that the TA for our data as a whole was .658, we can work on both equations, starting with R/O, which can be rewritten as:

Runs = (1.42*aTA - .42)*(AB - H)*.172 = .371*TA*(AB - H) - .072*(AB - H)

There is no need to differentiate this, since by definition TA*(AB - H) = TB + W. So we have Runs = .371(TB + W) - .072(AB - H), which can be expanded out to:

LW = .37S + .74D + 1.11T + 1.48HR + .37W - .072(AB - H)

So TA severely underweights singles, while overweighting homers and walks.

Looking at the PA relationship:

Runs = (1.19*aTA - .20)*(AB + W)*.117 = .212*(TA*PA) - .0234*PA

Differentiating, we get:

LW = .212*(pTA + PA*dTA) - .0234p

Where dTA = (O*b - B*o)/O^2, where O = outs, B = bases (TB + W), o = 1 for an out, b = total base weight of any event, and 1 for a walk as well. For our dataset, this results in:

LW = .43S + .74D + 1.06T + 1.37HR + .43W - .09(AB - H)

Here, the hit values are more reasonable, but the single is still undervalued, and the .06 run bump it received also is parceled out to the walk, compounding that issue.

Total Average and all of its cousins are just too simplistic to provide a good model of offense. The idea of working from bases is not a bad one; while runs are the most fundamental unit to work with, bases are highly correlated with runs. But the problem is that TA focuses only on the bases that are gained by the batter, and not by the impact that the batter has on the baserunners. There is no difference between a single and a walk if the bases are empty; it is the fact that singles often advance runners by two bases, or advance runners when walks do not force them, that makes singles more valuable. TA disavows this basic knowledge, and thus the weights used do not reflect the true value of offensive events. If you could account for all of the bases that a player is responsible for (or estimate them), then you would have a much better foundation from which to evaluate them.

So please, please, stop “inventing” these statistics, unless you are twelve years old, in which case you may have a pretty good future as a sabermetrician.

5 comments:

  1. I can shed a little light on the "two guys in the 90's named Hitzges and Lawson." Norm Hitzges is a somewhat-famous sports personality from Dallas who called baseball games for ESPN for a bit. Dave Lawson is a professional stat-guy type (who, in the interest of full disclosure, is also my father).

    You're correct that TOPR is very similar to total average. The formula is:

    (TB + BB + HP + SH + SF + SB - CS - DP) / (AB - H + SH + SF + CS +DP)

    Not exactly revolutionary, but it's a pretty solid example of the type. Now, making it a little more useful is that fact that they indexed everything to league average, as in (Player's TOPR) / (League TOPR) * 100. So, a player with a 105 TOPR is 5% more productive than the average hitter in the league. Then, they broke the offense out by segment (on base component, power, walks, strikeouts, etc) and indexed that against the league average. So,that same player with the 105 TOPR might have only have a 95 on base component, a 110 power, and 150 strikeout rate.

    Nothing there is revolutionary in terms of accurately rating offense, but it's a nice, useful tool for getting snapshots of what large numbers of players have actually done relative to the league.

    Anyway, where it really gets interesting is with the pitchers. They use the exact same formula for rating the pitchers. This makes the formula completely reflective. If a team has a 102 TOPR for a season, then opposing pitchers has a 102 TPER (that's what the pitching element is called). In 1993, the idea of using the exact same method to grade hitters and pitchers was pretty unusual, so I'd call this one a genuine innovation.

    Anyway, the book these gents produced was called "Essential Baseball 1994", which, if you'll recall, was a pretty unfortunate time to try to get started in baseball publishing. I can assure you that TOPR and TPER are still alive and well.

    ReplyDelete
  2. Thanks; I will update the post to include the information. I was not aware of details, and given your explanation of them, I agree that their approach had some originality to it, to an extent that most of the other methods listed here did not.

    ReplyDelete
  3. Small bibliographic note:
    The first versions of Earnshaw Cook's Scoring Index [DX] go back to 1964, not 1971. He published Percentage Baseball in 1964, with a 2nd edition in 1966 [I have that edition]. In fact, as with James's Runs Created, Cook offered several slight variations of the formula, one of which is a close antecedent of basic runs created: (percentage of hits + percentage of walks) x (percentage of total bases). In essence the same calculation as basic runs created except that the denominator was PA-squared instead of PA, so it is essentially runs per PA instead of runs.

    You probably depended on Palmer's summary in Hidden Game of Baseball, p.45, in which he says, accurately but also misleadingly, that Cook's scoring index "did not appear in a form intelligible to the layman until the appearance of ... Percentage Baseball and the Computer (1971)."
    It is true that the presentation in Percentage Baseball is not reader-friendly, but the formulas do date to his earlier book.

    ReplyDelete
  4. Thanks Joe. But I can't blame Palmer and Thorn for that--I myself have read both books, a few years ago now, and for some reason I was thinking that DX did not appear in the first one.

    ReplyDelete
  5. I just came across another one; this one actually predates Codell and Boswell (it's PA-based, though, not outs). In the November 1977 Baseball Digest, someone named "C. Maher" (I only have the citation from an academic paper--no first name) wrote an article titled "Batting Average: A True Gauge of a Hitter's Value?". His stat was called Offensive Average:

    OA = (TB + W + SB)/(AB + W)

    ReplyDelete

Comments are moderated, so there will be a lag between your post and it actually appearing. I reserve the right to reject any comment for any reason.