Monday, February 11, 2008

Beating a Dead Horse, pt. 3

So far we have considered several different means of adding OBA and SLG together to measure offensive productivity. Now we will look at other means of combining OBA and SLG, namely multiplication.

OBA*SLG (OTS) is a relationship that has been known in sabermetrics for many years. It has been independently developed at least three times. The first was by Earnshaw Cook, who in Percentage Baseball and the Computer presented a model that was essentially OBA*SLG. Then Dick Cramer developed his Batter’s Run Average, which was published in SABR’s Baseball Research Journal at some point in the 1970s; BRA actually was defined as OBA*SLG. Cramer later developed this into Batter’s Win Average, which was based around a model of BRA*PA. The third is the most famous, the Runs Created formula of Bill James, originally written as (H + W)*TB/(AB + W). Given the assumptions about calculating OBA that I have been making throughout this series, RC is precisely equal to OBA*SLG*AB.

I am not going to go too in-depth in discussing the flaws of these methods, because I and many others have noted the flaws of Runs Created elsewhere. Briefly, RC attempts to model run scoring, but fails to take into account many logical properties that such a formula should have (Base Runs addresses several of these problems, but while less flawed, is also not perfect). (Basic) RC underweights walks and overweights all types of hits but particularly extra base hits. As a team scoring model, it is inappropriate for application to individual players, and because of the design shortcomings, can boomerang out of control for extreme environments.

Another OTS-based method comes from David Smyth, who has advocated the use of OTS*34 as a quick estimator for runs/game. You may look at that and wonder how multiplying by a constant helps anything.

First, we have to recognize what estimated unit OTS is expressed in. As you can see, RC = OTS*AB; so OTS is an estimate of runs/at bat. So Smyth’s formula converts runs/at bat to runs/game, and is essentially assuming 34 at bats/game.

The average major league game does have something around 34 at bats (for the data I have used throughout this series, the average is within .15 of 34). Of course, we know that the number of at bats any team gets will depend on their own BA and OBA. By holding AB/game constant, the formula is attempting to counterbalance the fact that OBA*SLG goes overboard in predicting the run creation rate of good teams.

When you apply the R/G estimate to actual major league teams by assuming 25.2 outs/game, the RMSE is 25.25 (I used OBA*SLG*1.36*(AB - H)). This is actually more accurate than Basic RC (OBA*SLG*AB), which comes in at 26.09, and presumably the benefits are greater when applied to extreme players.

Another twist on the idea of multiplying OBA and SLG is a method posted by “dq” on the Inside the Book blog. In it, OBA and SLG are each raised to powers, with the SLG power being dependent on OBA. This is designed to alleviate the issues caused by applying OBA*SLG to extremely high offense situations. OTSE, for “Onbase Time Slugging, Exponential” is defined as:

OTSE = PA*OBA^.85*SLG^(1-OBA/2)*.652

For the data we’ve been working with here, a multiplier of .668 will be used, since our OBA does not include HB or SF. This will still throw off the entire equation a bit, though, since the OBA version we are using is not the same as the one it was designed to work with.

Throughout this series, I have not discussed what happens at extreme levels of performance too much. Do not think for a minute that this is because I feel that the extremes are unimportant; I am always concerned about theoretical accuracy in addition to empirical accuracy. However, in the case of the cruder metrics being considered (OPS, OPS+, OTS, etc.), we can see their flaws even at normal levels of offensive performance. It is the better constructed metrics for which a more thorough investigation is warranted.

Of course, for a formula like OTSE, it is at the extremes where it shows its superiority. As shown in the link above, OTSE matches our expectations for linear weights in extreme environments better than the more simple OBA/SLG combinations. However, when used with average teams using OBA = (H + W)/(AB + W), it leaves a little bit to be desired on the linear weight level:

LW = .52S + .78D + 1.05T + 1.31HR + .36W - .103(AB - H)

The big issue is that homers are underweighted by around a tenth of a run. However, OTSE is a lot more robust than other OBA/SLG combinations.

With that being said, though, is it really worth it? The appeal of OBA/SLG combinations is rooted in the idea that the two stats are readily available. But when you have to resort to two non-linear operations, I think that any claim of “simplicity” is dead on arrival. If you do not consider OBA and SLG to be known, and have to figure them and then plug into OTSE, it is far, far more complex than Base Runs. Furthermore, OTSE is not be applicable to individual batters for the same reasons that multiplicative run estimators like RC and BsR are not. So you may as well just figure Base Runs for the team and be done with it. I fail to see any practical application for which you would want to use OTSE.

I did not give the OTSE linear weight formula before the results that it generates, because I was hoping that someone would still be around to read those. The formula for the OTSE weights will scare everyone that’s left off:

LW = .668*((OBA*PA*.81*SLG^(-.19)*dSLG + SLG^.81*(OBA*p + PA*dOBA))

5 comments:

  1. alysis of why OTS*34 works as well as it does. As Tango would say, it works 'by accident'. I've also noticed casually that the 34 number seems to be gradually decreasing, to 33 point something currently. I suppose this reflects primarily fielding improvement, and secondarily more conservative baserunning.

    ReplyDelete
  2. David, I don't know if you'll see this comment, but I was wondering if you'd like to write an article about your Base Wins methodology. I think that your thoughts about the theoretical value of runs and outs being equal to the reciprocal of their frequencies per game are interesting. Most of the approaches for run-win conversions use the marginal value of runs and thus BsW is unique and worthy of more exposure.

    If you wanted to write an article, I'd be happy to post it here, or it could be for Tango's wiki, etc.

    re: OTS*34, another factor could be the increasing frequency of hit batters, if one uses the full version of OBA in the formula. The RC relationship OBA*SLG*AB does not include HB, so even though there are more at bats/game than in the past, the HB factor is working in the other direction. And sorry for taking so long to acknowledge your comment.

    ReplyDelete
  3. The whole point of OTSE is that it is not linear.
    Scoring is a combination of getting on base (OBP) and advancing the batters (SLG)

    You would use it because it is most accurate,

    ReplyDelete
  4. It's the most accurate only among measures in which you handcuff yourself by only using OBA and SLG. You and I will probably not agree, which is fine, but my position is the same now as it was when I wrote the above. The OTSE equation is not simple, it's not directly applicable to players, and it's not the most accurate if you're allowed to consider inputs other than OBA and SLG.

    ReplyDelete
  5. It is not simple - agreed

    It's not directly applicable to players - not sure I agree; it works well at extremes, which is similar to players (a team of Pujols, for example)

    It's not most accurate - this says it was most accurate -http://www.baseball-fever.com/showthread.php?48531-Correlation-Between-Stats-and-Runs-etc/page2

    ReplyDelete

I reserve the right to reject any comment for any reason.