Monday, July 07, 2008

Run Estimation Stuff, pt. 3

Base Runs and Runs Created have many similarities that are easily apparent. Each has an A factor that represents baserunners and a B factor that represents advancement. Each has a C factor that in some way reflects what Bill James called “opportunity”, but James used plate appearances while Smyth uses outs.

The two fundamental differences between the models are 1) their treatment of the home run and 2) how they estimate score rate (the proportion of baserunners that score). BsR recognizes that a home run always results in at least one run, and thus splits them off into a D factor. This modification not only credits each homer with creating at least one run, but it alters the BsR A factor. Since the run scored by the batter who hits the longball has already been accounted for, home runs are removed from baserunners.

Then, since BsR estimates score rate as a proportion (B/(B + C)) rather than as a ratio (B/C), it adheres to the obvious constraint that “runs scored by baserunners <= baserunners”, which RC does not adhere to. If B>C, than RC predicts that more runs will score than their were runners on base.

This is not to say that BsR is a perfect model, because of course it is not. For example, BsR does not adhere to the rule that you can leave a maximum of three on base per inning. In some extreme situations it could return a negative run estimate (although the simple versions that do not subtract anything from the B factor have a lower bound of zero). But these things are also true of Runs Created.

Another flaw of BsR is that in some extreme cases, the intrinsic weight of a triple is greater than that of a home run. This is obviously absurd, although Tango Tiger has developed a workaround for this situation. However, while RC does not have that flaw, it is not necessarily a point in RC’s favor. It’s better to have the triple be weighted higher than a homer, but both weighted somewhat in the vicinity of their actual value than to have both wildly overvalued. That is not to say the triple problem is a good thing for BsR; it’s a clear shortcoming. It just doesn’t hurt its standing against RC, as any time BsR says that a triple is worth more runs than a home run, RC will be telling you that they are both are much more valuable than they actually are.

The only claim here is that BsR is a better model than RC, in almost all regards. You might be able to find an extreme situation that a particular RC version will handle better than a particular BsR version, but the vast majority of cases go to BsR and the logic of the model is a clear advantage for BsR. There is also no fundamental rule of baseball (such as the maximum three runners stranded per inning) that RC adheres to when BsR does not, at least not that I am aware of. Base Runs is the best simple, dynamic model for run scoring that we have. Of course, I grant that simulations and Markov models can be superior if designed properly. However, those take massive spreadsheets or stand-alone programs to implement. As far as formulas that can be implemented quickly on a spreadsheet (and explained in English to those unfamiliar with linear algebra or computer programming) go, Base Runs is the best model.

Anyway, what I want to do here is to examine RC and BsR head-to-head in the manner that Tango suggested here. First, let’s look at Bill James’ latest RC version, remembering that RC is always A*B/C:

A = H + W + HB - DP - CS
B = 1.125S + 1.69D + 3.02T + 3.73HR + .29(W - IW + HB) + .492(SB + SH + SF) - .04K
C = AB + W + HB + SH + SF

As will be demonstrated in the next segment, this formula is more accurate than any of the BsR versions we came up with when used on the 1990-2005 team data. I don’t put a lot of stock in this, but I feel obliged to mention it in the interests of full disclosure (lest I be accused of being a BsR “fanboy” as I have been in the past, by one of the idiots who posts at a certain site that sometimes links to stuff like this, although thankfully not as much as they did in the past).

Let’s take a look at the intrinsic weights generated by this formula for the 1960-2004 period, compared to the target weights from Ruane’s research:



I will let that stand without comment; if the arguments of others as well as my own feeble efforts have not yet convinced you of the importance of approximating empirical linear weight values, nothing I can say here will help. If you hold that view, this table should be very revealing with respect to the Runs Created model, or at the very least the manifestation of it that is currently in use.

Just as we did for BsR, we can find the B coefficients for RC necessary to match our target linear weights. They are:

B = .634S + 1.640D + 2.594T + 3.844HR + .105(W - IW) - .337IW + .189HB + .652SB + .416CS + .157(AB - H - K - DP) + .025K + .146DP + .629SH + .911SF

You can see that these weights are superficially similar to the BsR B weights; of course the home run is a big exception, and there certainly are some discrepancies, but they are in the same ballpark.

We can also “dumb down” Base Runs in order to compare it to the Runs Created model. David Smyth has recently suggested this type of approach as the most basic Base Runs model. What we will do is not give the home run the special treatment of being split off from baserunners and always credited with creating at least one run.

This is not to say that the home run should not be treated as BsR does. The home run, as opposed to other (or possible) categories that always indicate a run scored, like the sacrifice fly, is unique not as a scoring category but as an event defined by the rules of the game. A sacrifice fly, or a RBI groundout, is just an accounting category--it's a way of more specifically describing the outcome of a given event that could have also been classified an out, a groundout, etc. The home run is not just an accounting category; it is an event set aside in the rules of baseball that entitles the batter and any runner to score automatically without the threat of being put out. While there are other events that allow one to score without risk (a bases loaded walk, for instance), the home run is the only event that can do so independently of what precedes or follows it in an inning.

That digression aside, a simple version of BsR suggested by Smyth is:
A = H + W
B = (2*TB - H + HR)*.75
C = AB - H
BsR = A*B/(B + C)

Another version he offered is B = (2*TB - H)*.79.

We could optimize this equation to match target linear weights in the same way that we did in the previous installment, but that would be overkill. It defeats the point of a “simple” formula to include small fractional weights for walks and outs in the B factor.

The second version in particular does a decent job of estimating linear weight values: .52, .83, 1.14, 1.45, .36, -.11 for 1960-2004 (S, D, T, HR, W, O). To me, this in an illustration of the fact that BsR is a step forward over RC in its approach to estimating score rate, without even considering its treatment of the home run.

We have looked at how we can move RC closer to BsR by switching to a proportional estimate of score rate. We could also move RC closer to BsR by retaining a ratio estimate of score rate, but instead dealing with the home runs specially. This would give us a model along the lines of Runs = A*B/C + D.

In fact, this is a construct that has been used for a run estimator, namely Eric Van’s Contextual Runs. Here we will make a generic version to would match our target linear weights:

A = H + W - HR + HB
B = .349S + 1.018D + 1.652T + .915HR - .004(W - IW) - .297IW + .053HB + .434SB - .598CS + .153(AB - H - K - DP) + .065K - .730DP + .467SH + .654SF
C = AB - H + SH + SF

Van uses an initial baserunner, full out construct, but I just want to illustrate how the coefficients directly compare to the RC and BsR versions here. Obviously I am not attempting to supersede his definition of his own estimator; consider this a generic, nameless A*B/C + D model. However, it would be disingenuous to present such a model without acknowledging Van's work.

The bottom line is that all of these models are inferior to the full Base Runs model of A*B/(B + C) + D. Is it possible that someone could develop a better estimator of score rate? Sure, it’s possible. However, it won’t be easy to find, and it is almost certain that it would be a lot more complex than B/(B + C).

When you look at the measures to which Bill James has resorted to prop up the accuracy of RC, you have to ask why not just take one more step and take HR out of A? That is the only difference between the two formulas at this point, once James abandoned using TB in the B factor (instead considering S, D, T, and HR separately) and went to two decimal place weights.

The claim that RC is more simple than BsR, computationally, is now hinging just a few additional mathematical operations (subtracting HR from A, adding them in D, adding B and C, and depending on which BsR version you use, adding and multiplying by a coefficient for a few miscellaneous events in B). If it is really simplicity that you crave, a simple linear weights formula can’t be beat. If you are willing to put up with the “complexity” of Runs Created, it seems very odd to me that you would find Base Runs to be a bridge too far.

I do not really care if Bill James wants to continue to use RC; after all, he developed it, and it has influenced the superior estimators that followed. I imagine that it’s hard to let go of your own creation that was once considered the gold standard (I wouldn’t know because I am no Bill James, not by a longshot, and have never been in that position). What I fail to understand, though, is why anyone else continues to use Runs Created (other than its most simple incarnations for quick and dirty estimates in the same vein as OPS and its cousins).

13 comments:

  1. Another great article. You do a great job explaining the technical aspects of Run Estimators in a simple and concise manner. The only reason I can come up with why some baseball fans continue to use Runs Created is that there are custom versions of the formula dating back to 1876. I don't think the reason is that RC is simpler to compute and understand. You were right when you said "If it's really simplicity that you crave, a simple linear weights formula can't be beat." Keep up the good work.

    ReplyDelete
  2. Thanks. And THT deserves props for making the switch, even if it was long overdue...:-)

    ReplyDelete
  3. Patriot, what is the RMSE of the "wrong" RC version and your corrected "B" weights for that RC version?

    ReplyDelete
  4. As David seemed to be assuming, the "wrong" version does in fact win--23.37 to 23.77.

    The next post will have the RMSEs, FWIW, for all of the formulas I've discussed in the first three posts.

    ReplyDelete
  5. I like the BaseRuns concept, but I will always prefer basic RC per out = OBP*SLG/(1-AVG) because it is both simple and elegant, plus it incorporates all three statistics that are commonly used to characterize the overall hitting performance of a player or team. Multiplying by 25 (not 27) then gives you the approximate RC per game for comparison with the league average, since the denominator of AB-H omits DP, CS, and other outs on the bases. No run estimator can be 100% accurate--there are way too many variables--and this one is certainly close enough for me.

    ReplyDelete
  6. To each his own.

    However, I will point out for the benefit of other readers that RC/O is starting by assuming (for the 1961-2005 major league totals) that a single is worth .57 runs, a double .88, a triple 1.20, a homer 1.52, a walk just .25, and an out -.115.

    It will also tell you that Magglio Ordonez was 84 runs better than an average AL hitter last year, whereas I would say he was around +64 and Palmer's Batting Runs say he was around +62.

    But if having ten run discrepancies for the best hitters in a league, solely as a result of which estimator you choose, doesn't phase you, go for it.

    ReplyDelete
  7. That is one of my favorite temporary insanity homophone mixups of all-time. It sounds as if you are going to shot by a deathray from OBA*SLG/(1 - BA) or something.

    I was looking for "faze".

    ReplyDelete
  8. Well, Bill James is pretty powerful--he may have a deathray at his disposal! :-)

    I take exception to the assertion that various events are each "worth" a particular number of runs. Maybe I am just getting hung up on semantics. It is certainly valid to use linear weights in run estimators, but they are another model that approximates reality--not the Truth in an absolute sense.

    Consequently, ten-run discrepancies between different estimators do not bother me. We are not actually measuring real runs anyway--we are measuring overall hitting performance with a single number, which can be characterized as the runs attributable to an individual player. As long as we use the same metric for everyone--RC, BsR, or whatever--we are still comparing apples to apples.

    Personally, I think that the most accurate way to assign runs to players would be to count the bases gained and lost by batters and runners within each inning, add up the results, and then divide the difference (BG-BL) by 4. This has the advantage of being 100% accurate at the team level, but obviously requires complete play-by-play data and some basic "scoring" rules.

    For example, if a runner advances from first to third on a single, the batter would get 2 BG (himself to first and the runner to second), while the runner would get 1 BG (for taking the extra base). Meanwhile, the pitcher would presumably be charged with all 3 BG; or maybe just the batter's 2 BG, with the defense taking the runner's 1 BG.

    Other tricky parts include what to do with errors (perhaps keep these BG in a separate category) and how to distribute BL among the three out-makers when runners are left on base. I am sure that those details could be worked out by the sabermetric community. Does this idea have merit, or am I way off base here?

    ReplyDelete
  9. as long as we use the same metric for everyone--RC, BsR, or whatever--we are still comparing apples to apples.

    The problem that I have with this is that we know that the RC model is biased in favor of certain profiles. Just becuase everyone is evaluated with the same metric does not wash those biases away. We could rate everyone by (HB + SB)/D and it would be perfectly "fair". It just wouldn't measure anything useful.

    Of course, I'm not trying to say that RC measures nothing useful, and I conceded in the original post that it is useful for quick calculations. And the OBA*SLG = runs/at bat, OBA*SLG/(1-BA) = runs/out, and OBA*SLG*(1-OBA)/(1-BA) = runs/PA properties are really cool and convenient. But for serious evaluation of players, they leave a lot to be desired.

    The problem with the bases gained/lost line of analysis is that it treats all bases equally. But this is simply not the case. Moving from first to second with no one out benefits your team a lot more than moving from second to third with two outs.

    Run Expectancy, the foundation of Linear Weights, enables each play to be valued in a number of runs, based on actual situational data instead of assuming that all bases are created equal. I realize that RE/LW are based on average situations, and that you may be hesitant to say "this play is worth .3 runs" or whatever because of this.

    However, assuming an average situation is not a whole lot different than assuming that all bases are equally valuable.

    Anyway, just as the linear weight of a single is the average of all of the changes in RE on singles, you can find the average number of bases produced by each event. Tango Tiger figured this for 1999-2002 recently. ( Bases ).

    You can see that the average bases correlate very well to the average runs as measured by linear weights. Of the existing offensive metrics, LW is the one that comes closest to matching the results that you would get from your idealized metric.

    Of course, run expectancy can also be applied to each play individually, giving each play a different value based on the unique situation in which it occurred, as they do at Fangraphs with BRAA.

    ReplyDelete
  10. Thanks for the thoughtful discussion.

    Perhaps the difference between us is captured by your comment about "serious evaluation of players". I guess that I am more interested in using approaches that are "really cool and convenient", but also accurate enough to be legitimate. It figures, since I am a structural engineer; that is pretty much what I do for a living.

    As far as the different values of bases gained and lost, my assumption is that this would tend to take care of itself in the aggregate. Assuming that the scoring rules are unambiguous, you would have an objective way of giving partial run credit to individuals in any environment without having to do statistical analysis beforehand.

    Using your example, moving from first to second with no outs has no run value if the next three hitters all strike out. Moving from second to third with two outs is very valuable if the next thing that happens is a wild pitch. Both events are equally valuable if the next hitter singles on a ground ball up the middle.

    My point here is that applying run expectancies is essentially using past data to predict future results; assigning BG and BL happens in real time and reflects actual outcomes. The close correlation of average bases and average runs suggests that counting actual bases would be a good way to attribute actual runs.

    ReplyDelete
  11. Using your example, moving from first to second with no outs has no run value if the next three hitters all strike out. Moving from second to third with two outs is very valuable if the next thing that happens is a wild pitch. Both events are equally valuable if the next hitter singles on a ground ball up the middle.

    I don't want to belabor these disagreements too much, because I have absolutely no illusions that I am going to persuade you (*), but the runner would have, more often than not, scored from second on the single with 2 outs. Sure, there are infield hits, hard hit singles to left, cases in which the baserunner is Ernie Lombardi, but more often than not, he will score.

    While it is true that using RE is using past results, it also captures a level of reality that is left out by using bases. Which is not to say that the differences won't be subtle, or that the base approach won't give *similar* results.

    (*) Just to clarify, that is not a comment on you, it is a recognition that I was not on the debate team and that I'm sure you've thoughtfully considered these issues, and have for whatever reason come to a different conclusion than I have.

    ReplyDelete
  12. Fair enough. My point with the last example was just that exactly one run scores either way.

    My preference for counting bases vs. run expectancy is because the former is valid in the absence of any prior knowledge or assumptions (other than all bases being equal). One argument for BaseRuns is that it works in any environment, but that is true only if you have a large enough and representative enough data set for proper calibration of the B factor.

    So far in 2008, Major League teams are scoring an average of 4.54 runs per game. Over the time period 1961-2005 (which you referenced above), the average is 4.39 runs per game. So it seems to me that you already have a 3.4% discrepancy between the basis for your formulas and something to which you presumably want to be able to apply them.

    Thanks again.

    ReplyDelete
  13. Ideally, the BsR B factor does not need to be calibrated for each particular environment to produce an accurate estimate. Obviously, you can make it return the exact number of runs scored for a given dataset if you do calibrate particular for it, but that isn't a requirement of using the model.

    As a model of the run scoring process, BsR should adapt to any conditions. It doesn't of course--after all, there is no such thing as a perfect model. However, I think that it gives reasonable results for a very wide range of conditions.

    The charts in part 4 for the '68 NL and '96 AL, while far from conclusive, demonstrate that Base Runs tracks the empirical linear weights pretty well, with only the SF being particularly troublesome. It certainly matches the empirical coefficients of each events better over disparate conditions than Runs Created does for the period in which it was designed to work.

    ReplyDelete

I reserve the right to reject any comment for any reason.