Thursday, February 06, 2020

Tripod: Run Estimators & Accuracy

See the first paragraph of this post for an explanation of this series.

This page covers some run estimators. It by no means includes all of the run estimators, of which there are dozens. I may add some more descriptions at a later time. Anyway, Base Runs and Linear Weights are the most important and relevant. Equivalent Runs is often misunderstood. Appraised Runs is my twist on the funny looking, flawed, but no more so than Runs Created method of Mike Gimbel.

I guess I'll also use this page to make some general comments about run estimators that I may expand upon in the future. I posted these comments on Primer in response to an article by Chris Dial saying that we should use RC (or at least that it was ok as an accepted standard) and in which me mentioned something or the other about it being easy to understand for the average fan:

If you want a run statistic that the general public will understand, wouldn't it be better to have one that you can explain what the structure represents?

Any baseball fan should be able to understand that runs = baserunners *% of baserunners who score + home runs. Then you can explain that baserunners and home runs are known, and that we have to estimate % who score, and the estimate we have for it may not look pretty, but it's the best we've been able to do so far, and that we are still looking for a better estimator. So, you've given them:

1. an equation that they can understand and know to be true

2. an admission that we don't know everything

3. a better estimator than RC

And I think the "average" fan would have a much easier time understanding that the average value of a single is 1/2 a run, the average value of a walk is 1/3 of a run, the average value of an out is -1/10 of a run, then that complicated, fatally flawed, and complex RC equation. But to each his own I suppose.

I will also add that the statement that "all RC methods are right" is simply false IMO. It is true that there is room for different approaches. But, for instance, RC and BsR both purport to model team runs scored in a non-linear fashion. They can't both be equally right. The real answer is that neither of them are "right"; but one is more "right" than the other, and that is clearly BsR. But which is more right, BsR or LW? Depends on what you are trying to measure.

********

When I started this page, I didn't intend to include anything about the accuracy of the various methods other than mentioning it while discussing them. A RMSE test done on a large sample of normal major league teams really does not prove much. There are other concerns which are more important IMO such as whether or not the method works at the extremes, whether or not it is equally applicable to players as teams, etc. However, I am publishing this data in response to the continuing assertation I have seen from numerous people that BsR is more accurate at the extremes but less accurate with normal teams then other methods. I don't know where this idea got started, but it is prevelant with uninformed people apparently, so I wanted to present a resource where people could go and see the data disproving this for themselves.

I used the Lahman database for all teams 1961-2002, except 1981 and 1994 for obvious reasons. I tested 10 different RC methods, with the restricition that they use only AB, H, D, T, HR, W, SB, and CS, or stats that can be derived from those. This was for three reasons: one, I personally am not particularly interested in including SH, SF, DP, etc. in RC methods if I am not going to use them on a team; two, I am lazy and that data is not available and I didn't feel like compiling it; three, some of the methods don't have published versions that include all of the categories. As it is, each method is on a fair playing field, as all of them include all of the categories allowed in this test. Here are the formulas I tested:

RC: Bill James, (H+W-CS)*(TB+.55SB)/(AB+W)

BR: Pete Palmer, .47S+.78D+1.09T+1.4HR+.33W+.3SB-.6CS-.090(AB-H)
.090 was the proper absolute out value for the teams tested

ERP: originally Paul Johnson, version used in "Linear Weights" article on this site

XR: Jim Furtado, .5S+.72D+1.04T+1.44HR+.34W+.18SB-.32CS-.096(AB-H)

EQR: Clay Davenport, as explained in "Equivalent Runs" article on this site

EQRme: my modification of EQR, using 1.9 and -.9, explained in same article
For both EQR, the LgRAW for the sample was .732 and the LgR/PA was .117--these were held constant

BsR: David Smyth, version used published in "Base Runs" article on this site

UW: Phil Birnbaum, .46S+.8D+1.02T+1.4HR+.33W+.3SB-.5CS-(.687BA-1.188BA^2+.152ISO^2-1.288(WAB)(BA)-.049(BA)(ISO)+.271(BA)(ISO)(WAB)+.459WAB-.552WAB^2-.018)*(AB-H)
where WAB = W/AB

AR: based on Mike Gimbel concept, explained in "Appraised Runs" article on this site

Reg: multiple regression equation for the teams in the sample, .509S+.674D+1.167T+1.487HR+.335W+.211SB-.262CS-.0993(AB-H)

Earlier I said that all methods were on a level playing field. This is not exactly true. EQR and BR both take into account the actual runs scored data for the sample, but only to establish constants. BSR's B component should have this advantage too, but I chose not to so that the scales would not be tipped in favor of BsR, since the whole point is to demonstrate BsR's accuracy. Also remember that the BsR equation I used is probably not the most accurate that you could design, it is one that I have used for a couple years now and am familiar with. Obviously the Regression equation has a gigantic advantage.

Anyway, what are the RMSEs for each method?

Reg-------22.56
XR--------22.77
BsR-------22.93
AR--------23.08
EQRme-----23.12
ERP-------23.15
BR--------23.29
UW--------23.34
EQR-------23.74
RC--------25.44

Again, you should not use these figures as the absolute truth, because there are many other important factors to consider when choosing a run estimator. But the important things to recognize IMO are:

* all of the legitamite published formulas have very similar accuracy with real major league teams' seasonal data

* if accuracy on team seasonal data is your only concern, throw everything away and run a regression (the reluctance of people who claim to be totally concerned about seasonal accuracy to do this IMO displays that they aren't really as stuck on seasonal team accuracy as they claim to be)

* RC is way behind the other methods, although I think if it included W in the B factor as the Tech versions do it would be right in the midst of the pack

* BsR is just as accurate with actual team seasonal data as the other run estimators

Anyway, the spreadsheet is available here, and you can plug in other methods and see how they do. But here is the evidence; let the myths die.

Here are some other accuracy studies that you may want to look at. One is by John Jarvis. My only quibble with it is that he uses a regression to runs on each RC estimator, but it is a very interesting article that also applies the methods to defense as well, and is definitely worth reading (NOTE: sadly this link is dead)

And this is Jim Furtado's article as published in the 1999 BBBA. He uses both RMSE and regression techniques to evaluate the estimators. Just ignore his look at rate stats--it is fatally flawed by assuming there is a 1:1 relationship between rate stats and run scoring rate. That is pretty much true for OBAxSLG only and that is why it comes in so well in his survey.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.