Monday, January 16, 2006

Another Superfluous Run Estimator

It is hard to muster a lot of enthusiasm for looking at run estimators other then Linear Weights or Base Runs, unless somebody manages to improve the score rate estimator for BsR someday, or somebody comes up with an alternative model of the scoring process that works at extremes as well as everyday teams.

The method I am looking at here is not one of them[LW or BsR]. It is a faulty run estimator, with those flaws often being obvious. However, if we set James’ Runs Created as the minimum standard a run estimator must meet in order to be taken seriously, this one does that. I believe that it may in fact be superior to Runs Created, but of course inferior to LW and BsR. It probably also falls below Eric Van’s Contextual Runs as well (Contextual Runs is sort of a poor man’s BsR. It takes out homers, but the score rate estimator does not hold up nearly as well at theoretical extremes as BsR’s does. This is probably partially because it is based on an advancement to out ratio instead of advancement as a percentage of advancement plus outs as is done by David Smyth).

Anyway, here I will take another look at what I call Appraised Runs (all of the good names are already taken after all), which I developed a few years ago as an adaptation of Mike Gimbel’s Run Production Average. RPA started with a set of linear weights (which Gimbel never fully explained where they came from, except that they represent “run-driving” value) which clearly did not measure the same thing as traditional LW as they severely underrated on base events and overrated homers. He then adjusted these by a “set up” rating which adjusted for the value of the event in setting up further scoring opportunities for the team. Homers actually reduced the set up rating because they take runners off base.

The set up rating was then divided by the league average, and that was used to scale 50% of the run-driving value. That plus 50% of the run-driving total was the estimated number of runs scored. This step still does not make sense to me because the team’s runs scored should not depend on the league rate, nor should the number of runners they have on base after an event.

Gimbel also included a number of categories that aren’t widely available, such as balks and wild pitches (for offenses), and so it was difficult to test his work and apply it with limited data. So I came up with my own method that started with his basic values and then applied a set up rating that was not dependent on the league average.

My starting point was Gimbel’s run-driving values, scaled to equal total runs scored for the dataset I developed the formula on:
RD = .289S + .408D + .697T + 1.433HR + .164W
Then the set up figure, which I abbreviated UP, was found through regression and some trial and error, trying to get realistic intrinsic linear weight values:
UP = (5.7S + 8.6(D+T) + 1.44HR + 5W)/(AB+W) - .821
Then AR = UP*RD*.5 + RD*.5

Immediately you can see some of the flaws in this equation. A team that gets a very small number of runners on base will get a negative UP and negative runs. A team that hits 500 HRs will have a RD of 716.5, and UP of .619, and an AR of 580, when we know they will score 500 runs. It may be better at the extremes then RC, but it is not correct at the extremes and is not nearly as good as BsR. Please note, again, that I am not claiming this method should remain in use in sabermetrics. I am claiming that it is probably as good as or better then Runs Created. If you read Gimbel’s book in 1993 and did not yet know about BsR, it may well have been the best dynamic run estimator at that time.

I also came up with a version that included SB and CS:
RD = .288S + .407D + .694T + 1.428HR + .164W + .099SB - .164CS
UP = (5.7S + 8.6(D+T) + 1.44HR + 5W + 1.5SB - 3CS)/(AB+W) - .818
AR = UP*RD*.5 + RD*.5

Anyway, if you do an accuracy test, using the SB version, on all teams in the 1980s excepting 1981(the same sample I used in the post about Mann’s RPA), the RMSE of AR is 23.09, compared with 23.64 for ERP, 22.85 for BsR, and 25.15 for RC. So it is just as accurate as any of the other methods with normal teams. If you use just the RD portion, you get a RMSE of 31.06. Obviously it is flawed, not only from under-weighting the value of on base events but also not considering outs at all. This construction necessitates the use of the UP factor or some other adjustment.

There is more than a similarity of names between the Gimbel and Mann RPAs. Both start with coefficients that do not capture the importance of avoiding outs and do not properly credit getting on base, then apply adjustments in order to make the formula useful. Gimbel’s adjustments seem to be much more thought out and reasonable then Mann’s. I might be very wrong, but I get the impression that Gimbel developed his system knowing that he would need to apply adjustments and actually wanting that to be the way his estimator worked, whereas it seems that Mann kept adding stuff in order to give his estimator some semblance of accuracy.

Of course, I like to differentiate everything so that we can see the intrinsic linear weights. In order to do this, we need to differentiate UP and RD. So the derivative of UP with respect to an event will be dUP, and for RD we’ll call it dRD(which will simply be the coefficient of the event in the RD equation). We can write UP as U/P, where U is numerator and P is the denominator. Since we are dealing with a constant subtraction of .818, we can ignore this, since the derivative of a constant is zero. We will call the coefficient of any event in the UP numerator u and the coefficient of any event in the UP denominator p. Then dUP = (P*u - U*p)/P^2. Knowing that value, the formula for the AR intrinsic LW is simple:
dAR = .5*(UP*dRD + RD*dUP) + .5*dRD
For example, using the non-SB version of AR, the 1979 Red Sox had an RD of 809.529, an UP of 1.17269, with U = 11663.06 and P = 5850. So the dUP for a walk was:
dUP = (5850*5 - 11663.06*1)/5850^2 = 5.139*10^(-4). We can plug into the dAR formula now:
dAR(wrt W) = .5*(1.17629*.164 + 809.529*5.139*10^(-4)) + .5*.164 = .3862
So the intrinsic LW value of a walk for the 1979 BoSox, according to AR, is .3862 runs. We can likewise calculate the intrinsic LWs for the other offensive events.

I have calculated the LW for each of the three methods based on the 1980s sample, and gotten these results(displayed as S, D, T, HR, W, SB, CS, O):
RC: .58, .89, 1.21, 1.52, .35, .16, -.39, -.122
AR: .51, .80, 1.09, 1.41, .35, .08, -.44, -.105
BsR: .47, .77, 1.08, 1.45, .33, .23, -.32, -.094
The AR weights are a better match for the BsR weights then are the RC weights. Although it seems that I have not properly factored in the value of the stolen base in the AR SB version. Maybe they and CS should be given more weight in the UP factor.

If you apply the linear versions in order to do an accuracy test, the RMSE for RC is 43.65 (this is not really a reflection on RC, it is that I used the technical version without using all of the categories, so you should probably take the intrinsic LW for RC with a grain of salt too), 31.27 for AR, and 23.02 for BsR (interestingly, the BsR linear weight error is only .17 higher then the error for the multiplicative BsR equation).

I tried the derivative formulas on an extreme player, one with 500 AB, 180 H, 40 D, 60 HR, 150 W, and no SB or CS. That is a batting line of 360/508/800, which of course is ridiculously awesome. The point was to see how the intrinsic LW from AR would match up with those from RC or BsR. I used the basic versions of AR and BsR but the technical version of RC, just so that RC would include walks in the B factor. This may cause other problems, since it would be unrealistic to have no DP, SF, IW, etc. for a player like this, but I think the walks not being included at all in advancement problem is bigger. The estimated runs created for the player by each method are 192.4 for RC, 202.3 for AR, and 191.2 for BsR. Here are the resulting intrinsic weights from the three equations:
RC: .84, 1.35, 1.86, 2.36, .46, -.34
AR: .76, 1.25, 1.64, 1.81, .51, -.29
BsR: .66, 1.01, 1.36, 1.52, .49, -.21
As you can see, the AR weights are a better match for the BsR weights then the RC weights, but the AR and RC estimates are closer to each other then they are to BsR. AR performs better largely because, while it does not treat the HR correctly by counting it as an automatic run as BsR does, it does recognize that homers reduce the number of runners on base for the following batters(this is what the UP factor in it’s original form was supposed to capture--the replacement for it that I have used here is empirical and not theoretical as Gimbel’s was), and therefore does not allow the value of the HR to keep compounding as RC does.

Although I have not tested it, I think that is possible that AR shares with BsR one flaw that occurs at particularly high OBAs that you could call the “triple jump” flaw--the value of a triple will jump the value of a HR at some point. This is obviously incorrect, but the distortion that comes from having the triple valued slightly higher then the home run in BsR is of a much smaller magnitude then the RC flaw of allowing a HR to continue to grow in value, up to a maximum of four runs a pop. It is unfortunate that we have this imperfection in BsR, and hopefully future innovation will allow us to eliminate it.

I am sure that one could improve the accuracy of this formula by fiddling around with the coefficients, particularly for UP. Perhaps one could develop an UP with a completely different structure that would be useable as well, or change the percentage of RD that is scaled by UP, etc. I’m not sure what the point would be though. This type of run estimator is best accepted for what it is: a clever but ultimately not very useful approximation of the scoring process that displays ingenuity on the part of its inventor and may have once been state of the art, but is no longer. As far as I’m concerned, this is exactly the same category that Runs Created falls into. It, however, shows no signs of being consigned to the dustbin any time soon.

No comments:

Post a Comment

Comments are moderated, so there will be a lag between your post and it actually appearing. I reserve the right to reject any comment for any reason.