Monday, May 19, 2008

An Analysis of Clay Davenport’s EqR and EqA

This piece will be an analysis of Clay Davenport’s EqR and EqA, and will attempt to break down how they work and what their true nature. As such, it is an opinion piece, and is critical in nature. It should certainly not be construed as a neutral presentation of formulas as the last piece aspired to.

Let’s focus on the nature of the EqR equation, which once again is:

EqR = (2*RAW/LgRAW - 1) * PA * Lg(R/PA)

where RAW = (H + TB + 1.5*(W + HB + SB) + SH + SF)/(AB + W + HB + SH + SF + SB + CS)

The formula starts by treating the league runs per plate appearance as a known constant. There is nothing inherently wrong with this, but it does give EqR an advantage in accuracy testing should other metrics not be granted the same consideration. One could also fix the values for LgRAW and Lg(R/PA) over some period of time and thus have defined numerical values as constants.

The relationship between RAW and expected runs per PA is linear, with a slope of 2 and an intercept of -1. This means that a team with a RAW 10% better than league average is expected to score 20% more runs than average. This property has sometimes led to the accuracy of EqA being underestimated in studies.

For example, in the 1999 Big Bad Baseball Annual, Jim Furtado tested the accuracy of rate stats in predicting by runs by assuming that there was a 1:1 relationship between the rate in question and runs per out. This was flawed on two levels; one is that most rate stats (including OPS) will test as more accurate when used to predict R/PA. The second is that not all statistics have a 1:1 ratio; OPS, like RAW, is closer to 2:1. OTS performed very well in that test because it is one of the few that does have a 1:1 ratio (or close to it).

One way then to attempt to fine-tune the accuracy of the EqR equation is to run a regression of RAW/LgRAW versus (R/PA)/Lg(R/PA). Using data for all teams 1990-2005 (except 1994), the long-term average R/PA is .1219 and the long-term average RAW is .7706. Using those two values as constants in place of the actual league averages, we get this optimal equation:

EQR = (1.968*RAW/.7706 - .968)* PA* .1219

As you can see, very little difference, and small enough to be ignored. However, with earlier versions of RAW that did not include HB, SF, and SH, the slope was closer to 1.9 and there was a measurable improvement in accuracy as a result of using the regression coefficients.

The most important to thing to understand about the structure of Equivalent Runs is that it is essentially a linear weights formula. It is difficult for some people to recognize this from just taking a glance at the formula. However, if one really takes a look at the formula, you’ll see that the only source of non-linearity is the treatment of stolen bases and caught stealing. The denominator of RAW is PA + SB + CS. If SB and CS are both zero, then the RAW denominator cancels with the multiplication by PA, and the formula is 100% linear.

There are strong points for linear run estimators, and there are also strong points for non-linear run estimators. Which class of estimator you should use depends on what you are trying to measure, and both have their uses in sabermetrics. However, it makes no sense to have a formula that is dynamic only in its treatment of stolen base attempts. Of course the value of a SB or CS is dependent on the overall context in which they occur. So is the value of a single, a homer, a walk, or any other event.

In fairness, the impact of stolen base attempts on the coefficients is fairly small. It is not a source of major distortion in the formula. Of course, the fact that they are fairly small makes the decision to treat them as denominator quantities and thus make EqR much more complex than it needs to be all the more puzzling. Such a choice can seemingly only be justified by a blind pursuit of lowering RMSE at the expense of all other attributes of a run estimator (such as logic).

What are the intrinsic linear weights used by EqR, anyway? We can use calculus to find the exact values for any given set of input stats. Let’s define some terms to make the equations easier to write and follow. L is Lg(R/PA), X = LgRAW, m = slope (we’ll use 2, but generalize it in case some other value was to be used), b = intercept (again, generalized to -1), P = PA, N = RAW numerator, D = RAW denominator. Also, let p be the derivative of PA with respect to a given event (1 for any PA event, 0 for any non-PA event like a SB or CS), n = derivative of RAW numerator with respect to a given event, and d = derivative of RAW denominator with respect to a given event. Then, we can rewrite EqR as:

EqR = (m/X)*(N/D)*L*P + b*L*P

The derivative of this with respect to a given event is:

(m*L/X)*(P*(D*n - N*d)/(D^2) + (N/D)*p) + b*L*p

We can use this equation to generate the linear weights for any given set of input stats. We’ll use the composite stats for 1990-2005 (excluding 1994) to get these weights:

EqR = .501S + .810D + 1.119T + 1.428HR + .347(W + HB) + .225SB - .238CS + .193(SH + SF) - .116(AB - H)

You can see that these are fairly reasonable linear weights. You can quibble about how optimal they actually are, but for the purposes of this article, we’ll just leave it at “they are in the ballpark”--there's nothing so far off as to be seriously distorting (as is the case for Basic RC, for instance). Thus, for all of the window dressing, EqR may as well be Estimated Runs Produced, Batting Runs, or any number of other linear weight estimators. In fact, if you apply the above formula to the teams in the aforementioned sample, and compare it to the “actual” EqR figure (using the long-term averages for L and X), the largest difference for any team is four runs--the 1991 Expos and the 1992 Angels.

To get a further handle on how the non-linearity rears its head, let’s consider Roger Maris’ 1961 season. I don’t have any special reason for choosing it, by the way. We will look at his EqR total (based on our fixed values for L and X) and the intrinsic weights with a few made-up SB/CS combinations.

The first column below gives Maris’ actual statistics. The second column has the intrinsic weights and EqR total based on those. The other columns present different, hypothetical SB/CS combinations with their intrinsic weights and EqR. For example “30/10” is 30 SB and 10 CS. The EqR figures at the bottom have removed the contribution of SB and CS--I am trying to show you the effect that different SB/CS combinations had on the weights, not the value of those stolen base attempts themselves. Again, the EqR figures at the bottom are EXCLUDING SB and CS; the different values for SB and CS have been changed the weights for all of the events, and those new values were used to generate EqR--without the contribution of SB and CS. The differences between the various cases are due solely to the changes in weights brought about by considering SB and CS in the formula, not the run values of the SB and CS themselves:


Let me first note that for the column with the weights for Maris' actual performance (zero stolen base attempts) , the weights are exactly the same as we would find for any player or team with zero stolen base attempts, given the same values for X and L. Algebraically, the formula for the weights simplifies under those conditions to:

m/X*n*L + b*L

For example, Maris' homer weight is 1.460. The value of a homer by the above formula is:

2/.7706*5*.1219 + (-1)*.1219 = 1.460

When looking at the other SB/CS combinations, the EqR stays within a 1.5 run range for all of the combinations. However, the weights for the various hit types move pretty wildly, even for the home run, which should be fairly stable. The out is also bizarrely affected.

This could all be easily fixed by using PA in the RAW denominator and making EqR a pure linear formula. Instead, you have something that is:
1) more complex
2) causes problems at extremes, if not in actual result then in theory
3) looks very confusing to those who don’t have the patience to sort through it

The bottom line is that EqR is not a terrible run estimator; it is a terribly constructed run estimator. If someone presents you a pre-figured list of EqR, feel free to look at it and consider the results fairly reasonable. However, if you are going to pick a run estimator to use, take a 100% linear estimator or a dynamic team modeling method (like Theoretical Team Base Runs) instead of a metric that can’t decide what it wants to be and thus is a bizarre hybrid.

Then we move on to the matter of Equivalent Average. For some further exposition on the nature of my issue with EqA, I will refer you to this post, where I discuss criteria for evaluating a statistic. The third criterion I listed is that of comparability: how can two players, teams, etc. be compared with this statistic, such that the comparison itself tells you something meaningful. With a ratio? With a difference? Both? Neither?

For example, in the case of a not-too-useful metric like Batting Average, both the ratio and the difference has inherent meaning. Comparing a .300 hitter to a .250 hitter, the ratio of 1.2 means that the first player is 1.2 times more likely to get a hit in a given at bat than is the second (leaving aside the question of how useful a subset of plate appearances At Bats actually are). The difference of .050 means that the first player gets .05 more hits for each at bat than the second. Both comparisons are meaningful, even if they aren’t particularly insightful.

Equivalent Average is based on runs per out. Runs per out is meaningful for both types of comparisons. EqA does not stop with R/O, though; first it divides it by five. Dividing by five does nothing to change the R/O ratio. It does make the differential comparison less useful, as “runs divided by five per out” is not a meaningful baseball unit. However, it is directly linearly-related to a meaningful one, and thus can still be considered meaningful (and if you disagree, you can just multiply by five).

However, to finish off EqA, the (R/O/5) is raised to the .4 power. This is done so that it approximates the scale of Batting Average. In doing so, though, both ratio and differential comparisons are destroyed. Consider a player with .4 runs per out (about 10 per game) and another with .2 (about 5). The true ratio is 2 and the difference is .2 runs per out.

However, when we convert to EqA, player A has a .364 and player B has a .276. What does the ratio of 1.32 mean? It doesn’t mean that player A has created 32% more runs per out. It doesn’t mean anything, unless you raise it to the 2.5 power. The difference is even more hopeless.

If someone gave me a list of EqA figures, the first thing I’d do would be to raise them to the 2.5 power and multiply by five to convert them back into runs per out. I don’t really want to get into whether having the figures on a BA scale or not is helpful; after all, that is a matter of preference. However, if you are going to make the scale conversion in a non-linear way, I believe it should be incumbent on you as the developer of the metric to make clear the issues involved in player comparisons. I do not believe that the average user of EqA has any idea of how to compare the ratios or differences in meaningful baseball terms.

1 comment:

  1. This was incredibly insightful and interesting after a 6 hour statistics bender; thank you from a year in the future.

    ReplyDelete

I reserve the right to reject any comment for any reason.