Sunday, March 28, 2010

Esoteric Ramblings About +

Recently, Sean Forman changed the formula for ERA+ used at Baseball-Reference. It turned out to be a fleeting alteration, as for now the old ERA+ is back. There's been a lot written about the properties of the two ERA+ formulas, and I have nothing new to add, but that's never stopped me before.

Everything that follows takes ERA and ERA+ for what they are. I'm not going to get into why I prefer total runs to earned runs, or the virtues of DIPS-based stats, or any other topic. Just how best to compare a pitcher's ERA to the league average, implicitly assuming that is something one is interested in doing.

A Brief, Trivial History of ERA+

As far as I can find, the earliest reference to an adjusted ERA in the literature occurs in Pete Palmer and John Thorn's seminal Hidden Game of Baseball, circa 1984. In The Hidden Game, "Normalized ERA" is calculated as LgERA/ERA*100 (there is a park-adjusted version as well, but I'll sidestep the issue of park effects for the remainder of the post as they just serve to muddy the water even more, and the concept of comparing ERA to LgERA stands whether park is accounted for or not. I'll also ignore the scalar of 100 for the rest of the post because it is cosmetic and does nothing to alter the properties of the method, provided that it is corrected for when appropriate).

Interestingly, at no point in the book (at least that I could find through skimming) does Palmer explicitly explain why he chose to calculate LgERA/ERA rather than ERA/LgERA. There is a general explanation that normalized statistics set the average at 100 and above-average performances receive marks greater than 100, but the case of ERA is not given special attention.

In their later work, Total Baseball, the statistic was called Adjusted ERA but the ERA+ abbreviation was used (at least as of Total Baseball III, published in 1994). In the ESPN Baseball Encyclopedia, a collaboration between Palmer and Gary Gillette, it remains Adjusted ERA but with a new abbreviation (AERA); the names of most of the derived stats changed, presumably to avoid any sort of issue between the two works.

In his Historical Baseball Abstract (1985), Bill James uses what he calls "Percentage of League" as his main rate stat for pitchers. Percentage of League is RA/LgRA. Obviously, it differs from ERA+ by using total runs allowed, but more relevant for the purpose of this discussion is that it puts the individual's average in the numerator. James does not discuss the properties of one form against the other.

Properties of ERA+

Treating LgERA as a constant (and it is from the backwards-looking perspective of the historical record), the formula for ERA+ is:

LgERA/ERA = LgERA/(ER*9/IP) = (LgERA*IP)/(ER*9)

The denominator of ERA is innings pitched; the denominator of ERA+ is earned runs. This is a point that many people have overlooked (embarrassingly, I made this mistake myself in 2007, although I had come to my senses by early 2008). The practical consequence is that if you are averaging ERA+ across samples, you cannot weight it by innings pitched--you must weight it by earned runs.

An example is hardly necessary given that this point has been repeated ad nauseam, but consider a pitcher with a ERA of 3.25 in one season and 4.00 in a second season, both achieved in leagues with LgERA = 4.50 and both in 200 innings. His ERA+ is 1.385 in year one and 1.125 in year two. The natural inclination is to simply average the two and produce an ERA+ of 1.255.

However, his ERA for the two seasons combined is 3.625 against a league average of 4.50, producing an ERA+ of 1.241. The discrepancy is because the denominator of ERA+ is not innings.

In order to accurately figure ERA+ for the two seasons combined from the single-season ERA+s, one has to weight them by earned runs. In that case (we can use ERA as a stand-in for earned runs since we've specified that innings = 200 in both cases):

(3.25*1.385 + 4.00*1.125)/(3.25 + 4.00) = 1.241

Admittedly, the discrepancy in a case like this is quite small. Others might argue that weighting by earned runs is not that big of a hassle. However, once park factors are introduced into the mix, the task becomes trickier, as it's necessary to park-adjust the earned run totals being used to weight ERA+.

A common way to explain ERA+ figures in English is to say that an ERA+ of 1.07 means a pitcher is "7% better than average", or an ERA+ of .84 means that he is "16% worse than average". This usage doesn't really hold up mathematically, though. In a 4.5 LgERA context, an ERA+ of 1.07 corresponds to a 4.206 ERA. But 4.206 is 94.37% of 4.5, or 6.53% less. What ERA+ actually says, in English, is that the League ERA was 7% higher than the pitcher's ERA.

Using "better" and "worse" rather than the mathematically definable "less" and "more" is one of the causes of all the confusion. What does it mean to be 50% better with a 1.50 ERA+? 50% fewer runs (no, that's not it--that would be an ERA+ of 2.00)? 50% more wins than average? And if one pitcher is 50% better (1.50 ERA+) and another is 50% worse (.50 ERA+), then they should add up to be average, right? As was already demonstrated, they don't.

I'm repeating the takeaway from the second-to-last paragraph now, but I think it's worthwhile. ERA+ does not tell you that a pitcher's ERA was X% less or more than the league's ERA. It tells you that the league's ERA was X% less or more than the pitcher's ERA. If you want to claim that the league's ERA being X% more than an individual's ERA means that the pitcher was X% better than average, you can try, but it will require a very odd and very convoluted definition of "better" to hold up as logically consistent.

The end result is that while ERA+ ratios are meaningful, ERA+ differentials are not. Let's return to the two pitchers above, one with a 4.206 ERA and 1.07 ERA+ and the other with a 5.357 ERA and .84 ERA+ in a 4.50 LgERA context. The ratio between their ERAs is 5.357/4.206 = 1.274. The ratio between their ERA+ is also 1.274 (although we have to take Pitcher A/Pitcher B in this case because of the inversion of ERA+)--1.07/.84 = 1.274.

However, there's nothing you can do with (1.07-.84). Simple mathematical operations to compare the differences between ERA+s are impossible because there are no common denominators. (One might object at this point and say that because each batter has a different number of PA, OBA doesn't have a common denominator. That's not true, though, since anything per PA is a rate of batter performance per unit of playing time. While the PA for any two batters may not be equal, OBA remains the proportion of PA in which the batter gets on base. Using ER as a denominator, on the other hand, leaves one with a ratio rather than a rate--a ratio that, as Colin Wyers points out, is outs to runs rather than the much more useful runs per out that ERA itself returns).

There is one positive quality of ERA+ worth noting. If one wishes to convert ERA to an estimated W% for an individual pitcher (while ignoring factors like IP/G that will make team W% when the pitcher works in a game different), ERA+ provides a run ratio that can be plugged into a Pythagorean estimator. An individual W% can be easily figures as (ERA+)^2/((ERA+)^2 + 1). (ERA+)^2, therefore, is an estimate of individual win ratio.

Proponents of ERA+ have pointed out that ERA+ has a wider spread of observed values than aERA does. This is true, at least for above-average pitchers, but I fail to see why that's a positive if the numbers themselves don't mean what people often interpret them to mean. We could increase the spread even more by, say, cubing ERA+, but the results would have no mathematical or baseball meanings attached. ERA+ also depresses differences for pitchers with very bad ERAs, but of course this is brushed off because most discussion of ranking players focuses on the greats rather than the scrubs.

Runs allowed has a floor at zero; a pitcher can't possibly prevent more than 100% fewer runs than average.

Moving into the realm of pure opinion at this point:

Pros of ERA+ construction:

1. Inverted scale makes larger numbers better, which is aesthetically pleasing to some
2. Maintains ratio comparability
3. Corresponds to Pythagorean win ratio
4. People have become familiar with the scale

Cons of ERA+ construction:

1. Inverted scale wrecks havoc--makes earned runs the denominator quantity making averaging confusing, and makes differentials between ERA+s useless
2. Doesn't offer a coherent definition of "better" or "worse" to be put into English, and "less" or "more" cannot be used due to mathematical properties

I did ERA+ some favors there by splitting pros up as much as possible, while the inverted scale drawback could have been expanded into several associated complaints.

Properties of ERA/LgERA

Personally, I use RA/LgRA when I need a statistic in this family, and I just call it Adjusted RA. "Fancy Pants Handle" on BTF coined the name "ERA-" for ERA/LgERA, which is clever, but I'll just call it aERA for this post.

One of the most basic properties of aERA is that the lower it is, the better. The convention with adjusted statistics dating back to Palmer has been to make higher better, but I don't see why that needs to be the case. Lower is better for ERA and many other pitching statistics in their unadjusted form, and even the most casual of fans understands this.

An aERA of .75 means that the pitcher allowed runs at 75% of the league average, or that he allowed 25% less runs per inning than the league average. Unlike similar statements made with ERA+, these kinds of statements are mathematically accurate.

Ratio comparisons between pitchers remain workable. Going back to pitcher A with a 4.206 ERA and pitcher B with a 5.357 ERA in a 4.50 LgERA context, pitcher A's aERA is .9347 and pitcher B's is 1.1904. The ratio, .9347/1.1904 = .785, is the same as the ratio between their ERAs (4.206/5.357 = .785).

Where aERA distinguishes itself from ERA+ is the differential comparison, which is possible with the former but not the latter. The difference between the two pitcher's ERAs is 1.151 runs; the difference between their aERAs multiplied by the LgERA by definition is also 1.151 runs ((1.1904-.9347)*4.5 = 1.151).

aERA keeps innings pitched as the denominator, since aERA = ER*9/(IP*LgERA), so weighted averaging is achieved by using innings, which is far more natural than using earned runs.

The only mathematical property on which aERA arguably loses to ERA+ by my reckoning is conversion to estimated W%. ERA+ can be seen as a stand-in for run ratio, which when squared equals win ratio (using Pythagorean). aERA on the other hand serves as a stand-in for runs allowed ratio, and so W% = 1/(1 + aERA^2). Squaring aERA on its own produces not an approximate win ratio but the approximate win ratio of one's opponents, or the approximate loss ratio of a pitcher (in other words, an aERA of .5 means that a pitchers opponents are expected to have a win ratio of .25, or that he will have a .25:1 ratio of losses:opponents losses).

Properties of "ERA#"

A number of sabermetricians have taken issue with some or all of the questionable properties of ERA+ described above. In response to one of Tango Tiger's posts, Guy suggested an alternative to aERA that would retain the "higher is better" property of ERA+ but also retain some of the good properties of aERA. His formula was 2 - ERA/LgERA, which of course is equal to 2 - aERA.

This is the version that Forman briefly implemented on Baseball-Reference. Following the lead of BTF poster Greg Pope, some folks in the discussion there have taken to calling Guy's version ERA# to distinguish it from ERA+, and I'll use that naming convention here.

By subtracting aERA from two, ERA# caps itself on the high end at two. It also allows for negative numbers in cases where a pitcher's ERA is more than double the league average (ERA+ of .5 or less).

If you write it with a common denominator, ERA# = (2*LgERA - ERA)/LgERA. It could also be written as ERA# = 2 - ER*9/(IP*LgERA), which demonstrates that innings are the denominator as in aERA.

Like ERA+, ERA# is not able to sustain both ratio and differential comparisons between ERA like aERA. ERA+ sacrifices differential comparisons; ERA# sacrifices ratio comparisons. An ERA# of 1.15 means that the pitcher allowed 15% fewer runs per inning than average; and ERA# of -1 tells us that a pitcher allowed 200% more runs per inning than average. This property holds for all values of ERA#.

A natural reaction might be to wonder how that can be when ERA# is capped at two, but the cap at two is natural--one cannot possibly allow more than 100% fewer runs per inning than the league average. Negative runs may exist in our run estimators, but they do not exist in reality.

However, ERA# ratios are not comparable across pitchers. To illustrate, allow me to return to pitcher A with a 4.206 ERA and pitcher B with a 5.357 ERA, both in a 4.5 LgERA context. Pitcher A's ERA# is 1.0653, and Pitcher B's ERA# is .8096. The difference between their ERAs is 1.15 runs, which can be found by taking (1.0653-.8096)*4.5 = 1.15. But the ratio between their ERA#s is 1.0653/.8096 = 1.316, which does not correspond to the ratio between their ERAs.

Since ERA# does not retain ratio comparability, it cannot be plugged directly into Pythagorean to produce a W% estimate, which could be considered a drawback. However, as its proponents have pointed out, it does have a fairly solid relationship with 2*W% without making any adjustments at all.

Why does it work out this way? One way to demonstrate it is to use Bill Kross' W% estimator, which mathematically is equivalent to a linearization of Pythagorean with exponent 2 at win ratio = run ratio = 1 (in other words, a perfectly average team). I have written about his equation at (considerable) length before. It is:

W% = R/(2*RA) if R < RA, W% = 1 - RA/(2*R) if R > RA

The first equation, for R < RA, is the one that can be directly tied to ERA#. If you think about ERA as a stand-in for runs allowed and LgERA as a stand-in for runs scored, then 2*W% = ERA# using Kross' method (*). Of course, this is only true if R > RA, which for our purposes here is equivalent to ERA < LgERA. So for above-average pitchers, ERA# can be viewed as an estimate of 2*W%. For below-average pitchers, it's still pretty close--Kross' two equation solution ensures that W% estimates for a team and its opponents sum to one, but for normal situations, using one equation or the other does not cause massive distortions, so ERA# still approximates 2*W%. Of course, if someone wanted their adjusted ERA to track 2*W% for some reason, they could use two equations just like Kross does. For ERA < LgERA, you would use ERA# as discussed. For ERA > LgERA, one would mimic 2*(R/(2*RA)), which would simplify to R/RA, which using our stand-ins would simplify to...ERA+.

100% Opinion (as opposed to ~ 75% opinion in the rest of the post)

The major objections to the ERA+ change, as best as I can summarize them, are:

1. Familiarity with the existing ERA+

People have grown familiar with ERA+ as it is. They have a feel for how the all-time leaders stack up, perhaps they have explained it to their friends, etc. As a hard-core sabermetric sort, I can't really relate to this.

2. ERA+ distribution is comparable to OPS+ distribution

I'm not going to take the time to get into this right now, as this post is long enough as is, but these two statistics are measuring different things as is. I'm not sure why their scales should be expected to match or what it means if they do, without more context, and when you start adding that context you really shouldn't be using OPS+ anyway.

3. The new formula is less intuitive

I will admit that there is some validity in this objection, as it is marginally easier to figure LgERA/ERA than it is to figure 2 - ERA/LgERA, and the former is more intuitive. The problem is that once many people start trying to explain what ERA+ means, they demonstrate that they either don't really understand what it means or that they don't have a solid mathematical foundation for the claims they make. People are in the habit of saying that a 200 ERA+ means "twice as good as average", which is logically ambiguous and even worse mathematically. Very few people say that what it means is that the League ERA was twice the pitcher's ERA (or that the pitcher's ERA was half the league's ERA). What good is ease of computation if it comes at the price of easy explanation?

My own personal preference is for aERA, since I'm not bothered by lower being better and because, when possible, I prefer statistics that offer both ratio and differential comparability. However, if the adjusted ERA standard must display above-average as >100, then I prefer ERA#.

What bothers me most about the debate I've seen is not that people can't express what ERA+ really means or that fuzzy terminology is used, but the impression I get that some people have placed certain statistics on a pedestal and never want to see them change, even if the changes are sound. When sabermetric formulas are treated as inviolable constants, the entire purpose of sabermetrics goes out the window.

I don't mean to suggest that any alteration to a sabermetric statistic, no matter how marginal the benefit, should be implemented immediately and without discussion. That too would go against the spirit of the field. But it's important to remember that sabermetrics is always moving forward (or at least striving to move forward), and from time-to-time new and better methods will be developed. If and when there is a backlash to those newer methods simply because they are new, then ERA+ and OPS+ and the other statistics have simply become a new-age version of BA, HR, RBI, Wins--numbers celebrated for their own sake, rather than for what they tell us about the game.

Mathematical Miscellany

ERA+ = 1/aERA and aERA = 1/ERA+
ERA# = 2 - aERA and aERA = 2 - ERA#
ERA+ = 1/(2 - ERA#) and ERA# = 2 - 1/ERA+

(*) 2*W% = 2*(1 - RA/(2*R)) = 2 - 2*RA/(2*R) = 2 - RA/R ~= ERA#

5 comments:

  1. Good summary of ERA+ vs ERA# (and aERA). I like the way you presented the mathematical implications of each.

    For this part:

    "W% = R/(2*RA) if R > RA, W% = 1 - RA/(2*R) if R < RA The second equation, for RA < R, is the one that can be directly tied to ERA#."

    I think there is a typo where the > and < signs are reversed. R/(2*RA) is for when a team is outscored (R < RA), and 1 - RA/(2*R) is for when a team outscores its opponents (R > RA). Then, in the next sentence, you have it right: the second equation (1 - RA/(2*R)) is ERA#/2, and it is the equations for RA < R, which is the opposite of what the first sentence says. But the next paragraph says ERA# ~ 2*W% for below average pitchers: that should be for above average pitchers, when RA < R.

    That only really matters once you start to get close to the negative ERA# range, though, like you noted with it working pretty well in normal situations either way. The errors for the estimation vs. pythagorean record (with an exponent of 2) are pretty symmetrical around average until you get to the extreme ranges (i.e. the error in the estimation for a 90 ERA# is about the same as for a 110 ERA#, and the error for a 70 ERA# is about the same as for a 130 ERA#).

    There is another quirk of the ERA#/2 trick that causes it to skew toward working better for above-average pitchers, though. When you use PythagenPat to estimate W%, then the exponent will generally be smaller than 2, and that will change the linear estimate. Using the estimate for z=2 eliminates the symmetry of errors (against PythagenPat) around average, and the estimation becomes better for above average pitchers and worse for below average pitchers. So it is probably worth noting that in practice, the ERA#/2 trick actually works better for above average pitchers, although it's because using the linear estimate for z=2 skews the errors, not so much because the run ratio is greater than or less than 1.

    ReplyDelete
  2. Kincaid, you're absolutely right about me reversing signs on the equation. Thanks a bunch for pointing that out--I've now corrected both the equations and the accompanying text.

    I didn't even consider the ramifications when using a floating exponent, but you're right, that does skew things even more in favor of above-average pitchers since standard Pyth on which the approximation is based over-estimates their W%s.

    ReplyDelete
  3. 1. There's no e in Forman, in this case at least

    2. The problem with changing the definition of statistics is that you end up causing arguments that are the result of people using the two different definitions. I remember having a long argument with Michael Wolverton on Usenet in the mid-90s that happened only because we were using ERA+ figures from different editions of Total Baseball. (That argument directly resulted in the invention of SNWL, so it turned out to be a good thing). Regardless of the merits of the versions of the stats, I'd like to avoid confusion. Maybe the new version could be called ERA* instead of ERA+

    (P.S. - You are correct that AERA was used in the ESPN Baseball Encyclopedia for legal reasons only. Of course, OPS+ and ERA+ were listed under a variety of names in the various editions of Total Baseball; it wasn't until I came in for the 7th edition that we settled on ERA+ and OPS+ there.)

    ReplyDelete
  4. Kincaid and Greg have demonstrated that this post needed some more proof-reading. At least I got Sean's name right the first time.

    I didn't touch on the name confusion issue, which is certainly a valid point. It's probably true that the ship has sailed on the use of ERA+ as a name for anything other than LgERA/ERA.

    ReplyDelete
  5. Actually, if individual ERA is

    ER*9/IP

    And league ERA is

    LgER*9/LgIP,

    you get:

    ERA+ = (LgER/ER)*(IP/LgIP)

    That is, one way to treat the "weighting" is as an individual's share of IP. The other way to treat the weighting is as the inverse of the individual's share of ER. (As I think of it, we should actually compute the league ERA as

    LgERA = (LgER*9 -ER*9)/(LgIP - IP)

    I'm not sure I want to try to decompose that into league and individual effects...

    ReplyDelete

Comments are moderated, so there will be a lag between your post and it actually appearing. I reserve the right to reject any comment for any reason.