Sunday, March 28, 2010

Esoteric Ramblings About +

Recently, Sean Forman changed the formula for ERA+ used at Baseball-Reference. It turned out to be a fleeting alteration, as for now the old ERA+ is back. There's been a lot written about the properties of the two ERA+ formulas, and I have nothing new to add, but that's never stopped me before.

Everything that follows takes ERA and ERA+ for what they are. I'm not going to get into why I prefer total runs to earned runs, or the virtues of DIPS-based stats, or any other topic. Just how best to compare a pitcher's ERA to the league average, implicitly assuming that is something one is interested in doing.

A Brief, Trivial History of ERA+

As far as I can find, the earliest reference to an adjusted ERA in the literature occurs in Pete Palmer and John Thorn's seminal Hidden Game of Baseball, circa 1984. In The Hidden Game, "Normalized ERA" is calculated as LgERA/ERA*100 (there is a park-adjusted version as well, but I'll sidestep the issue of park effects for the remainder of the post as they just serve to muddy the water even more, and the concept of comparing ERA to LgERA stands whether park is accounted for or not. I'll also ignore the scalar of 100 for the rest of the post because it is cosmetic and does nothing to alter the properties of the method, provided that it is corrected for when appropriate).

Interestingly, at no point in the book (at least that I could find through skimming) does Palmer explicitly explain why he chose to calculate LgERA/ERA rather than ERA/LgERA. There is a general explanation that normalized statistics set the average at 100 and above-average performances receive marks greater than 100, but the case of ERA is not given special attention.

In their later work, Total Baseball, the statistic was called Adjusted ERA but the ERA+ abbreviation was used (at least as of Total Baseball III, published in 1994). In the ESPN Baseball Encyclopedia, a collaboration between Palmer and Gary Gillette, it remains Adjusted ERA but with a new abbreviation (AERA); the names of most of the derived stats changed, presumably to avoid any sort of issue between the two works.

In his Historical Baseball Abstract (1985), Bill James uses what he calls "Percentage of League" as his main rate stat for pitchers. Percentage of League is RA/LgRA. Obviously, it differs from ERA+ by using total runs allowed, but more relevant for the purpose of this discussion is that it puts the individual's average in the numerator. James does not discuss the properties of one form against the other.

Properties of ERA+

Treating LgERA as a constant (and it is from the backwards-looking perspective of the historical record), the formula for ERA+ is:

LgERA/ERA = LgERA/(ER*9/IP) = (LgERA*IP)/(ER*9)

The denominator of ERA is innings pitched; the denominator of ERA+ is earned runs. This is a point that many people have overlooked (embarrassingly, I made this mistake myself in 2007, although I had come to my senses by early 2008). The practical consequence is that if you are averaging ERA+ across samples, you cannot weight it by innings pitched--you must weight it by earned runs.

An example is hardly necessary given that this point has been repeated ad nauseam, but consider a pitcher with a ERA of 3.25 in one season and 4.00 in a second season, both achieved in leagues with LgERA = 4.50 and both in 200 innings. His ERA+ is 1.385 in year one and 1.125 in year two. The natural inclination is to simply average the two and produce an ERA+ of 1.255.

However, his ERA for the two seasons combined is 3.625 against a league average of 4.50, producing an ERA+ of 1.241. The discrepancy is because the denominator of ERA+ is not innings.

In order to accurately figure ERA+ for the two seasons combined from the single-season ERA+s, one has to weight them by earned runs. In that case (we can use ERA as a stand-in for earned runs since we've specified that innings = 200 in both cases):

(3.25*1.385 + 4.00*1.125)/(3.25 + 4.00) = 1.241

Admittedly, the discrepancy in a case like this is quite small. Others might argue that weighting by earned runs is not that big of a hassle. However, once park factors are introduced into the mix, the task becomes trickier, as it's necessary to park-adjust the earned run totals being used to weight ERA+.

A common way to explain ERA+ figures in English is to say that an ERA+ of 1.07 means a pitcher is "7% better than average", or an ERA+ of .84 means that he is "16% worse than average". This usage doesn't really hold up mathematically, though. In a 4.5 LgERA context, an ERA+ of 1.07 corresponds to a 4.206 ERA. But 4.206 is 94.37% of 4.5, or 6.53% less. What ERA+ actually says, in English, is that the League ERA was 7% higher than the pitcher's ERA.

Using "better" and "worse" rather than the mathematically definable "less" and "more" is one of the causes of all the confusion. What does it mean to be 50% better with a 1.50 ERA+? 50% fewer runs (no, that's not it--that would be an ERA+ of 2.00)? 50% more wins than average? And if one pitcher is 50% better (1.50 ERA+) and another is 50% worse (.50 ERA+), then they should add up to be average, right? As was already demonstrated, they don't.

I'm repeating the takeaway from the second-to-last paragraph now, but I think it's worthwhile. ERA+ does not tell you that a pitcher's ERA was X% less or more than the league's ERA. It tells you that the league's ERA was X% less or more than the pitcher's ERA. If you want to claim that the league's ERA being X% more than an individual's ERA means that the pitcher was X% better than average, you can try, but it will require a very odd and very convoluted definition of "better" to hold up as logically consistent.

The end result is that while ERA+ ratios are meaningful, ERA+ differentials are not. Let's return to the two pitchers above, one with a 4.206 ERA and 1.07 ERA+ and the other with a 5.357 ERA and .84 ERA+ in a 4.50 LgERA context. The ratio between their ERAs is 5.357/4.206 = 1.274. The ratio between their ERA+ is also 1.274 (although we have to take Pitcher A/Pitcher B in this case because of the inversion of ERA+)--1.07/.84 = 1.274.

However, there's nothing you can do with (1.07-.84). Simple mathematical operations to compare the differences between ERA+s are impossible because there are no common denominators. (One might object at this point and say that because each batter has a different number of PA, OBA doesn't have a common denominator. That's not true, though, since anything per PA is a rate of batter performance per unit of playing time. While the PA for any two batters may not be equal, OBA remains the proportion of PA in which the batter gets on base. Using ER as a denominator, on the other hand, leaves one with a ratio rather than a rate--a ratio that, as Colin Wyers points out, is outs to runs rather than the much more useful runs per out that ERA itself returns).

There is one positive quality of ERA+ worth noting. If one wishes to convert ERA to an estimated W% for an individual pitcher (while ignoring factors like IP/G that will make team W% when the pitcher works in a game different), ERA+ provides a run ratio that can be plugged into a Pythagorean estimator. An individual W% can be easily figures as (ERA+)^2/((ERA+)^2 + 1). (ERA+)^2, therefore, is an estimate of individual win ratio.

Proponents of ERA+ have pointed out that ERA+ has a wider spread of observed values than aERA does. This is true, at least for above-average pitchers, but I fail to see why that's a positive if the numbers themselves don't mean what people often interpret them to mean. We could increase the spread even more by, say, cubing ERA+, but the results would have no mathematical or baseball meanings attached. ERA+ also depresses differences for pitchers with very bad ERAs, but of course this is brushed off because most discussion of ranking players focuses on the greats rather than the scrubs.

Runs allowed has a floor at zero; a pitcher can't possibly prevent more than 100% fewer runs than average.

Moving into the realm of pure opinion at this point:

Pros of ERA+ construction:

1. Inverted scale makes larger numbers better, which is aesthetically pleasing to some
2. Maintains ratio comparability
3. Corresponds to Pythagorean win ratio
4. People have become familiar with the scale

Cons of ERA+ construction:

1. Inverted scale wrecks havoc--makes earned runs the denominator quantity making averaging confusing, and makes differentials between ERA+s useless
2. Doesn't offer a coherent definition of "better" or "worse" to be put into English, and "less" or "more" cannot be used due to mathematical properties

I did ERA+ some favors there by splitting pros up as much as possible, while the inverted scale drawback could have been expanded into several associated complaints.

Properties of ERA/LgERA

Personally, I use RA/LgRA when I need a statistic in this family, and I just call it Adjusted RA. "Fancy Pants Handle" on BTF coined the name "ERA-" for ERA/LgERA, which is clever, but I'll just call it aERA for this post.

One of the most basic properties of aERA is that the lower it is, the better. The convention with adjusted statistics dating back to Palmer has been to make higher better, but I don't see why that needs to be the case. Lower is better for ERA and many other pitching statistics in their unadjusted form, and even the most casual of fans understands this.

An aERA of .75 means that the pitcher allowed runs at 75% of the league average, or that he allowed 25% less runs per inning than the league average. Unlike similar statements made with ERA+, these kinds of statements are mathematically accurate.

Ratio comparisons between pitchers remain workable. Going back to pitcher A with a 4.206 ERA and pitcher B with a 5.357 ERA in a 4.50 LgERA context, pitcher A's aERA is .9347 and pitcher B's is 1.1904. The ratio, .9347/1.1904 = .785, is the same as the ratio between their ERAs (4.206/5.357 = .785).

Where aERA distinguishes itself from ERA+ is the differential comparison, which is possible with the former but not the latter. The difference between the two pitcher's ERAs is 1.151 runs; the difference between their aERAs multiplied by the LgERA by definition is also 1.151 runs ((1.1904-.9347)*4.5 = 1.151).

aERA keeps innings pitched as the denominator, since aERA = ER*9/(IP*LgERA), so weighted averaging is achieved by using innings, which is far more natural than using earned runs.

The only mathematical property on which aERA arguably loses to ERA+ by my reckoning is conversion to estimated W%. ERA+ can be seen as a stand-in for run ratio, which when squared equals win ratio (using Pythagorean). aERA on the other hand serves as a stand-in for runs allowed ratio, and so W% = 1/(1 + aERA^2). Squaring aERA on its own produces not an approximate win ratio but the approximate win ratio of one's opponents, or the approximate loss ratio of a pitcher (in other words, an aERA of .5 means that a pitchers opponents are expected to have a win ratio of .25, or that he will have a .25:1 ratio of losses:opponents losses).

Properties of "ERA#"

A number of sabermetricians have taken issue with some or all of the questionable properties of ERA+ described above. In response to one of Tango Tiger's posts, Guy suggested an alternative to aERA that would retain the "higher is better" property of ERA+ but also retain some of the good properties of aERA. His formula was 2 - ERA/LgERA, which of course is equal to 2 - aERA.

This is the version that Forman briefly implemented on Baseball-Reference. Following the lead of BTF poster Greg Pope, some folks in the discussion there have taken to calling Guy's version ERA# to distinguish it from ERA+, and I'll use that naming convention here.

By subtracting aERA from two, ERA# caps itself on the high end at two. It also allows for negative numbers in cases where a pitcher's ERA is more than double the league average (ERA+ of .5 or less).

If you write it with a common denominator, ERA# = (2*LgERA - ERA)/LgERA. It could also be written as ERA# = 2 - ER*9/(IP*LgERA), which demonstrates that innings are the denominator as in aERA.

Like ERA+, ERA# is not able to sustain both ratio and differential comparisons between ERA like aERA. ERA+ sacrifices differential comparisons; ERA# sacrifices ratio comparisons. An ERA# of 1.15 means that the pitcher allowed 15% fewer runs per inning than average; and ERA# of -1 tells us that a pitcher allowed 200% more runs per inning than average. This property holds for all values of ERA#.

A natural reaction might be to wonder how that can be when ERA# is capped at two, but the cap at two is natural--one cannot possibly allow more than 100% fewer runs per inning than the league average. Negative runs may exist in our run estimators, but they do not exist in reality.

However, ERA# ratios are not comparable across pitchers. To illustrate, allow me to return to pitcher A with a 4.206 ERA and pitcher B with a 5.357 ERA, both in a 4.5 LgERA context. Pitcher A's ERA# is 1.0653, and Pitcher B's ERA# is .8096. The difference between their ERAs is 1.15 runs, which can be found by taking (1.0653-.8096)*4.5 = 1.15. But the ratio between their ERA#s is 1.0653/.8096 = 1.316, which does not correspond to the ratio between their ERAs.

Since ERA# does not retain ratio comparability, it cannot be plugged directly into Pythagorean to produce a W% estimate, which could be considered a drawback. However, as its proponents have pointed out, it does have a fairly solid relationship with 2*W% without making any adjustments at all.

Why does it work out this way? One way to demonstrate it is to use Bill Kross' W% estimator, which mathematically is equivalent to a linearization of Pythagorean with exponent 2 at win ratio = run ratio = 1 (in other words, a perfectly average team). I have written about his equation at (considerable) length before. It is:

W% = R/(2*RA) if R < RA, W% = 1 - RA/(2*R) if R > RA

The first equation, for R < RA, is the one that can be directly tied to ERA#. If you think about ERA as a stand-in for runs allowed and LgERA as a stand-in for runs scored, then 2*W% = ERA# using Kross' method (*). Of course, this is only true if R > RA, which for our purposes here is equivalent to ERA < LgERA. So for above-average pitchers, ERA# can be viewed as an estimate of 2*W%. For below-average pitchers, it's still pretty close--Kross' two equation solution ensures that W% estimates for a team and its opponents sum to one, but for normal situations, using one equation or the other does not cause massive distortions, so ERA# still approximates 2*W%. Of course, if someone wanted their adjusted ERA to track 2*W% for some reason, they could use two equations just like Kross does. For ERA < LgERA, you would use ERA# as discussed. For ERA > LgERA, one would mimic 2*(R/(2*RA)), which would simplify to R/RA, which using our stand-ins would simplify to...ERA+.

100% Opinion (as opposed to ~ 75% opinion in the rest of the post)

The major objections to the ERA+ change, as best as I can summarize them, are:

1. Familiarity with the existing ERA+

People have grown familiar with ERA+ as it is. They have a feel for how the all-time leaders stack up, perhaps they have explained it to their friends, etc. As a hard-core sabermetric sort, I can't really relate to this.

2. ERA+ distribution is comparable to OPS+ distribution

I'm not going to take the time to get into this right now, as this post is long enough as is, but these two statistics are measuring different things as is. I'm not sure why their scales should be expected to match or what it means if they do, without more context, and when you start adding that context you really shouldn't be using OPS+ anyway.

3. The new formula is less intuitive

I will admit that there is some validity in this objection, as it is marginally easier to figure LgERA/ERA than it is to figure 2 - ERA/LgERA, and the former is more intuitive. The problem is that once many people start trying to explain what ERA+ means, they demonstrate that they either don't really understand what it means or that they don't have a solid mathematical foundation for the claims they make. People are in the habit of saying that a 200 ERA+ means "twice as good as average", which is logically ambiguous and even worse mathematically. Very few people say that what it means is that the League ERA was twice the pitcher's ERA (or that the pitcher's ERA was half the league's ERA). What good is ease of computation if it comes at the price of easy explanation?

My own personal preference is for aERA, since I'm not bothered by lower being better and because, when possible, I prefer statistics that offer both ratio and differential comparability. However, if the adjusted ERA standard must display above-average as >100, then I prefer ERA#.

What bothers me most about the debate I've seen is not that people can't express what ERA+ really means or that fuzzy terminology is used, but the impression I get that some people have placed certain statistics on a pedestal and never want to see them change, even if the changes are sound. When sabermetric formulas are treated as inviolable constants, the entire purpose of sabermetrics goes out the window.

I don't mean to suggest that any alteration to a sabermetric statistic, no matter how marginal the benefit, should be implemented immediately and without discussion. That too would go against the spirit of the field. But it's important to remember that sabermetrics is always moving forward (or at least striving to move forward), and from time-to-time new and better methods will be developed. If and when there is a backlash to those newer methods simply because they are new, then ERA+ and OPS+ and the other statistics have simply become a new-age version of BA, HR, RBI, Wins--numbers celebrated for their own sake, rather than for what they tell us about the game.

Mathematical Miscellany

ERA+ = 1/aERA and aERA = 1/ERA+
ERA# = 2 - aERA and aERA = 2 - ERA#
ERA+ = 1/(2 - ERA#) and ERA# = 2 - 1/ERA+

(*) 2*W% = 2*(1 - RA/(2*R)) = 2 - 2*RA/(2*R) = 2 - RA/R ~= ERA#

Monday, March 22, 2010

2010 Indians: Bad (but not as bad as some think)

Even by the meager standards of my own writing, I am not good at writing micro- or trend analysis of players and teams (and even by the meager standards of my own post titles, this is a bad title). There will be dozens of better previews of the Indians' 2010 season that you can read elsewhere on the internet. What I hope to do here is express some broad thoughts on the team's outlook from the perspective of a fan without a tremendous amount of emotional investment.

Cleveland fans suffered through a second consecutive season of underperforming expectations in 2009, as a team that was expected to contend got off to a slow start, stumbled around for a while, and then fell all the way to sharing the cellar with Kansas City. This (along with the salary dump/rebuilding trades of Cliff Lee and Victor Martinez) has caused fans to take a pessimistic outlook towards the 2010 season and the organization in general.

Some of the frustration is certainly warranted--it is frustrating when a team trades away two homegrown Cy Young winners in as many years, and it's even worse when the team came within a game of the pennant just two and a half years ago. However, much of the pessimism is over the top.

To listen to the fans who call talk shows, the team will be lucky to match last season's win total of 65. Of course, such a finish is well within the confidence interval I'd set for the team, but I don't think it's a particularly good median projection.

Looking at the personnel, it's best to start with the team's biggest weakness: starting pitching. The good news about Cleveland's starting pitching is that there should be enough warm bodies to run out to the mound. The bad news is that the back-of-the-rotation options are nearly indistinguishable from those that would replace them.

The #1 starter will be Jake Westbrook, making his return after one and a half years on the mend from Tommy John. Behind him comes Fausto Carmona, still seeking to recapture the low-strikeout hard sinker magic of 2007. He's been pounded in every other major league stint of his career and in the last two has lost all semblance of command. The third starter is Justin Masterson, yet another sinkerballer, and one who has seen most of his major league appearances out of the Boston bullpen.

Westbrook, Carmona, and Masterson all have the potential to be valuable members of a rotation. But when they are the front three, there's cause for serious concern.

The fourth and fifth spots will likely go to Aaron Laffey and Mitch Talbot, besting David Huff. Jeremy Sowers' March injury left him out of the running and will likely provide an avenue to forestall a decision on his future (he's out of options), while OSU product Scott Lewis remains hampered by shoulder trouble. Carlos Carrasco should be second in line to get the call from Columbus, while the team will seek to get top prospect Hector Rondon another season on the farm prior to a possible September callup.

Laffey, Lewis, Sowers, and Huff (along with prospect TJ House) are all emblematic of what's become an organizational hallmark--the finesse left-handed starter. I certainly don't want to overstate their similarities lest the PitchF/x-ers correctly rebuff me with detailed analysis of the differing repertories of the group, but it's quite a departure from the teams of the 90s which had next to no left-handed starters (Brian Anderson led 1994-99 Indian southpaws with 17 starts).

The bullpen will be anchored by Kerry Wood, at least until he gets injured or is traded. (Editor's note: I wrote this about a week before Wood was officially pronounced out for eight weeks, but he was already behind schedule. Plus, seriously, it's Kerry Wood.) Chris Perez is penciled in as the top righty setting him up; his stuff continues to outrank his performance. If Rafael Perez can bounce back, he should combine with Tony Sipp to give the team a pair of lefties that can hold their own against right-handed batters as well.

Joe Smith and Jensen Lewis would appear to have the upper hand to gain middle relief jobs, but both had disappointing '09s and have not impressed to this point in the spring. Rule V pick Hector Ambriz is another option, and if Mitch Talbot fails to win a starting role he will surely be in the pen. Jess Todd will likely be sent to AAA, while Saul Rivera is a longshot despite his history with Acta in Washington. Jamey Wright may well win a spot as a veteran presence, even if a mediocre one; Jason Grilli was a prime contender for the spot but was injured and is out for the season. Josh Judy may be the nearest to ready among a group of young relief possibilities, but he's not pitched above AA and would it would be a surprise if he made the team.

My best guess (and it's just that) on the bullpen makeup is: Chris Perez, Rafael Perez, Tony Sipp, Jamey Wright, Joe Smith, Jensen Lewis, and Hector Ambriz.

The offense was essentially average last year, scoring 4.9 runs/game versus the 4.8 league average. I expect that the 2010 offense will perform similarly. Lou Marson should best Wyatt Toregas for the starting catching job, holding it down for a few months while Carlos Santana gets some AAA seasoning. Marson projects as an average-ish offensive catcher with the ability to draw some walks. Mike Redmond will serve as the veteran backup, and hopefully will not weasel his way into too much playing time at Marson or Santana's expense.

Russell Branyan, a late signing, will play first when healthy and drive fans crazy with his strikeouts. He also could spell Travis Hafner at DH if his shoulder does not allow him to resume everyday status. However, as of this writing Branyan has yet to make his spring debut and could open the season on the DL.

Second baseman Luis Valbuena showed surprising power spurts last year, but his fielding metrics and OBA leave lots of room for growth if he is justify his status as the centerpiece of the Franklin Gutierrez trade. At third, Jhonny Peralta should rebound from a dreadful offensive season to get back into the hunt for "most average player in the league" status. If he has any success, look for the Indians to move him at the deadline as they are unlikely to exercise his $7 million option for 2011. The team appears to be high on Lonnie Chisenhall and is positioning him as the third baseman of the future.

Asdrubal Cabrera will be under the microscope this year as he assumes leadoff duties. He has emerged as one of the better young shortstops in the game thanks to a good fielding and solid offensive contributions. His walk rate is not great, so if he has a BA dropoff he might become shaky as a leadoff option.

Thanks to the presence of Valbuena and Peralta (each of whom played shortstop last season) on the roster, it is not imperative that Cleveland carries a utility infielder that can play short, which could open the door for Mark Grudizelanek as utility-man and Valbuena insurance. Jason Donald, acquired in the Cliff Lee trade, will likely play everyday in Columbus, leaving former Met Anderson Hernandez as the top utility candidate that can play short. Brian Bixler is also under consideration but his inclusion would be a surprise.

Many fans want to see Michael Brantley in left field and leading off everyday, but Brantley projects to an ISO in the neighborhood of .100 and would probably be better served with another season at AAA. Depending on Branyan's status, he could open the season in Cleveland, but it's more likely that the team would go with a veteran in that case. Matt LaPorta originally appeared to be ticketed for first, but with the Branyan signing he figures to remain in left for another season.

Grady Sizemore looks to bounce back from an injury-plagued campaign, and this observer is still waiting for him to luck his way into a high-BA season that will make the average fan take note of what a great player he is when healthy. Sizemore will bat second this year, somewhat placating fans who still think he should be batting third. Shin-Soo Choo combines with Sizemore to give the Indians two of the better outfielders in the AL.

When Travis Hafner played last year, he rebounded to average production for a 1B/DH. Of course, it is not average production that the team is paying for, and 379 PA is far from full-time. He may be an albatross to the bottom line, but he wasn't an albatross to the lineup, at least in 2010. His largest cost to the team on the field was leaving them a man short every third day and a fielder short every day (the latter being expected for a DH-exclusive type of player like Pronk).

Trevor Crowe has a good shot at a reserve outfield spot as he can play center and could serve as a pinch-runner. Austin Kearns also has a decent shot to make it as a 1B/LF/DH and a right-handed bat in a lineup whose power threats are predominantly left-handed (Choo, Hafner, Sizemore). If I had to guess, I'd say that the bench is Mike Redmond, Anderson Hernandez, Trevor Crowe, and Austin Kearns. If Branyan is on the DL to start the season, add Andy Marte to that prognostication as I assume Brantley is headed for Columbus regardless. Marte had a superficially impressive season as a Clipper in '09 (.327/.369/.593), but didn't hit in the majors (85 OPS+ in 175 PA) and is now 26.

I'm not big on "best-case" scenario forecasts; they'd probably be better termed "90th percentile" or something along those lines. In any case, the best-case scenario I see for this team as the term is used by these types of previews is: Carmona, Westbrook, and Masterson are effective enough to drag the rotation close to average, while the bullpen is above-average and the offense pretty sold--enough to earn 86 wins or so with some magic Pythagorean dust the Indians have lacked lately and thus a team right in the thick of the AL Central race.

Worst-case? Disastrous pitching, continuing injuries for Sizemore, Hafner, and Branyan, and an epic anti-pennant race with the Royals.

Median? Solid offense, below average bullpen, and bad starting pitching leave the Indians at 74-88 or something in that neighborhood and a fourth-place finish several games ahead of Kansas City and a few games behind Detroit.

I actually happen to be fairly optimistic about the organization's long-term prospects--I generally think that Mark Shapiro's regime makes sound decisions (which is not to say that they are perfect, or to excuse the poor drafts the team has had). Chris Antonetti will become GM later this year as Shapiro is promoted to President (of the team, not the USA, although I doubt he'd be significantly worse in that capacity than the current occupant of the office), ensuring organizational stability for the next few years. Manny Acta got my tacit endorsement as a managerial hire. There remain legitimate questions about ownership's commitment to payroll, but I don't disagree with the decisions not to pay market value for Sabathia, Lee, and Martinez, so it's hard for me to criticize too much on that front, at least at this time.

Thursday, March 11, 2010

Runs Anything and the Hall of Fame

I don't actually want to discuss specific Hall of Fame candidacies, but I will present career Runs Anything figures for predominantly post-1900 Hall of Famers (not including those selected in 2010, as I wrote this a week before the results were announced) and a number of players who often come up in Hall of Fame debates. While I don't care for many of the Hall's selections, using membership as a standard is a convenient way to keep the player pool to a reasonable size.

First, here is a list of the top ten Hall of Famers in R+. They are mostly the usual suspects, but I think there is one name that will surprise you--it certainly surprised me:

Frank Chance scored 60% more runs per out than the average player in his day. He did have a short career and he was on a historically great team, but his is still a surprising name to have pop up on a list of this type.

Here is a similar list for RBI+:

Willie Stargell is probably the most surprising name on the list (and perhaps Cobb if one fails to account for context in setting their expectations), but for the most part these are exactly the players you would expect to see: middle-of-the-order studs.

There are five players who crack the top ten in both R+ and RBI+ (Cobb, Gehrig, Hornsby, Ruth, and Williams)--it is thus no surprise that they make up the top five in Runs Anything:

A natural question to ask is "what is the degree of agreement between Runs Anything and a sabermetrically-approved context-neutral measure of the same thing, on the career level?" With the release of wRC+ on Fangraphs, it is very easy to compare ANY to such a metric. I did so for the Hall of Famers.

Of course, this is a very biased sample--it is only those players selected as the greatest of all-time, and there may well be a tendency for players who excel in R and RBI to a greater extent than context-neutral statistics to be voted into the Hall. The players in the sample all have long careers, there is a higher representation of players that excel in both runs and RBI than one would observe among the general player population, and there also bound to be more middle-of-the-order hitters, who are placed in a much better position to excel in both categories than leadoff hitters, for instance. wRC+ includes a park adjustment; ANY does not.

With all of the obvious problems with restricting the comparison to this group, why am I bothering? Because players of this group are the only ones for whom mainstream fans really examine career statistics closely. Ordinary players are hardly ever placed in historical perspective; the Hall of Fame debates are really the only time careers are given any sort of through examination. Ordinary players are talked about each season, but when their useful playing careers end, they are forgotten by and large. Few people are particularly concerned about Jeff King's career batting performance and how it compares to that of Al Martin. Those that are probably aren't using R and RBI anyway.

In fact, as a brief digression, the fact that HOF debates are the only time that players are really placed into historical context is what might make it seem as if I care about them. If you read this blog, you'll note a few posts tagged "Hall of Fame"--despite the fact that I do not think the current Hall of Fame structure is capable of being salvaged and don't care which players are chosen. But if someone like me (interested in both statistics and the history of the game) chooses to ignore the discussion surrounding the Hall altogether, they are basically shutting himself out from comparing players' careers. It's really the only time that the topic gets any sort of mainstream play (one could talk about top 100 lists, and comparisons between Aaron and Bonds when the home run record fell, or what have you, but those are in a certain sense subsets of the Hall debate--examining the records of no-brainer Hall of Famers).

If you want to skip the last two paragraphs, the takeaway is that I am only looking at the relationship between ANY and wRC+ for great players because that is the only situation in which ANY would be used in practice (in fact, it won't be at all, but that's besides the point).

So, for the Hall of Famers, the correlation between ANY and wRC+ is +.94. The average absolute value of the difference between the two is 7, and the average difference (ANY - wRC+) is -1 (in other words, the average player has a very slightly lower ANY). The players that ANY favors the most:

It's pretty interesting how close both rates are for the three catchers (Bench, Berra, Campanella); they all have ANY of 141-143 and wRC+ of 130. Traynor, Medwick, Klein, and Wilson can be tied into their own group as thirties hitters (although the others, as corner outfielders, form a tighter group than is made if Traynor is included).

The players for that ANY doesn't like:

A lot of high-OBA, top of the order types in this group, highlighted by Henderson and Morgan. It's interesting that two Hall of Famers who would probably be considered in the bottom half of the Hall show up here (Carey and Combs), and that a player often put forth as one of the very worst (Ferrell) is here as well.

The correlation between R+ and wRC+ is +.87; between RBI+ and wRC+ it is +.78. At least with this very biased sample, runs are more closely related to estimated context-neutral production than RBI. The next logical step is to run a regression to estimate wRC+ from R+ and RBI+ (then to never actually use it for a myriad of reasons, as this the type of mathematical gyration that would drive me crazy if it was being used for serious purposes):

wRC+ ~ = .616(R+) + .287(RBI+) + 14.7

This equation estimates that a player with average R+ and RBI+ would actually have a 105 wRC+, which is not quite satisfying; it also puts a double-weight on runs scored. We could try a regression without an intercept:

wRC+ ~ = .710(R+) + .302(RBI+)

This equation puts a R+ = RBI+ = 100 player at 101, which is more in line with what we'd expect, but produces less accurate estimates for this group.

To get back to the point of these posts, one can use runs scored and RBI to derive a decent estimate of a player's productivity, particularly over a career. I'd never use it and even when dressed up it retains the drawbacks present in looking at R and RBI in any event, but I think that it could have some limited persuasive value with mainstream fans if you could get them to accept the key premise (productivity must be expressed in relation to outs). Of course, if you can get one to accept that premise, they are probably open to further sabermetric indoctrination, and that would obviously be preferable.

This spreadsheet includes career totals in these categories for all of the predominantly post-1900 Hall of Famers as well as a few other selected players (the others are not in any way intended to be a comprehensive list of the most productive non-HOF players).

Wednesday, March 03, 2010

Runs Anything

In The Politics of Glory (later Whatever Happened to the Hall of Fame, although I still think the original title was much better), Bill James examines the history of the Hall of Fame and discusses some of the general arguments advanced on behalf of candidates.

After quoting a letter to the editor questioning Reggie Jackson's Hall of Fame worthiness due to his low batting average, James responds: "If you have to focus on one number, don't focus on batting average. Focus on runs--runs scored, runs batted in, runs created, runs produced, runs anything. Focus on what wins the ballgame."

For some reason the last phrase in his list--"runs anything"--has stuck with me since I first read the book. It seems like a great name to slap on a metric based on some tally of runs, but one the person writing about the metric doesn't actually believe in--and for that very reason I've used it here.

Let me accept the premise which is often either explicitly or implicitly put forth by non-sabermetricians, namely that estimates of runs (such as runs created) are unreliable and not based in reality. Ergo, if I want a metric expressed in runs for an individual batter, I will be limiting myself to those based on runs scored and RBI.

Let me also add an arbitrary constraint that I can only use R and RBI; I can't consider any other piece of statistical information (like home runs) in constructing the numerator of my metric. This allows us to sidestep the lively discussion about the merits or R+RBI versus R+RBI-HR.

Of course it goes without saying that I don't accept this premise, and you and I both know the myriad weaknesses of using runs scored and RBI to evaluate individual batters. I won't even insult your intelligence by listing them. Given these constraints, what is a reasonable (note that I did not say "the best", "ideal", or any of a number of possible phrases that would indicate I am taking this more seriously than I am) metric we can construct while still infusing it with some sabermetric-suggested properties?

There are three such properties that I think would be imperative in constructing such a metric. These are based on principles that are really not negotiable from my point of view; any analysis (even one constrained to using R and RBI) that does not consider them is one that is fatally flawed, IMO. Of course the points that follow are all obvious to you (although you are more than welcome to not care for how I express them):

1. You must consider runs scored AND runs batted in. Too often in mainstream analysis, runs are pushed aside while RBI are given all the attention. Of course driving in runs is just one side of the coin; someone must be on base to score them. The bias towards driving runs in seems to manifest itself most plainly in modern MVP voting, in which cleanup hitters with gaudy RBI totals are rewarded handsomely (see Juan Gonzalez, Ryan Howard, and Justin Morneau) while their teammates who get on base in front of them (Chase Utley and Joe Mauer in 2006) are ignored.

2. The proper opportunity factor for a rate of run production is outs. Certainly, we can complicate things by talking about the merits of R+/PA, RAA/PA, R+/O+, and the like, but that's beyond the scope of an exercise limiting one to only considering R and RBI, and really is a topic that has more theoretical than practical implications in any event.

It is imperative to consider the batter's role in using up his team's opportunities by making out and creating additional opportunities for his teammates by avoiding outs. Any attempt to quantify offensive performance that doesn't account for this is woefully incomplete.

3. When comparing players across eras, it is necessary to consider the win impact of a player's run contribution. Runs are not equally valuable in all contexts; the more runs are scored per game, the less win impact each run has.

I have not considered park factor here; after all, a metric of this type is designed for non-statheads, the effects are of a lesser magnitude than league scoring level, and it's way too much precision for such a clumsy metric.

So, here are the metrics I'm going to use:

R+ = (R/Out)/Lg(R/O)*100
RBI+ = (RBI/Out)/Lg(RBI/O)*100

Where outs are simply AB-H; again, we really should consider caught stealing at the least, and could consider the ancillary outs as well. I'm going to error on the side of simplicity given the inputs.

To combine R+ and RBI+ into one number, I simply average them. League RBI usually are around 95% of league runs scored in recent years, as of course RBI are not credited on every run. Averaging R+ and RBI+ counts them both equally, and I'll call this figure Runs Anything:

ANY = [(R+) + (RBI+)]/2

The result is a figure similar in scale to OPS or wRC+ and similar metrics--efficiency of run production expressed as a ratio to the league average, with the decimal point discarded.

I have also figured a runs above average figure, which I call ANYA. It is figured as the average of runs above average and RBI above average, which admittedly gives a little more weight to runs scored for an average player. I have not bothered to convert to win value; that is partially accounted for by figuring runs above average, but not completely, which is why sabermetricians are so fond of converting to wins when comparing players across divergent contexts. However, that is too much precision for me to worry about in such an inherently imprecise metric:

ANYA = average(R - Lg(R/O)*O, RBI - Lg(RBI/O)*O)

Let's look at an example that I mentioned in passing earlier through the Runs Anything framework: Ryan Howard and Chase Utley in 2009. Howard finished third in the MVP voting, while Utley languished back in eighth place. (Context-neutral) sabermetric measures are unanimous in evaluating Utley's performance as being more valuable to the Phillies than Howard, but Howard drove in 141 runs to tie for the NL lead while Utley drove in 93. I obviously can't identify exactly what the thought process was for the NL MVP voters, but it seems like a safe bet that the 48 RBI gap between the two was a key factor in their placement.

Utley scored more runs than Howard, although the gap is only seven (112-105). So by any kind of "runs produced" metric, regardless of whether you take out homers or not, Howard ranks well ahead of Utley.

However, what that analysis does not consider is that Howard made 34 more outs than Utley (444-410). The NL averages were .176 runs and .168 RBI per out, which means that Howard's R+, RBI+, ANY is 134, 189, 162 and Utley came in at 155, 135, 145.

This is a useful illustration, as we still get the "wrong" result even when using ANY. ANY does not adjust for the way batting order position affects R and RBI production, and the other pitfalls of using actual R and RBI counts are still in play. Still, the gap between 162-145 is a lot less than a cursory look at R and RBI totals for Howard and Utley would indicate. And of course the analysis ignores the difference in fielding value between the two. Even if ANY captures the true offensive value of the two players, the slick-fielding second baseman is going to come out on top when defense is taken into account.

Believe it or not, I'm going to wring another post out of this topic.