Tuesday, December 30, 2008


* Advice to aspiring sabermetric bloggers who’d like to increase comments on their site: blog about Jim Rice. It doesn’t have to be about Jim Rice’s Hall of Fame candidacy, it just has to be timed around the Hall of Fame season (unfortunately for you, this is his last year of eligibility, so chop chop). As a matter of fact, you can even explicitly say that your point is not to discuss Rice’s candidacy as a whole, but rather one of the many arguments that have been offered on his behalf. It doesn’t matter.

Then comes the hard part. You have to figure out a way to get your post linked by a much more prominent blog, like the BTF Newsblog or Rob Neyer. I don’t really have any advice on this count, since I don’t go around seeking to have my posts linked anywhere, but I’m sure you can figure something out. Send them nice emails or something.

And now you’re in business. Now you will get a steady stream of comments from Rice fans and supporters, and this post of yours will get more comments than the last two dozen boring posts you wrote combined. The drawback is that these new posters consider you a big meanie for daring to say something less than complementary of Rice, have no interest in sabermetrics, and will never read your blog again. But hey, at least you got comments!

Seriously, though, this illustrates why topics like the Hall of Fame debate de jour will always be plastered all over the net: they sell. They get attention and comments and page views, and for anyone with any dreams of making money off his blog, it’s tough to pass up. I am always amused by which posts of mine get linked in other places most often, because the ones that get the most exposure (which admittedly is still not much at all) are the ones I tend to think are the weakest and/or most blah. Posts about Jim Rice, OPS, the MVP award, and rankings of players are popular. Posts about Base Runs and the nineteenth century National League, not so much.

* I recently went digging through a box of Baseball Weekly back issues, looking for one with roughly the same date as now--the end of December. In short order I found the “on sale through December 26, 1995” issue with Ozzie Smith on the cover. I was hoping to find some stuff that read very funny in retrospect--not in an attempt to mock the writer, but just as an illustration of how funny some things that seemed perfectly reasonable at the time would look thirteen years later.

Unfortunately, I didn’t find too much material along those lines. One of the problems was that it is one of those skinny, wintertime, forty page editions they used to have. There aren’t too many opinion columns, just a lot of notes on which marginal free agents are signing where. I should try it again over the spring with the 1996 preview issue, which should be a much more fertile ground for the kind of stuff I was looking for, and will juxtapose well with my annual “don’t take predictions too seriously” disclaimer.

All that being said, there was one gem to be found, a letter to the editor from Dave Roman of Brooklyn. This time, I will be poking a little fun at the author, as this is not just a bad prediction, it’s a bad prediction combined with a large presumption of knowledge and a dash of haughtiness thrown in--the kind of stuff I would make fun of anywhere, anytime:

Your piece (Mets, Marlins rebuilding, Dec. 6-12) should serve as an excellent reference for the Yankees and their owner, who should take notes on how to stop insulting their fans with lousy management tactics and overall lack of planning. There are no “yes men” or lackeys at Shea.

As a longtime Mets fan, I just sit here and laugh quietly knowing that class organizations like my Mets, the Marlins, AND Buck Showalter’s Diamondbacks will win championships before the Yankees.

In fairness, the Marlins have won a couple of championships, although in the process they completely destroyed any perception of being a class organization. And the Diamondbacks did win a World Series, albeit after Showalter was fired. But the Mets…well, at least they managed to win a pennant. And lose the subsequent World Series. To the Yankees. Good call, Dave.

* Within the last month I received my annual (actually, now bi-annual thanks to a title shakeup) copy of the SABR Baseball Research Journal. I have been a member of SABR since 1999 and have read all of the BRJs since then, as well as all the back issues dating back to 1996 and probably about a dozen from before then.

The difference in quality between the early journals (I’m using “early” here to mean “early in my time as a member”, not to refer to the BRJs of the 70s and early 80s) and the most recent editions is night and day. The older ones had many more articles, generally much shorter in length, and many about trivial topics (which is some people’s cup of tea, but not mine). The sabermetric pieces in them were generally trash; freak show rankings of players, endless rediscoveries of bases/something, and the like. There were always some very good non-statistical articles to be found, but the consistency of quality left something to be desired.

It is therefore a great endorsement of both the former SABR publications director Jim Charlton and his successor, Nick Frankovich, to say that the recent editions have been light years better. From the sabermetric perspetive, the BRJ remains non-essential, and that will probably always be, as sabermetricians were early adaptors of the internet and most of the best research will always wind up there. Additionally, the internet provides a great “peer review” outlet for sabermetric research, and allows for frequent publication.

However, the sabermetric pieces that are in the BRJ now are of a much higher quality than those of a decade ago. While the 2008 edition features a dialogue between Bill James and Phil Birnbaum lifted from the pages of By the Numbers (the newsletter of SABR’s Statistical Analysis committee), they are good articles as is a lot of what is in BTN. The old state of affairs was that the sabermetric articles in BTN, the quarterly newsletter of a single committee, were vastly superior to those in SABR’s flagship annual publication, which was pretty sad.

The historical articles are of greater quality and greater detail than those of the past, generally speaking. A couple that I really enjoyed in this one were Daniel Levitt, Mark Armour, and Matthew Levitt’s piece on Harry Frazee and Jerry Kuntz’s piece on George Lawson, baseball rabble-rouser who tried to organize a couple of “major” leagues. If you were unfamiliar with Mr. Lawson, don’t feel bad, as I was too. But the man was an unbelievable character as Mr. Kuntz’s piece demonstrates. Apparently he is working on a book about the Lawson brothers, which if this piece is any indication, will be a very entertaining read for a biography, even if much of the story is non-baseball.

Like any other journal covering a broad spectrum, there are articles that you will not find up your alley, and there are certainly a few pieces that didn’t do anything for me. Regardless, the recent BRJs are a huge step forward from those of a decade ago, and are a great example of the benefits of SABR membership and the wide-ranging interests and expertise of its members.

* I am an unabashed fan of the World Baseball Classic and am very much looking forward to the second edition. I am also an American and an unabashed supporter of the US team. That being said, though, I have no issues whatsoever with Alex Rodriguez choosing to play for the Dominican Republic.

While I am fairly patriotic (although that’s not why I go by the handle “Patriot”--that's another story, and not a particularly interesting one), I am an individualist first, and so I support any individual’s right to represent whichever nation they’d like in something as innocuous as a baseball tournament (if ARod had signed up for the Iranian army, that might be another story). While the various means of determining citizenship for international competition are a little silly, as long as a player is eligible under the rules, I see no problem with them choosing to represent either of his possible choices.

From a value perspective, the loss of ARod is not really much of a blow to the US team, if his replacements at third are indeed David Wright and Chipper Jones. While Adrian Beltre is a fine player (assuming he is playing in the WBC), it’s safe to say that Rodriguez is more valuable to the Dominicans relative to his replacement.

While I will be rooting for the US, my number one wish for the tournament is that Cuba be kept from winning, and strengthening the Dominican side without doing much damage to the American cause is a winning move on this front. My antipathy towards the Cubans is purely political and not directed at their baseball people; I feel bad for their players, and I also feel bad for the Cubans like Jose Contreras, Kendry Morales, and Orlando Hernandez who may well want to represent their country but cannot because they fled the Castros’ regime. Unfortunately the players are pawns in a political game, and unlike baseball games which are ultimately for fun, that one is for keeps.

I can also justify pulling for the US and the DR on the basis that I generally prefer to see the “better” team win a small sample size tournament. All things being equal, I will root for the team with a better regular season record in a playoff matchup. While both national teams may have failed to reach the semifinals in 2006, it would be difficult to argue that the US and the DR don’t produce the most plentiful talent, as evidenced by major league performance, with apologies to Venezuela.

Finally, I can’t help but be amused at the stories that said something to the effect of “Howard declines invite to Team USA”. I don’t doubt the veracity of this, but in a sane world, this would read like me announcing that I am declining to toss my hat into the ring for the Browns’ coaching job.

Tuesday, December 16, 2008

A Jim Rice Post (sorry)

I really don’t want this to be about Jim Rice or the Hall of Fame, but you may not believe me if you keep reading. What I really want to do here is make a point, and Rice just happens to serve as the example. However, the Hall of Fame debates are near the forefront of the baseball scene right now, and one writer’s argument about Rice has inspired me to write this piece.

Another disclaimer: this is not a good post. It just isn’t. Don’t say you weren’t warned.

I also don’t want to make this about the writer in question, Peter Abraham. I cannot claim to be familiar with his work, but my first impression is that he seems to be reasonably thoughtful and intelligent. However, he happened to raise a particular point, which I have seen broached elsewhere in different forms, explicitly and openly, and thus it is easy to respond to. It is a lot more difficult to respond to “Some people say…” as it seems as if you are setting up a strawman piƱata to bash with a Louisville Slugger.

This is what Mr. Abraham wrote in his blog at LoHud.com:

But here’s the problem, the Hall of full of players who were elected based on those standards. So should Jim Rice suffer or Bert Blyleven be elevated because smart people came up with better, more revealing statistics?

Nobody cared about on-base percentage in the 70s and 80s. Rice’s job was to swing for the fences. But now we know OBP matters. But Jim Rice can’t get in the DeLorean and take more pitches because it would make the Baseball Prospectus guys respect him more.

I have four points I would like to make in response, and I will take the lazy route and make a list:

1. Jim Rice had a low walk rate for a big slugger, even for a benighted era. I looked at the relative walk rates (W/(AB + W) relative to the league average of the same) for all of the 350 HR men in MLB history who started their careers in 1974 or earlier; Rice started his career in 1974, so these are all players who presumably would not have been affected by external pressure to take more pitches:

As you can see, most of these hitters drew a lot of walks relative to their peers, regardless of whether it was “their job” or not. Rice, walking at just 79% of the league average, is 38th of 41 on this list.

I could see this argument if Rice was not unique, and all of the big sluggers of the benighted era were not drawing walks. That’s just not the case.

In fairness, you can pick apart this list in a couple of ways. For one, the 350 homer minimum puts Rice, with 382, near the bottom. He’s being compared to a bunch of players who are better than him. A true group of his comparables would set the line lower, so that Rice was near the middle of the group. On the other hand, since the Hall of Fame is the overarching subject of this whole discussion, these are the guys he should be compared to. Also, eyeballing similarity, I would say the most similar players (in terms of career length, era, and production) on this list would be Dwight Evans, Dick Allen, Norm Cash, Rocky Colavito, Frank Howard, Graig Nettles, Billy Williams, Tony Perez, Dave Kingman, Orlando Cepada, and Lee May--all of whom walked more than he did except the last two. You don’t need to compare him to Ruth, Foxx, and Mantle to see that he didn’t draw a lot of walks for a slugger.

Another complaint is that by requiring the players to have embarked on their careers in 1974 and earlier, I am cutting out players who are essentially contemporaries of Rice while including many others who are clearly not his contemporaries (Ruth, Gehrig, Ott, etc.) Rice belongs in a group with Andre Dawson, who’s excluded here, much more than he does with them.

Caveats aside, I think the claim that sluggers of Rice’s era didn’t walk (or weren’t expected to walk) and thus it doesn’t matter should have to establish that contention. It flies against common sense and it flies against a cursory look at the evidence, and I am trying to be generous.

2. I let it go in my first point, but I don’t believe there was a benighted era (Abraham does not explicitly state that there was, but it is hinted at). Sure, baseball people (and writers, and fans, etc.) generally have a better or more complete understanding of the value of OBA and walks now than they did in 1980 or 1920. But I think it’s just a little bit condescending to pretend as if it was a revelation to them. Perhaps to some; but there have always been Earl Weavers and Branch Rickeys out there who understand the importance of getting on base/avoiding outs as well as anybody in any time.

There are certainly people throughout baseball, past and present, who didn’t properly appreciate the value of getting on base. And there are many more that, while recognizing the theoretical value of getting on base, have ignored or undervalued it when using statistics to judge a player. That’s a far cry from believing that this belief was so overwhelming that it prevented Rice from displaying patience, and the walk rates of other big home run hitters doesn't support such a position.

3. Even if someone threatened to fine Rice for every pitch he took, I don’t care. In the end, I am only interested in assessing value. If Rice’s lack of selectivity was caused by the conditions of his time, and made him a less valuable player in his own time and place than he might have been had he played in 2005, I don’t care.

Generally speaking, the same things that won baseball games in 2005 won baseball games in 1975. Evolving metrics just allow us to better quantify reality--they don’t change it.

This is where the discussion crosses into opinion territory of course--"value rules all" does not have to be the underlying philosophy that you use when evaluating players. I happen to believe that the only truly fair way to evaluate a player is to estimate how many wins he contributed to his team in the unique context in which he actually performed. Anything else is judging him on what he might have been or could have been and ultimately, what the individual thinks he could have been, rather than what he was. From reading the arguments of other writers over the years, it is my observation that straying too far off the value reservation often results in a mess of countering what-ifs and contradictions. It is much easier to just judge the player’s achievements as they relate to winning baseball games in the environment in which he actually performed.

Perhaps Rice would have taken more pitches if he had played today…but perhaps his aggressiveness enabled him to hit some of his home runs. Perhaps he would have gotten frustrated as a rookie when his hitting coach tried to impose this upon him, pressed, and gotten labeled as a AAAA player. Perhaps he would have been Manny Ramirez instead. The point is, you cannot possibly know what would have happened with any degree of certainty. Sabermetric estimates of Rice’s value in his own time and place are certainly not without flaw, weakness, and oversight, but they also are grounded in the principle of assessing what Rice actually did, and estimating its value.

I would even go so far as to say that Abraham’s argument, taken to its extreme, glorifies statistics above winning. One of the common complaints about sabermetricians is that all we care about is the numbers. Of course, the reason we look at certain statistics and interpret them as we do is because they correlate with wins. If you throw up your hands and say “In Rice’s time, people valued BA, HR, and RBI” and look at these despite agreeing that they are less telling than other metrics, aren’t you in fact saying that putting up statistics (specifically, statistics that are in vogue in a given time and place) is what matters?

4. Even if one accepts Abraham’s premise, I can’t imagine using it as an argument against a player. He cites Rice and Blyleven as two sides of the coin; one whose standing has been hurt by the proliferation of sabermetric ideas and one who has benefited from it.

Abraham suggests that perhaps Rice would have played the game differently if he played today. As I have spent the rest of this post explaining, I don’t buy it and even if I did, it wouldn’t change my mind on my opinion of him. But if you do, you may accept Abraham’s argument and view Rice’s relatively low OBAs (for a Hall of Fame corner outfielder) as a product of environment.

I have to ask, though, how would Blyleven’s performance be affected? Blyleven looks better when you evaluate him by runs allowed instead of by win-loss record. Had the observers of the time thought more about his ERA instead of his W%, he would have been better regarded in his own time. But how would it have affected what he did on the field? Blyleven was trying to prevent runs and win games. Are we to believe that he would have been able to allow less runs (or that he would have allowed more) had his contemporaries not paid attention to the “W” and “L” columns? Since I have to assume that just about everyone would answer “no”, then what good does it do to evaluate Blyleven in an outdated light? At least in the case of Rice, Abraham has offered a possible cause and effect relationship between contemporary views of statistics and performance. I don’t see any offered for the case of ERA v. W-L.

I have tried to avoid mentioning the Hall of Fame, because I don’t want to get into that debate (“that debate” being the one about which specific players should be in/out, not the election process itself or how a theoretical Hall might look). The same issues that come up in relation to the Hall are relevant for the general discussion of player value down through the years. My opinion of the Hall will be the same after Rice’s induction as it was before Rice’s induction, and while you can probably tell what I think of Jim Rice as a player, my intent really was just to use him as a vehicle to touch on the larger issues.

Tuesday, December 09, 2008

Demystifying Fibonacci Win Points

In his seminal 1994 book The Politics of Glory, Bill James introduced a simple method for evaluating pitcher win-loss records by combining the two components into one number. He called this method Fibonacci Win Points, and found that it was a fairly good predictor of which starting pitchers would be selected for the Hall of Fame.

You still see Win Points brought up from time to time for that kind of lightweight analysis, but I get the impression that most of the users don’t really know what the results are telling them (outside of the general idea that high numbers are good, it’s a combination of wins and losses, etc.). Since no one is taking the results too seriously, it’s not a big deal. But I always like to look underneath the math hood and see how things work.

So warning: this is a math post, and doesn’t really have much to do with baseball. The formula for Win Points (which I will abbreviate “FIB”) is wins times winning percentage, plus wins, minus losses. Writing it as a formula:

FIB = W/(W + L)*W + W - L

We can also write it as W^2/(W + L) + W - L, and make it abundantly clear that this is a unitless measure. The results bear a resemblance to win figures, but they are no longer a meaningful baseball unit.

To understand how FIB works a little better, let’s express all of the terms as per-decision rates. W% is still winning percentage, but wins are now equal to W% and losses are equal to 1-W%. The formula for Win Point rate (FIBr) becomes:

FIBr = W%*W% + W% + (1-W%) = (W%)^2 + 2(W%) - 1

To convert back to Win Points, we simply multiply FIBr by the number of decisions the pitcher was credited.

FIBr is a useful way to explore the relationship between Win Points and W%. We can see that a .500 pitcher will have a FIBr of (.5)^2 + 2(.5) - 1 = .25. So, a 10-10 pitcher will get .25 Win Points/decision, or 5 points. What does a pitcher have to do to earn .5 win points/decision?

We can find that by setting FIBr equal to .5, and solving for W%. It is a simple quadratic equation, which becomes very easy to see when we use “x” in place of W%:

FIBr = x^2 + 2x – 1

Setting FIBr equal to .5 and solving for x gives sqrt(10)/2 - 1, or .581. A pitcher with a W% of .581 will get a win point for every two decisions, whereas one with a .500 W% gets a win point every four decisions. It’s a very steep function (more on this later).

What is the point at which a pitcher gets zero win points? Solve the equation for FIBr = 0, and you get sqrt(2) - 1 = .414. While the value is superficially similar to a replacement level baseline, win points do not serve as a WAR method. .414 though is obviously the baseline, below which negative values are returned.

Finally, what is the point at which the pitcher gets as many win points per decision as he does wins? Solve FIBr for x, and you get (sqrt(5) - 1)/2 = .618. This consequence of James’ formula is why he called them “Fibonacci” win points, as (sqrt(5) - 1)/2 is the reciprocal of Fibonacci’s number, the golden ratio.

Allow me to go completely down the math digression path for a moment, with a disclaimer that I am not by any means a math professor and that James covered this ground in The Politics of Glory. The Fibonacci sequence starts with 0 and 1; at each step, you add the last two numbers together to yield the next term. So 0 + 1 = 1, 1 + 1 = 2, 1 + 2 = 3, 2 + 3 = 5, 3 + 5 = 8, 5 + 8 = 13, 8 + 13 = 21, 13 + 21 = 34, 21 + 34 = 55, and so on. As this sequence approaches infinity, the ratio between the previous term and the next term approaches (sqrt(5) - 1)/2. You can see this even in the early iterations; 8/13 = .61538, while 34/55 = .61818181… .

This will happen regardless of which two numbers you use to start out with. If we start out with 1 and 155, 1 + 155 = 156, 155 + 156 = 311, 156 + 311 = 467, 311 + 467 = 778, 467 + 778 = 1245, 778 + 1245 = 2023…1245/2023 = .6154, and you can see where this is going.

Writing it out to ten decimal places, (sqrt(5) - 1)/2 = .6180339887. The reciprocal of .6180339887 is 1.6180339888…the only reason why it is not exactly equal to 1 + itself is because I rounded.

The square of the number is .381966011, which has a reciprocal of 2.618033989. So the reciprocal of its square is 2 + itself. I wish I could tell you that the reciprocal of its cube is 3 + itself, but alas… However, the square plus Fibonacci’s number is one; thus it is both the square and the complement. The Wikipedia link above gives some more details on the Fibonacci sequence and the golden ratio and their appearances in art and nature as well as a much more detailed discussion of their mathematical properties.

Getting back to Win Points, we can differentiate FIBr with respect to W% to see just how steep the function is:

dFIBr = 2(W%) + 2

If we look at this specifically at the mean W% (.5 of course), we get a slope of 3. We can write the tangent line at that point in point-slope form as:

y - y1 = m(x - x1) where y = FIBr, y1 = FIBr(.5) = .25, m = dFIBr (.5 at this point), x = W%, and x1 = .5: FIBr - .25 = 3(W% - .5), which can be simplified to:

FIBr ~= 3(W%) - 1.25

Which can also be written as:

FIBr ~= 3(W% - 5/12) ~= 3(W% - .417)

Again, this is just a linearization of the FIBr function for a .500 W%; it is precise at that point, but doesn’t hold over the entire range of W%. However, it does give us some insight into how Win Points work. The baseline is a W% of .417 (although .414 is the actual zero point, a .500 pitcher doesn’t get any win points until his W% reaches .417--and if that doesn't make sense to you, just ignore it, as I'm not quite sure how to express it coherently), and each point of W% above .417 is multiplied by three.

Comparing this to a standard WAR formula, the WAR baseline will be in the same ballpark as .414 (I use .390), but each point of W% above the baseline is equally valuable. A .500 pitcher and a .495 pitcher will have the same WAR gap, given equal decisions, as a .600 and a .595 pitcher. That will not be true for Win Points, which rewards higher W%s more. This may be why James found them useful for predicting the Hall of Fame’s treatment of pitchers, as Win Points put a premium on excellence, beyond its real world win value.

Just to give you an idea of how well the approximation works on the career level, I figured actual Fibonacci Win Points and the linear knockoff for all post-1900 pitchers with 150 or more wins as of 2007. There are 198 pitchers, for whom the average absolute difference between the two is 2.3 win points. The differences are proportional to win points; the biggest differences are for those with the most win points (the largest single difference is 15 between Christy Mathewson’s 433 actual and 418 approximate). I think that this supports my claim that the linear approximation is a decent tool by which to understand how Win Points work.

It should go without saying that the approximation works best for those pitchers with W%s near .500, as that is the point at which I found the tangent line. If you were to find the tangent line at .600, you would get more accurate results for pitchers with W%s near .600, naturally.

Now I will give a freak show stat of my own, which I would never really encourage anyone to use. I include it here as a way to use the career value metric I prefer (given the constraint of working with the actual W-L record) with a career wins pseudo-scale. It is just a quick z-score conversion of pitcher WAR to a pseudo-Wins unit. For the 150 win pitchers, the mean number of wins was 210 with a standard deviation of 55. I figured WAR crudely as (W% - .39)*(W + L); crude because it is using actual wins and losses as the inputs, not because the concept is flawed. This metric has a mean of 65 and a standard deviation of 25 for this group. Setting the z-scores equal and rearranging to isolate wins gives this conversion:

pseudo-Wins = 2.2*WAR + 67

Obviously this is only intended for use with career WAR…saying that CC was worth 7 WAR last year and that is equivalent to 84 wins would make no sense. Of course it would make no sense for a pitcher with 7 career WAR either; this was based on pitchers with 150 wins and shouldn’t be used outside the range of long, reasonably productive careers. Well, it really shouldn’t be used at all, but I needed some original content, even of questionable quality, to make me feel better about a post rehashing James’ method.

The lowest-ranking 300 win pitcher under this formula (and thus, by definition, WAR as defined above as well) is Nolan Ryan (251). There is only one pitcher with 300 pseudo-Wins without 300 actual wins, and that may change this season--Randy Johnson at 319. The other non-300 game winners with 290 or more pseudo wins are Jim Palmer (296) and Whitey Ford (293). In case it is not clear, I have been using actual wins and losses, not “neutral wins and losses” as in the previous post and other posts through the years.

In summation, Win Points may well have value as a gauge of Hall of Fame chances or as a reasonable way to combine wins and losses into one number. However, they are unitless, and they place a very high premium on excellent performance, much more than the actual, tangible win impact of said performance. Despite their baseline of .414, which is a reasonable “replacement” level, the aforementioned premium should dispel any notion that they are a WAR knockoff. And of course, they just so happen to tie in with a fascinating mathematical sequence and ratio, which gives an otherwise forgettable “freak show stat” (I apply that label in the kindest sense of the term, and it is one which Bill James often applied to his own inventions) a little more pizzazz.

Tuesday, December 02, 2008

W-L Records of Mussina and Contemporaries

The retirement of Mike Mussina and the imminent departure of the rest of a roughly contemporary group of great pitchers that many lump together (Clemens, Maddux, Johnson, Smoltz, Glavine, Schilling, Martinez, Brown) has led to a number of discussions about how they stack up, which ones are worthy of the Hall of Fame, and the like. I don’t wish to enter the Hall of Fame debate, but I do want to provide a little bit of information and use this as an opportunity to re-make a larger point.

In the course of these discussions, sometimes pitcher W-L records are brought up. I have no particular desire to promote their inclusion in these discussions, but to the extent that it is inevitable that they will be used, I do wish that people would take the time to really think about how to evaluate them, and the assumptions that their approach entails.

Everyone who is reading this blog knows about the deficiencies of pitcher W-L record as a serious analytical tool. At the risk of being patronizing, I will list some of the biggies:

1) they are heavily affected by the offensive performance of the pitcher’s teammates
2) the accounting rules used to assign them are outdated and often result in questionable (to put it generously) results
3) they are heavily affected, in the modern era at least, by a pitcher’s bullpen support
4) like most other basic pitching measures, they do not isolate the pitcher’s efforts from those of the fielders behind him

On the other hand, there are a few good things to be said about them:

1) they are inherently (at least as a W% or a W/L ratio) park and era adjusted, as the mean is .500 always and forever
2) if you subscribe to the notion that a player really only adds value in games his team ends up winning (I don’t), at least a pitcher’s win is always a team win
3) when analyzed, they often lead to similar conclusions as other measures of pitching effectiveness; they are positively and fairly strongly correlated with ERA and similar metrics

Please do not get me wrong--I don’t believe that the positives outweigh the negatives. However, they are not going anywhere, and some people will continue to use them to evaluate pitchers. With that being the case, the question that I am addressing is “How can W-L records be interpreted so as to make the best estimate of a pitcher’s true value?”

Ideally, run support data can be used (either average or discrete figures from each game) to provide context for the W-L record. However, this introduces the issue of park effects, which we previously could ignore, one of the positive attributes of W-L. There is a bit more math involved as well, which may not deter me or you but will lose many of the fans who continue to rely on W-L. Additionally, there is the problem of past seasons not covered by the efforts of Retrosheet and others. Pointing out these drawbacks is not an attempt to shun this approach in favor of what is to follow, which admittedly is a more rudimentary and less optimal approach.

A very common approach that even casual, non-sabermetric fans seem to gravitate towards is comparing a pitcher’s W% to that of his teammates. This approach dates to at least 1944 and Ted Oliver’s “Kings of the Mound”. It seems like a common sense way to account and adjust for the quality of a pitcher’s team, it is easy to do computationally, and it involves data (team W-L record) that is readily available. So what’s not to like?

Notice that I slipped “adjust for the quality of a pitcher’s team” in there. That’s exactly what a direct comparison of pitcher W-L to teammates’ W-L record does. But why would one want to adjust for the quality of the team? The team’s record includes the contributions of the team’s hitters, fielders, and relievers, all of which influence the W-L records of starting pitchers. But it also includes the contributions of the team’s other starting pitchers, which are irrelevant to any individual starter. If Stephen Drew plays well, he helps to increase Brandon Webb’s “teammate W%”. And if Dan Haren pitches well, he also winds up increasing Brandon Webb’s “teammate W%”. The difference is that while Drew’s actions serve to increase Webb’s chances of earning a win (or avoiding a loss), Haren’s do no such thing. They are confined to a completely different set of games, games in which Webb does not pitch.

Therefore, assuming that the goal of any method of comparing pitcher W% to team W% is to estimate what his W% would be on an average team, the simple differential between W% and teammates’ W% (which I will call Mate for the sake of brevity) is flawed. This is because it implicitly assumes that all of the team’s deviation from .500 is the product of offense, fielding, and relief support, ignoring the contributions of the other starting pitchers.

In order to come up with a simple model, let’s make the following assumptions:

1) 50% of an average team’s deviation from .500 is due to offense; 50% is due to defense
2) Pitching is 100% of defense (this is obviously a faulty assumption, unlike the first one, which is reasonable)
3) The starting pitcher, in one of his starts, is the entirety of the pitching; his relievers will not affect the outcome (again, faulty, although closer to reality than #2)
4) Team W% can be modeled linearly (faulty, but reasonable, as a linear model works fine for normal teams)

Given these assumptions, a pitcher should be compared not to Mate directly, but to the average of Mate and .500. In doing so, the assumption is that half of the deviation from .500 was due to the offense, and has changed the W% of a hypothetical average pitcher on this team from .500 to (Mate + .5)/2.

Continuing to apply the linear assumption, a pitcher’s Neutral W% (hypothetical on a .500 team) can be figured as:

NW% = W% - (Mate + .5)/2 + .5 = W% - Mate/2 + .25

Under the traditional approach, a .600 pitcher on a .600 Mate team would have a NW% of .500. Under this approach, his NW% will be .6 - .6/2 + .25 = .550. Compared to a simple differential, this approach is kinder to pitchers on good teams and less generous to those on bad teams.

One can argue about the assumptions above; you can use more sophisticated assumptions about fielding and bullpen support, use a Pythagorean model instead of a linear one, and the like. I think those refinements are overkill, since any analysis of W-L records is going to be inherently fraught with imprecision, but if you want to go further down that path, I won’t try to stop you.

Another, simpler option is to alter the weights on Mate and .500. I have weighted them 50/50; perhaps 40% of Mate and 60% of .500 would better account for some of the factors our assumptions brushed aside (I picked those specific numbers as an example rather than for any justifiable reason).

In any individual case, the 50/50 assumption may wind up being “worse” than the standard 100/0 assumption, or the 0/100 assumption (which would just set NW% = W%, assuming that the pitcher was solely responsible for deviation from .500). The average team may have a perfect offense/defense value split, but very few teams actually do. An example that you will see in the data presented later is the Braves teams of the 1990s. Their defenses were better relative to the league than their offenses, and thus even after making neutralizing the W% of a Maddux or a Smoltz in the manner prescribed here, they are being shortchanged. However, for more cases than not, 50/50 is going to match reality better than 100/0 or 0/100, and thus is better suited for general application.

Regardless of the assumptions made in figuring NW%, once we have NW% by any method, we can extend it to value measures. The most common is “Wins Above Team”, first figured by Oliver and carried on by countless analysts since. It is figured as (NW% - .5)*(W + L), and is the number of wins beyond those expected of an average pitcher in the same number of decisions.

We can also compare the pitcher to some replacement level; I use a .390 W% as my replacement level for starting pitchers, and thus what I call WCR (Wins Compared to Replacement, as I don’t want to overuse the common WAR acronym) is simply (NW% - .39)*(W + L).

Both formulas assume that the pitchers decisions will remain constant; you could use estimated decisions (ex. IP/9) in place, as the number of decisions itself can be affected by external factors. However, I am most comfortable assuming decisions are a constant. After sticking with actual decisions, we can figure a new W-L record, with NW = NW%*(W + L) and NL = (1 - NW%)*(W + L).

If by any chance this sounds familiar, I have written about all of this before. My previous posts on this matter were by no means original; the idea of the 50/50 split was explained and implemented by Rob Wood in his August 1999 By The Numbers article, "Evaluating Pitchers' Winning Percentages: A Mathematical Modeling Approach" (pdf link). I have also published some results for great pitchers on this blog; here I am going to supplement that with updated (through 2008) results for the pitchers generally considered to be Mussina’s contemporaries (Brown, Smoltz, Schilling, Martinez, Glavine, Johnson, Maddux, Clemens).

Here is the career data for those pitchers with the list sorted by NW:

Career Mate is weighted by decisions in each season, the reasoning behind which should be obvious. A few observations about the results:

* For pitchers with 150 or more Neutral Wins, Lefty Grove has the highest career NW%, at .650. When I last figured Clemens, he was at .654, but his performance in 2007 dropped him to .6497, now behind Grove’s .6502. Randy Johnson still leads Grove, but has slipped from .661 to .653 and may not hang on. Pedro Martinez has also slipped, from .680 to .671. I would wager that one of them manages to hang on, but it is within the realm of possibility that Grove will retain the career lead.

* Maddux may have slipped ahead of Clemens in wins, but the Rocket still has a seven win edge in NW, and with Maddux’ retirement seeming quite possible, Warren Spahn will remain the post-war leader at 355.

* Glavine may wind up outside the 300 NW club, but Randy Johnson is closer in NW than in actual wins thanks to pitching on slightly below average teams over the course of his career; Schilling is the only other member of this group with sub-.500 teammates.

I have posted the complete career data for these guys so that you can look at individual seasons (although an imprecise metric like this is best used when aggregated over a long period of time).

Finally, allow me to briefly comment on the Hall of Fame as it relates to Mussina. I have written before that I don’t really care who goes into Cooperstown, because I think that their process is broken beyond drastic repair, and has been for many years. However, I don’t waive the right to comment generally on the issue; my policy is simply not to advocate or give a yes/no answer for or against any particular player. (I strive for neutrality, but sometimes I can’t help myself, so if you want to accuse me of hypocrisy, have at it.)

There are 49 post-1900 starters in the Hall (depending on who you consider to be post-1900; I did not count Kid Nichols but did count Cy Young, if that helps). Fourteen of them (29%) won 300 games, so any notion that 300 wins is a time-established standard for induction is off-base. Thirty-six pitchers have been selected by the BBWAA, including all 14 of the 300 win group (39%). So even if you limit it to the writers, 61% of starting pitchers inducted did NOT win 300 games.

I have Mussina’s NW% as 262-161 (.619). Eyeballing similarity, that’s in the same area code as Carl Hubbell (244-163, .600), Joe McGinnity (234-154, .603), Bob Feller (257-171, .601), Bob Gibson (248-177, .583), Juan Marichal (236-149, .614), and Jim Palmer (251-169, .599). All of those guys are in the Hall of Fame, and seem to be regarded as fine choices.

I certainly am not saying Mussina must be elected because of his neutral W-L record alone; there are certainly better metrics by which to evaluate pitchers, other factors that you may want to consider beyond career value, and there's nothing stopping you from having your own standards for what is a Hall of Famer. However, the notion that Mussina’s career W-L record is in and of itself a liability, absent mitigating circumstances or a divergent opinion about what the standard should be, seems misguided. Put another way, Mussina’s career W-L record is one that typically would be associated with a Hall of Fame pitcher.

One of the observations that has been put out there by a number of writers (I think I remember Tom Verducci in particular mentioning this) is that Mussina spun a career of near-misses--he wasn’t that far from 300, he always just missed 20 wins (until 2008 of course), he just missed winning the Series as a Yank, he just missed a couple of no-hitters/perfect games.

Personally, I was always a fan of Mussina as a result of two things, one of those a near-miss; his almost perfect game against the Indians in 1997. The other was that in my childhood/adolescence I played Front Page Sports: Baseball incessantly. I recall that the pitcher on the box of the first edition (’94) bore a resemblance to Mussina. If you actually dredge this up, you may well find that I am way off and it embarrassingly looks nothing like him, or that it actually is him. Regardless, I pretended that it was him. He was also money in the ’96 version; in my seasons, he seemed to be the most consistently effective pitcher, better than Johnson or Cone or Clemens or even Maddux. The Yankees are now short a starter, and my old FPS Cleveland Indians are missing one too. So long, Moose.