Tuesday, December 02, 2008

W-L Records of Mussina and Contemporaries

The retirement of Mike Mussina and the imminent departure of the rest of a roughly contemporary group of great pitchers that many lump together (Clemens, Maddux, Johnson, Smoltz, Glavine, Schilling, Martinez, Brown) has led to a number of discussions about how they stack up, which ones are worthy of the Hall of Fame, and the like. I don’t wish to enter the Hall of Fame debate, but I do want to provide a little bit of information and use this as an opportunity to re-make a larger point.

In the course of these discussions, sometimes pitcher W-L records are brought up. I have no particular desire to promote their inclusion in these discussions, but to the extent that it is inevitable that they will be used, I do wish that people would take the time to really think about how to evaluate them, and the assumptions that their approach entails.

Everyone who is reading this blog knows about the deficiencies of pitcher W-L record as a serious analytical tool. At the risk of being patronizing, I will list some of the biggies:

1) they are heavily affected by the offensive performance of the pitcher’s teammates
2) the accounting rules used to assign them are outdated and often result in questionable (to put it generously) results
3) they are heavily affected, in the modern era at least, by a pitcher’s bullpen support
4) like most other basic pitching measures, they do not isolate the pitcher’s efforts from those of the fielders behind him

On the other hand, there are a few good things to be said about them:

1) they are inherently (at least as a W% or a W/L ratio) park and era adjusted, as the mean is .500 always and forever
2) if you subscribe to the notion that a player really only adds value in games his team ends up winning (I don’t), at least a pitcher’s win is always a team win
3) when analyzed, they often lead to similar conclusions as other measures of pitching effectiveness; they are positively and fairly strongly correlated with ERA and similar metrics

Please do not get me wrong--I don’t believe that the positives outweigh the negatives. However, they are not going anywhere, and some people will continue to use them to evaluate pitchers. With that being the case, the question that I am addressing is “How can W-L records be interpreted so as to make the best estimate of a pitcher’s true value?”

Ideally, run support data can be used (either average or discrete figures from each game) to provide context for the W-L record. However, this introduces the issue of park effects, which we previously could ignore, one of the positive attributes of W-L. There is a bit more math involved as well, which may not deter me or you but will lose many of the fans who continue to rely on W-L. Additionally, there is the problem of past seasons not covered by the efforts of Retrosheet and others. Pointing out these drawbacks is not an attempt to shun this approach in favor of what is to follow, which admittedly is a more rudimentary and less optimal approach.

A very common approach that even casual, non-sabermetric fans seem to gravitate towards is comparing a pitcher’s W% to that of his teammates. This approach dates to at least 1944 and Ted Oliver’s “Kings of the Mound”. It seems like a common sense way to account and adjust for the quality of a pitcher’s team, it is easy to do computationally, and it involves data (team W-L record) that is readily available. So what’s not to like?

Notice that I slipped “adjust for the quality of a pitcher’s team” in there. That’s exactly what a direct comparison of pitcher W-L to teammates’ W-L record does. But why would one want to adjust for the quality of the team? The team’s record includes the contributions of the team’s hitters, fielders, and relievers, all of which influence the W-L records of starting pitchers. But it also includes the contributions of the team’s other starting pitchers, which are irrelevant to any individual starter. If Stephen Drew plays well, he helps to increase Brandon Webb’s “teammate W%”. And if Dan Haren pitches well, he also winds up increasing Brandon Webb’s “teammate W%”. The difference is that while Drew’s actions serve to increase Webb’s chances of earning a win (or avoiding a loss), Haren’s do no such thing. They are confined to a completely different set of games, games in which Webb does not pitch.

Therefore, assuming that the goal of any method of comparing pitcher W% to team W% is to estimate what his W% would be on an average team, the simple differential between W% and teammates’ W% (which I will call Mate for the sake of brevity) is flawed. This is because it implicitly assumes that all of the team’s deviation from .500 is the product of offense, fielding, and relief support, ignoring the contributions of the other starting pitchers.

In order to come up with a simple model, let’s make the following assumptions:

1) 50% of an average team’s deviation from .500 is due to offense; 50% is due to defense
2) Pitching is 100% of defense (this is obviously a faulty assumption, unlike the first one, which is reasonable)
3) The starting pitcher, in one of his starts, is the entirety of the pitching; his relievers will not affect the outcome (again, faulty, although closer to reality than #2)
4) Team W% can be modeled linearly (faulty, but reasonable, as a linear model works fine for normal teams)

Given these assumptions, a pitcher should be compared not to Mate directly, but to the average of Mate and .500. In doing so, the assumption is that half of the deviation from .500 was due to the offense, and has changed the W% of a hypothetical average pitcher on this team from .500 to (Mate + .5)/2.

Continuing to apply the linear assumption, a pitcher’s Neutral W% (hypothetical on a .500 team) can be figured as:

NW% = W% - (Mate + .5)/2 + .5 = W% - Mate/2 + .25

Under the traditional approach, a .600 pitcher on a .600 Mate team would have a NW% of .500. Under this approach, his NW% will be .6 - .6/2 + .25 = .550. Compared to a simple differential, this approach is kinder to pitchers on good teams and less generous to those on bad teams.

One can argue about the assumptions above; you can use more sophisticated assumptions about fielding and bullpen support, use a Pythagorean model instead of a linear one, and the like. I think those refinements are overkill, since any analysis of W-L records is going to be inherently fraught with imprecision, but if you want to go further down that path, I won’t try to stop you.

Another, simpler option is to alter the weights on Mate and .500. I have weighted them 50/50; perhaps 40% of Mate and 60% of .500 would better account for some of the factors our assumptions brushed aside (I picked those specific numbers as an example rather than for any justifiable reason).

In any individual case, the 50/50 assumption may wind up being “worse” than the standard 100/0 assumption, or the 0/100 assumption (which would just set NW% = W%, assuming that the pitcher was solely responsible for deviation from .500). The average team may have a perfect offense/defense value split, but very few teams actually do. An example that you will see in the data presented later is the Braves teams of the 1990s. Their defenses were better relative to the league than their offenses, and thus even after making neutralizing the W% of a Maddux or a Smoltz in the manner prescribed here, they are being shortchanged. However, for more cases than not, 50/50 is going to match reality better than 100/0 or 0/100, and thus is better suited for general application.

Regardless of the assumptions made in figuring NW%, once we have NW% by any method, we can extend it to value measures. The most common is “Wins Above Team”, first figured by Oliver and carried on by countless analysts since. It is figured as (NW% - .5)*(W + L), and is the number of wins beyond those expected of an average pitcher in the same number of decisions.

We can also compare the pitcher to some replacement level; I use a .390 W% as my replacement level for starting pitchers, and thus what I call WCR (Wins Compared to Replacement, as I don’t want to overuse the common WAR acronym) is simply (NW% - .39)*(W + L).

Both formulas assume that the pitchers decisions will remain constant; you could use estimated decisions (ex. IP/9) in place, as the number of decisions itself can be affected by external factors. However, I am most comfortable assuming decisions are a constant. After sticking with actual decisions, we can figure a new W-L record, with NW = NW%*(W + L) and NL = (1 - NW%)*(W + L).

If by any chance this sounds familiar, I have written about all of this before. My previous posts on this matter were by no means original; the idea of the 50/50 split was explained and implemented by Rob Wood in his August 1999 By The Numbers article, "Evaluating Pitchers' Winning Percentages: A Mathematical Modeling Approach" (pdf link). I have also published some results for great pitchers on this blog; here I am going to supplement that with updated (through 2008) results for the pitchers generally considered to be Mussina’s contemporaries (Brown, Smoltz, Schilling, Martinez, Glavine, Johnson, Maddux, Clemens).

Here is the career data for those pitchers with the list sorted by NW:

Career Mate is weighted by decisions in each season, the reasoning behind which should be obvious. A few observations about the results:

* For pitchers with 150 or more Neutral Wins, Lefty Grove has the highest career NW%, at .650. When I last figured Clemens, he was at .654, but his performance in 2007 dropped him to .6497, now behind Grove’s .6502. Randy Johnson still leads Grove, but has slipped from .661 to .653 and may not hang on. Pedro Martinez has also slipped, from .680 to .671. I would wager that one of them manages to hang on, but it is within the realm of possibility that Grove will retain the career lead.

* Maddux may have slipped ahead of Clemens in wins, but the Rocket still has a seven win edge in NW, and with Maddux’ retirement seeming quite possible, Warren Spahn will remain the post-war leader at 355.

* Glavine may wind up outside the 300 NW club, but Randy Johnson is closer in NW than in actual wins thanks to pitching on slightly below average teams over the course of his career; Schilling is the only other member of this group with sub-.500 teammates.

I have posted the complete career data for these guys so that you can look at individual seasons (although an imprecise metric like this is best used when aggregated over a long period of time).

Finally, allow me to briefly comment on the Hall of Fame as it relates to Mussina. I have written before that I don’t really care who goes into Cooperstown, because I think that their process is broken beyond drastic repair, and has been for many years. However, I don’t waive the right to comment generally on the issue; my policy is simply not to advocate or give a yes/no answer for or against any particular player. (I strive for neutrality, but sometimes I can’t help myself, so if you want to accuse me of hypocrisy, have at it.)

There are 49 post-1900 starters in the Hall (depending on who you consider to be post-1900; I did not count Kid Nichols but did count Cy Young, if that helps). Fourteen of them (29%) won 300 games, so any notion that 300 wins is a time-established standard for induction is off-base. Thirty-six pitchers have been selected by the BBWAA, including all 14 of the 300 win group (39%). So even if you limit it to the writers, 61% of starting pitchers inducted did NOT win 300 games.

I have Mussina’s NW% as 262-161 (.619). Eyeballing similarity, that’s in the same area code as Carl Hubbell (244-163, .600), Joe McGinnity (234-154, .603), Bob Feller (257-171, .601), Bob Gibson (248-177, .583), Juan Marichal (236-149, .614), and Jim Palmer (251-169, .599). All of those guys are in the Hall of Fame, and seem to be regarded as fine choices.

I certainly am not saying Mussina must be elected because of his neutral W-L record alone; there are certainly better metrics by which to evaluate pitchers, other factors that you may want to consider beyond career value, and there's nothing stopping you from having your own standards for what is a Hall of Famer. However, the notion that Mussina’s career W-L record is in and of itself a liability, absent mitigating circumstances or a divergent opinion about what the standard should be, seems misguided. Put another way, Mussina’s career W-L record is one that typically would be associated with a Hall of Fame pitcher.

One of the observations that has been put out there by a number of writers (I think I remember Tom Verducci in particular mentioning this) is that Mussina spun a career of near-misses--he wasn’t that far from 300, he always just missed 20 wins (until 2008 of course), he just missed winning the Series as a Yank, he just missed a couple of no-hitters/perfect games.

Personally, I was always a fan of Mussina as a result of two things, one of those a near-miss; his almost perfect game against the Indians in 1997. The other was that in my childhood/adolescence I played Front Page Sports: Baseball incessantly. I recall that the pitcher on the box of the first edition (’94) bore a resemblance to Mussina. If you actually dredge this up, you may well find that I am way off and it embarrassingly looks nothing like him, or that it actually is him. Regardless, I pretended that it was him. He was also money in the ’96 version; in my seasons, he seemed to be the most consistently effective pitcher, better than Johnson or Cone or Clemens or even Maddux. The Yankees are now short a starter, and my old FPS Cleveland Indians are missing one too. So long, Moose.


  1. I had an unhealthy addiction to Front Page Sports Baseball 94. I used to skip school so I could play this game. I ended up missing out on the 5th grade end-of-the year party because my grades dropped from playing too much Front Page Sports Baseball.

    The game had an MLBPA liscence, but it didn't have an MLB liscence. Because of this, the cover featured Mussina with a black cap and orange uniform, but with no logoes on them.

    Here's a short video clip of FP Baseball 1994:


  2. I'm glad my memory wasn't off. I know the later versions had Randy Johnson on the box.

    The only thing that drove me nuts about that game is that batters didn't have any particular plate discipline tendency. Otherwise, it was awesome, and for the time was easily the most realistic game.

    The best was the fake team names that tried to mimic the actual names (due to the licensing issue you mentioned). The "Cleveland Natives" was hilariously bad.

  3. Patriot,

    I can't believe this game was considered state-of-the-art 13 or 14 years ago. Did you watch the video? I remember that clunky VCR replay camera. They mentioned it in the video.

    Plate discipline is handled poorly in nearly every baseball video-game. I haven't played video-games in 5+ years, so I don't know if it has been improved upon in the newer baseball video games. High Heat Baseball for the PS2 handled plate discipline fairly well. I could draw about 2-3 walks per game.

  4. I cannot believe that you would knock the revolutionary Camera Angle Management System (CAMS) (TM). I thought that was so cool at the time.

    FPS is the last baseball video game I played (I also played Hardball, Tony LaRussa, and Earl Weaver, but those are all either contemporaries or predate FPS). I play OOTP now (actually, I haven't in about a year, but if I was going to play a game, that would be the one), but it has no graphics. Which is okay because I was a nerd who only managed on FPS anyway. Sometimes I would pitch and run the bases, but that was it.


I reserve the right to reject any comment for any reason.