Monday, October 30, 2006

Evaluating Pitcher Winning %, Pt. 3

All of the widely used and published Wins Above Team formulas have compared a pitcher to what a .500 pitcher would be expected to do for that team. But with most sabermetricians preferring the use of a lower baseline for most player comparison questions (usually the murkily-defined “replacement level”), why has no one adapted replacement level to use in a WAT system?

The most likely answer is that most sabermetricians don’t really bother with WAT-type methods, and so there is no need to bother with a replacement level. As my writing this series has shown, I am bothering, so I will do it.

Before we apply replacement level to Wins Above Team, we must determine what replacement level is. I have decided, for reasoning that I will not repeat here, to use a pitcher who allows runs at 125% of the league average as the definition of a replacement-level pitcher. Using a Pythagorean exponent of 2, this corresponds to a .390 W%. The .390 W% is what we assume this pitcher will have in an average context; i.e. on a .500 team.

Now we can apply the Oliver approach. Recall from part one that NW% = W% - Mate + .500. We know that the replacement pitcher has a .390 NW%, so if we know Mate, we can solve for RW%(Replacement W%) as follows:
RW% = NW% + Mate - .500
Suppose we have a replacement level pitcher on a .550 team. We are saying that his W% with this team, based on the Oliver assumption, will be .390 + .550 - .500 = .440. So to calculate what I will call Wins Compared to Replacement (WCR), we have:
WCR = (W% - RW%)*(W + L)
For our default .390, RW% = Mate - .11

We can define RW% for Deane’s construct as well. Assuming that the replacement has a lower W% then that of his team (which should almost always be the case for a replacement-level pitcher), then we need to solve for RW% in the formula NW% = .5 - (Mate - RW%)/(2*Mate), which gives:
RW% = 2*NW%*Mate
With our .390 NW%, we get .78*Mate.

With the Pythagorean based model I discussed in part 2, we know that the percentage of league-average runs a Mate team will score is equal to x, where:
x = (Mate/(1-Mate))^(1/z)
where z is the Pythagorean exponent. Knowing x, we just need to figure out the W% for a pitcher allowing runs at the replacement level (125% of league average, or generally, r). We don’t have to screw around with the replacement pitcher’s W%, because our starting definition of replacement is based on runs allowed. So:
RW% = x^z/(x^z + r^z)

In using the quick method approximating Pythagorean, which I guess I should call the Wood approach, since he published it in 1999, the definition will be NW% = RW% - (Mate +.5)/2 + .5. Solving this gives:
RW% = NW% + (Mate + .5)/2 - .5
For a .390 replacement, this comes out to RW% = .14 + Mate/2

What would our estimates of replacement winning percentages be on a .600 team, in a z = 2 environment? For Oliver, it would be .390 + .600 - .500 = .490. For Deane, 2*.390*.600 = .468. For my approach, x = 1.107, so 1.107^2/(1.107^2 + 1.25^2) = .440. For Wood, .14 + .6/2 = .440.

As you can see, the assumptions effect our assumption of the replacement pitcher. Wood and I think that if you put a replacement level pitcher on a truly great team, a .700 team, he will still only have a .490 W%! Oliver's assumptions would lead you to believe he would be a .590 pitcher and Deane's would lead you to believe he is a .546 pitcher.

So now you can couple your NW% with replacement value, if you so desire. For our old friend Red Ruffing, pitching on .554 Mate teams for his career, his replacement would be expected to go .417 (Wood assumptions), so his 273-225 is +65 wins above replacement. According to B-R, the league ERA in Ruffing’s time was 4.15. A decent assumption for modern times is that 90% of runs are earned, and so we’d convert that to RA of 4.61. Ruffing himself had a 4.39 RA in 4344 innings. A replacement would be expected to allow 4.61*1.25 = 5.76, and so Ruffing would be (5.76-4.39)*4344/9 = +661 runs above replacement. Assuming that RPW = RPG, the RPW would be 9.22, making him 661/9.22 = +72 wins above replacement. So his career W-L record is about 7 WAR worse then his runs allowed. You’d have to do this sort of comparison for other pitchers to get a sense of how this should impact his HOF case (if it should at all, which of course is a whole different can of worms).

After going through all of this, I am now going to ask “Why?” I suppose I could consult a therapist of some kind to help answer this, because after all it was me who wrote all of this, and now am going to talk about why I don’t think it’s a great idea to compare a pitcher’s W% to his team’s at all.

And the reasoning behind this is not going to be the standard complaints about the deficiencies of win-loss records that everybody reading this blog already knows or if they don’t, are way over their head and stopped reading a long time ago. Even if you grant me for the sake of argument that win-loss records ARE valuable for evaluating pitchers, I will claim that they should not be compared to the W% of their teams.

The reason for this is that all throughout this series, I have been referring to the assumptions that each of these approaches take. Oliver inherently assumes that a team has an average offense. Deane inherently assumes that most of the deviation from .500 is due to the defense, with the offense playing a small role. Wood explicitly assumes that a team is equally skilled on offense and defense.

But in the real world, when we look at real teams, we don’t have to assume anything about their offense and defense. We know exactly how many runs they scored, or how many runs they allowed. We can compare a pitcher directly to the number of games he’d be expected to win based on the number of runs his team scored. Or in modern times, even better, his Run Support, the number of runs his team scored in either games he pitched or while he was in the game, depending on whose definition of the statistic that you use.

The only drawback to this is that we have to consider park factors, or if we choose to ignore them, accept that we have a flaw in our model. But doing so is just one extra step, and I think that far outweighs the silly assumption that all teams are perfectly balanced between offense and defense.

Actually, I cheated a little bit. One could, I suppose, make the argument that the W% implicitly contains information about the team’s distribution of runs scored, while using the team’s R/G or Run Support ignores the distribution. This is true, but the W% contains many other polluting elements, and therefore the fact that it does in a small way include information on the run distribution does not make it a better option on the whole.

I suppose, if you really wanted to get into it, look at a pitcher’s performance when his team scores three runs, and compare this to what an average or replacement level pitcher would do when supported by three runs, and repeat this for every support level. Bill James did this to compare Danny Jackson and Walt Terrell in the 1988 Abstract, concluding that despite Terrell’s superior W-L record, Jackson pitched better. Ironically, B-R lists the most similar pitcher to Danny Jackson as…Walt Terrell (although Jackson is only eighth on Terrell’s list; Richard Dotson wins there).

1 comment:

  1. Over at the Inside the Book Blog, Tango points out something that I overlooked, but concur on: that over the course of a career, we should expect the offense/defense split to approach equality. In fact, he points out that the range in actual off/def split for the population of pitchers with long careers would probably not be far removed from the distribution of park factors for these pitchers.

    Anyway, I made the unrelated point in my post there that the Oliver method should be retired. It is only natural to simply compare pitcher W% to Mate, but if you just sit down and consider the assumption that you are making by doing this, you will see that it is going to be misleading more often then not. I think this is just a case of people not thinking about what they are doing--I would be surprised if there were a sizeable number of people who would argue against the Wood assumptions (at least as an alternative to the Oliver assumptions).

    Additionally, the Wood approach is very simple and tracks the more precise Pythagorean approach very well. I see no reason to use the Oliver or Deane approaches if you are going to do this type of analysis.


I reserve the right to reject any comment for any reason.