## Thursday, August 24, 2006

### Evaluating Pitcher Winning %, Pt. 1

How best to evaluate a pitcher’s W-L record? While it has plenty of contextual biases, one that it does not have is park/era, since W% always is .500 for the league as a whole. This makes pitcher win-loss record a fairly interesting thing to look at, at least on the career level.

But of course the biggest pollution is the quality of the team around him. So it only seems natural that for many years, would-be sabermetricians have compared a pitcher’s W% to that of his team. Usually, this comparison is done only after the pitcher in question’s decisions have been removed. The reasoning for this is that we do not want to compare the pitcher to a standard that he himself has contributed to. Anyway, Ted Oliver’s Weighted Rating System was the first such approach, and the one most commonly used:
Rating = (W% - Mate)*(W + L)

Where Mate, to borrow a designation from Rob Wood, is the W% of his teammates (TmW - W)/(TmW + TmL - W - L). Oliver’s rating gives a number of wins above what an average teammate would have achieved in the same number of decisions. We could also call this Wins Above Team as Total Baseball does.

A related question is what is the projected W% of this pitcher on an otherwise .500 team? I’ll call this Neutral W%, to use the same abbreviation but a different name then Bill Deane does, so that my general term won’t get confused with his specific one. For the Oliver approach:
NW% = W% - Mate + .500

If this is not intuitively obvious, consider a 20-10 pitcher on a .540 Mate team. His WAT is (.667 - .500)*(20 + 10) = +3.8. If he is 3.8 wins above average in 30 decisions, this implies that he is 3.8 wins better then 15-15, or 18.8-11.2. This is an equivalent W% of 18.8/30 = .627, the same result as .667-.540+.500.

What begins to become clear as you look at how the method works is that it assumes that a .500 pitcher on this team would have a .540 record. This means that all of the team’s deviation from .500 is attributed to the offense or fielders. This assumption is clearly wrong, at least for a randomly selected team--given a random team, we should assume that they are equally skilled on offense and defense. Obviously, in some cases this assumption will be dreadfully wrong--but it will be correct more often then assuming that EVERY team deviates from .500 only because of offense and one particular pitcher whom we isolate to calculate his WAT/NW%.

We can find some historical examples where the assumption of the Oliver method really causes problems. The most notorious case is that of Red Ruffing, who so far as I know is the only Hall of Fame starter with a W% worse then that of his teammates. For his career, Ruffing was 273-225(.548), while the rest of his team was .554. This is a .494 NW% and -3 WAT. As a side note, WAT is also equal to (NW%-.500)*(W+L).

Ruffing did pitch for Yankee teams with great offenses, but he also had mound teammates like Lefty Gomez, Johnny Allen, and Spud Chandler (at various times). In 1936, for example, Ruffing was 20-12(.625), while the rest of the team was .678, for -1.7 WAT and a .447 NW%. His team did score a whopping 1065 runs, but they also led the league with 731 runs allowed. An average pitcher in the 1936 AL (who would have a 5.67 RA), would figure to have only a .594 record if supported by New York’s 6.87 runs/game.

We’ll check in on Ruffing more as we go. Bill Deane, formerly a Senior Research Associate at the Hall of Fame, developed his own method to divorce a pitcher’s W% from that of his team. Deane’s insight was that the further above .500 a team’s W% was, the less margin there was to improve upon it. A .500 team could be bettered by .500; a .625 team only by .375. A bad team could be improved by even more. So Deane rated equally pitches who improved their teams by equal percentages of the potential margin.

A .550 pitcher on a .500 team improved his team by .050 out of a possible .500 (10%); so did a .460 pitcher on a .400 team (.060/.600 = 10%). Thus, they are each credited with the same .550 NW% (Deane used the term Normalized W% for this). If it is not clear why the normalized percentage should be .550 for each pitcher, it is because a .500 team has a .500 margin for improvement, and 10% of .500 is .050. Following this logic, Deane would up with these formulas for NW%:
If W% >= Mate:
NW% = (W% - Mate)/(2*(1 - Mate)) + .500
If W%< Mate:
NW% = .500 - (Mate - W%)/(2*Mate)

The second formula comes from the fact that on a .600 team, there is a .600 margin for lowering the W%; a .550 pitcher did this by 8.33%, so .0833*.5 = .042, for a NW% of .458. Total Baseball (unlike Thorn & Palmer’s earlier Hidden Game which used Oliver’s formula) used Deane’s methodology to calculate WAT. A poster child for considering the margin for improvement is Steve Carlton in 1972, who was 27-10(.730) for a team that was otherwise 32-87(.269). Under the Oliver methodology, this is a nearly impossible .961 NW% and +17.1 WAT. Using Deane’s approach, it is an .815 NW% and +11.7 WAT (still the highest since Lefty Grove in 1931).

How does Ruffing fair under this approach? Career-wise, since his W% was so close to Mate to begin with, not much changes--he now sports a .495 NW%(v. .494) and -2.7 WAT(v. -3). In 1936 he moves from .447 to .461 and from -1.7 to -1.3 WAT.