How best to evaluate a pitcher’s W-L record? While it has plenty of contextual biases, one that it does not have is park/era, since W% always is .500 for the league as a whole. This makes pitcher win-loss record a fairly interesting thing to look at, at least on the career level.
But of course the biggest pollution is the quality of the team around him. So it only seems natural that for many years, would-be sabermetricians have compared a pitcher’s W% to that of his team. Usually, this comparison is done only after the pitcher in question’s decisions have been removed. The reasoning for this is that we do not want to compare the pitcher to a standard that he himself has contributed to. Anyway, Ted Oliver’s Weighted Rating System was the first such approach, and the one most commonly used:
Rating = (W% - Mate)*(W + L)
Where Mate, to borrow a designation from Rob Wood, is the W% of his teammates (TmW - W)/(TmW + TmL - W - L). Oliver’s rating gives a number of wins above what an average teammate would have achieved in the same number of decisions. We could also call this Wins Above Team as Total Baseball does.
A related question is what is the projected W% of this pitcher on an otherwise .500 team? I’ll call this Neutral W%, to use the same abbreviation but a different name then Bill Deane does, so that my general term won’t get confused with his specific one. For the Oliver approach:
NW% = W% - Mate + .500
If this is not intuitively obvious, consider a 20-10 pitcher on a .540 Mate team. His WAT is (.667 - .500)*(20 + 10) = +3.8. If he is 3.8 wins above average in 30 decisions, this implies that he is 3.8 wins better then 15-15, or 18.8-11.2. This is an equivalent W% of 18.8/30 = .627, the same result as .667-.540+.500.
What begins to become clear as you look at how the method works is that it assumes that a .500 pitcher on this team would have a .540 record. This means that all of the team’s deviation from .500 is attributed to the offense or fielders. This assumption is clearly wrong, at least for a randomly selected team--given a random team, we should assume that they are equally skilled on offense and defense. Obviously, in some cases this assumption will be dreadfully wrong--but it will be correct more often then assuming that EVERY team deviates from .500 only because of offense and one particular pitcher whom we isolate to calculate his WAT/NW%.
We can find some historical examples where the assumption of the Oliver method really causes problems. The most notorious case is that of Red Ruffing, who so far as I know is the only Hall of Fame starter with a W% worse then that of his teammates. For his career, Ruffing was 273-225(.548), while the rest of his team was .554. This is a .494 NW% and -3 WAT. As a side note, WAT is also equal to (NW%-.500)*(W+L).
Ruffing did pitch for Yankee teams with great offenses, but he also had mound teammates like Lefty Gomez, Johnny Allen, and Spud Chandler (at various times). In 1936, for example, Ruffing was 20-12(.625), while the rest of the team was .678, for -1.7 WAT and a .447 NW%. His team did score a whopping 1065 runs, but they also led the league with 731 runs allowed. An average pitcher in the 1936 AL (who would have a 5.67 RA), would figure to have only a .594 record if supported by New York’s 6.87 runs/game.
We’ll check in on Ruffing more as we go. Bill Deane, formerly a Senior Research Associate at the Hall of Fame, developed his own method to divorce a pitcher’s W% from that of his team. Deane’s insight was that the further above .500 a team’s W% was, the less margin there was to improve upon it. A .500 team could be bettered by .500; a .625 team only by .375. A bad team could be improved by even more. So Deane rated equally pitches who improved their teams by equal percentages of the potential margin.
A .550 pitcher on a .500 team improved his team by .050 out of a possible .500 (10%); so did a .460 pitcher on a .400 team (.060/.600 = 10%). Thus, they are each credited with the same .550 NW% (Deane used the term Normalized W% for this). If it is not clear why the normalized percentage should be .550 for each pitcher, it is because a .500 team has a .500 margin for improvement, and 10% of .500 is .050. Following this logic, Deane would up with these formulas for NW%:
If W% >= Mate:
NW% = (W% - Mate)/(2*(1 - Mate)) + .500
If W%< Mate:
NW% = .500 - (Mate - W%)/(2*Mate)
The second formula comes from the fact that on a .600 team, there is a .600 margin for lowering the W%; a .550 pitcher did this by 8.33%, so .0833*.5 = .042, for a NW% of .458. Total Baseball (unlike Thorn & Palmer’s earlier Hidden Game which used Oliver’s formula) used Deane’s methodology to calculate WAT. A poster child for considering the margin for improvement is Steve Carlton in 1972, who was 27-10(.730) for a team that was otherwise 32-87(.269). Under the Oliver methodology, this is a nearly impossible .961 NW% and +17.1 WAT. Using Deane’s approach, it is an .815 NW% and +11.7 WAT (still the highest since Lefty Grove in 1931).
How does Ruffing fair under this approach? Career-wise, since his W% was so close to Mate to begin with, not much changes--he now sports a .495 NW%(v. .494) and -2.7 WAT(v. -3). In 1936 he moves from .447 to .461 and from -1.7 to -1.3 WAT.
Thursday, August 24, 2006
Evaluating Pitcher Winning %, Pt. 1
Thursday, August 10, 2006
Third and Third
As I write this, yesterday Oregon State, pretenders to the abbreviation of OSU, won the College World Series (yes, I really did write this in June). I figured it would be a good time to look back at the season of The OSU.
The Buckeyes finished third in the B10 regular season, crippled by a sweep in the heart of darkness. Northwestern shockingly was able to grab second after a horrific non-conference performance. Minnesota had a second consecutive year where they were not a major player in the race for the regular season title, but still qualified for the six-team tournament, which was filled out by Purdue and Illinois.
In the tournament, OSU beat Purdue and Northwestern but were tripped up in the winner’s bracket final by Minnesota and then lost the loser’s bracket final to those who shall not be named. They who shall not be named beat Minnesota two straight as the Gophers for the second straight year placed second in the tournament (to OSU in 2005). So those who shall not be named got the only B10 bid to the NCAA tournament.
While the Buckeyes fell short of a championship, they still had a solid season. Considering all games, OSU was second in W% at 37-21, .638 (those guys led at .672). But the Buckeyes paced the conference in EW%(.721; Minnesota was second at .627) and PW%(.728 with Minnesota second at .628). The Buckeyes also led in R/G(6.66; MSU second at 6.18) and RA/G(4.17; the bad guys second at 4.42). Northwestern’s W%, EW%, and PW% were .411(ninth of ten), .467(fifth), and .417(ninth), a simply bizarre combination for a second-place team. They were lucky that they did not have to face OSU, but even had they played the Bucks and been swept they would have finished in the first division.
With that, I will take a look at the individual performances of OSU players. Incidentally, all of the spreadsheets I used will be posted soon on my website if you are interested. Offensively, the Buckeyes were led by B10 MVP Ronnie Bourquin, the third baseman who was a second round pick to the Tigers. He narrowly missed the B10 triple crown, and hit 416/490/612 with 67 RC, a 12.2 RG(versus a conference average of 5.63, and +36 RAA. As you can see, his ISO was .196, but various scouting reports I saw before the draft said that he had power potential he had not shown in games. I have no trouble believing this, and can certainly understand why nobody on the collegiate level tried to mess with the form of a .400 hitter.
The Ohio offense was solid from top to bottom--sophomore centerfielder and leadoff man Matt Angle improved greatly, with a .449 OBA, 25-29 stealing, and +21. Sophomore catcher Eric Fryer was great again, with more power but less walks then Angle, resulting in nearly identical values(Angle created 54 runs and 9.2 per game; Fryer 54 and 9.2 per game). Senior captain and eighth round Oriole selection Jeddidiah Stephen finished his career at +16, 8.2, and his junior double play partner Jason Zoeller was second to him on the team in isolated power, +11 runs and 7.8 per game.
Junior Jacob Howell struggled through hamstring injuries, but hit a sizzling 402/448/500, 9.9, +15 RAA when able to play. The two weak spots in the lineup were Justin Miller, a freshman first baseman who started slowly but improved as the year went on, finishing at 4.3, -5. Junior rightfielder Wes Schirtzinger struggled greatly at the plate, with 257/321/296, 3.8, -11. The other hitters with over 100 PA were freshman OF/1B/P JB Shuck (6.1, +2) and DH Adam Schneider (4.7, -4).
The Buckeye pitching was solid again, tops in the B10 without a real standout. The ace was junior lefty Dan DeLucia with a 3.67 RA, +27 RAA, and 5.8 K per game, which may be why he went undrafted. Cory Luebke, a 22nd round pick of the Rangers as a draft eligible sophomore was 4.34 and +15. Freshman Jake Hale was the (relative) weak link at 4.92, +7. B10 Freshman of the Year JB Shuck probably looked better with traditional stats, as is 4.56 RA was a full two runs higher then his 2.51 ERA. Shuck, depending on your perspective, was victimized by his defense or had some mistakes obscured by the silly points of the earned run rule. His 4.52 eRA and .298 H/BIP lead me to the latter. But for a freshman, 79 innings and 12 runs above average is nothing to sneeze at.
There were really only four pitchers who got significant innings out of the pen. Rory Meister served as closer and had a 4.36 RA despite a 5.76 eRA. His control was very poor, walking 28 in 33 frames, but his H/BIP was a very high .382. Josh Barerra, a true freshman, had similar issues, walking 20 in 38 innings with a 5.68 RA and 7.16 eRA but a .429 hit rate. Both pitchers struck out a lot of batters and have shown evidence that they can be effective, but certainly need some polish. Trey Fausnaugh was pounded again with a 6.11 RA and 8.13 eRA. As were the other key relievers, he was victimized by a high hit per BIP rate at .409. Dan Barker was good again in 4 starts and 14 relief appearances, with a 4.15 RA and 3.57 eRA.
This was a fairly young team, but with only two (potentially three if Luebke was to sign with Texas) major losses, and a solid performance, it looks as if Ohio State will once again be a major player in the 2007 Big Ten race.