Walk Like a Sabermetrician: Meanderings

Tuesday, September 16, 2008

Meanderings

...thus, you are free to disregard it even more than a normal post.

*If the only difference between a no-hitter and a one-hitter (in a specific case) is a matter of scorer’s discretion, than why should I treat that particular no-hitter as something special? I believe this is a case where people elevate statistics above the game itself. Regardless of how the play was scored, the outcome was the same--a runner on first base.

This is not a diatribe against the error, although I could write one. If MLB had overruled the scorer’s decision, what would that mean? It’s even less obviously meaningful in a case in which it was the pitcher’s supposed error that is in question. Either way, he was responsible for what happened.

I think that much of the problem here is the error itself; the silly belief that the error promotes that by tracking it, we can remove pitching from fielding, when they are in fact much more deeply intertwined. For my money, the pitcher’s job is to both pitch and field when appropriate, and that what matters is the overall performance, rather than a somewhat arbitrary segmentation of his role into two distinct parts. Whether it actually was a hit or an error, Sabathia allowed a runner to reach first base. In the terms that actually matter (runs, outs, wins), that’s the bottom line.

(In case anyone reads this in five years, the CC Sabathia one-hitter of August 31 is the game in question.)

*When I have occasion to discuss the matter of “Who’s the better player” with people not versed in sabermetrics (I generally try to avoid this, but it’s a topic that’s fairly pervasive when you talk baseball, and so it’s sometimes necessary), I still find that there are people who absolutely balk at the idea that outs are what a player’s overall production should be measured against.

In my arrogant and stupid early days as a sabermetrician (some things never change), I used to argue for PA as the denominator for Runs Created, not outs. However misguided I may have been, I never dismissed out of hand the idea that the number of outs a player makes is unimportant once you account for how many times he has been to the plate. You might argue that I implicitly did so by not agreeing with the outs camp, and that’s fair, but I understood the idea that the number of plate appearances the team would get is directly tied to the rate at which they make outs.

The interesting thing is that everyone agrees with the idea of an outs denominator when applied to pitchers. What are the two most used pitching rate stats by the general public? Certainly ERA is one of them, and if WHIP is not second, it’s definitely third or fourth (K/W and W% are the only other contenders). What both of those have in common is that they are denominated in outs.

Why don’t people see this and assume that hitting stats should be constructed in the same manner? I see three possibilities. One is that people just don’t think that hard about this stuff. The second is that people just don’t connect innings pitched with outs--that's exactly what they are, opponents’ outs divided by three, but the name makes you think about it. They don’t have anything to do with the actual division into innings...wait, you guys know this. The third is that perhaps people inherently recognize the fact that the individual pitcher (plus his defense, whose efforts is all over his statistical line) is his own team, whereas the batter is operating in a context of eight others. I doubt that is it, though, since that is a subtle point that many people interested in sabermetrics sometimes need a little prodding on (for example, you can’t apply RC directly to individual hitters). It is a valid point, but one that doesn’t really effect the rate stat issue except for very extreme cases.

Perhaps my experience is not reflective of the thoughts of the larger group of baseball fans. I don’t know, but next time I get the opportunity, I’ll try pulling out the “you measure pitchers with a denominator of outs” card.

Along the same lines, I am always amazed at the deference with which some people view the RBI, while heeding no attention to runs scored. Not that I want to perpetuate the use of either, but it is certainly incomplete to consider only one side of the coin.

*A couple years ago I wrote a bit about methods to adjust a pitcher’s win-loss record based solely on his team’s record, and I posted some career data for notable pitchers. This family of “Neutral W%” metrics was pioneered by Ted Oliver, and the version I used is based on logic first formally presented by Rob Wood. Anyway, the formula for NW% that I use is:

NW% = W% - Mate/2 + .25

Where Mate is the team’s winning percentage with the individual’s decisions removed.

This is a linear approximation of a more advanced Pythagorean-based function, and works fine for most cases. Occasionally you will get extreme cases that may test it, and Cliff Lee’s season had the potential to be one of them. Fortunately (due to my rooting preferences), the Tribe has started to win games that Lee does not start with some regularity, and the uniqueness of the situation has been lessened. Still, it is odd to see a 21-2 pitcher (as of September 10) on a 71-73 (and thus .413 Mate) team. That gives him a NW% of .956, as opposed to his actual .913 W%.

If you go through the more complex Pythagorean approach (with an exponent of 2), you get a NW% estimate of .926 for Lee. So for a pretty extreme case, we are off by .7 wins over 23 decisions. This, coupled with the fact that the whole exercise of using pitcher’s W% in the first place is imprecise, is why I use the linear approximation to find NW%.

How good is a .956 NW%? I have figured NW% for every Hall of Fame pitcher as well as a number of other great pitchers, around 150 or so. Obviously, there are many fine pitchers outside the scope of that group, including one year wonders like Lee. Still, it’s a sample that’s likely to include many of the best single season performances.

In this group, there is only one pitcher with ten or more wins in a season and a NW% of .900--Randy Johnson, 1995. The Big Unit went 18-2 with a 79-66 team which is a .906 NW%.

Steve Carlton’s 27-10 season in 1972 for a 59-97 team comes in at .845. Koufax’s best season is 1964 (19-5, .821). Grove’s best is 1931 (31-4, .816). Seaver’s best is 1981 (14-2, .842). Pedro’s best is 1999 (23-4, .839). Maddux’s best is 1995 (19-2, .866). Clemens’ best is 2001 (20-3, .846). Cy Young’s best is 1901 (33-10, .770). Mathewson’s best is 1909 (25-6, .782). Walter Johnson’s best is 1913 (36-7, .844). Alexander’s best is 1915 (31-10, .740). Joe Wood’s 34-5 season in 1912 was good for a .808 NW%.

Obviously, I do not mean to suggest that Lee has had the greatest season ever. NW% is a crude tool on a number of levels. However, one must admit that from a “playing around with numbers” perspective, Lee’s win-loss record is remarkable.

8 comments:

UnknownSeptember 16, 2008 at 1:33 AM
It is refreshing to learn Cliff Lee is a "one year wonder" with his astonishing record of 22 - 2.

Speaking of wonder, I wonder who the Cliff Lee was who was fourth in the Cy Young voting three years ago with a record of 18 - 5.

Or was that other Cliff Lee also a "one year wonder."
ReplyDelete
Replies
AnonymousSeptember 16, 2008 at 2:38 AM
Patriot,

I have seen you express R/O in 2 ways and R/PA in 2 ways. I wondered if you could explain the difference between:

R/O
R+/O

and

R/PA
R+/PA
ReplyDelete
Replies
pSeptember 16, 2008 at 10:07 AM
Robert, you mean the Cliff Lee who was fourth in the league with a whopping 6.46 run support and who finished 19th in Runs Above Replacement, but got Cy Young votes because of his shiny W-L record? I remember him too, and he certainly doesn't make this Lee season any less out of line with his career.

Terps, R/O and R/PA are the straightforward versions you could expect--RC/Out and RC/PA, with RC of course being the run estimator of your choice (as long as it's an absolute out value (-.1 type) version).

R+/PA was posted at FanHome several years back by a poster named Sibelius. What it does is add the extra runs generated indirectly by the batter as a result of creating more PAs for his teammates to his regular RC. The formula for R+ is:

R+ = RC + (O/PA - Lg(O/PA))*Lg(R/O)
or = RC + (OBA - LgOBA)*Lg(R/O) if you are only considering outs = AB - H

Then R+/PA is just that divided by PA. If you use this to find runs above average (R+/PA - Lg(R/PA))*PA, you will get the exact same result as if you used (R/O - Lg(R/O))*O. The rank of players in R+/PA and R/O will not match exactly, but they will be very close.

R+/O+ is something that David Smyth came up with, but it really only applies with a Theoretical Team estimator. Basically, the "R+" is the runs the team expects to score as a result of adding the player (which includes the effect of extra PA created), and the "O+" is his PA times the new out rate/PA of his team. It's not really worth worrying about.
ReplyDelete
Replies
pSeptember 16, 2008 at 10:09 AM
Sorry, I reversed the sign in the R+/PA formula. It should be

R+ = RC - (O/PA - Lg(O/PA))*Lg(R/O)

If you make outs at a rate less than the league average, your contribution goes up. Duh, p.
ReplyDelete
Replies
UnknownSeptember 16, 2008 at 5:17 PM
It just goes to show how deceptive statistics can be.

If you look at career stats among active pitchers with more than 100 decisions, there's that name "Cliff Lee" again. He has a career record of 76 - 38. His .667 career win percentage ranks him fourth behind Pedro Martinez, Johann Santana and Roy Oswalt.

How silly of me not to realize that with your advanced mathematical formulas you can dismiss him as a "one year wonder." Go figure.
ReplyDelete
Replies
pSeptember 16, 2008 at 5:50 PM
Statistics can be deceptive...and win-loss record is exempt from this?

I never meant to imply that Cliff Lee was a bad pitcher. But I stand by Lee as one of the top pitchers in the league being a one year wonder, at least to this point in his career.

Cliff Lee has exceeded the W-L record that you would expect from his runs allowed rate and his run support by about five games over the course of his career, about 1 win/32 starts. Nothing out of the ordinary. Nothing to suggest that his pedestrian runs allowed rates and peripherals give a misleading estimate of his contribution.
ReplyDelete
Replies
AnonymousSeptember 21, 2008 at 3:16 PM
Speaking of suprise performances for the Indians this year, how about Shin-Soo Choo. Talk about flying under the radar. In 88 games he's hitting .310/.400/.559. I had always thought that Choo would be a good player if given the opportunity to play on a regular basis.
ReplyDelete
Replies
pSeptember 21, 2008 at 11:27 PM
I too am a fan of Choo. As I'm sure the Mariner fans will never forget, in 200 two seperate trades sent the Indians' first base platoon of Eduardo Perez and Ben Broussard to Seattle for Shin-Soo Choo and Asdrubal Cabrera. Good one, Bavasi.
ReplyDelete
Replies

Add comment

I reserve the right to reject any comment for any reason.

Walk Like a Sabermetrician

Tuesday, September 16, 2008

Meanderings

8 comments:

Me, Elsewhere

Analysis Links

Reference Links

Blog Archive

OSU Baseball

End of Season Statistics

Win Shares Walkthrough

NL 1876-1881 Series

Labels

About Me