Tuesday, December 26, 2006

Leadoff Hitters, 2006

Last year I did a piece ranking the leadoff performances of each teams in a number of categories. I will do the same this year, although without a lot of the comments about each method and how it is calculated. I’ll refer you to last year’s post for that.

In brief, though, I don’t believe that this is a particularly useful activity--for the large part, hitters are hitters, regardless of what slot in the order they bat. Leadoff is probably the most important role, but in general, the best leadoff hitter and the best hitter period would be the same guy. That does not of course mean that leadoff is necessarily the best possible slot for the best hitter, but in general, too much is made of lineup construction among traditional folks anyway. Nevertheless, it is an interesting exercise if not particularly enlightening.

The data comes from the Baseball Direct Scoreboard and is for the team’s #1 slot hitters as a whole. I have listed in parentheses the guys who had the most games played while batting in the #1 spot, which sometimes is less then half of the team’s games. The “ML average” listed in the table is for ML leadoff hitters, not the entire league as a whole. I’ll sometimes discuss the league total in my comments.

The first category I’ll look at is good old runs scored, per 25.5 outs:
1. CLE(Sizemore), 7.3
2. NYA(Damon), 6.7
3. NYN(Reyes), 6.6
ML Average, 5.5
28. ARI(Counsell), 4.8
29. CIN(Freel), 4.6
30. CHN(Pierre), 4.2
Johnny Damon’s Red Sox were number one a year ago, and his Yankees are #2 this go around. Of course, runs scored are heavily influenced by the succeeding batters, and it’s little surprise three of the game’s best offenses are represented in the top 3 spots here. Juan Pierre was seen as a leadoff solution for the Cubs, but as I pointed out last year, this was dubious as he was coming off a year in Florida in which the Marlins were in many of the trailer categories.

On Base Average is an obvious criteria to look at:
1. CLE(Sizemore), .369
2. LA(Furcal), .366
3. SEA(Suzuki), .365
ML Average, .339
28. ARI(Counsell), .301
29. MIL(Weeks), .300
30. PIT(Duffy), .298
The average for all players was .333, so the leadoff advantage is only six points; last year it was ten. The Yankees are fourth on the list, so Damon did his job, although none of the OBAs are eye-popping for an individual player.

Runners On Base Average removes HR and CS from OBA, leaving it not as a pure measure of skill but as an accounting for the percentage of PA in which the leadoff men sets the table by remaining on base:
1. SEA(Suzuki), .351
2. LA(Furcal), .330
3. OAK(Kendall), .327
ML Average, .305
28. WAS(Soriano), .271
29. ARI(Counsell), .267
30. MIL(Weeks), .265
The average for all hitters was .293, making a larger leadoff/overall gap in ROBA then in OBA. Last year, when the opposite was true, I presumed it was because of the high number of caught stealings racked up by leadoff hitters. Washington is near the bottom here because of Soriano’s 40 HR season (they ranked ninth in OBA). The usual suspects, Cleveland and New York, come in at seventh (.321) and eleventh (.318) respectively.

Run Element Ratio from Bill James is not a skill or production measure at all. It is a ratio between offensive elements ideally placed at the beginning of an inning to set it up (walks and steals) versus those ideally placed at the end to clean it up (extra bases):
1. LAA(Figgins), 2.1
2. OAK(Kendall), 1.8
3. MIN(Castillo), 1.7
ML Average, 1.0
28. CLE(Sizemore), .6
29. TOR(Johnson), .6
30. TEX(Matthews), .6
The overall average is .7, and only five teams were below that with their leadoff hitters (the three above as well as Kansas City and Tampa Bay). Sizemore’s power again put Cleveland in the bottom three, while Texas is last for the second year in a row despite changing their primary leadoff hitter from David Dellucci to Gary Matthews. Since those two are now in Cleveland and Los Angeles, we’ll see if they can do it again with a new man in 2007.

Another Bill James tool was his own method for evaluating leadoff hitters, which I call Leadoff Efficiency. This is the number of expected runs scored per 25.5 outs, which is a (relatively) pure of the leadoff man, unlike the actual runs scored figures we looked at first:
1. CLE(Sizemore), 7.1
2. NYN(Reyes), 6.6
3. WAS(Soriano), 6.6
ML Average, 5.6
28. STL(Eckstein), 4.8
29. ARI(Counsell), 4.7
30. MIL(Weeks), 4.6
Damon is again just off the list, fourth at 6.5. Last year the leadoff efficiency formula overestimated actual runs scored for leadoff hitters by a fairly big margin, but this year, the actual was 5.49 and the expected 5.55, not bad at all. Scott Podsednik is fourth to last and Chone Figgins sixth to last.

One can always just look at a leadoff hitter just like we would any other. So here is the list by good old Runs Created per Game:
1. CLE(Sizemore), 6.9
2. NYN(Reyes), 6.4
3. WAS(Soriano), 6.3
ML Average, 4.9
28. LAA(Figgins), 3.9
29. ARI(Counsell), 3.9
30. MIL(Weeks), 3.7
The average for all hitters was 5.0, so once again the average leadoff hitter was worse then the average hitter. Damon is fourth at 6.2

Last year I included what I called Pure Leadoff RAA. Basically, it is the linear weight RAA total a player would get if he always batted with nobody on base and nobody out (the ideal leadoff situation). I based it off of Pete Palmer’s Run Expectancy table from The Hidden Game for simplicity’s sake, which means it is not fully adapted to the run environment of today’s game, but the values should not be too far off. One assumption that the formula makes that I did not mention last year is it assumes that all stolen base attempts occur during the next batter’s PA (or in other words, in a runner at first, no out situation). Here are the figures in this category:
1. CLE(Sizemore), +37
2. NYN(Reyes), +32
3. WAS(Soriano), +29
ML Average, +1
28. ARI(Counsell), -9
29. STL(Eckstein), -10
30. MIL(Weeks), -12
NYA is again fourth at +27. The top three are the same as the RG list for overall hitting, with Soriano’s homers only worth 1 run instead of 1.46 there.

Last year, David Smyth suggested that I look at a modified OPS, 2*OBA + SLG, which I will call 2OPS. Since the optimal OPS construction is something like 1.7 or 1.8*OBA + SLG, using 2 is a way to give a bit more credit to the on base side of things while still having a decent overall measure of production. Since the OPS units are meaningless anyway, I scaled these back so that the league 2OPS ~ league OPS. So these figures are for (2*OBA + SLG)*.7:
1. CLE(Sizemore), 893
2. WAS(Soriano), 864
3. TEX(Matthews), 847
ML Average, 767
28. ARI(Counsell), 686
29. PIT(Duffy), 680
30. MIL(Weeks), 677

Although he may not fit the ideal prototype of a leadoff hitter, Grady Sizemore still comes out on top in most categories as the top leadoff hitter in the game in 2006, with Jose Reyes, Alfonso Soriano, and Johnny Damon close behind in many categories.

Friday, December 22, 2006

Historical Hitting by Postion

I’m excessively fond of chiding the National League as the “Neanderthal League” at every available opportunity for their refusal to use the designated hitter. I don’t wish to discuss the DH here per se, but use one argument against it that you’ll occasionally see as a springboard for discussion.

Sometimes the anti-DH argument will include the rhetorical question, “Why stop at pitcher? Why not have a defensive shortstop and a DH for him, or a defensive catcher and a DH for him.” While it is true of course that you could put together a better offense by completely ignoring defensive ability, there is absolutely no comparison between the performance of shortstops relative to the population of hitters at large and that of pitchers. In order to believe that the circumstances would become such that there would be popular support for a similar shortstop or catcher DH, one must assume I would think that shortstops, like pitchers, have progressively seen their offensive levels decline relative to hitters as a whole.

And this is a useful point for a brief discussion of offensive positional adjustments throughout the decades. While this question is just one part of a tangled web of questions dealing with how to value players at different positions, I’m not going to discuss that issue but just the historical facts.

On my website, there is a chart showing offensive PADJs broken down by into the ten decades from 1900-1998. There are a couple issues with this chart; first, it considers a player only at the position he is listed at first in Total Baseball. In other words, the position in which he played the most games in a given season is his position. If a player appeared in 25 games as an outfielder and 24 as a first baseman, he is 100% an outfielder. Secondly, it does not account for the three outfield positions, but lumps them altogether. And thirdly, it uses the flawed model of basic Runs Created to evaluate each player’s offense.

With the exception of problem two, in which we lose valuable data on the breakdown between outfield positions, I don’t believe that the other two flaws are particularly consequential when dealing with a large group of aggregated players.

Looking at the chart, one of the most interesting things is that third baseman were worse hitters then second baseman in the 1900-1929 period. It is not until the thirties that third baseman hit better then second baseman. This phenomenon has been noted by other analysts, notably Bill James in Win Shares; I’m just pointing out that this data agrees with the earlier conclusions (which of course it should since the other studies were constructed similarly).

Getting to the issue of offensive balance between the positions and whether or not it has declined historically, if you look at the field non-pitching fielding positions (here catcher, first, second, third, short, and outfield), you will see that the standard deviation of position adjustment was higher in the early days then it is today:
1900: .154, 1910: .143, 1920: .156, 1930: .156, 1940: .132, 1950: .122, 1960: .151, 1970: .159, 1980: .134, 1990: .135, 1900-1998: .134

“1900” means the ten-year period starting in 1900 (1900-1909), and so on. In fact, the highest standard deviation came in the 1970s when the DH was adopted. In the 70s, shortstops hit at just 77% of the league average (only aught catchers hit worse, 76%). But this is still a far cry from pitchers’ best showing, 45% in the aughts and the twenties. Pitchers showed a pattern of steady decline to as low as 30% in the 60s and 70s when the DH came of age and 26% and 27% in the eighties and nineties.

The best offensive showing by any position is the first sackers of the 1930s, 129%. In an eight team universe, Lou Gehrig, Jimmie Foxx, and later Hank Greenberg and Johnny Mize are bound to wreck some havoc on the overall figures.

Anyway, you can pursue the chart yourself if you wish for other interesting things. The main point I wanted to make is that at no time in twentieth century major league history has the balance of offensive production between the positions been greater then in the 1980s and 1990s. While the chart does not include 1999-2006, I do not believe that the trend would be significantly different.

It is possible I suppose that the DH itself has had some impact on this. DHs would probably be stuck at 1B or a corner outfield perch in earlier times, and allowing them their own category would allow defensive specialists to sneak in the field at those positions while maintaining overall offensive output. This could cause the balance between the fielding positions to be greater then it would be in absence of the DH. First base offense has been essentially unchanged since the 70s, but outfielders have dropped a little bit. But even if this effect is significant, I don’t believe that it is significant enough to mask a markedly worsening balance or collapse of short, catcher, or other low-offense positions. In fact, shortstops have bounced back relative to the league hitting as a whole form the aforementioned 77% in the 70s; the composite league comparison compares to all hitters, including DHs.

I don’t think that the historical data shows a significant trend, taking a full century view, towards more or less of a balance. But it clearly does not show the widening balance that would justify concerns about multiple DHs becoming a possibility. And if you think that I’ve wasted your time with some rudimentary stuff and this whole thing was an excuse to get a post up finally and bash the NL some more, you might be on to something.

Monday, October 30, 2006

Evaluating Pitcher Winning %, Pt. 3

All of the widely used and published Wins Above Team formulas have compared a pitcher to what a .500 pitcher would be expected to do for that team. But with most sabermetricians preferring the use of a lower baseline for most player comparison questions (usually the murkily-defined “replacement level”), why has no one adapted replacement level to use in a WAT system?

The most likely answer is that most sabermetricians don’t really bother with WAT-type methods, and so there is no need to bother with a replacement level. As my writing this series has shown, I am bothering, so I will do it.

Before we apply replacement level to Wins Above Team, we must determine what replacement level is. I have decided, for reasoning that I will not repeat here, to use a pitcher who allows runs at 125% of the league average as the definition of a replacement-level pitcher. Using a Pythagorean exponent of 2, this corresponds to a .390 W%. The .390 W% is what we assume this pitcher will have in an average context; i.e. on a .500 team.

Now we can apply the Oliver approach. Recall from part one that NW% = W% - Mate + .500. We know that the replacement pitcher has a .390 NW%, so if we know Mate, we can solve for RW%(Replacement W%) as follows:
RW% = NW% + Mate - .500
Suppose we have a replacement level pitcher on a .550 team. We are saying that his W% with this team, based on the Oliver assumption, will be .390 + .550 - .500 = .440. So to calculate what I will call Wins Compared to Replacement (WCR), we have:
WCR = (W% - RW%)*(W + L)
For our default .390, RW% = Mate - .11

We can define RW% for Deane’s construct as well. Assuming that the replacement has a lower W% then that of his team (which should almost always be the case for a replacement-level pitcher), then we need to solve for RW% in the formula NW% = .5 - (Mate - RW%)/(2*Mate), which gives:
RW% = 2*NW%*Mate
With our .390 NW%, we get .78*Mate.

With the Pythagorean based model I discussed in part 2, we know that the percentage of league-average runs a Mate team will score is equal to x, where:
x = (Mate/(1-Mate))^(1/z)
where z is the Pythagorean exponent. Knowing x, we just need to figure out the W% for a pitcher allowing runs at the replacement level (125% of league average, or generally, r). We don’t have to screw around with the replacement pitcher’s W%, because our starting definition of replacement is based on runs allowed. So:
RW% = x^z/(x^z + r^z)

In using the quick method approximating Pythagorean, which I guess I should call the Wood approach, since he published it in 1999, the definition will be NW% = RW% - (Mate +.5)/2 + .5. Solving this gives:
RW% = NW% + (Mate + .5)/2 - .5
For a .390 replacement, this comes out to RW% = .14 + Mate/2

What would our estimates of replacement winning percentages be on a .600 team, in a z = 2 environment? For Oliver, it would be .390 + .600 - .500 = .490. For Deane, 2*.390*.600 = .468. For my approach, x = 1.107, so 1.107^2/(1.107^2 + 1.25^2) = .440. For Wood, .14 + .6/2 = .440.

As you can see, the assumptions effect our assumption of the replacement pitcher. Wood and I think that if you put a replacement level pitcher on a truly great team, a .700 team, he will still only have a .490 W%! Oliver's assumptions would lead you to believe he would be a .590 pitcher and Deane's would lead you to believe he is a .546 pitcher.

So now you can couple your NW% with replacement value, if you so desire. For our old friend Red Ruffing, pitching on .554 Mate teams for his career, his replacement would be expected to go .417 (Wood assumptions), so his 273-225 is +65 wins above replacement. According to B-R, the league ERA in Ruffing’s time was 4.15. A decent assumption for modern times is that 90% of runs are earned, and so we’d convert that to RA of 4.61. Ruffing himself had a 4.39 RA in 4344 innings. A replacement would be expected to allow 4.61*1.25 = 5.76, and so Ruffing would be (5.76-4.39)*4344/9 = +661 runs above replacement. Assuming that RPW = RPG, the RPW would be 9.22, making him 661/9.22 = +72 wins above replacement. So his career W-L record is about 7 WAR worse then his runs allowed. You’d have to do this sort of comparison for other pitchers to get a sense of how this should impact his HOF case (if it should at all, which of course is a whole different can of worms).

After going through all of this, I am now going to ask “Why?” I suppose I could consult a therapist of some kind to help answer this, because after all it was me who wrote all of this, and now am going to talk about why I don’t think it’s a great idea to compare a pitcher’s W% to his team’s at all.

And the reasoning behind this is not going to be the standard complaints about the deficiencies of win-loss records that everybody reading this blog already knows or if they don’t, are way over their head and stopped reading a long time ago. Even if you grant me for the sake of argument that win-loss records ARE valuable for evaluating pitchers, I will claim that they should not be compared to the W% of their teams.

The reason for this is that all throughout this series, I have been referring to the assumptions that each of these approaches take. Oliver inherently assumes that a team has an average offense. Deane inherently assumes that most of the deviation from .500 is due to the defense, with the offense playing a small role. Wood explicitly assumes that a team is equally skilled on offense and defense.

But in the real world, when we look at real teams, we don’t have to assume anything about their offense and defense. We know exactly how many runs they scored, or how many runs they allowed. We can compare a pitcher directly to the number of games he’d be expected to win based on the number of runs his team scored. Or in modern times, even better, his Run Support, the number of runs his team scored in either games he pitched or while he was in the game, depending on whose definition of the statistic that you use.

The only drawback to this is that we have to consider park factors, or if we choose to ignore them, accept that we have a flaw in our model. But doing so is just one extra step, and I think that far outweighs the silly assumption that all teams are perfectly balanced between offense and defense.

Actually, I cheated a little bit. One could, I suppose, make the argument that the W% implicitly contains information about the team’s distribution of runs scored, while using the team’s R/G or Run Support ignores the distribution. This is true, but the W% contains many other polluting elements, and therefore the fact that it does in a small way include information on the run distribution does not make it a better option on the whole.

I suppose, if you really wanted to get into it, look at a pitcher’s performance when his team scores three runs, and compare this to what an average or replacement level pitcher would do when supported by three runs, and repeat this for every support level. Bill James did this to compare Danny Jackson and Walt Terrell in the 1988 Abstract, concluding that despite Terrell’s superior W-L record, Jackson pitched better. Ironically, B-R lists the most similar pitcher to Danny Jackson as…Walt Terrell (although Jackson is only eighth on Terrell’s list; Richard Dotson wins there).

Sunday, October 29, 2006

Why I'm Glad the Cardinals Won

Taking off on the title of my pre-World Series post, I am happy that the Cardinals won, for the reasons I described last time. Unsurprisingly, after the Series, many people have been dismayed that the "inferior" team won. I just still fail to see why I should throw out the past years in determining this. Sure, the Tigers were probably the better team in 2006. But we all know that the playoffs are largely a crapshoot. If you use a broader historical perspective, the 2006 Cardinals are the declining end of a great run that could have easily produced a world champ in 2000 or 2002 or 2004 or 2005. That they lost to equal or inferior teams in some of those years is balanced by the fact that their inferior team beat some superior teams in 2006.

I have also seen a comment about how boring this postseason was. And I agree, in theory--certainly nothing remarkable that makes it stand out from other years. But one of these comments specifically mentioned how it was worse then previous years, including 2005. Which is funny, because I wrote a post last year complaining about the 05 playoffs. To review, here are the total number of playoff games in each year, a fairly good gauge of how competitive they were:
01--35 games
Same number of games as last year. However, we at least had a great LCS this year, that would be much more fondly remembered had the better (Mets) team won and a truly spectacular play (the Chavez catch) counted for something other then the Cardinals' margin of victory.

One place where 2005 does have the advantage is in extra inning games--while they were only two, they were the longest game in playoff history and the longest game in World Series history. The 2006 playoffs featured NO extra-inning games, as did 2002.

So 2005 and 2006 rank about the same in my eyes on the excitement level. Perhaps I wasn't reading enough, but I don't remember seeing my sentiment echoed a lot elsewhere, but I have seen it in this year, a mirror image. To me at least, 2006 was better because the team I disliked the most out of the playoff field lost the World Series rather then winning it.

Now, on a rare non-baseball foray, the greatest single day in American professional sports will take place on Saturday. Yes, I believe that this event trumps the Super Bowl. The other sports are harder to pin one day on because you never know how long a World Series or Stanley Cup is going to last. I firmly believe that the single greatest day in American professional sport is the Breeder's Cup.

While I know next to nothing about horse racing and am completely unqualified to make predictions, the Breeder's Cup is a great day because you have seven championships essentially decided in one day. Again, with me not being a horse racing expert, some of these, like the Filly & Mare Turf, or the Juvenile Filies, don't do a lot for me. But the Distaff, the Mile, and the Juvenile do picque my interest, the Sprint is a fascinating race to watch, the Turf is an interesting opportunity to see the how the rest of the world likes its racing, and the Classic of course is the piece de resistance.

The only division that I even cursorily is the Classic, and so I am familiar with the horses there. All of the attention is on Bernardini, Lava Man, and Invasor, of course. I am torn on who to pull for (and pick) between Bernardini and Lava Man. I think Lava Man's west coast handicap wins are thoroughly impressive, although I believe the Beyers speak differently. Lava Man also has the great gelding, claimer story. On the other hand, Bernardini dominated two of the big four three-year old races (Preakness and Travers), and I always love to see a three year old who can run with the older horses (none has won the Classic since Tiznow in 2000, and before that it was Bernardini's pop, AP Indy in 1992).

Both are grandsons (horse racing enthusiasts will cringe at my use of a human relationship description I am sure) of the great Seattle Slew, although Bernardini certainly has the more accomplished father in AP Indy (Lava Man through Slew City Slew). I would love to see either of them win, I guess. As to who I pick, I'm going to go with Lava Man. And I may be giving the short shrift to very deserving horses like Invasor, Premium Tap, and Perfect Drift, which may well be true.

Anyway, back to baseball, at least tangentially. It recently occured to me that George Steinbrenner spends his money exactly as I would if I was rich. If I had millions of dollars to spend, what would I spend it on?
1) Supporting a certain university (check: George does this, with the same university)
2) Buy a baseball teams (check)
3) Buy thoroughbreds (check)
I've got to respect a guy like that. And John Galbreath, former owner of the Pirates, who did the exact same thing (and actually won some Triple Crown races).

I should have a post up some time in the next couple of weeks about OBA/SLG combos; a tired subject, but one that should be revisited from time to time. There's been some interesting work on The Book blog on this topic, so there's at least some new ground to cover.

Friday, October 20, 2006

Why I'm Rooting for the Cardinals

Disclaimer: This post is not befitting of the stated serious sabermetric focus of this blog. Of course, neither are a lot of others. But this one in particular.

First, I hate the Tigers because I hate all things from Michigan. But even as a baseball fan, I would root for the Cardinals.

My ususal approach, unless it is a team that I would consider myself a fan of in any degree whatsoever (Indians, Reds, Yankees), or a team with a particular favorite player of mine (Nick Swisher, Barry Bonds, Rickey Henderson, Bobby Abreu are probably the best recent examples), is to root for the team that I feel is better. Some like to root for the underdog. This is a foreign concept to me, unless the degree to which the favorite has been hyped as invincible is particulary nauseating.

So why the would I pull for the Cardinals, who if they win would have the worst record of any World Series winner? They are definitely underdogs, and they don't have a particular favorite player of mine, and I am certainly not a fan of theirs.

But I think reasonable people can agree that the playoffs are largely a crapshoot. Does this mean that they are completely random, that there are no factors above and beyond sheer team quality that can help us predict the outcome? Of course not. Does claiming that the playoffs are a crapshoot mean that I am denying that the winners deserve their titles? No, of course not. For whatever reason, Major League Baseball has decided that you need to win 3 consecutive series in order to be crowned world champion, and all of the teams know this, and so whoever wins has won the championship as MLB has defined it. Fine.

Anyway, the Tigers clearly had a better season in 2006 then did the Cardinals. But who was better in 2005? 2004? What team in the National League was better then St. Louis is those years? None that I know of.

The point is that the Cardinals, had "luck" gone there way, were a team that was perfectly "worthy" of winning a World Series in 2004 or 2005. And so if they were to win this year, when they aren't "worthy", would this be some sort of injustice? Well, as stated above, the "better" team losing can never be an injustice. But the point is that if the Cardinals could have won in 2004, but they came up on the wrong end of the crapshoot, and they shouldn't win in 2006, but happen to have the dice fall their way, then hasn't a "wrong" just been righted?

I am not expressing my feelings on this well, but they are crystal clear in my mind. Maybe this approach: If you were an owner, and you accept the premise that the playoffs is largely a crapshoot, what would you order your general manager to do:
1. Build a team that will win the World Series in a given year
2. Build a team that will consistently be a playoff contender and hope that the crapshoot works out in one of those years

I don't know about you, but I would pick option 2 every day of the week. So if that is the standard by which I judge the worthiness of teams, who is more worthy between the Tigers and Cardinals? Clearly the Cardinals, who have a mini-dynasty in the NL Central and have been consistent contenders over the past decade.

Now of course we can't know the future--perhaps the Tigers are on the cusp of such a run and this is the beginning of it. I have serious doubts that this is the case, but I wouldn't say it was impossible. What I do know is that the Cardinals have done it and the Tigers have not yet.

Anyway, I mentioned near the beginning that the Cardinals would have the worst record for a World Series winner, 83-78. The current holder of that distinction is the 1987 Twins, who went 85-77. In the ALCS, they beat the AL East winner, with a much better record, the Detroit Tigers. In the World Series, they beat the 95-67 National League pennant winners--the St. Louis Cardinals.

The Detroit Tigers were 95-67 this season. Does this mean a darn thing? No; any yahoo can go back and fit a supposed historical pattern to a current event. But I'm just sayin.

Tuesday, October 17, 2006

Against My Better Judgment

This morning there were three new comments in my email for recent posts on this blog. This doesn’t sound like a lot, but for this site it is a remarkably high number, and so I figured something must be up. And what was up is that it(my MVP post) was linked at BTF.

So I went over there and took a look at the thread. And against my better judgment, I will respond to some of the criticisms of my post. This is against my better judgment because I do not intend to get into the business of going around the internet looking for people commenting about my work and responding to it, unless they actually have a point. I don't think that much of what is said in the BTF thread really has a point, or in any case is misdirected. But there are some fairly egregious misrepresentations of the analytical structure I am using, and so I will make an exception. I don't want to let the misconceptions define what I am doing.

First, let me briefly describe that analytical structure. My RAR numbers come from comparing a player to a replacement-level HITTER at his position. This approach inherently assumes that the total offense + defense value of each position is equal. This assumption is one that reasonable people can certainly disagree with; in fact, I don’t agree with it 100% myself. However, it is by far the easiest way to go about constructing an analytical system.

There was a very good thread at the Tango/MGL/Dolphin Inside the Book Blog about some of the issues involving positional adjustments, and I suggest that you read that if you want to delve into this further.

The replacement level player is defined as one who performs at 73% of the positional average (equivalent to a .350 OW%). I personally believe that this baseline is too low (for my views on replacement level in general see the “Baselines” essay on my website). But it has been, or at least was for a long time, the standard level used by sabermetricians. Bill James eventually settled on .350 as did the Big Bad Baseball Annual. Perhaps there has been some movement away from this standard; Baseball Prospectus’ VORP now uses close to 80% at each position (which I think is probably a better choice then 73%), but on the other hand Clay Davenport’s structure defines replacement as a .230 EQA which is equivalent to a .350 OW%. Anyway, I used 73% in this analysis. If you want to quibble with this, be my guest. But keep in mind that my analysis and ranking of players would change if I used a different baseline.

Now a little bit about defensive value. There are multiple competing defensive metrics out there, few of which have published comprehensive 2006 values yet. Therefore, even if I could settle on which metric to use, it is simply not feasible to include defensive values in the numbers on my site. This is why I throw in defense in terms like “approximately ten runs”, and it seems to be an afterthought. I do not disavow the value of defense, not by any stretch.

But remember that once you have gone down the offensive positional adjustments route, you are essentially assuming that the defensive value differences between positions are equal to the offensive value differences between the positions. Again, feel free to disagree with this assumption, but it is hardly one that is unique to me or one that is not used by other sabermetricians (in the linked thread, MGL to some extent defends this line of analysis, and it is exactly what the Pete Palmer analytical system does, as well as any VORP+defense measure that BP may publish. So it’s not like I’m some lone crackpot out there on this front.)

And once you have determined that the defensive value gap between positions is equal to the offensive value gap between positions, and applied an offensive position adjustment to correct for this, you are now operating in a world where +5 runs above average playing shortstop has equal incremental value (that is, when added to our offensive, position-adjusted RAR) to +5 runs above average playing first base, or any other position.

Keeping these points in mind, I will quote from a few posts on the BTF thread, and show what these misconceptions are.

#1 RMc Presents
“Er...isn't being 10 runs below average worse than being at zero? Jeter costs his team ten runs -- about one win -- on defense. Ortiz doesn't cost the Red Sox anything on defense. How can this be a wash?”

This is in reference to me saying that Jeter is probably something like -10 runs defensively. Yes, Ortiz has a big 0, but remember that my structure has assumed that total value of 1B = total value of DH. This, as other BTF posters rightfully point out, is not true and is illogical. Therefore, a DH should be penalized an additional amount for the loss in flexibility or whatever you want to call it of him being a DH. Ortiz being a DH everyday prevents Manny from DHing when he’s a little tired for instance. Where I will disagree with some of the BTF posters later is that they seem to think that this means a given DH could never be more valuable then a given 1B. They may not have stated that, but that seems to be the implication. I would say that a DH is probably costing you 5-10 defensive runs versus a first baseman. And so Ortiz does have a cost, and yes, it could very well be a wash with Jeter’s cost (recognizing that Jeter, even as a poor shortstop, is a plus defensively, but he is a cost relative to an average shortstop, which is what the analytical structure demands that we consider).

#19 The Yankee Clapper
“Putting aside the question of Jeter's exact defensive value, it is ridiculous to say Ortiz is a neutral value on defense. His defensive value is much less than Jeter's.”

Absolutely agreed. But remember again that the positional adjustment has already incorporated much of the defensive value gap. If you just compare Ortiz and Jeter to a replacement level hitter, regardless of position, then Ortiz is +81 to start out with and Jeter is +64.

#26 Gaelan
“This is a preposterous ballot. How can Ortiz be second and Ramirez be left off the ballot when Ramirez has better offensive rate stats, has more defensive value. The only advantage Ortiz has is in playing time and that's not enough to make up the difference let alone vault him so far ahead.”

Ramirez does have better offensive rate stats. Ortiz does have more playing time, which the poster kind of glosses over as if it’s not really that important. But it is. We’re talking about a difference of 128 plate appearances. If we just compare Ortiz and Jeter to a replacement level hitter, regardless of position, Ortiz is +81 and Ramirez is +59. So Ramirez has to be 22 runs more valuable defensively then Ortiz in order to close this gap.

Now let’s put in the positional adjustment. It’s now +70 to +64 in favor of Ortiz. So now it’s only 6 runs he needs to make up. But I don’t think he can do it. Ramirez, according to MGL, is truly a dreadful defensive outfielder, projected to be about -15 runs. Even if we dock Ortiz a full 10 runs for the difference between being a first baseman and being a DH, he gains 5 runs on the deal.

Now it is true that on the Red Sox, if Ortiz was a better left fielder then Manny, he would play, so Manny clearly has more defensive value within the context of the Red Sox. But on a league-wide basis, this is not true. Playing Manny in left field may be the right move for Boston, but it is a bad move for a hypothetical team. I choose not to define value in terms of the personnel of the team surrounding him, because there are way too many factors to consider and way too much uncertainty in putting tangible values to those factors. If you don’t like my way of defining value, fine, but I think that my conclusion is perfectly reasonable within that framework.

#26 Gaelan (cont.)
“In general treating firstbasemen and DH as equivelant is a huge mistake since they aren't the same. It results in overestimating the value of DH's and underestimating the value of 1B. For instance the only way to dismiss Morneau's candidacy is to treat him as a DH which is ridiculous.”

To paraphrase my new friend, this contention is preposterous. I agree that an average 1B is clearly more valuable then an average DH. But to claim that this results in underestimating the value of a 1B is just false. First of all, including DHs in the pool of 1B for the purposes of calculating the offensive positional adjustment actually REDUCES the magnitude of the adjustment, because the group of players who actually play 1B outhit the group of players who actually DH. The explanations for this have, in all likelihood, next to nothing to do with the inherent differences between playing 1B and DH, and everything to do with the personnel actually picked to fill those roles (DHs are often old, half-crippled, players nursing injuries, etc.).

So if you just calculate my RAR v. position for a 1B and a DH, yes, the 1B will get the short end of the stick. That’s why it’s appropriate to penalize a DH another five or ten runs. But it makes no difference, none whatsoever, in comparing the 1B to a shortstop.

As to the claim about Justin Morneau, as I discussed at length in my post, he, by any reasonable sabermetric approach, is not a contender for the top spot on the ballot. Would it be reasonable for someone to list him eighth or tenth? Sure. But it is pretty darn easy to dismiss his candidacy for being THE MVP out of hand. Morneau was ninth in the league in RC. He was twelfth in RG. And while he trails four DHs in this category, he also trails a left fielder, two right fielders, a catcher, two shortstops, and a third baseman. That doesn’t sound like an MVP to me, and we have not yet included the position in the numbers we are looking at. Do that, and he ranks sixteenth. Even if you throw out the four DHs, as apparently Gaelan would have you do, he is twelfth, behind a bunch of guys who play more demanding defensive positions then he does. Unless Justin Morneau is something like +25 runs defensively at first base, he has no claim to being anywhere near the top of the ballot.

#26 Gaelan (cont.)
“And how the hell does Rodriguez get anywhere near that ballot. He's not in the top ten in any total offensive metric, he's bad defensively and he certainly shouldn't get any credit for intangibles.”

I don’t know what this guy defines as a “total offensive metric”, or what stats source he uses, but I have ARod as sixth in RC. He is ninth in position-adjusted RAR(PRAR). The guys who are near ARod in terms of PRAR are Manny Ramirez, Jermaine Dye, Vlad Guerrero, Jim Thome, Victor Martinez, and Vernon Wells. We can throw out Thome by Galen-logic since he’s a DH. Martinez is a bad defensive catcher. Ramirez is a huge defensive negative. So perhaps Dye, Guerrero, and Wells are better, relative to their positions, then ARod is. ARod had a poor season defensively, from all accounts. But without most defensive metrics available yet, I don’t like to infer too much, and even if those guys are better then ARod, which they may well be, he is still in the top 15.

#30 One Alou
“And what makes this ironic is that although BPro realise that a replacement DH is a league average hitter, for the purposes of VORP they set the replacement level for DHs absurdly low (far lower than at first base, left or right field). Which is why Ortiz and Hafner have such absurdly inflated VORPs. And why, in turn, sabertypes who should know better put Ortiz and Hafner absurdly high on their ballots.”

Again, as I already stated, yes DHs deserve an additional penalty versus my numbers. However, I am not using VORP, regardless of the similarities in the method, and I never claimed to be using VORP. If I was using VORP, I have enough decency to say so and not pretend like I am presenting BP’s work as my own. And my replacement level for DHs is equal to that of 1B, but higher then that of the corner outfield spots and any other position.

#36 Gaelan (quoting the segment of One Alou’s post reprinted above)
“This is so true. Hence the ballot full of shortstops and DH's but no corner outfielders or firstbasemen.”

If he would have scrolled down the page, he would have seen my NL ballot. Of course there are no DHs on it, but there are three firstbaseman…in the top five! The only shortstop appears at tenth. The reason why my AL ballot is full of shortstops and DHs is because no corner outfielder or firstbaseman was among the top 10 players in the AL. If it was systematic bias, it would be hard as heck to have them go 1, 2, 5 in the NL, now wouldn’t it? I’m surprised somebody didn’t criticize me for giving the short shrift to up-the-middle players in the NL.

Let’s look at the AL list of RAR, with no position adjustment, except I will tell you what position each guy in each spot actually does play. The leader is a DH. Second is a DH. So far we have DH, DH. Expanding this to the top ten we have DH, DH, LF, DH, RF, SS, RF, CF, 3B, C. As you can see, no first baseman were among the top ten hitters in the AL regardless of position! It’s darn hard to make an MVP case for first baseman who can’t crack the top ten in hitting RAR.

And of course the corner outfielders who do make the list are going to be hit by a big position adjustment, relative to the shortstops and centerfielders and catchers, etc. LF and RF are much less valuable defensively then any position other then 1B/DH. The identities of the corner outfielders above are Ramirez, Dye, and Vlad, by the way. Ramirez was dreadful defensively, and I have already acknowledged that Dye and Vlad could easily be in the 7-10 range. But you just as easily could choose not to have them in there, and have them as the ones just off the ballot, which is of course what I have done.

Monday, October 16, 2006

Internet Baseball Awards: MVP

Last year, in a rare convergence of my mind with the collective of the BBWAA, the actual MVP awards went the way I would have voted them, or close to it. ARod was my choice as AL MVP, and while I gave a very slight edge to Derrek Lee in the NL, it was tough to argue with the choice of Albert Pujols.

I will not go through ten guys and give my opinions on the minutiae of their ranking; instead I’ll just discuss the candidates for the top few spots. In the AL, there are a number of worthy candidates. Looking solely at runs above a replacement level hitter at their position, the top guys are Derek Jeter (+72), Travis Hafner (+71), David Ortiz (+70), Carlos Guillen (+65), Joe Mauer (+64), Manny Ramirez (+64), Grady Sizemore (+62), and Miguel Tejada (+61). The ones who stand out here in my mind are Jeter, Ortiz, Hafner, and Mauer.

Jeter hit 347/413/487, 123 RC in 692 PA. Ortiz hit 313/409/631, 137 in 677. Hafner hit 313/439/667, 124 in 554. Mauer hit 318/433/507, 108 in 600.

While I am hesitant about how much weight to give to the information, Fangraphs does a great job of providing us with WPA data. Jeter was +5.98; noted “Clutch God” Big Papi was +8.04; Pronk was +4.44; and Mauer was +2.36.

How big are these differences, compared to our basic evaluation of each player’s offensive contributions? AL teams averaged 4.97 runs/game this year, which equates to a R-W converter of about 10.2 r/w. So I will multiply the wins figures from Fangraphs by 10.2 to put them into runs, and compare to runs created above average (an average HITTER, not position-specific):
These figures certainly bolster the standing of Jeter and Ortiz while dropping that of Hafner and Mauer.

One question that needs to be answered is “how bad of a shortstop is Jeter”? I am no fielding maven, and so I have to look at other people’s rankings, many of which are not yet available for 2006. I gather that a conservative estimate is that he is 10 runs below average at shortstop. But remember that Ortiz has no defensive value whatsoever, and while we have assumed that everyone is equal defensively (and the big assumption that all positions have equal offense + defense value, which may well not be true), remember that I have lumped 1B and DH together. So if we impose a penalty for DHing rather then playing first, I think that defense is a wash.

That leaves the “clutch” factor as the only thing left to be considered. But I don’t think it really helps Big Papi that much. He was 22 runs better through WPA then we expected; Jeter 19. And I can’t justify giving him the MVP award on that.

The wildcard is Johan Santana, who had an outstanding season. Jeter is +72 RAR and +53 RAA, Santana +82 RAR and +50 RAA. Santana’s runs are more valuable, as they occur in the lower run context that he creates. But giving Jeter credit for the “clutch” runs edges him back ahead. And so, in a close call, I see it:

1) SS Derek Jeter, NYA
2) DH David Ortiz, BOS
3) SP Johan Santana, MIN
4) DH Travis Hafner, CLE
5) C Joe Mauer, MIN
6) SP Roy Halladay, TOR
7) SS Carlos Guillen, DET
8) CF Grady Sizemore, CLE
9) SS Miguel Tejada, BAL
10) 3B Alex Rodriguez, NYA

There are two players who have gotten significant MVP hype who do not crack my ten are Frank Thomas and Justin Morneau. Quite frankly, I am mystified (but not particularly surprised, given the fact that some people campaigned for Shannon Stewart a few years ago) that those guys are even considered candidates. Thomas is a great player, a surefire Hall of Famer, and one who has oddly been praised at the beginning of his career but underrated in the middle to end. Certainly Thomas’ run from 1991-1997 or so was brilliant, but he has been a very good player since then, albeit one who gets hurt a lot. But it seems to me as if his reputation was not at that level—it seemed that the perception was that Ken Griffey was still very good but always hurt, while Thomas was mediocre and always hurt. So I am glad to see the Big Hurt getting respect at this late stage.

Much of his candidacy stems from his status as the leading player on a playoff contender, and in fact he was the most valuable Oakland hitter. But a DH simply cannot be the MVP while creating 96 runs. Thomas ranks just thirteenth in the league in RG, and eighteenth in Runs Above Replacement (PRIOR to applying a positional adjustment). His WPA of +3.19 is only 3 runs better then would be expected. Big Frank is a valuable player, but not one of the top ten in the league.

Then there is Justin Morneau. On the surface, the reason for his candidacy seems to be the same as Thomas; big bopper on a division champ. He also has the glistening 130 RBI, second only to Big Papi. But in the sabermetric measures, it is no less mystifying. He ranks sixteenth in position-adjusted RAR at +47. His 7.3 RG ranks twelfth, certainly underwhelming for a first baseman who is staking an MVP claim. His +4.46 WPA is nine runs better then expected. But even if we give him credit for the full nine runs, moving him to +56, he still trails all of the hitters on my ballot, as well as Jermaine Dye, Vladimir Guerrero, and tied with Jim Thome. In fairness, those four are only separated by two runs, pretty negligible. But he’s still a long way away from Jeter, Papi, and Pronk.

What befuddles me is that Batting Average is perceived at least to be the holy grail at which the traditionalists worship. And Morneau’s teammate is the first AL catcher ever to win the BA title and the first ML catcher to lead both leagues. And yet Morneau may be an even bigger MVP candidate. I forget where exactly I saw it, but a study once showed that the most important factor in historical MVP voting has been RBIs. Morneau certainly has those, and that seems to be what is driving his candidacy.

In the National League, there are four players who I think are in the running: Albert Pujols, Ryan Howard, Miguel Cabrera, and Carlos Beltran. Let me do a chart of some pertinent data rather then writing it out:
“EXP” is the difference between estimated WPA runs and RC above average. Looking at this, it is tough to give it to anyone other then Prince Albert. He’d be the choice based on just going by the basic numbers, he has the best “clutch” performance, and he’s a better defensive first baseman then Howard. I jump Beltran over Cabrera because he is a fine defensive outfielder while Cabrera apparently is not all that at third.

1) 1B Albert Pujols, STL
2) 1B Ryan Howard, PHI
3) CF Carlos Beltran, NYN
4) 3B Miguel Cabrera, FLA
5) 1B Lance Berkman, HOU
6) SP Brandon Webb, ARI
7) SP Chris Carpenter, STL
8) SP Roy Oswalt, HOU
9) 2B Chase Utley, PHI
10) SS Jose Reyes, NYN

Wednesday, October 11, 2006

Why I Don't Visit BTF Much Anymore

The Cory Lidle accident/tragedy/developing story is a poor time to air my grievances with Baseball Think Factory (actually, not Jim Furtado and the fine people who run the site but many of the people who post there). Of course, that never stopped anyone posting on BTF either:

But of course it's not that simple. I make fun of players who scabbed in '95, but most of them were too naive to have really thought about the implications of what they were doing. It's certainly possible that Lidle, and Rick Reed, and Kevin Millar and all the rest are perfectly fine people from a moral standpoint, just a little dim. So there's a difference between being a scab because you don't understand about unions and dignity and human rights and all, just because you want money now instead of later, or you're afraid to pass up an "opportunity," and being a scab after having considered and understood the implications. It's the same difference between killing someone who attacks you and killing someone who's at home in bed at the time.

So maybe Lidle was a jerk who spit in the face of dozens of millions of hard-working American families, or maybe he was a naive, silly kid. I have no idea which, so I certainly feel no concrete emotion but sadness about his untimely death. But I'd be lying to say that I feel as bad about it as I would if it was someone who I knew was a truly good person. People have differing value to history, which is a taboo thing to say, but absolutely true, and something that has to be faced.

This is a very sick man. And the degeneration into politics (of a leftist bent, mostly, since that's the view of most of the posters) on every thread, along with the general jackassery, makes it pretty much intolerable.

Now you might say I'm a hypocrite, that I'm taking advantage of Lidle to complain about BTF. And I suppose you may be right. But I've never seen a better, more timely illustration of why I hate reading that site.

Internet Baseball Awards: Cy Young

Last year, neither pitcher who I chose for the Cy Young won the actual award. In the NL, Chris Carpenter won, while I favored Roger Clemens (although certainly Carpenter had a strong case). In the AL, Bartolo Colon won the Cy on the strength of his wins, while Johan Santana was again the AL’s top starter.

And now the AL mistake grows in magnitude, because Santana should now be polishing his third trophy. There is little doubt that he will win all polls this year, as he won the pitching triple crown. Had Roy Halladay not been felled in September by an injury, he actually would have given him a run for his money. Santana pitched 233 innings with a 3.05 RA (+82 RAR); Halladay 220 with a 3.26 (+72). The peripherals all favor Santana, but Halladay was once again outstanding as well.

Beyond them, it gets a little murky. The next three pitchers on the RAR leaderboard are Chien-Ming Wang (+57), Curt Schilling (+53), and Barry Zito (+53). All of them have shortcomings in the peripheral measures; Wang’s GRA is 4.64, Schilling’s eRA is 4.62 with a 4.04 GRA, and Zito’s eRA is 4.76. Wang fares decently in eRA (4.07)--his poor showing in GRA is likely due to his groundball tendencies. eRA as I am figuring it now incorporates actual doubles and triples allowed, while GRA assumes that balls in play will become hits at an average rate and the type of hit will be average as well. Likely due in large part to his groundball ability, Wang gave up the third-lowest Isolated Power on balls in play in the AL. Schilling managed to be “run lucky” but also “hit unlucky”, allowing a .327 %H, and thus I see his performance as a washout and revert to the run-based measure.

After them is Francisco Liriano (+53), brilliant in a brief campaign. After them is John Lackey and the previously discussed rookies Weaver and Verlander.

I have to whittle the ballot down to five; and I have not mentioned the two relievers who had dominant seasons, BJ Ryan and Jonathon Papelbon. Giving each of them a little extra credit for high leverage justifies moving them past the second tier of starters, and into third and fourth on the ballot. Which spot for each though? Ryan worked more games (65 to 59) and more innings (72 to 68). Papelbon leads in RA and ERA (.91 to 1.33 in ERA and a similar margin in RA). They are very similar in terms of inherited runners per game (.45 for Ryan, .41 for Papelbon). Both were very effective at keeping them from scoring, although more so for Ryan, bringing his RRA to .41 versus .52 for Papelbon. Ryan leads in eRA (1.29 to 1.52) and they are in a dead heat in GRA. I give the edge, ever so slightly, to Ryan. And so my AL Cy ballot falls into place:

AL Cy Young
1) Johan Santana, MIN
2) Roy Halladay, TOR
3) BJ Ryan, TOR
4) Jonathon Papelbon, BOS
5) Chien-Ming Wang, NYA

In the National League, a popular pick among the traditional punditry seems to be Trevor Hoffman. I’ll have some of what they’re having. Hoffman got a lot of (well-deserved) attention for breaking the saves record; he truly is one of the best relievers of all-time. But his 2006 season was hardly Cy worthy, even in rudimentary measures like saves; 46 saves in 51 chances is great, but not historic by any means. Hoffman benefited from a .247 %H, and worked only 63 innings. +25 RAR is nothing compared to the +47 put up by Cy-discussion worthy BJ Ryan in the AL.

In fact, no NL reliever had a good enough campaign to crack by ballot. Takashi Saito was the most impressive (+31 RAR, 2.02 eRA), but giving him as much of a leverage bonus as Ryan or Papelbon is tough to do when he was not the closer for the entire season (he has around a 1.5 LI according to Fangraphs, while Ryan is near 1.7 and Papelbon 1.9).

So it comes down to the starters, not the most impressive crop either. The three I’ll look at for the honor are Roy Oswalt, Chris Carpenter, and Brandon Webb. Carpenter, the defending winner, had another fine season (221 IP, 3.33 RA, 3.38 eRA, +64 RAR). Webb bested him across the board, although not by large margins (235, 3.29, 3.20, +70). Oswalt matched Webb in RAR while pitching 220 innings, but his 3.92 eRA is significantly higher then the other two pitchers.

Looking at wins and losses, they are almost identical; Oswalt and Carpenter 15-8, Webb 16-8. Run support? You guessed it, they’re a match; Oswalt 5.22, Webb and Carpenter 5.40. However, bringing park adjustments into play (Arizona is a strong hitter’s park at 106, HOU moderate at 101, St. Louis 99), Webb’s win-loss mark is 5.8 wins better then a replacement versus 5.1 for Oswalt and 4.5 for Carpenter.

The three are close enough that you can reasonably argue for all of them, but I will go with Webb, who is tied with Oswalt in RAR while besting him in peripherals and posting the best W-L mark. After them, the surprising (and over his head) Bronson Arroyo and John Smoltz round out my ballot. Roger Clemens might be in the mix had he pitched a full or close to it season--he was +41 RAR in 113 innings with a brilliant 2.68 RA and 2.83 eRA.

NL Cy Young
1) Brandon Webb, ARI
2) Chris Carptenter, STL
3) Roy Oswalt, HOU
4) Bronson Arroyo, CIN
5) John Smoltz, ATL

Monday, October 09, 2006

(Rhetorical) Question

Suppose that you read somebody's blog, and they gave predictions on the playoff series. And while this person explicitly told you that he didn't have a lot of confidence in his predictions, and that the playoffs were a crapshoot and anyone could win, etc., he picked the wrong team in each of the four series. The question is, would you care to know who he picked for the next round of the playoffs?

The answer, I believe is yes, you would want to know, because maybe he's always 100% wrong, and that's just as valuable as being 100% right. And in that spirit, I offer (against my better judgement) the A's in the ALCS and the Mets in the NLCS. Why the A's? No particular reason. Call it a hunch (or more likely, a hope).

Of course it is fated that the Tigers will win the World Series. So did the Diamondbacks, Angels, Marlins, and White Sox. What did all these teams have in common? I couldn't stand them.

I need some content for this post, and it will just be some random stuff:
1. There was a little flap this week about some comments Phil Birnbaum posted on his blog (linked under his name on the side of the page). The Cliff's Notes is that he criticized academics for ignoring the work of sabermetricians when they write baseball-related articles. Unfortunately, Phil specifically discussed a paper by JC Bradbury of Sabernomics (also linked on the side of the page). The problem with this is while JC may in this instance (or may not; I don't really have an opinion and don't want to get involved) have been "dismissive" of the work of sabermetricians, he has excellent credentials as someone who engages with the sabermetric community at large, best exemplified by his occasional writing for The Hardball Times. So while I agree completely with Phil's criticisim of academics in general (it is a sentiment that to some degree I expressed here; my target was "statisticians" but of course many of the statisticians I was referring to are academics), it was unfortunate that he coupled this complaint with an article by Bradbury, when there are plenty of other examples out there.

2. Kevin Kennedy criticized Joe Torre and Bruce Bochy for not bringing back their top pitchers to pitch in game four, facing elimination. Now in the case of Bochy, I can see it since Peavy could have pitched game four and maintained his normal rest. And that allows you to come back with Woody Williams in game five, plus maybe use David Wells as a long man on three days rest. So I can understand that. But if you throw Wang on three days rest in game four, you are stuck with Jaret Wright anyway, and you don't get either your "ace" on full rest or skip your bum. So I'm not sure what that is supposed to accomplish. Yes, you can't get to game five if you don't win game four, but your goal has to be to win both games. Having a 60% chance to win game four and a 40% chance to win game five is not more valuable to you then having a 50% chance to win both games (just throwing out numbers there), even if it makes you feel better. It doesn't matter if you lose in four or lose in five.

3. One thing Bochy did that did bother me was bat Josh Bard cleanup. I don't even want to get into the debate about ideal lineup construction and how big of an impact in terms of runs the batting order makes. That's not the point. I just see it as laziness on the part of the manager. And other guys do this too (I remember seeing Mike Redmond batting third for the Twins while Joe Mauer was getting a day off). You slide the replacement into the regular guys spot because that way everybody else stays in their same spot. If the players really do have a psychological problem with batting in different spots (and really, what's the difference between say #6 and #7 anyway?), then if I was a manager I'd see what I could do to shake it up and change their thinking on it.

4. Remedial Math for Torii Hunter. But keeping it simple with run expectancy, and overly simplistic assumptions. Everyone knows that with 2 outs and a runner at first, Hunter dove for a ball and it got by him and was an inside-the-park homer. Given Tango's 1999-2002 RE table, the expectancy is .251 runs. Obviously it is 0 if you make the catch. If it goes for an inside the parker, it is none on and 2 out, which is .117, plus the 2 runs that scored on the play. So 2.117 runs. The breakeven percentage is:
.251 = 0(X)+2.117(1-X)
Where X = 88.9% So if you can make the play 89% of the time, you break even. Of course, this assumes that it always goes for a homer if you don't catch it, which is not true, and makes the play a little better. But the point is, you've gotta be pretty confident that you're going to make the play.

5. Technical Sabermetric stuff that apparently needs to be repeated constantly, and hopefully somebody will remember:
a) Win Shares has a built-in baseline. Putting lipstick (distribution of team wins) on a pig (marginal runs versus a .200 player) leaves you with a pig, not Kate from Lost.
b) OW% is the W% a team would have if they had AVERAGE defense and hit like the player in question. And therefore when you say a replacement player is a .350 player, it doesn't mean that a team filled with replacement players would be .350. They'd be worse. A replacement level offense with an average defense would be .350.

Thursday, October 05, 2006

Internet Baseball Awards: Rookie of the Year

Last year was a bad year in terms of big-impact rookies. Ryan Howard was my choice (and the BBWAA) for the NL, despite having just 345 plate appearances. What else was one to do when the other choices were Jeff Francouer, Willy Taveras, Garrett Atkins, and Gary Majewski?

In the American League, Huston Street was a solid choice, but the crop was not spectacular. That is most decidedly not the case this year. There is a bumper crop, particular among AL pitchers.

I will start my 2006 analysis as always with the superior league that uses modern rules. The top rookie position players are Kenji Johjima (296/323/459, 72 RC, +26 RAR, +10 RAA), Ian Kinsler (278/338/441, 62, +21, +6), and Mike Napoli (231/356/461, 46, +20, +10). Two catchers and a second baseman, with Johjima being the top choice I think. But none of the position players will crack the five-man ballot that can be submitted for the IBAs.

That is because of some remarkable pitchers, two relievers and three starters. Joel Zumaya worked 62 games and 83 innings for the Tigers, with a 2.38 RRA, +35 v. a replacement level pitcher, ranking fifth in the league. But Jonathon Papelbon worked 59 times, 68 IP, with a microscopic RRA of .52, +43 v. replacement. Not only that, but Papelbon pitched in higher leverage situations then Zumaya, making him a clear choice as top rookie reliever.

Then there are the three starters. I expect Justin Verlander to come out as the top choice of the writers, thanks to his 17 wins. And Verlander was good, +48 RAR in 186 IP with a 3.89 RA. But his eRA was significantly higher (4.63), and he got 6.77 runs of support per start. I estimate that his 17-9 record bettered that of a replacement-level pitcher with 6.77 RS by 2.5 games.

Jered Weaver came out with an identical +48 RAR, but in only 123 innings. That’s because he had a 2.72 RA. Weaver’s 3.36 eRA was more in line with his RA, but on the other hand, his %H was just .245, almost certainly an unsustainable performance. His 11-2 with 5.49 runs was 5.1 games better then replacement.

Francisco Liriano pitched even less innings then the other two, just 121. But he bettered them both in RAR at +53, with a 2.31 RA (best among any AL pitcher with 15 or more starts). His 2.57 eRA was also the best in the league, as was his 3.02 GRA. Liriano went 12-3 on 5.36 RS, the lowest of any of the three, good for +5.6 against replacement.

Five runs against replacement is certainly not an insurmountable margin; there might be considerations in somebody’s mind that would tip the scale to Verlander. But I see no reason to rank Verlander against two pitchers who were far more dominant and had they been given an opportunity to be in the rotation on opening day as Verlander was, would have surely bettered him by even more.

The real question is where does Papelbon fit in. WPA advocates will point to his +5.24 WPA and LI of approximately 2, while Liriano was at +3.02. But as I have expressed on this blog before, I believe that WPA is just one way of conceptualizing value, namely in real time. But I do not see its usage, even for value applications, as a given. Furthermore, Rookie of the Year is not a purely value question. I am not sure exactly what the language for the award is, but it seems to me as if it the most outstanding rookie, not the most valuable rookie. So I don’t see any reason to overturn the RAR difference, and I have little doubt that if both stay reasonably healthy, Francisco Liriano will have a more outstanding career then Jonathon Papelbon.

However, since the other two starters have flaws (Verlander’s relatively poor component stats, Weaver’s low hit rate), I will slip Papelbon in front of them. So here’s how I see it:

1) SP Francisco Liriano, MIN
2) RP Jonathon Papelbon, BOS
3) SP Jered Weaver, LAA
4) SP Justin Verlander, DET
5) RP Joel Zumaya, DET

Now the Neanderthal League, where pitchers still carry around clubs and flail them at the birds flying past them. There are three interesting relief pitchers, two of them Dodgers. One is Takashi Saito, with 78 IP, +31 RAR, a 2.39 RRA, a 2.02 eRA, a 2.22 GRA--brilliant all around, and now the LA closer. Jonathon Broxton sets up for him and in 76 IP is +25 RA with a 3.02 RRA and 3.88 eRA. Adam Wainwright, now the Cardinals’ closer, pitched 75 innings, +24 RAR, 3.09 RRA, 3.44 eRA.

There is also quite a bumper crop of NL starters, with guys like Chad Billingsley, Paul Maholm, John Maine, Cole Hamels, and Scott Olsen not getting past the first cut despite solid debut seasons. The four I considered are Florida’s Josh Johnson (157 IP, +38 RAR, 3.76 RA, 4.16 eRA) and Anibal Sanchez (114, +35, 3.20, 3.55, and of course a no-hitter), San Diego’s Clay Hensley (187, +36, 4.20, 4.43) and San Fran’s Matt Cain (190, +32, 4.45, 4.14).

There are a number of position players that deserve a look, but I’ll cut it to three, leaving out notables like Russel Martin, Josh Barfield, Prince Fielder, Chris Duncan, and Josh Willingham. Those three are Nationals third baseman Ryan Zimmerman, and the Marlins double play combo, Dan Uggla and Hanley Ramirez.

Zimmerman hit 290/356/477 in 675 PA, for 102 RC, +40 RAR. Uggla hit 287/340/488 in 659, 98 RC, +42 RAR. Ramirez hit 298/356/489 in 689 PA for 114 RC, +59 RAR.

And that pretty much closes the case. Zimmerman is reputed to have great defense, which is enough for him to jump Uggla, but not make up the twenty run gap to Ramirez. Saito gets a few bonus points for high-leverage, and while Johnson and Sanchez are really too close to call, I’ll go with Johnson, as Sanchez allowed just a .245 %H (Johnson at a more normal .283):

1) SS Hanley Ramirez, FLA
2) 3B Ryan Zimmerman, WAS
3) 2B Dan Uggla, FLA
4) RP Takashi Saito, LA
5) SP Josh Johnson, FLA

Wednesday, October 04, 2006

Internet Baseball Awards: Manager of the Year

In this exciting series, I will share my votes for the Internet Baseball Awards, sponsored by Baseball Prospectus, and the thinking or lack thereof behind my choices.

Manager of the Year has always struck me as a silly award, because you can’t really evaluate a manager statistically. By convention you give it to the manager of the most surprising team, or to the manager of a team that is just really, really good. And of course which team you think are most surprising is biased based on your personal feelings before the season started, which may have been unfounded. Despite all of this, I will reluctantly choose in this manner as well.

In the American League, the big surprise team was the Tigers. The A’s managed to milk a .574 W% out of a .497 PW%, although it was the sub-.500 PW% that surprised me, not their division title. Some will hail the Twins as a surprise, but I picked them to play well. The East has a choke job in the Red Sox, the Blue Jays who did about as well as could be expected, and the DRays and Orioles are non-factors. Out West, the Angels, Rangers, and Mariners did about what I thought they would.

So I will go with the two “surprises” and the guy who managed the best team in the league:
1) Jim Leyland, DET
2) Ken Macha, OAK
3) Joe Torre, NYA

In the National League, the Marlins were pegged by me, foolishly in retrospect, to lose 100 games. There is a good chance that Joe Girardi will win manager of the year after being fired. The Astros and Reds surprised me, but Garner is not likely to garner a lot of support because they were pennant winners last year. The West played out like I thought it would, except I didn’t expect the Dodgers to end up in the postseason. So I have it:
1) Joe Girardi, FLA
2) Jerry Narron, CIN
3) Grady Little, LA

Tuesday, October 03, 2006

Quick Playoff Preview

This is not going to be extensive like those you will find elsewhere, but I just wanted to get on record with my picks (recognizing as always that a short series can easily turn any which way and not to take any prediction too seriously). And yes, my method of setting probabilities is flawed, as I treat PW% as the actual true W%, do not adjust for home/road, do not adjust for pitching matchups, etc. Anyway:
A’s v. Twins
A’s: 574, 529, 497 (W%, EW%, PW%), 4.81 R/G, 4.53 RA/G (park adjusted)
Twins: 593, 578, 553, 4.94, 4.22
Rooting for: A’s
As you can see, the Twins beat the A’s in every category, and the A’s component stats indicate a sub-.500 performance this year. While the A’s are my preference (like many sabermetricians, I have a soft spot for Billy Beane, but more importantly because they have my favorite player in baseball, Nick Swisher). One must remember that the Twins don’t have Liriano, who contributed to much of their success, but it’s still hard to pick against Minnesota.
P(Twins win game; Log5 based on PW%): 55.6%
P(Twins win series; above result, Binomial distribution): 60.4%

Dodgers v. Mets
Dodgers: 543, 546, 549, 5.27, 4.83
Mets: 599, 568, 572, 5.31, 4.65
Rooting for: Mets
Surprisingly, the Dodgers have below average defense, but their offense is strong enough to be equal, park-adjusted, with the vaunted Mets attack. This is a case where I am going to pick the upset: the Mets played poorly down the stretch, and really have a paper thin starting rotation (Glavine +7 RAA, El Duque -7, Trachsel -10, Maine +6).
P(Mets win game) = 52.3%
P(Mets win series) = 54.3%

Tigers v. Yankees
Tigers: 586, 597, 561, 5.23, 4.30
Yankees: 599, 608, 624, 5.86, 4.83
Rooting for: Yankees
The Yankees offense is by far the best in baseball, even more powerful then it was for most of the year with a full crew of Abreu, Sheffield, and Matsui. As Baseball Prospectus pointed out yesterday, the Tigers bench is putrid (Vance Wilson, Alexis Gomez, Ramon Santiago, Neifi Perez, Omar Infante). Their starters aren’t as good as everyone thinks they are; Verlander has a 4.63 eRA, Robertson 4.77, Rogers 4.35, Bonderman 4.51. This is a series the Yankees should win.
P(Yankees win game): 56.5%
P(Yankees win series): 62.1%

Cardinals v. Padres
Cardinals: 516, 513, 494, 4.90, 4.78
Padres: 543, 534, 555, 4.80, 4.46
Rooting for: Padres
This is a far cry from past Cardinal teams, limping down the stretch, with a surprisingly bad rotation (only Chris Carpenter ranks above average). The Padres have solid starters, a better bullpen, and almost as good of an offense. They are my pick.
P(Padres win game): 56.1%
P(Padres win series): 61.3%

I like the Dodgers over the Padres in the NLCS, the Yankees over the Twins, and the Yankees over the Dodgers in a World Series rematch of 1981…and 1978…and 1977…and….and 1941. But I am pulling for a Subway Series because I want to see the two best teams in baseball square off (although all of the NY hype for that is hard to stomach). But in the modern playoff system, that rarely happens.

Monday, October 02, 2006

2006 Park Factors

Herein I will present run and home run park factors for 2006 based on 5-year data (when applicable). I will provide a brief overview of the methodology--if you want a full description you can find it on my website or if you have any questions feel free to leave a comment. Basically, I first calculate a raw park factor (based on runs or home runs per game), accounting for the fact that the “true neutral” context includes a small percentage of games played in their home park as well as the road games (I do not, however, weight by the actual number of games played in each park). Then I take the average of that number and one, to account for the fact that only half of the games are actually played at homes (this means that the final result is meant to be applied to total statistics, not simply home statistics). Then I regress this figure towards one, with less weight given to one as the number of years of data we have for the park increases.

Here, are the PFs for 2006 (listed Run, HR):
ARI: 106, 106
ATL: 99, 98
BAL: 98, 103
BOS: 102, 94
CHA: 102, 113
CHN: 101, 107
CIN: 101, 108
CLE: 97, 93
COL: 112, 112
DET: 97, 94
FLA: 96, 93
HOU: 101, 105
KC: 100, 93
LA: 96, 104
LAA: 97, 94
MIL: 100, 103
MIN: 100, 95
NYA: 98, 101
NYN: 97, 95
OAK: 99, 100
PHI: 103, 108
PIT: 100, 95
SD: 94, 93
SEA: 96, 96
SF: 99, 90
STL: 99, 97
TB: 99, 98
TEX: 107, 108
TOR: 103, 107
WAS: 97, 94

There has been some talk about how Coors Field has played differently this year, often attributed to more aggressive use of the humidor. Coors Field did have its lowest raw home/road run ratio of the last five years--just 1.15, compared to a previous low of 1.24 in 2003. However, this still made it the second most offense-friendly park in the majors this year (Great American in Cincinnati was at 1.15 as well; Royals and Bank One were in that neighborhood as well. Side note: Yes, I know its not called Bank One anymore. Or Royals for that matter. I don’t care, I use the name I’m used to). Some people will argue that this is a fundamental change and that it should be handled differently then other parks, but I’m going to use the five-year average. If somebody could show me that it was a statistically significant difference, that would be one thing, but I haven’t seen evidence to that effect and anecdotes about the ball being moister or whatever the claim is don’t fly with me. However, I will give you the results by a couple alternative ways I could calculate the Coors PF (I certainly do not hold all of these as equally valid)
1) Treat 2006 as one year, don’t regress. Then the PFs would be 107, 108
2) Treat 2006 as one year, regress towards 1 like I would for another park. 104, 105
3) Treat 2006 as one year, but regress towards a historically normal Coors PF like 1.15 instead. 110, 111

If I was going to do something out of the ordinary, it would be option 3. As MGL has pointed out, parks shouldn’t be regressed towards 1. Each park should be regressed towards its own expected value, which we could determine based on a number of factors such as altitude, fence distance, surface type, foul territory, weather, outdoor/dome, etc. However, I have not undertaken the research that would be necessary to produce these custom values, and I don’t believe that anyone else has, at least not published and with a formula that can be used to combine all of the factors (obviously studies have been done into the more specific areas, but you would have to consider all of them in order to properly do what I am talking about).

So since we know that Coors Field has traditionally played as a very strong hitter’s park, and since we know that altitude is one of the largest factors in how a park should play (and that Coors is clearly at a very high altitude, i.e. very conducive to offense), it is silly to expect that the unseen mean we are regressing to for it is 1. That is what I have done in calculating all the parks, of course, but when we have five years of data, the differences are fairly negligible (If I didn’t regress the five year Coors results at all, I would get 114, 113 versus the 112, 112 I show above). For one year, though, it would definitely be an issue, especially for a park like Coors that we expect to be extreme, even if the balls are damp or whatever exactly it is they are supposed to be this year that they weren’t in the past.

Wednesday, September 27, 2006

Evaluating Pitcher W%, Pt. 2

As discussed in the first part of this series, an overlooked aspect of the traditional NW%/WAT approach is that it makes certain assumptions about how a team achieves its winning percentage (namely, all through the efforts of non-pitchers). So why not attempt to improve our methodology by using a more realistic model of W% causation?

I should note at this point that a lot of the ideas I am going to discuss were first published by Rob Wood in the August, 1999 edition of By the Numbers (see link to BTN archives on the right side of the page). While my results may not exactly match his, and my explanation is my own, it would be disingenuous to not acknowledge that he did this stuff first.

For any given team, our best assumption will be that their offense and defense are equally responsible for the team’s deviation from .500. Certainly this assumption will be wrong in some cases, worse then using the Oliver or Deane assumptions. But it will be correct more often and the overall error introduced by this assumption will be less then for others.

To keep things simple, I will assume that all defense is pitching. This is an obviously faulty assumption, but it will keep things workable, and again while this assumption will not always hold, it is better to assume that all pitching is defense then to assume that all deviation from .500 is the product of the offense. If one wanted to get even more precise then we are going to, they could introduce a correction for this.

Suppose we have a pitcher working for a team with Mate .540, who goes 15-10(.600 W%). His NW% and WAT under Oliver are .540 and +1. Under the Deane method, they are .565 and +1.63.

However, we are now going to assume that this team has pulled itself away from .500 through equal efforts by the offense and the pitching (excluding the pitcher in question of course, since we have removed his decisions from the rest of the team’s when calculating Mate). Using the Pythagorean theory, we can write this equation:
Mate = x^2/(x^2+(1/x)^2)

x is the percentage of league average runs the team must score (or inversely allow) in order to achieve a given W%. x can be solved for:
x = (Mate/(1-Mate))^.25
Thus, we expect a .540 team to score runs at 104.1% of the league average and allow runs at 96.1%. Once we know this, we can calculate the W% that we expect the team to have given only the non-average offense (since we are assuming that all defense is pitching and the other pitchers are irrelevant when evaluating our pitcher) to have a W% of x^2/(x^2+1), in this case .520.

We can also solve for the implied runs allowed ratio of our pitcher. He achieved a W% of .600 on a team with an offense that scored at 1.041% of average, so:
W% = x^2/(x^2 + y^2)
.6 = 1.041^2/(1.041^2 + y^2)

y can be solved for as:
y = x*sqrt((1-W%)/W%)
Or in this case, y = .85.

Now we know that by achieving a .600 W% for a team of this caliber, the pitcher’s performance was equivalent to allowing runs at 85% of the league average. To calculate his Neutral W%, we put him on a team with an average offense, and find that 1/(1+.85^2) = .581. That makes his WAT +2.03--significantly different then the Oliver and Deane estimates, because they (largely) assume that only the offense has caused the team to rise above .500, whereas we are assuming that it is a joint and balanced effort between the offense and the other members of the pitching staff.

We can generalize this for non-2 exponents as x = (Mate/(1-Mate))^(1/z), y as x*((1-W%)/W%)^(1/z), and NW% as 1/(1 + y^z), where z is the exponent we are using. But is any of this really necessary?

We found that a balanced .540 team would allow a .500 pitcher to be a .520 pitcher. If we simply calculate NW% for our pitcher as .600-.520+.500, as we did for Oliver, we find .580--pretty much equivalent to our convoluted Pythagorean approach. Not only that, but .520 is also equal to the average of .540 and .500. So can we just use this kind of approximation?

While I am a strong advocate of using methods that are theoretically sound across as many potential contexts as possible, practically we only care about the real range of major league teams, which I’ll just assume for the modern times are bounded between .250 and .750. If we find our expected W% for a .250 team with only the offense, it is .366. The average of .5 and .25 is .375, a difference of less then 3%. So it seems pretty safe to use this simplified assumption, and not screw around with all of the Pythagorean calculations. I should also note that there is a further error introduced when you eschew the calculation of the pitcher’s NW% through the Pythagorean approach as well. So there are two sources of error 1) is estimating the comparison level as the midway point between Mate and .500 and 2) is estimating that the pitcher’s NW% will be the same linear difference from .500 as his W% is from the estimate in part 1).

Again, I should note that this is the conclusion that Rob Wood came to, and also the same as Tango Tiger’s quick and dirty method linked below, in response to the first installment of this series. (As an aside, that is the problem with doing series in installments as I am wont to do, and at the same time having a few smart people read it. They figure out what you are doing, or what you should be doing, before you post it. I should either scrap the installment approach or not allow any readers). So, to summarize the quick approach:
NW% = W% - (Mate + .5)/2 + .5

How does this kind of approach change the standing of the historical pitchers discussed in the first installment? Well, Red Ruffing now has a NW% of .521 and +10.5 WAT. Still not in the league with many other Hall of Fame pitchers, but far from concluding that he was a true sub-.500 pitcher. Steve Carlton in 1972 now has a NW% of .846 and +12.8 WAT. Interestingly, he does better under this approach then the Deane approach, probably because the Deane approach looks at the percentage of possible improvement. But in Carlton’s case, the offense is, at least by our assumptions, so bad, that even an otherworldly performance can only do so much to raise the team’s fortunes.

In a third and final installment, which I promise will be posted by the end of the decade, I will look at replacement level and ask the question “Why even compare to Mate at all”, and perhaps throw in some other odds and ends.

Rob Wood in Aug 99 BTN(pdf)

Tango's blog entry

Thursday, August 24, 2006

Evaluating Pitcher Winning %, Pt. 1

How best to evaluate a pitcher’s W-L record? While it has plenty of contextual biases, one that it does not have is park/era, since W% always is .500 for the league as a whole. This makes pitcher win-loss record a fairly interesting thing to look at, at least on the career level.

But of course the biggest pollution is the quality of the team around him. So it only seems natural that for many years, would-be sabermetricians have compared a pitcher’s W% to that of his team. Usually, this comparison is done only after the pitcher in question’s decisions have been removed. The reasoning for this is that we do not want to compare the pitcher to a standard that he himself has contributed to. Anyway, Ted Oliver’s Weighted Rating System was the first such approach, and the one most commonly used:
Rating = (W% - Mate)*(W + L)

Where Mate, to borrow a designation from Rob Wood, is the W% of his teammates (TmW - W)/(TmW + TmL - W - L). Oliver’s rating gives a number of wins above what an average teammate would have achieved in the same number of decisions. We could also call this Wins Above Team as Total Baseball does.

A related question is what is the projected W% of this pitcher on an otherwise .500 team? I’ll call this Neutral W%, to use the same abbreviation but a different name then Bill Deane does, so that my general term won’t get confused with his specific one. For the Oliver approach:
NW% = W% - Mate + .500

If this is not intuitively obvious, consider a 20-10 pitcher on a .540 Mate team. His WAT is (.667 - .500)*(20 + 10) = +3.8. If he is 3.8 wins above average in 30 decisions, this implies that he is 3.8 wins better then 15-15, or 18.8-11.2. This is an equivalent W% of 18.8/30 = .627, the same result as .667-.540+.500.

What begins to become clear as you look at how the method works is that it assumes that a .500 pitcher on this team would have a .540 record. This means that all of the team’s deviation from .500 is attributed to the offense or fielders. This assumption is clearly wrong, at least for a randomly selected team--given a random team, we should assume that they are equally skilled on offense and defense. Obviously, in some cases this assumption will be dreadfully wrong--but it will be correct more often then assuming that EVERY team deviates from .500 only because of offense and one particular pitcher whom we isolate to calculate his WAT/NW%.

We can find some historical examples where the assumption of the Oliver method really causes problems. The most notorious case is that of Red Ruffing, who so far as I know is the only Hall of Fame starter with a W% worse then that of his teammates. For his career, Ruffing was 273-225(.548), while the rest of his team was .554. This is a .494 NW% and -3 WAT. As a side note, WAT is also equal to (NW%-.500)*(W+L).

Ruffing did pitch for Yankee teams with great offenses, but he also had mound teammates like Lefty Gomez, Johnny Allen, and Spud Chandler (at various times). In 1936, for example, Ruffing was 20-12(.625), while the rest of the team was .678, for -1.7 WAT and a .447 NW%. His team did score a whopping 1065 runs, but they also led the league with 731 runs allowed. An average pitcher in the 1936 AL (who would have a 5.67 RA), would figure to have only a .594 record if supported by New York’s 6.87 runs/game.

We’ll check in on Ruffing more as we go. Bill Deane, formerly a Senior Research Associate at the Hall of Fame, developed his own method to divorce a pitcher’s W% from that of his team. Deane’s insight was that the further above .500 a team’s W% was, the less margin there was to improve upon it. A .500 team could be bettered by .500; a .625 team only by .375. A bad team could be improved by even more. So Deane rated equally pitches who improved their teams by equal percentages of the potential margin.

A .550 pitcher on a .500 team improved his team by .050 out of a possible .500 (10%); so did a .460 pitcher on a .400 team (.060/.600 = 10%). Thus, they are each credited with the same .550 NW% (Deane used the term Normalized W% for this). If it is not clear why the normalized percentage should be .550 for each pitcher, it is because a .500 team has a .500 margin for improvement, and 10% of .500 is .050. Following this logic, Deane would up with these formulas for NW%:
If W% >= Mate:
NW% = (W% - Mate)/(2*(1 - Mate)) + .500
If W%< Mate:
NW% = .500 - (Mate - W%)/(2*Mate)

The second formula comes from the fact that on a .600 team, there is a .600 margin for lowering the W%; a .550 pitcher did this by 8.33%, so .0833*.5 = .042, for a NW% of .458. Total Baseball (unlike Thorn & Palmer’s earlier Hidden Game which used Oliver’s formula) used Deane’s methodology to calculate WAT. A poster child for considering the margin for improvement is Steve Carlton in 1972, who was 27-10(.730) for a team that was otherwise 32-87(.269). Under the Oliver methodology, this is a nearly impossible .961 NW% and +17.1 WAT. Using Deane’s approach, it is an .815 NW% and +11.7 WAT (still the highest since Lefty Grove in 1931).

How does Ruffing fair under this approach? Career-wise, since his W% was so close to Mate to begin with, not much changes--he now sports a .495 NW%(v. .494) and -2.7 WAT(v. -3). In 1936 he moves from .447 to .461 and from -1.7 to -1.3 WAT.

Thursday, August 10, 2006

Third and Third

As I write this, yesterday Oregon State, pretenders to the abbreviation of OSU, won the College World Series (yes, I really did write this in June). I figured it would be a good time to look back at the season of The OSU.

The Buckeyes finished third in the B10 regular season, crippled by a sweep in the heart of darkness. Northwestern shockingly was able to grab second after a horrific non-conference performance. Minnesota had a second consecutive year where they were not a major player in the race for the regular season title, but still qualified for the six-team tournament, which was filled out by Purdue and Illinois.

In the tournament, OSU beat Purdue and Northwestern but were tripped up in the winner’s bracket final by Minnesota and then lost the loser’s bracket final to those who shall not be named. They who shall not be named beat Minnesota two straight as the Gophers for the second straight year placed second in the tournament (to OSU in 2005). So those who shall not be named got the only B10 bid to the NCAA tournament.

While the Buckeyes fell short of a championship, they still had a solid season. Considering all games, OSU was second in W% at 37-21, .638 (those guys led at .672). But the Buckeyes paced the conference in EW%(.721; Minnesota was second at .627) and PW%(.728 with Minnesota second at .628). The Buckeyes also led in R/G(6.66; MSU second at 6.18) and RA/G(4.17; the bad guys second at 4.42). Northwestern’s W%, EW%, and PW% were .411(ninth of ten), .467(fifth), and .417(ninth), a simply bizarre combination for a second-place team. They were lucky that they did not have to face OSU, but even had they played the Bucks and been swept they would have finished in the first division.

With that, I will take a look at the individual performances of OSU players. Incidentally, all of the spreadsheets I used will be posted soon on my website if you are interested. Offensively, the Buckeyes were led by B10 MVP Ronnie Bourquin, the third baseman who was a second round pick to the Tigers. He narrowly missed the B10 triple crown, and hit 416/490/612 with 67 RC, a 12.2 RG(versus a conference average of 5.63, and +36 RAA. As you can see, his ISO was .196, but various scouting reports I saw before the draft said that he had power potential he had not shown in games. I have no trouble believing this, and can certainly understand why nobody on the collegiate level tried to mess with the form of a .400 hitter.

The Ohio offense was solid from top to bottom--sophomore centerfielder and leadoff man Matt Angle improved greatly, with a .449 OBA, 25-29 stealing, and +21. Sophomore catcher Eric Fryer was great again, with more power but less walks then Angle, resulting in nearly identical values(Angle created 54 runs and 9.2 per game; Fryer 54 and 9.2 per game). Senior captain and eighth round Oriole selection Jeddidiah Stephen finished his career at +16, 8.2, and his junior double play partner Jason Zoeller was second to him on the team in isolated power, +11 runs and 7.8 per game.

Junior Jacob Howell struggled through hamstring injuries, but hit a sizzling 402/448/500, 9.9, +15 RAA when able to play. The two weak spots in the lineup were Justin Miller, a freshman first baseman who started slowly but improved as the year went on, finishing at 4.3, -5. Junior rightfielder Wes Schirtzinger struggled greatly at the plate, with 257/321/296, 3.8, -11. The other hitters with over 100 PA were freshman OF/1B/P JB Shuck (6.1, +2) and DH Adam Schneider (4.7, -4).

The Buckeye pitching was solid again, tops in the B10 without a real standout. The ace was junior lefty Dan DeLucia with a 3.67 RA, +27 RAA, and 5.8 K per game, which may be why he went undrafted. Cory Luebke, a 22nd round pick of the Rangers as a draft eligible sophomore was 4.34 and +15. Freshman Jake Hale was the (relative) weak link at 4.92, +7. B10 Freshman of the Year JB Shuck probably looked better with traditional stats, as is 4.56 RA was a full two runs higher then his 2.51 ERA. Shuck, depending on your perspective, was victimized by his defense or had some mistakes obscured by the silly points of the earned run rule. His 4.52 eRA and .298 H/BIP lead me to the latter. But for a freshman, 79 innings and 12 runs above average is nothing to sneeze at.

There were really only four pitchers who got significant innings out of the pen. Rory Meister served as closer and had a 4.36 RA despite a 5.76 eRA. His control was very poor, walking 28 in 33 frames, but his H/BIP was a very high .382. Josh Barerra, a true freshman, had similar issues, walking 20 in 38 innings with a 5.68 RA and 7.16 eRA but a .429 hit rate. Both pitchers struck out a lot of batters and have shown evidence that they can be effective, but certainly need some polish. Trey Fausnaugh was pounded again with a 6.11 RA and 8.13 eRA. As were the other key relievers, he was victimized by a high hit per BIP rate at .409. Dan Barker was good again in 4 starts and 14 relief appearances, with a 4.15 RA and 3.57 eRA.

This was a fairly young team, but with only two (potentially three if Luebke was to sign with Texas) major losses, and a solid performance, it looks as if Ohio State will once again be a major player in the 2007 Big Ten race.

Thursday, July 13, 2006


I have always considered myself to be much more a fan of baseball in general then of any specific team. This is not the case for me in other professional sports. That is not to say I am not interested in NFL games that do not involve the Browns, but when the Browns were moved, my interest level in the NFL plummeted. I do not at all believe that this would be the case if the Indians were to dissipate.

I have always rooted for all Ohio teams, although since I have lived in the Cleveland and Columbus areas, I am more partial to those teams then to those from Cincinnati (of course, Columbus has just one pro franchise, and it's the only team from Ohio in the NHL, so there's really no conflict there at all. But if it comes down to Indians/Reds or Browns/Bengals, I definitely side with Cleveland).

Anyway, all of this personal rambling is to get the point that my dual status of 1) being a bigger fan of the sport in general then of any specific team and 2) the Reds' status as my second favorite, rather then favorite team is quite a fortunate thing today. Otherwise, I would be infuriated that the Reds traded away two everyday players, 26 years of age, and both rated as above average hitters for their position a year ago (Lopez by quite a bit, Kearns by the skin of his teeth), in exchange for a couple of relief pitchers, Royce Clayton, and Brendan Harris. It is simply unbelievable to me.

And what really makes it galling is that Jim Bowden, whose tenure to this point in Washington has been embarassingly bad, has completely stuck it to his old employers.