Walk Like a Sabermetrician: October 2010

Tuesday, October 26, 2010

The Two Best Events in Sports

You can have the Super Bowl, the Stanley Cup, and the NCAA Tournament (except for the games involving my alma mater). Take the Olympics, the Masters, and the World Cup (please take the World Cup, I beg you). Just leave me the World Series and the Breeders' Cup.

Those two events are by far the most compelling (IMO should have gone without saying) in all of sports. Coincidentally, they both occur in autumn, sometimes even overlapping. This year, barring a horrible streak of rainouts, the World Series will have concluded before the horses hit the track at Churchill Downs, but with both events coming up I will bore you with a few of my stray thoughts.

* I do not have a rooting interest when it comes to who wins the World Series, seeing as I don't particularly care for either club. I do have a rooting interest in the series though--rooting for seven games. There has not been a World Series game seven since 2002--also a series that matched San Francisco against an AL West club, apropos of nothing.

If one assumes that the outcomes of each game of a series are independent, and that both teams are equally matched with constant strength and no home field advantage, then the probability of a series of length N is as follows (geometric distribution):

4 = 12.5%
5 = 25%
6 = 31.25%
7 = 31.25%

The probability of going seven years without a seventh game is (1 - .3125)^7 = 7.3%; it's not a particularly likely streak, but it's not remarkable either. I'd still like to see it come to an end in 2010.

* I don't believe there's much value in handicapping a seven-game series, but I would give Texas an edge, something like a 55% chance of winning. There's even less value in doing so on the basis of full-season team records, but I'll proceed for the sake of discussion.

San Francisco had a better actual W% (.568 to .556) and expected W% (based on runs scored and allowed, .581 to .564), but Texas had the edge in predicted W% (based on runs created and runs created allowed, .557 to .543). However, these comparisons don't take into account strength of schedule, which can be a significant factor between the unbalanced schedule and the AL/NL imbalance.

My crude ranking system (yet to be published, as it will take a long, boring post to explain it) gives the Rangers the edge on two of thee comparisons when SOS is taken into account. Based on W/L, Texas has a rating of 121 to San Francisco's 118 (or a 51% chance of winning a seven-game series with no HFA). Based on R/RA, San Fran leads 129 to 119 (54%). Based on RC/RCA, it's Texas 123 to 110 (56%). Considering that, I think 55% is a reasonable estimate.

* From a preseason perspective, when's the last time there was a more surprising World Series than TEX/SF? I intend that as a rhetorical question, as the answer depends on your own perspectives on the teams before the season. For me, it's probably the most surprising since 2005. I picked Texas second in the AL West and San Francisco fourth in the NL West.

There have been other pennant winners that I did not pick for the playoffs (a long list, in fact, owing to both my misjudgments and the inherent inaccuracy of the accuracy), but I had picked both 2005 pennant winners to finish fourth in their division, so that one stands out.

While record in the preceding season is far from a perfect measure of preseason expectations, it might be instructive to look at the combined previous season W% of the two World Series participants. In the expansion era (1961-), the average pennant winner played .550 ball in year X-1. Both TEX (.537) and SF (.543) were below average, although not by a huge margin. Combined, their .540 W% ranks 31st out of the 49 World Series.

Several series in the twenty-first century have been lower, including 2001 NYA/ARI (.533), 2006 DET/STL (.528), 2002 ANA/SF (.509), 2007 BOS/COL (.500), and 2008 TB/PHI (.478). This should not be too surprising, since the expanded playoffs have had the effect of reducing the same season W% of pennant winning teams.

The highest previous season W% of the era was on display in the 1999 NYA/ATL series, by a huge margin; the two teams had combined for a .678 W% in 1998. 1962 NYA/SF (.614), 1970 BAL/CIN (.611), 1978 NYA/LA (.611, and a World Series rematch), and 1964 NYA/STL (.610) are the other high points. The highest of this decade was surprisingly 2003 NYA/FLA; the Marlins were below .500 in 2002, but the Yankees' 103-58 carried the combination to .563.

The two lowest X-1 W% combinations both involve the Twins. Not surprisingly, the worst-to-first MIN/ATL series of 1991 is last at .429, with the 1987 MIN/STL series at .464. The other series featuring teams that had combined to be sub-.500 in the previous season were 1988 OAK/LA (.475), the aforementioned 2008 TB/PHI, 1967 BOS/STL (.478), and 1965 MIN/LA (.491).

The Twins also account for another dubious distinction; their three World Series are the only ones in the expansion era in which both teams were sub-.500 the previous season. Both the 1965 and 1987 series saw Minnesota playing an opponent that had won the NL pennant in year X-2, but struggled in year X-1 before rebounding and taking the flag back.

I'm now descending from "vaguely interesting trivia" to "absolutely worthless drivel", but it's something I noticed looking over the data. This series features one of the closest year X-1 matches between the two participants, with just one game difference between them (TEX was 87-75, SF 88-74 in 2009). The only perfect match of the era is the 1985 STL/KC series (both were 84-78), with 1965 MIN/LA (79-83, 80-82) and 1980 KC/PHI (85-77, 84-78) also off by just one game.

* I have to admit feeling a twinge of happiness with the Phillies' defeat. The Phillies were, both in my estimation and the conventional wisdom, the strongest NL team in 2010. But when a national baseball writer (even if it is a demonstrated fool like Tracy Ringolsby) picks a team to sweep through the playoffs 11-0, it's hard for me to not root against them. It wasn't just Ringolsby--the Vegas notion that the Phillies were 2-1 favorites to win the World Series is tough to defend logically.

The recent Phillies are among the more overhyped teams of recent memory. Their regular season records have been good, certainly, but not historically special. Winning two straight pennants (combined with the third that some members of the media awarded to them) caused a lot of people to downplay the regular season record.

I looked at the team with the best record in the NL over each three-year period beginning in 1961. Obviously, there is nothing special about this approach, no reason to think that looking at three years is better than looking at two or four or using a different approach altogether. It is a timeframe that fits the Phillies' record, as it captures their world title, their pennant, and their best regular season record.

The Phillies' three-year regular season record of 282-204 (.580) is the best in the NL over the last three seasons, but it ranks 28th of 48 in the expansion era, hardly the record of a historically great club. Recent NL leaders with better marks include several combinations of Cardinal seasons (2000-02, 2003-05, 2004-06) and all of the three-year groups formed from Atlanta's 1990s run.

At least to this point, the Phillies' would-be-dynasty is certainly no better than St. Louis' 2004-2006. The Cards record over that period was 288-197 (.594). Their postseason results were the same as the Phillies: a World Series win, a World Series loss, and a NLCS loss.

Absolutely worthless drivel: the best three-year NL record during the period was 310-176 (.638), compiled by the 1997-99 Braves. If you want one which includes a World Series win, it's the 1974-76 Reds (308-178, .634). The lowest three-year mark by a team which led the NL over that stretch is 260-226 (.535) by the 1982-84 Phillies.

* The main storyline for the Breeders' Cup revolves around Zenyatta. For those of you who may be unfamiliar with horse racing, Zenyatta is a six-year old mare that has raced nineteen times in her career and has never be beaten. Nineteen straight is the longest winning streak in major North American horse racing, surpassing the streaks of sixteen compiled by Citation and Cigar. Most of Zenyatta's victories have come in races against other fillies and mares, but after winning the Breeders' Cup Distaff in 2008 (I refuse to refer to this as the "Ladies' Classic" as is now proper), she became the first female ever to win the Breeders' Cup Classic in 2009.

Zenyatta will be retired after the race, and so there would be obvious interest in the final start of a legend, let alone the fact that she could finish her career perfect and become just the second horse to win the Classic twice (Tiznow repeated in 2000-01). She also could become the first horse to win three Breeders' Cup races.

This will be a tough task, however. The Zenyatta-doubters (a group which I admittedly would include myself in) will point out that she has run most of her races over a synthetic surface and in California, and that her only race against males was the 2009 Classic (which, in fairness, is the premiere race in North America).

Zenyatta certainly has a good chance to win, and I can even get behind the idea that she's the deserving favorite. However, if you let me have a choice between Zenyatta and the field, that's easy. Quality Road, Blame, and Lookin' at Lucky all should get support, and there are some other horses of intrigue that may run (like Japanese star Espoir City, the usual European invaders, and second line three-year olds First Dude, Fly Down, and Paddy O'Prado).

* A related Zenyatta storyline that some racing writers have begun to wring their hands over is whether Zenyatta will be Horse of the Year or not. Horse of the Year is voted on by a group of turf writers, similar to the MVP award. However, it's held in a little higher esteem in the horse racing world than the MVP is in baseball circles. A better comparison is college football's national championship, particularly prior to the BCS.

Like the MNC, the winner can be viewed as the overall champion of the season. In college football, you had conference champions (think divisional awards for horse racing, like Champion Three-Year Old Male, Champion Older Female, or Champion Sprinter) and bowl game champions (think Breeder's Cup race winners). There was no unified way to pick an overall champion, so journalists got together and took a poll. The strange thing is that people found themselves intensely emotionally invested in the outcome of that poll, but so be it.

Zenyatta, whose career accomplishments pretty straightforwardly place her among the all-time greats, has never been voted HOTY. Some folks seem to be concerned about this apparent contradiction, a great performer in an individual sport never voted as the best in a given year.

Historically, it is very difficult for a filly or mare to get the nod as HOTY. Since 1971, when the current honors (the Eclipse Awards) were introduced, only four female horses have won the honor: All Along in 1983, Lady's Secret in 1986, Azeri in 2002, and Rachel Alexandra in 2009. Generally, HOTY goes to the top older male horse (which is logical since older male horses are generally the best horses. It's similar to the MNC usually going to the most impressive champion of a major conference). A three-year old male also has a clear path to the award, by scoring impressive victories over older horses (as done by Tiznow in 2000 or Curlin in 2007) or by dominating races against other horses of his generation (Point Given in 2001).

Generally, horses from other groups will only get consideration if there is no clear choice among the top males. Even fillies/mares having undefeated seasons will get passed over in favor of a worthy male (see Personal Ensign in 1988; she even won the Whitney against males but couldn't beat out Breeders' Cup Classic winner Alysheba for HOTY).

That's pretty much what happened to Zenyatta in 2008. She won all seven of her races, but didn't face males or run off of a California synthetic track. Curlin had an impressive season, winning the Dubai World Cup, the Stephen Foster, the Woodward, and the Jockey Club Gold Cup, becoming the all-time earnings leader in the process. He got HOTY.

The 2009 HOTY race was a little less conventional as three year-old filly Rachel Alexandra won the award. Rachel Alexandra had won the prestigious Kentucky Oaks and Mother Goose against other three-year old fillies, but also defeated three-year old males in the Preakness and Haskell and older males in the Woodward. She was not entered in the Breeders' Cup because her owners did not want her to run over a synthetic track; her detractors claimed it was because they wanted to duck Zenyatta.

Zenyatta certainly had a case for HOTY, and would have been a reasonable choice, but I agreed with the decision to side with Rachel Alexandra. Rachel Alexandra won races all over the country; Zenyatta ran only in California. Rachel Alexandra was a dirt horse; Zenyatta stuck to synthetic surfaces, which simply have not yet reached the same level of importance in American racing. Zenyatta's win over males was admittedly in the most impressive race possible, but Rachel Alexandra's three wins over males were all in Grade I races. Perhaps the best point in Zenyatta's favor was that she won at the classic 1 1/4 mile distance, while Rachel Alexandra's longest race was the 1 3/16 mile Preakness.

Looking at the 2010 HOTY race, Zenyatta will win by acclimation if she can repeat in the Classic. She will also be an easy choice if any horse other than Blame, Lookin' at Lucky, or Quality Road win the race. In the event that one of those colts win, though, it would be tough to deny them the award. Each would have a head-to-head win over Zenyatta (and the rest of the field) in the most important race of the year. Blame boasts victories in the Stephen Foster and Whitney; Quality Road in the Donn, Met Mile, and Woodward; and Lookin' at Lucky in the Preakness and the Haskell.

It is possible that Zenyatta could be voted HOTY even in the event that one of her top three challengers wins the Classic. However, I would hope that voters would do this out of a belief that she was the top horse in 2010, not out of a desire to right the historical record. (*) A horse, especially a mare, can easily be considered an all-time great through finishing second three times in the HOTY voting.

(*) If this post wasn't already too long, I would advance a half-baked theory about how this is exactly what happened in 1998, when HOTY voters went for Skip Away instead of Awesome Again, after passing over Skip Away for Favorite Trick in 1997.

Monday, October 18, 2010

Even More Mundane Comments on the Playoff Structure

You certainly don't need me to point out to you that run scoring is down in the playoffs compared to the regular season. I am just going to give you some data on the matter, and half-heartedly explore one possible explanation for why that is.

I figured the RPG (total runs in the game by both teams) for every World Series (through 2008) and a comparison of that to the overall RPG for that season (figured as a simple average of the AL and NL RPG) and the RPG for the two World Series participants (again, a simple average). I have limited the scope to the World Series so that the cross-era comparisons are on more of a level footing.

I've averaged the data by decade (a simple average for 1900-1909, 1910-1919, etc.) so that we can see how it has changed over time:

Frankly, this chart surprised me. I had expected that in recent years the disparity between regular season and World Series RPG levels would have increased, but in fact recent decades are the closest matches for regular season scoring.

Since I assumed this to be the case, I was going to put forth the argument that the playoff structure coupled with changes in the game (specifically, the increased use of relief pitchers) has caused the post-season to become a different game from the regular season, to a greater extent than in the past. My personal take on this phenomenon was to be that it was unfortunate--that the run scoring levels, pitcher usage, strategy choices, etc. should ideally be as close to the same as possible for the regular season and the post-season. I don't like the idea of playing 162 games under one set of conditions and switching to very different conditions to crown a champion.

But my assumption was unfounded. While run scoring declines in the World Series (you can pick your explanation--colder weather, reduced usage of marginal pitchers, increased usage of one-run strategies, or whatever other theory you'd like to advance), the decline has not grown over time. Today's World Series are generally as close to regular season scoring levels as they have ever been.

One other little tidbit to note is that generally, with the 1990s and 2000s actually being the most obvious exceptions, the pennant winners combine for a higher RPG than the majors as a whole. Obviously we expect that pennant winners are very good teams, and will likely both score more runs than the league average and allow less. If a team was equally good offensively and defensively in terms of runs above average, then their RPG would be equal to the league average.

However, if pennant winners were especially strong on defense relative to offense, then their RPG should be lower than the league average. Of course, you can rightly point out that runs scored and allowed have different win values, dependent on their unique combination of runs scored and allowed, and so you don't want to draw too much of a conclusion from this one way or another. Park factors are also ignored by this crude comparison. But if pitching and defense were everything as a minority of traditionalists would have you believe, then pennant winners should certainly have lower RPG than the league average.

Moving along, one possible explanation for lower scoring levels in the post-season is increased usage of top pitchers. I did a little crude investigating on this front by figuring the percentage of regular season innings thrown by a team's top three pitchers (in terms of innings), and comparing that the percentage of World Series innings thrown by the top three pitchers (again, in terms of innings). Please note that I did not consider the same three pitchers--the group under consideration is the three pitchers with the most innings in the games being considered (regular season or World Series).

The reason I chose three pitchers is because presumably the top three in IP will be the front three starters, which is all teams have traditionally needed to use in a seven-game series (of course in today's game four starters are usually employed). There are a number of weaknesses to this approach, including but by no means limited to:

1. It doesn't include the effect of relief aces, who have a disproportionate impact on win probability thanks to working in high leverage situations, and are often employed differently in the playoffs. They are also a relatively modern phenomenon that will damage cross-era comparisons of IP%.

2. It doesn't account for injuries and other factors that alter pitching workloads. If, for instance, a top pitcher is out for the Series, IP% will likely be lower than it might have been, but only because of the absence of the pitcher, not because of any intentional alteration in strategy.

3. IP% can be highly influenced by series length. If a series only goes four games, then it is likely that a larger percentage of the workload can be borne by the key pitchers of the staff.

4. Rainouts or other delays in the series can greatly skew the results by allowing pitchers to pitch more than they would have. This is particularly evident in the 1989 World Series and its earthquake delay; the A's IP% for the series was 88%, the highest in twenty years.

5. I am only considering the World Series; presumably managers are more conservative with the usage of their pitching staffs in earlier playoff rounds, or at least no more aggressive.

So I'm not claiming that these results are particularly informative. Nonetheless, I broke them up by decade as I did for the RPG data. Reg IP% is the simple average of IP% for the two pennant winners, simply averaged for the decade; WS IP% is the same for the World Series; and RAT is the ratio of WS IP% to Reg IP%, expressed as a percentage:

Again, I have to admit this is not what I expected to see. I expected that teams of earlier eras, heavily concentrating their workload on a few pitchers to begin with, would show a more even IP% between the regular season and World Series. The opposite appears to be true; earlier teams ratcheted up the workload for frontline pitchers in the Series to a greater extent than to today's pennant winners. Of course, the weakness in using three pitchers is illustrated by the fact that the ratio was fairly stable until the 1970s, around which time the trend towards larger starting staffs was accelerating.

Again, the data here is by no means conclusive or even particularly insightful. However, I expected to find support for my seat-of-the-pants belief that style of play in the playoffs had become more removed from the regular season over time. Instead, I have no solid ground to stand on to make such a claim (there may well be data out there that would support such a position, but it isn't here).

My argument would have been that since 1) teams could now get a higher percentage of innings from front-line pitchers in the World Series than in the regular season, and 2) that because runs scored declined more precipitously in the World Series, ergo changes should be made to the playoff series format to make it closer to regular season conditions. The most obvious alteration would be to eliminate off-days, which would eliminate the possibility of using a three-man rotation and possibly even encourage the use of five starters as in the regular season.

Leaving aside the practical problems with such a change (chief among them revenue concerns), I personally believe that such modifications would make the playoff series a better test of team strength since they would more closely track the conditions of the regular season. But I didn't find any evidence that the disparity between regular season play and World Series play has increased over time--to the limited extent that the data here addresses the issue, the disparity has actually lessened. Any push for changes that would close the gap is undermined by the fact that larger disparities (at least in terms of these two measures) were accepted throughout the twentieth century.

Tuesday, October 12, 2010

Two Wildcards? Too Many

Apparently, the vast conspiracy that determines what national baseball writers should pontificate about has finally tired of steroids, and has moved on to the pressing issue of whether or not there should be two wildcards in each league. Tom Verducci, Buster Olney, and Jayson Stark have all penned articles on this topic.

I normally ignore the musings of that type of baseball writer, but sometimes it's harder to do that for any number of reasons. It might be that the moral outrage is completely off the scales (as with a typical steroids column) or that the idea is unbelievably stupid (as with the calls for Bud Selig to whitewash events from a game so that Armando Galarraga could be a trivia answer). In this case, not only do I consider the idea stupid, but it would seriously dampen my own enthusiasm for the playoffs.

The folly of wasting one's time on this sort of thing is that just because Jayson Stark advocates something doesn't mean it has a snowball's chance in hell of coming to fruition, and I don't think this proposal is any different. However, in the course of responding I have some potentially interesting data for you on the records of playoff teams in the wildcard era.

In the 32-league seasons since the wildcard was implemented (1995-2010), the average W% for the best team in the league is .620. The second-best division winner averages .583, the third-best .556. The wildcard team is .573 on average, while the team that would be the second wildcard averages .548.

Ten times (31%) the wildcard has had the second-best record in the league, better than every team except the one that bested it for the division title. It has happened 6 times in the AL and 4 in the NL. You might expect that this happens disproportionately when an AL East team wins the wildcard, benefiting the Yankees or Red Sox. That is in fact the case. The wildcard has come out of the AL East twelve times, and in five of those seasons (42%) has had the second-best record in the league. That still leaves five seasons out of 20 (20%) in which the wildcard was not an AL East team and had the league's second-best record.

Only eight times has the wildcard had the worst record of the playoff participants (25%), twice in the AL and six times in the NL. This has become the usual circumstance in the NL, as the wildcard has not bested the #3 division winner since 2004. The opposite holds in the AL, where it has not happened since 1999, when West champ Texas edged out wildcard Boston by one game. In fact, the only other time it happened in the AL was in 1996, when West champ Texas had a better record than wildcard Baltimore.

Of course, these W% comparisons don't account for the strength of the team's schedules, which can be significant in the era of the unbalanced schedule. That would involve some more extensive computations, and I'm not sure it would significantly change the results. I have no doubt that the AL East was easily the strongest division in baseball in 2010, and yet it still managed to produce the wildcard and the team that would have been the second wildcard.

Once one accepts that wildcards are going to be part of the playoff format, I don't think it makes a lot of sense to construct additional barriers to them winning, which is what the second wildcard proposal would do. And while its proponents claim that it would emphasize winning the division, they seem to gloss over the fact that it would inevitably at some point allow a third-place team to qualify for the playoffs? Of course, you could limit the second to wildcard to only second-place teams, but that would further reduce the expected W% of that team and increase dependence on the division format.

And why exactly is winning the division important anyway? Does it really make sense to put more emphasis winning divisions when they consist of uneven numbers of teams and when they are often not even close to being competitively balanced (see the 2009-2010 NL Central)? Why should a future team in the mold of the 2010 Rays have to jump through hoops just because they happen to be a member of the same arbitrary five team grouping as New York? It's bad enough that teams in the West only have to defeat three opponents.

It seems to me as if a lot of the proponents of this plan have really never accepted the expanded playoffs. They yearn for the days in which there were only two divisions, and the possibility existed for two teams with great records to slug it out and one to be shut out. Of course, the reality is that this was less common than some would have you believe (how convenient that the classic ATL/SF race of 1993 happened in the last year of the old format and thus is frozen in time), but they have a point. There is something to be said for using the regular season and not a five-game series to cull the field down to four teams, or even two. I would have no objections with a return to that format.

However, if the expanded playoffs are non-negotiable, then any attempt to punish a wildcard team with an outstanding record only serves to further de-emphasize regular season success in every regard other than defeating the teams in one's own division, while simultaneously giving an opportunity to the other teams that have failed in their divisional races. It does emphasize winning a division, but that division is itself a far cry from the six or seven team groupings that existed in the would-be golden age. And in doing so, it does nothing to emphasize regular season success for teams that are fortunate enough to be grouped with three-five weak teams (hello, 2010 Rangers). The two best teams in baseball slug it out while mediocrities lope comfortably home? Sorry, I don't think that's excitement or upholding tradition--I think it's farcical.

Tuesday, October 05, 2010

Playoff Meanderings

When I write a post to give some thoughts/impressions on the playoffs, I always like to work in a numerical example of why I feel that formally projecting the outcomes is largely a waste of time. These are always repackaged versions of the same idea (assuming independence of outcomes, constant team strength from game-to-game, no home field advantage), namely using the binomial distribution to estimate some sort of probability. Last year I estimated the probability that the Nationals would win the World Series, if only they somehow managed to make the playoffs. This year, I'm going to look at it from the perspective of "How good do you have to be to have an X% chance of winning the pennant?"

I'm going to assume that a team has a constant W% against its playoff opponents, whatever that W% might be, and that the other seven playoff participants are of equal quality. Neither will change from round-to-round. Suppose we have a .500 team. What is the probability they win the pennant given those conditions? That's easy--25%. So how good does a team have to be in order to have a 40% chance of winning the pennant? 10%? 50%?

This chart uses trial and error to estimate that probability. Given the initial assumptions, one can use the binomial distribution to estimate the probability of winning a five game series, then a seven game series. I won't bother reviewing that math because I think most of you probably already know about the binomial distribution, and it is incidental to the point:

The W% listed on the chart is the one that Excel Goal Seek provides, rounded to three decimal places. The takeaway from this chart is that it's just not credible to be supremely confident about which team is going to win the pennant--and yet mainstream fans and pundits will be. To have even a 50/50 chance (given the stated assumptions, of course), a team must be a .606 (98 win) team--relative to its playoff opponents. Even if the playoff opponents are of just .525 true quality, Log5 estimates that the .606 team would be a true quality .630 (102 win) team.

The necessary W% to achieve a given probability only increase of course when the goal is changed from pennant to World Series, and the team must win one five and two seven game series:

I have been amazed by the number of people who should know better who seem to want to hand the Phillies the pennant. The Phillies are clearly the strongest NL team entering the post-season, but I don't for a second believe they have the average 60% chance to win each game that they would need in order to be even a 50-50 shot.

Over the last couple of weeks I've been working on a rating system that takes strength of schedule into account. There's nothing unique about it; countless other people have published similar systems (or the results of said systems) that are based on similar and quite likely better-conceived methodology. It is creatively called crude team ranking (CTR).

Allow me to use these yet unpublished ratings (I need to write up a formal explanation, and that should appear some time during the offseason) to estimate the playoff probabilities. Please don't take the results too seriously, as they have many flaws (they don't consider home field advantage, they don't estimate true talent by using regression or projected performance, and they don't consider specific team personnel). They are based 50% on actual record and 50% on expected record from R/RA and RC/RCA. Still, they offer a reasonable estimate of the various probabilities, and they are very easy to figure with the spreadsheet:

The first column is the team's overall ranking among the 30 MLB teams in CTR. The playoff teams pretty close to being the top eight in CTR; #4 Boston and #8 Toronto did not make the playoffs. Cincinnati has the lowest odds to advance to the second round or win the World Series, but Texas has a lower estimated probability of winning the pennant (since they are guaranteed to play a very highly ranked team in the LCS). Similarly, Philly has the highest first round and pennant odds, but the prospect of a strong AL World Series opponent knocks their championship odds just below that of New York and Tampa Bay.

It's worth noting that even the team with the worst rating in the playoffs (CIN) playing the best team in its circuit is still estimated to have a 41% of winning the five-game series, illustrating again the intended takeaway for this post.

Pushing the system a little bit further still, here are the estimated probabilities of each LCS matchup, along with the probability of each team winning such a series:

Again, the biggest on-paper mismatch possible (NYA/TEX) results in a series in which the underdog has a 40% chance to win, which undersells it just a little because I haven't considered HFA which would go to Texas in such a series. For the World Series:

Now we have a series in which one team has an approximately 2/3 chance of winning (NYA/CIN). If you'd like to see a very evenly matched series, pull for Texas to pull two upsets and make it out of the AL, as the rankings consider them to be about equal with Atlanta and San Francisco, while still being close enough to Philadelphia on the plus side and Cincinnati on the down side to offer an even matchup. The highest individual World Series win probability for the NL in non-PHI series is Atlanta's estimated 46% chance to beat Minnesota.

Other miscellaneous (and equally crude) probabilities:

* Both NYA and TB eliminated in first round: 19%
* NYA, TB, PHI all eliminated in first round: 8%
* All favorites win in first round (TB, NYA, PHI, SF): 9%
* All favorites lose in first round: 4%
* World Series does not feature TB, NYA, or PHI: 26%
* American League wins World Series: 56%

(The rankings favor ATL over SF, but the consensus is definitely the reverse, the two team's ratings are very close, and ATL has suffered key injuries that the system, based on composite regular season performance along, can't account for).

Some random observations:

* BP points out in their series preview that the Reds and Phillies are pretty close in third-order W% (based on RC and RC allowed and adjusted for strength of schedule). My figures agree that it's fairly close, but still have the Phillies in front. However, it is worth pointing out that it is not a matter of the Reds looking better when using what I call PW%; it's the Phillies looking worse. The Reds' W% is .562 and their PW% is .564. They won as many games as you'd expect from their RC/RCA.

* I think the impact of Philadelphia's three starters is being overstated. The Braves should be a pretty good reminder that a collection of great starters is no guarantee of post-season glory (which is not to suggest that winning five pennants is anything to sneeze at). The Astros' staff which is being cited as the last comparable front three lost in the LCS in '04, then were swept in the World Series in '05, outpitched by a group of lesser pitchers who just happened to be having an amazing run. The Astros of 1998 didn't have the same kind of consistent stars that the Braves, later Astros, or Phillies possess, but they did have Randy Johnson plus Mike Hampton, Shane Reynolds, and Jose Lima, and they were quickly dispatched by San Diego.

I'm not saying that great starting pitching is going to harm the Phillies' chances, but it will cause them to be overstated.

* Jon Heyman posted an article with his odds for winning the World Series, which I'll use as an illustration. I don't mean to pick on Heyman, because there are a lot of folks out there who do that already. I've presented the odds based on the charts above as well:

Heyman's odds for most of the teams are fairly reasonable; as you can see, there are three teams we disagree on significantly. Heyman gives Philadelphia 2-1 odds, which is equivalent to having a 1/3 probability. I realize I'm beating a dead horse here, but I can't stress this enough. As demonstrated above, in order to have a 1/3 chance of winning the World Series, a team would need to have an expected W% of close to .600 against their playoff opponents. It doesn't make any sort of logical sense to suggest that any recent team short of the '98 Yankees is even close to that level.

Minnesota at 20-1 is not quite as silly, but is close, for the same reasons. That's a 4.76% chance of winning, and a team would need to have a .425 expected W% versus their opponents to be that big of a longshot. This is not unique to Heymen at all--in general, people tend to overstate the probabilities of the most likely sports outcomes and understate the probabilities of the least likely.

One nice thing about Heyman's odds is that they sum to 100%, almost exactly. I'm actually impressed by that.

* I will close by noting my order of preference for the world championship--you don't care, nor should you, and it has nothing to do with sabermetrics at all. But after seeing the team I was pulling against the most win four out of five World Series from 2001-2005, I find it cathartic to pre-vent:

1. NYA
2. TB
3. ATL
(I would be happy if any of those three teams won)
4. MIN
5. TEX
(I would be unhappy if any of these three won)
6. CIN
7. SF
8. PHI

Monday, October 04, 2010

End of Season Statistics 2010

For the past few years I have been using the same essay to introduce my yearly statistical reports. This is still largely the same explanation as in past years, and much of it is simply copied and pasted, but there are a few new explanations and digressions.

The spreadsheets are published as Google Spreadsheets, which you can download in Excel format by changing the extension in the address from "=html" to "=xls". That way you can download them and manipulate things however you see fit.

The data comes from a number of different sources. Most of the basic data comes from Doug's Stats, which is a very handy site. KJOK's park database provided some of the data used in the park factors, but for recent seasons park data comes from anywhere that has it--Doug's Stats, or Baseball-Reference, or ESPN.com, or MLB.com. Data on pitchers doubles, triples, and inherited runners comes from ESPN.com. Data on pitcher's batted ball types allowed, doubles/triples allowed, and inherited/bequeathed runners comes from Baseball Prospectus.

The basic philosophy behind these stats is to use the simplest methods that have acceptable accuracy. Of course, "acceptable" is in the eye of the beholder, namely me. I use Pythagenpat not because other run/win converters, like a constant RPW or a fixed exponent are not accurate enough for this purpose, but because it's mine and it would be kind of odd if I didn't use it.

If I seem to be a stickler for purity in my critiques of others' methods, I'd contend it is usually in a theoretical sense, not an input sense. So when I exclude hit batters, I'm not saying that hit batters are worthless or that they *should* be ignored; it's just easier not to mess with them and not that much less accurate.

I also don't really have a problem with people using sub-standard methods (say, Basic RC) as long as they acknowledge that they are sub-standard. If someone pretends that Basic RC doesn't undervalue walks or cause problems when applied to extreme individuals, I'll call them on it; if they explain its shortcomings but use it regardless, I accept that. Take these last three paragraphs as my acknowledgment that some of the statistics displayed here have shortcomings as well.

The League spreadsheet is pretty straightforward--it includes league totals and averages for a number of categories, most or all of which are explained at appropriate junctures throughout this piece. The advent of interleague play has created two different sets of league totals--one for the offense of league teams and one for the defense of league teams. Before interleague play, these two were identical. I do not present both sets of totals (you can figure the defensive ones yourself from the team spreadsheet, if you desire), just those for the offenses. The exception is for the defense-specific statistics, like innings pitched and quality starts. The figures for those categories in the league report are for the defenses of the league's teams. However, I do include each league's breakdown of basic pitching stats between starters and relievers (denoted by "s" or "r" prefixes), and so summing those will yield the totals from the pitching side. The one abbreviation you might not recognize is "N"--this is the league average of runs/game for one team, and it will pop up again.

The Team spreadsheet focuses on overall team performance--wins, losses, runs scored, runs allowed. The columns included are: Park Factor (PF), Home Run Park Factor (PFhr), Winning Percentage (W%), Expected W% (EW%), Predicted W% (PW%), wins, losses, runs, runs allowed, Runs Created (RC), Runs Created Allowed (RCA), Home Winning Percentage (HW%), Road Winning Percentage (RW%) [exactly what they sound like--W% at home and on the road], Runs/Game (R/G), R7uns Allowed/Game (RA/G), Runs Created/Game (RCG), Runs Created Allowed/Game (RCAG), and Runs Per Game (the average number of runs scored an allowed per game). Ideally, I would use outs as the denominator, but for teams, outs and games are so closely related that I don’t think it’s worth the extra effort.

The runs and Runs Created figures are unadjusted, but the per-game averages are park-adjusted, except for RPG which is also raw. Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. The formula is:
A = H + W - HR - CS
B = (2TB - H - 4HR + .05W + 1.5SB)*.76
C = AB - H
D = HR
Naturally, A*B/(B + C) + D.

I have explained the methodology used to figure the PFs before, but the cliff’s notes version is that they are based on five years of data when applicable, include both runs scored and allowed, and they are regressed towards average (PF = 1), with the amount of regression varying based on the number of years of data used. There are factors for both runs and home runs. The initial PF (not shown) is:

iPF = (H*T/(R*(T - 1) + H) + 1)/2
where H = RPG in home games, R = RPG in road games, T = # teams in league (14 for AL and 16 for NL). Then the iPF is converted to the PF by taking x*iPF + (1-x), where x = .6 if one year of data is used, .7 for 2, .8 for 3, and .9 for 4+.

It is important to note, since there always seems to be confusion about this, that these park factors already incorporate the fact that the average player plays 50% on the road and 50% at home. That is what the adding one and dividing by 2 in the iPF is all about. So if I list Fenway Park with a 1.02 PF, that means that it actually increases RPG by 4%.

In the calculation of the PFs, I did not get picky and take out “home” games that were actually at neutral sites, like the Astros/Cubs series that was moved to Milwaukee in 2008.

This year, I've added Team Offense and Defense spreadsheets. These include the following categories:

Team offense: Plate Appearances, Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Walks per At Bat (WAB), Isolated Power (SLG - BA), R/G at home (hR/G), and R/G on the road (rR/G) BA, OBA, SLG, WAB, and ISO are park-adjusted by dividing by the square root of park factor (or the equivalent; WAB = (OBA - BA)/(1 - OBA) and ISO = SLG - BA).

Team defense: Innings Pitched, BA, OBA, SLG, Innings per Start (IP/S), Starter's eRA (seRA), Reliever's eRA (reRA), RA/G at home (hRA/G), RA/G on the road (rRA/G), Battery Mishap Rate (BMR), Modified Fielding Average (mFA), and Defensive Efficiency Record (DER). BA, OBA, and SLG are park-adjusted by dividing by the square root of PF; seRA and reRA are divided by PF.

The three fielding metrics I've included are limited it only to metrics that a) I can calculate myself and b) are based on the basic available data, not specialized PBP data. The three metrics are explained in this post, but here are quick descriptions of each:

1) BMR--wild pitches and passed balls per 100 baserunners = (WP + PB)/(H + W - HR)*100

2) mFA--fielding average removing strikeouts and assists = (PO - K)/(PO - K + E)

3) DER--the Bill James classic, using only the PA-based estimate of plays made. Based on a suggestion by Terpsfan101, I've tweaked the error coefficient. Plays Made = PA - K - H - W - HR - HB - .64E and DER = PM/(PM + H - HR + .64E)

Next are the individual player reports. I defined a starting pitcher as one with 15 or more starts. All other pitchers are eligible to be included as a reliever. If a pitcher has 40 appearances, then they are included. Additionally, if a pitcher has 50 innings and less than 50% of his appearances are starts, he is also included as a reliever (this allows some swingmen type pitchers who wouldn’t meet either the minimum start or appearance standards to get in).

For all of the player reports, ages are based on simply subtracting their year of birth from 2010. I realize that this is not compatible with how ages are usually listed and so “Age 27” doesn’t necessarily correspond to age 27 as I list it, but it makes everything a heckuva lot easier, and I am more interested in comparing the ages of the players to their contemporaries, for which case it makes very little difference. The "R" category records rookie status with a "R" for rookies and a blank for everyone else; I've trusted Baseball Prospectus on this. Also, all players are counted as being on the team with whom they played/pitched (IP or PA as appropriate) the most. In a few cases in which it might make a big difference (like Cliff Lee, who split his time between one of the best hitters parks and one of the best pitchers parks in MLB), I have figured his PF separately as a weighted average (by IP) of his two parks, but I have not done this for every player that split time.

For relievers, the categories listed are: Games, Innings Pitched, Run Average (RA), Relief Run Average (RRA), Earned Run Average (ERA), Estimated Run Average (eRA), DIPS Run Average (dRA), Batted Ball Run Average (cRA), SIERA-style Run Average (sRA), Guess-Future (G-F), Inherited Runners per Game (IR/G), Batting Average on Balls in Play (%H), Runs Above Average (RAA), and Runs Above Replacement (RAR).

IR/G is per relief appearance (G - GS); it is an interesting thing to look at, I think, in lieu of actual leverage data. You can see which closers come in with runners on base, and which are used nearly exclusively to start innings. Of course, you can’t infer too much; there are bad relievers who come in with a lot of people on base, not because they are being used in high leverage situations, but because they are long men being used in low-leverage situations already out of hand.

For starting pitchers, the columns are: Wins, Losses, Innings Pitched, RA, RRA, ERA, eRA, dRA, cRA, sRA, G-F, %H, Pitches/Start (P/S), Quality Start Percentage (QS%), RAA, and RAR. RA and ERA you know--R*9/IP or ER*9/IP, park-adjusted by dividing by PF. The formulas for eRA, dRA, cRA, and sRA are in this article; I'm not going to copy them here, but all of them are based on the same Base Runs equation and they all estimate RA, not ERA:

* eRA is based on the actual results allowed by the pitcher (hits, doubles, home runs, walks, strikeouts, etc.). It is park-adjusted by dividing by PF.

* dRA is the classic DIPS-style RA, assuming that the pitcher allows a league average %H, and that his hits in play have a league-average S/D/T split. It is park-adjusted by dividing by PF.

* cRA is based on batted ball type (FB, GB, POP, LD) allowed, using the actual estimated linear weight value for each batted ball type. It is not park-adjusted.

* sRA is a SIERA-style RA, based on batted balls but broken down into just groundballs and non-groundballs. It is not park-adjusted either.

Both cRA and sRA are running a little high when compared to actual RA for 2010. Both measures are very sensitive and need to be recalibrated in order to overcome batted ball-type definition differences, frequencies of hit types on each kind of batted ball, and other factors, so keep in mind that they may not perfectly track RA without those adjustments (which I have not made in this case).

G-F is a junk stat, included here out of habit because I've been including it for years. It was intended to give a quick read of a pitcher's expected performance in the next season, based on eRA and strikeout rate. Although the numbers vaguely resemble RAs, it's actually unitless. As a rule of thumb, anything under four is pretty good for a starter. G-F = 4.46 + .095(eRA) - .113(KG). It is a junk stat. JUNK STAT JUNK STAT JUNK STAT. Got it?

%H is BABIP, more or less; I use an estimate of PA (IP*x + H + W, where x is the league average of (AB - H)/IP). %H = (H - HR)/(IP*x + H - HR - K). Pitches/Start includes all appearances, so I've counted relief appearances as one-half of a start (P/S = Pitches/(.5*G + .5*GS). QS% is just QS/(G - GS); I don't think it's particularly useful, but Doug's Stats include QS so I include it.

I've used a stat called Relief Run Average (RRA) in the past, based on Sky Andrecheck's article in the August 1999 By the Numbers; that one only used inherited runners, but I've revised it to include bequeathed runners as well, making it equally applicable to starters and relievers. I am using RRA as the building block for baselined value estimates for all pitchers this year. I explained RRA in this article, but the bottom line formulas are:

BRSV = BRS - BR*i*sqrt(PF)
IRSV = IR*i*sqrt(PF) - IRS
RRA = ((R - (BRSV + IRSV))*9/IP)/PF

The two baselined stats are Runs Above Average (RAA) and Runs Above Replacement (RAR). RAA uses the league average runs/game (N) for both starters and relievers, while RAR uses separate replacement levels for starters and relievers. Thus, RAA and RAR will be pretty close for relievers:

RAA = (N - RA)*IP/9
RAR (relievers) = (1.11*N - RA)*IP/9
RAR (starters) = (1.28*N - RA)*IP/9

All players with 350 or more plate appearances are included in the Hitters spreadsheets. Each is assigned one position, the one at which they appeared in the most games. The statistics presented are: Games played (G), Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Runs Created (RC), Runs Created per Game (RG), Speed Score (SS), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), Hitting Runs Above Replacement (HRAR), and Runs Above Replacement (RAR).

I do not bother to include hit batters, so take note of that for players who do get plunked a lot. Therefore, PA are simply AB + W. Outs are AB - H + CS. BA and SLG you know, but remember that without HB and SF, OBA is just (H + W)/(AB + W). Secondary Average = (TB - H + W)/AB = SLG - BA + (OBA - BA)/(1 - OBA). I have not included net steals as many people (and Bill James himself) do--it is solely hitting events.

BA, OBA, and SLG are park-adjusted by dividing by the square root of PF. This is an approximation, of course, but I'm satisfied that it works well. The goal here is to adjust for the win value of offensive events, not to quantify the exact park effect on the given rate. I use the BA/OBA/SLG-based formula to figure SEC, so it is park-adjusted as well.

Runs Created is actually Paul Johnson's ERP, more or less. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available. Anyway, RC = (TB + .8H + W + .7SB - CS - .3AB)*.322.

RC is park adjusted by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the rate, is RC/O*25.5. I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).

I have decided to switch to a watered-down version of Bill James' Speed Score this year; I only use four of his categories. Previously I used my own knockoff version called Speed Unit, but trying to keep it from breaking down every few years was a wasted effort.

Speed Score is the average of four components, which I'll call a, b, c, and d:

a = ((SB + 3)/(SB + CS + 7) - .4)*20
b = sqrt((SB + CS)/(S + W))*14.3
c = ((R - HR)/(H + W - HR) - .1)*25
d = T/(AB - HR - K)*450

James actually uses a sliding scale for the triples component, but it strikes me as needlessly complex and so I've streamlined it. I also changed some of his division to mathematically equivalent multiplications.

There are a whopping four categories that compare to a baseline; two for average, two for replacement. Hitting RAA compares to a league average hitter; it is in the vein of Pete Palmer’s Batting Runs. RAA compares to an average hitter at the player’s primary position. Hitting RAR compares to a “replacement level” hitter; RAR compares to a replacement level hitter at the player’s primary position. The formulas are:

HRAA = (RG - N)*O/25.5
RAA = (RG - N*PADJ)*O/25.5
HRAR = (RG - .73*N)*O/25.5
RAR = (RG - .73*N*PADJ)*O/25.5

PADJ is the position adjustment, and it is based on 1992-2001 offensive data. For catchers it is .89; for 1B/DH, 1.19; for 2B, .93; for 3B, 1.01; for SS, .86; for LF/RF, 1.12; and for CF, 1.02.

That was the mechanics of the calculations; now I'll twist myself into knots trying to justify them. If you only care about the how and not the why, stop reading now.

The first thing that should be covered is the philosophical position behind the statistics posted here. They fall on the continuum of ability and value in what I have called "performance". Performance is a technical-sounding way of saying "Whatever arbitrary combination of ability and value I prefer".

With respect to park adjustments, I am not interested in how any particular player is affected, so there no separate adjustments for lefties and righties for instance. The park factor is an attempt to determine how the park affects run scoring rates, and thus the win value of runs.

I apply the park factor directly to the player's statistics, but it could also be applied to the league context. The advantage to doing it my way is that it allows you to compare the component statistics (like Runs Created or OBA) on a park-adjusted basis. The drawback is that it creates a new theoretical universe, one in which all parks are equal, rather than leaving the player grounded in the actual context in which he played and evaluating how that context (and not the player's statistics) was altered by the park.

The good news is that the two approaches are essentially equivalent; in fact, they are equivalent if you assume that the Runs Per Win factor is equal to the RPG. Suppose that we have a player in an extreme park (PF = 1.15, approximately like Coors Field pre-humidor) who has an 8 RG before adjusting for park, while making 350 outs in a 4.5 N league. The first method of park adjustment, the one I use, converts his value into a neutral park, so his RG is now 8/1.15 = 6.957. We can now compare him directly to the league average:

RAA = (6.957 - 4.5)*350/25.5 = +33.72

The second method would be to adjust the league context. If N = 4.5, then the average player in this park will create 4.5*1.15 = 5.175 runs. Now, to figure RAA, we can use the unadjusted RG of 8:

RAA = (8 - 5.175)*350/25.5 = +38.77

These are not the same, as you can obviously see. The reason for this is that they take place in two different contexts. The first figure is in a 9 RPG (2*4.5) context; the second figure is in a 10.35 RPG (2*4.5*1.15) context. Runs have different values in different contexts; that is why we have RPW converters in the first place. If we convert to WAA (using RPW = RPG), then we have:

WAA = 33.72/9 = +3.75
WAA = 38.77/10.35 = +3.75

Once you convert to wins, the two approaches are equivalent. The other nice thing about the first approach is that once you park-adjust, everyone in the league is in the same context, and you can dispense with the need for converting to wins at all. You still might want to convert to wins, and you'll need to do so if you are comparing the 2010 players to players from other league-seasons (including between the AL and NL in the same year), but if you are only looking to compare Josh Hamilton to Robinson Cano, it's not necessary. WAR is somewhat ubiquitous now, but personally I prefer runs when possible--why mess with decimal points if you don't have to?

The park factors used to adjust player stats here are run-based. Thus, they make no effort to project what a player "would have done" in a neutral park, or account for the difference effects parks have on specific events (walks, home runs, BA) or types of players. They simply account for the difference in run environment that is caused by the park (as best I can measure it). As such, they don't evaluate a player within the actual run context of his team's games; they attempt to restate the player's performance as an equivalent performance in a neutral park.

I suppose I should also justify the use of sqrt(PF) for adjusting component statistics. The classic defense given for this approach relies on basic Runs Created--runs are proportional to OBA*SLG, and OBA*SLG/PF = OBA/sqrt(PF)*SLG/sqrt(PF). While RC may be an antiquated tool, you will find that the square root adjustment is fairly compatible with linear weights or Base Runs as well. I am not going to take the space to demonstrate this claim here, but I will some time in the future.

Many value figures published around the sabersphere adjust for the difference in quality level between the AL and NL. I don't, but this is a thorny area where there is no right or wrong answer as far as I'm concerned. I also do not make an adjustment in the league averages for the fact that the overall NL averages include pitcher batting and the AL does not (not quite true in the era of interleague play, but you get my drift).

The difference between the leagues may not be precisely calculable, and it certainly is not constant, but it is real. If the average player in the AL is better than the average player in the NL, it is perfectly reasonable to expect the average AL player to have more RAR than the average NL player, and that will not happen without some type of adjustment. On the other hand, if you are only interested in evaluating a player relative to his own league, such an adjustment is not necessarily welcome.

The league argument only applies cleanly to metrics baselined to average. Since replacement level compares the given player to a theoretical player that can be acquired on the cheap, the same pool of potential replacement players should by definition be available to the teams of each league. One could argue that if the two leagues don't have equal talent at the major league level, they might not have equal access to replacement level talent--except such an argument is at odds with the notion that replacement level represents talent that is truly "freely available".

So it's hard to justify the approach I take, which is to set replacement level relative to the average runs scored in each league, with no adjustment for the difference in the leagues. The best justification is that it's simple and it treats each league as its own universe, even if in reality they are connected.

The replacement levels I have used here are very much in line with the values used by other sabermetricians. This is based both on my own "research", my interpretation of other's people research, and a desire to not stray from consensus and make the values unhelpful to the majority of people who may encounter them.

Replacement level is certainly not settled science. There is always going to be room to disagree on what the baseline should be. Even if you agree it should be "replacement level", any estimate of where it should be set is just that--an estimate. Average is clean and fairly straightforward, even if its utility is questionable; replacement level is inherently messy. So I offer the average baseline as well.

For position players, replacement level is set at 73% of the positional average RG (since there's a history of discussing replacement level in terms of winning percentages, this is roughly equivalent to .350). For starting pitchers, it is set at 128% of the league average RA (.380), and for relievers it is set at 111% (.450).

I am still using an analytical structure that makes the comparison to replacement level for a position player by applying it to his hitting statistics. This is the approach taken by Keith Woolner in VORP (and some other earlier replacement level implementations), but the newer metrics (among them Rally and Fangraphs' WAR) handle replacement level by subtracting a set number of runs from the player's total runs above average in a number of different areas (batting, fielding, baserunning, positional value, etc.), which for lack of a better term I will call the subtraction approach.

The offensive positional adjustment makes the inherent assumption that the average player at each position is equally valuable. I think that this is close to being true, but it is not quite true. The ideal approach would be to use a defensive positional adjustment, since the real difference between a first baseman and a shortstop is their defensive value. When you bat, all runs count the same, whether you create them as a first baseman or as a shortstop.

That being said, using “replacement hitter at position” does not cause too many distortions. It is not theoretically correct, but it is practically powerful. For one thing, most players, even those at key defensive positions, are chosen first and foremost for their offense. Empirical work by Keith Woolner has shown that the replacement level hitting performance is about the same for every position, relative to the positional average.

Figuring what the defensive positional adjustment should be, though, is easier said than done. Therefore, I use the offensive positional adjustment. So if you want to criticize that choice, or criticize the numbers that result, be my guest. But do not claim that I am holding this up as the correct analytical structure. I am holding it up as the most simple and straightforward structure that conforms to reality reasonably well, and because while the numbers may be flawed, they are at least based on an objective formula that I can figure myself. If you feel comfortable with some other assumptions, please feel free to ignore mine.

That still does not justify the use of HRAR--hitting runs above replacement--which compares each hitter, regardless of position, to 73% of the league average. Basically, this is just a way to give an overall measure of offensive production without regard for position with a low baseline. It doesn't have any real baseball meaning.

A player who creates runs at 90% of the league average could be above-average (if he's a shortstop or catcher, or a great fielder at a less important fielding position), or sub-replacement level (DHs that create 4 runs a game are not valuable properties). Every player is chosen because his total value, both hitting and fielding, is sufficient to justify his inclusion on the team. HRAR fails even if you try to justify it with a thought experiment about a world in which defense doesn't matter, because in that case the absolute replacement level (in terms of RG, without accounting for the league average) would be much higher than it is currently.

The specific positional adjustments I use are based on 1992-2001 data. There's no particular reason for not updating them; at the time I started using them, they represented the ten most recent years. I have stuck with them because I have not seen compelling evidence of a change in the degree of difficulty or scarcity between the positions between now and then, and because I think they are fairly reasonable. The positions for which they diverge the most from the defensive position adjustments in common use are 2B, 3B, and CF. Second base is considered a premium position by the offensive PADJ (.94), while third base and center field are both neutral (1.01 and 1.02).

Another flaw is that the PADJ is applied to the overall league average RG, which is artificially low for the NL because of pitcher's batting. When using the actual league average runs/game, it's tough to just remove pitchers--any adjustment would be an estimate. If you use the league total of runs created instead, it is a much easier fix.

One other note on this topic is that since the offensive PADJ is a proxy for average defensive value by position, ideally it would be applied by tying it to defensive playing time. I have done it by outs, though.

The reason I have taken this flawed path is because 1) it ties the position adjustment directly into the RAR formula rather then leaving it as something to subtract on the outside and more importantly 2) there’s no straightforward way to do it. The best would be to use defensive innings--set the full-time player to X defensive innings, figure how Derek Jeter’s innings compared to X, and adjust his PADJ accordingly. Games in the field or games played are dicey because they can cause distortion for defensive replacements. Plate Appearances avoid the problem that outs have of being highly related to player quality, but they still carry the illogic of basing it on offensive playing time. And of course the differences here are going to be fairly small (a few runs). That is not to say that this way is preferable, but it’s not horrible either, at least as far as I can tell.

To compare this approach to the subtraction approach, start by assuming that a replacement level shortstop would create .86*.73*4.5 = 2.825 RG (or would perform at an overall level of equivalent value to being an average fielder at shortstop while creating 2.825 runs per game). Suppose that we are comparing two shortstops, each of whom compiled 600 PA and played an equal number of defensive games and innings (and thus would have the same positional adjustment using the subtraction approach). Alpha made 380 outs and Bravo made 410 outs, and each ranked as dead-on average in the field.

The difference in overall RAR between the two using the subtraction approach would be equal to the difference between their offensive RAA compared to the league average. Assuming the league average is 4.5 runs, and that both Alpha and Bravo created 75 runs, their offensive RAAs are:

Alpha = (75*25.5/380 - 4.5)*380/25.5 = +7.94

Similarly, Bravo is at +2.65, and so the difference between them will be 5.29 RAR.

Using the flawed approach, Alpha's RAR will be:

(75*25.5/380 - 4.5*.73*.86)*380/25.5 = +32.90

Bravo's RAR will be +29.58, a difference of 3.32 RAR, which is two runs off of the difference using the subtraction approach.

The downside to using PA is that you really need to consider park effects if you, whereas outs allow you to sidestep park effects. Outs are constant; plate appearances are linked to OBA. Thus, they not only depend on the offensive context (including park factor), but also on the quality of one's team. Of course, attempting to adjust for team PA differences opens a huge can of worms which is not really relevant; for now, the point is that using outs for individual players causes distortions, sometimes trivial and sometimes bothersome, but almost always makes one's life easier.

I do not include fielding (or baserunning outside of steals, although that is a trivial consideration in comparison) in the RAR figures--they cover offense and positional value only). This in no way means that I do not believe that fielding is an important consideration in player valuation. However, two of the key principles of these stat reports are 1) not incorporating any data that is not readily available and 2) not simply including other people's results (of course I borrow heavily from other people's methods, but only adapting methodology that I can apply myself).

Any fielding metric worth its salt will fail to meet either criterion--they use zone data or play-by-play data which I do not have easy access to. I do not have a fielding metric that I have stapled together myself, and so I would have to simply lift other analysts' figures.

Setting the practical reason for not including fielding aside, I do have some reservations about lumping fielding and hitting value together in one number because of the obvious differences in reliability between offensive and fielding metrics. In theory, they absolutely should be put together. But in practice, I believe it would be better to regress the fielding metric to a point at which it would be roughly equivalent in reliability to the offensive metric.

Offensive metrics have error bars associated with them, too, of course, and in evaluating a single season's value, I don't care about the vagaries that we often lump together as "luck". Still, there are errors in our assessment of linear weight values and players that collect an unusual proportion of infield hits or hits to the left side, errors in estimation of park factor, and any number of other factors that make their events more or less valuable than an average event of that type.

Fielding metrics offer up all of that and more, as we cannot be nearly as certain of true successes and failures as we are when analyzing offense. Recent investigations, particularly by Colin Wyers, have raised even more questions about the level of uncertainty. So, even if I was including a fielding value, my approach would be to assume that the offensive value was 100% reliable (which it isn't), and regress the fielding metric relative to that (so if the offensive metric was actually 70% reliable, and the fielding metric 40% reliable, I'd treat the fielding metric as .4/.7 = 57% reliable when tacking it on, to illustrate with a simplified and completely made up example presuming that one could have a precise estimate of nebulous "reliability").

Given the inherent assumption of the offensive PADJ that all positions are equally valuable, once RAR has been figured for a player, fielding value can be accounted for by adding on his runs above average relative to a player at his own position. If there is a shortstop that is -2 runs defensively versus an average shortstop, he is without a doubt a plus defensive player, and a more valuable defensive player than a first baseman who was +1 run better than an average first baseman. Regardless, since it was implicitly assumed that they are both average defensively for their position when RAR was calculated, the shortstop will see his value docked two runs. This DOES NOT MEAN that the shortstop has been penalized for his defense. The whole process of accounting for positional differences, going from hitting RAR to positional RAR, has benefited him.

I've found that there is often confusion about the treatment of first baseman and designated hitters in my PADJ methodology, since I consider DHs as in the same pool as first baseman. The fact of the matter is that first baseman outhit DH. There is any number of potential explanations for this; DHs are often old or injured, players hit worse when DHing than they do when playing the field, etc. This actually helps first baseman, since the DHs drag the average production of the pool down, thus resulting in a lower replacement level than I would get if I considered first baseman alone.

However, this method does assume that a 1B and a DH have equal defensive value. Obviously, a DH has no defensive value. What I advocate to correct this is to treat a DH as a bad defensive first baseman, and thus knock another five or ten runs off of his RAR for a full-time player. I do not incorporate this into the published numbers, but you should keep it in mind. However, there is no need to adjust the figures for first baseman upwards --the only necessary adjustment is to take the DHs down a notch.

Finally, I consider each player at his primary defensive position (defined as where he appears in the most games), and do not weight the PADJ by playing time. This does shortchange a player like Ben Zobrist (who saw significant time at a tougher position than his primary position), and unduly boost a player like Buster Posey (who logged a lot of games at a much easier position than his primary position). For most players, though, it doesn't matter much. I find it preferable to make manual adjustments for the unusual cases rather than add another layer of complexity to the whole endeavor.

2010 Park Factors
2010 Leagues
2010 Teams
2010 Team Offense
2010 Team Defense
2010 AL Relievers
2010 NL Relievers
2010 AL Starters
2010 NL Starters
2010 AL Hitters
2010 NL Hitters

Walk Like a Sabermetrician

Tuesday, October 26, 2010

The Two Best Events in Sports

Monday, October 18, 2010

Even More Mundane Comments on the Playoff Structure

Tuesday, October 12, 2010

Two Wildcards? Too Many

Tuesday, October 05, 2010

Playoff Meanderings

Monday, October 04, 2010

End of Season Statistics 2010

Me, Elsewhere

Analysis Links

Reference Links

Blog Archive

OSU Baseball

End of Season Statistics

Win Shares Walkthrough

NL 1876-1881 Series

Labels

About Me