Tuesday, November 17, 2009

IBA Ballot: MVP

Disclaimer: Presented below is my ballot (and some justification) for one of the categories in the Internet Baseball Awards hosted at Baseball Prospectus.  I’m just one person, and the whole point of having a vote like the IBA is to get a wide variety of (intelligent) perspectives, and so I will not feel in the list bit slighted if you don’t give a flip about this.  You've been warned.  Also, the RAA and RAR figures that will be cited are my own estimates, detailed here.  Any Leverage Index, WPA, or UZR figures cited are from FanGraphs; any quality of opposition or baserunning figures are from Baseball Prospectus.

The AL MVP debate will not be much of a debate after all--with the Twins' September surge, Joe Mauer should coast to the award. As you will see, I ultimately agree with this, but I think there's a solid case to be made that Zack Greinke was the most valuable player in the American League. The statistical comparison between the two hits on any number of hot spots--pitcher v. hitter, DIPS and fielding support, evaluating fielding, what the most appropriate baseline is--and depending on the judgment calls you make on those matters, it is not that hard to come down on Greinke's side.

RAR favors Greinke, +91 to +82. Mauer is generally considered a solid defensive catcher--let's call it five runs in lieu of a more rigorous estimate. On the other hand, that RAR figure assumes that Mauer is a full-time catcher, when in fact he appeared in 109 games behind the plate and 28 as a DH. That knocks around three runs off his position adjustment, leaving him at +84 (please note that I am overstating the precision of the initial estimates and the subsequent adjustments for the sake of discussion). BPro estimates his non-SB baserunning at -3 runs, which would lower his RAR accordingly.

Greinke's RAR is based on just taking his actual runs allowed into consideration. Suppose that you were to use his dRA (basically, simple DIPS RA) as the fuel for RAR instead. In that case, he would drop to...you guessed it, +84. Greinke allowed a high BABIP (not really a surprise with KC's poor fielding behind him), but DIPS throws the situational pitching baby out with the fielding bathwater.

There's also the matter of baseline. If you use average, Mauer is ahead +67 to +61 before considering his defense. If you use something in the middle, you're liable to end up with another statistical tie.

I'm not going to try to argue for one or the other, just that they're too close to call. The deciding factor for me is that Mauer is a position player and Greinke is a pitcher. I have no problem voting for a pitcher for MVP--my ballots probably average around 2.5 pitchers per league season. But if a pitcher and a position player are in a dead heat, I'm going to side with the position player more often than not. Last year I went with Cliff Lee for AL MVP as no position player turned in a comparable season.

Behind them, Roy Halladay and Felix Hernandez had seasons that would often be good enough to win Cy Youngs, and the rest of the AL hitters collectivley had another year without any real jawdropping performances. So the two hurlers go 3-4, with Ben Zobrist and Derek Jeter the next two position players.

Why Zobrist over Jeter? Zobrist does well in the defensive metrics, but you don't have to put a lot of weight on that to make a reasonable case for him over Jeter. I have Zobrist and Jeter even as offensive players without considering position (60 to 59 RAR, Zobrist's superior rate balanced by Jeter's extra 100 PA). So you only have to believe that Zobrist's fielding was more valuable than Jeter's, not that it was truly spectacular.

Evan Longoria's +52 RAR leave him down the ballot if you go just by hitting, but of course he has a good defensive reputation and his UZR was a whopping +19. Even if you only want to credit him as a +10 fielder, it's enough to vault him past some not particularly impressive fielders.

After yet another pitcher (Verlander), the last two spots on the ballot go to first baseman--Mark Teixeira and Miguel Cabrera. Kevin Youkilis might be the most surprising omission from my ballot, and you can certainly make a case for him over either of those two. Even giving him credit for his time at third base, I have him at +48 RAR versus +55 for Teixeira and +53 for Cabrera. My RAR figures lazily omit hit batters, but giving him another three runs for getting plunked and two runs for fielding (Fangraphs' estimate) leaves him in a dead heat. I went with the other two, but reasonable people will surely differ on this one.

Kendry Morales, on the other hand, will get mainstream MVP support but at +42 RAR, he's well behind the other first baseman, and even a generous (and likely unwarranted) fielding estimate just gets him into the mix. Was he a better value than the man he replaced? Absolutely. But I can't call him a more valuable player.

Victor Martinez ranks fourth in RAR among position players, but doesn't crack the ballot. Why? For one thing, the aforementioned RAR figure treats him as a pure catcher, but in reality 46% of his games played were at first base or DH. Incorporating that into his positional adjustment drops his RAR to +48, thirteenth in the league.

1) C Joe Mauer, MIN
2) SP Zack Greinke, KC
3) SP Roy Halladay, TOR
4) SP Felix Hernandez, SEA
5) 2B Ben Zobrist, TB
6) SS Derek Jeter, NYA
7) 3B Evan Longoria, TB
8) SP Justin Verlander, DET
9) 1B Mark Teixeira, NYA
10) 1B Miguel Cabrera, DET

In the National League, there is one super candidate with no real competition. Despite tailing off a bit in the second half, Albert Pujols recorded what is IMO the best season of his career (although picking between Pujols seasons is like picking between...nah, I'm bad at analogies), finishing second in BA, first in OBA, SLG, secondary average, Runs Created, and all four of the baselined categories I track. His RAR lead is a whopping 21 runs over Hanley Ramirez, and there's no amount of finessing the numbers that will close that gap.

Behind him, it is too close to call between Hanley Ramirez and Chase Utley once you give Utley credit for fielding and getting hit...Ramirez is +80 RAR, but you can't give him a big fielding number, while Utley is +64 with a very believable +12 UZR and some runs lying around from plunkings and baserunning. I went with Ramirez because I trust the offensive numbers more, but I wouldn't argue one bit if you think Utley was more valuable. Utley's oft-overlooked contributions allowed him to pass the two big first base bats, Prince Fielder and Adrian Gonzalez, but they are next on my ballot, with Gonzalez getting a narrow edge due to his fielding prowess (he trails 77-74 in RAR).

Ryan Zimmerman had a +18 UZR, which at full credit would put him ahead of the first baseman. I hedge a little bit and place him behind them, followed by a cavalcade of pitchers and Troy Tulowitzki:

1) 1B Albert Pujols, STL
2) SS Hanley Ramirez, FLA
3) 2B Chase Utley, PHI
4) 1B Adrian Gonzalez, SD
5) 1B Prince Fielder, MIL
6) 3B Ryan Zimmerman, WAS
7) SP Tim Lincecum, SF
8) SP Chris Carpenter, STL
9) SP Adam Wainwright, STL
10) SS Troy Tulowitzki, COL

Tuesday, November 10, 2009

IBA Ballot: Cy Young

Disclaimer: Presented below is my ballot (and some justification) for one of the categories in the Internet Baseball Awards hosted at Baseball Prospectus. I’m just one person, and the whole point of having a vote like the IBA is to get a wide variety of (intelligent) perspectives, and so I will not feel in the list bit slighted if you don’t give a flip about this. You've been warned. Also, the RAA and RAR figures that will be cited are my own estimates, detailed here. Any Leverage Index, WPA, or UZR figures cited are from FanGraphs; any quality of opposition or baserunning figures are from Baseball Prospectus.

In the American League, the top spot is a no-brainer. Zack Greinke was just eleven innings off the league lead (ranking sixth) and lead the AL in RA, ERA, eRA, dRA, RAA, and RAR. His +91 RAR was the highest for any pitcher season since 2001.

Behind him, the race for second is close as both Roy Halladay and Felix Hernandez had tremendous seasons that are hard to tell apart at first glance--their RAs differ by just .03 with a 1/3 inning difference. Hernandez had a lower ERA and eRA, but their dRAs were just about equal. The deciding factor for me is Halladay's slightly higher quality of opposition--5.1 to 4.9 in RG, a difference of around 4 runs over a full season. You can't go wrong choosing between these two.

Justin Verlander is a fairly clear #4 for me, leaving two rival lefties to duke it out for fifth--Jon Lester and CC Sabathia. I went with Lester, but that's another race that is too close to call. Sabathia has the innings edge, but Lester has a lower RA and the peripherals are split (Lester had a better eRA, Sabathia a better dRA):

1) Zack Greinke, KC
2) Roy Halladay, TOR
3) Felix Hernandez, SEA
4) Justin Verlander, DET
5) Jon Lester, BOS

In the National League, it's the race for the top that's too close to call. Either Tim Lincecum or Chris Carpenter would be very deserving should they win. Carpenter had a lower RA, but Lincecum pitched a lot more. The net difference between the two is an extra 18 runs in 32 innings (a RA of 5.06). That level of performance is close enough to replacement level that Lincecum's RAR lead is just two, which is by no means conclusive.

Their eRAs are about equal; Lincecum has a clear advantage in dRA. Carpenter has the better win-loss record, which I mention although I put no stock in it. They are about equal in quality start percentage. Quality of opposition is no help, as Lincecum's opponents combined for a 4.5 RG and Carpenter's 4.4. With so little to separate them, I stick with the RAR order, but this is certainly a race that could go either way--just like Lincecum v. Santana, 2008.

Adam Wainwright and Dan Haren take positions 3 and 4, while I went with Javier Vazquez and his superior peripherals over teammate Jair Jurrjens and Matt Cain, as all of them are separated by just 2 RAR. But no one really cares about fifth-place on an IBA Cy Young ballot:

1) Tim Lincecum, SF
2) Chris Carpenter, STL
3) Adam Wainwright, STL
4) Dan Haren, ARI
5) Javier Vazquez, ATL

Tuesday, November 03, 2009

Statistical Meanderings 2009

What follows is a disjointed collection of observations and thoughts, largely spurred by perusing the end of season statistical reports published here.

* The American League outscored the National League 4.82 to 4.43 runs per game this season. The gap of .39 was the largest since 1998 (.41, 5.01 to 4.60). The AL had a higher BA (.267 to .259), a slight lower walk rate (.099 walk:at bat ratio versus .102), and higher isolated power (.161 to .150).

* I track two different winning percentage estimators, both of which utilize Pythagenpat but with different inputs. Expected W% is based on actual runs scored and allowed, while Predicted W% is based on runs created and runs created allowed (actually Base Runs, but you get the idea). I always like to point out teams with very similar figures in all three categories as well as those with divergent

Teams that are close across the board include Colorado (.568, .556, .561), Texas (.537, .528, .527), and both Chicagos (.516, .524, .522 for the Cubs and .488, .495, .497 for the White Sox). Teams with some notable variations include the Angels (.599, .572, .524), Blue Jays (.463, .517, .514), and Diamondbacks (.432, .461, .495).

An interesting group of teams that may tend to be underrated next year by those who simply look at the so-called Johnson effect are those whose PW% match their W% more closely than their EW% does. These are teams that won more games than their R/RA would suggest, but whose R/RA was weaker than their RC/RCA would suggest. David Cameron noted this in his discussion of the Mariners, and they fit the bill (.525, .464, .490) as do San Diego (.463, .413, .440) and the Yankees (.630, .595, .628).

Cameron discusses this effect in terms of summed WAR for the members of a team; since WAR is based on RC, at least for batters, the results should be similar. However, I think it is a clumsy way of looking at things--it is much more direct to just apply your run estimator directly to the team totals and plug those results into your win estimator. If you want to talk about individual players' contributions, then obviously it makes sense to bring WAR into the discussion.

* Three teams had over ten runs per game scored in their games. I have to admit, I wouldn't have guessed one of them if I had twenty tries, and it would have taken me multiple guesses to come up with another. The Yankees would be one of the firs teams most people would guess, I imagine, but the Indians and Angels are a little tougher.

On the flip side of that, you can probably guess in short order that San Francisco had the lowest run context of any team (just 7.83 RPG). They were fifth to last in MLB in park adjusted R/G (and only .06 ahead of the last place team) and first in park-adjusted RA/G, so it's no surprise that the combination lapped the field (the next lowest RPG was Seattle, 8.22). No team had been under 8 RPG since the 2005 Astros (7.99) and no team had been below 7.83 since the 2003 Dodgers (6.98!)

When I posted this factoid on Twitter, Tommy Bennett asked about how the Dodgers would come out park-adjusted (SF this year had a 100 PF by my estimate). The LA PF in 2003 was 94, so the 6.98 is park adjusted to 7.43--still lower than the Giants, but it slashes half of the gap away.

* There was a lot of hoopla about the new Yankee Stadium being an offensive paradise and of CitiField being where home runs go to die, but the traditional park factor approaches just don't bare this out (I emphasize traditional as park factors, particularly for home runs, can be much improved by incorporating more advanced data than simple home run counts from 81 game sample sizes, and so I'm not asking you to forget what you've read on HitTracker, and of course you should know about the sample size issues inherent when working with one-year PFs).

NYA does have a high HR PF (107), but a neutral run PF (99). If this trend continues, Yankee Stadium will find itself in a group of parks that are unfairly labeled as hitter's paradises due to their higher HR factors, but which have much more muted effects on overall scoring. Since the easiest way to observe a park effect without data is home run frequency, these parks get a bad rap from the mainstream media and casual fans. Camden Yards (105 HR factor/100 runs), Great American (111/104), Enron (105/99), and Citizens Bank (109/103) seem to fit the bill. Other parks with a similar five-year split, including SkyDome (106/100) and Comiskey (112/103) don't seem to get the same treatment, although my perception could certainly be off.

Meanwhile, there were actually more homers hit in Mets home games (1.60) than in road games (1.52). Take it for what it's worth, and don't discard more detailed and relevant data.

One thing you will note in looking at the park factors is that there are few parks that come out as extreme in favor of pitchers. No park has a PF less than 97 except for Petco, which stands alone at 91, which is about as low as you'll ever see. In fact, it matches the lowest in my 1901-2006 spreadsheet, tied with Braves Field (1936), County Stadium (1959), and Dodger Stadium (1966).

* Here are the runs above average for each playoff team's offense and defense (crudely based on runs scored/allowed per game versus the league average, park-adjusted):



You can see that three teams displayed significantly stronger offense than defense (NYA, LAA, PHI); three were fairly balanced (BOS, MIN, LA); and two displayed significantly stronger defense (COL, STL). Both pennant winners were drawn from the stronger offense group.

This observation is not intended to trumpet offense over defense, but simply to poke holes in a conventional wisdom that should already be dead.

* I try to avoid writing too much about Cleveland, but I am a fan so it happens from time to time. When I heard that the Indians had Tomo Ohka (I don't recall if I learned this during spring training or when he was recalled), I thought "He's still around?" Then he proceeded to allow a .257 %H, which allowed his RA to hover right around 6. And yet at times, reason aside and just going by feelings, I actually felt good when he was on the mound. It says a lot about the Tribe's campaign, at least from a fan's emotional perspective.

* No NL reliever had an eye-popping season--the top eight finishers in RAR are mostly journeyman and middle relievers, with Ryan Franklin the exception to the middle relief trend but not to the journeymen trend. Heath Bell at #9 is the first stereotypical power closer, but this was his first season in that role.

* Brad Lidge ranked last among NL relievers in RAR (-17); last year he was third (+21).

* I list a zillion run averages for relief pitchers; it's overkill, but there's no reason not to fill up the page. Rafael Soriano was about as consistent across the board in those categories as one can be: 3.00 RA, 2.82 RRA, 3.00 ERA, 3.07 eRA, 3.02 dRA.

* If there's anyone who should feel fortunate about the myriad of problems encountered by the Mets, it should be Francisco Rodriguez. Rodriguez' 2009 performance was lost in the avalanche of injuries and despair, but it was not impressive--in fact, without a (deserved) allowance for his work with inherited runners, his RA was higher than the league average (for all pitchers, not just relievers). He was 35/42 in save situations, which is not terrible but nothing to write home about, and his WPA was -.45. A performance like that coupled with a Mets team in contention would have been a made-to-order storyline.

* David Hernandez easily had the worst dRA among AL starters--7.17, with the next highest belonging to Trevor Cahill (6.05).

* Zack Greinke's +91 RAR is the best in the majors in several years. I have my own spreadsheets going back to 2003, and it is the highest in that time. I didn't do a thorough check of 2002, but I'm pretty sure it's the highest RAR since Randy Johnson, 2001 (+92). This was not an ordinary, run-of-the-mill Cy Young type season; it was a top-season-of-the-decade contender type season.

* Here is a list of combined RAR for each team's top two starting pitchers (only teams with two +30 pitchers included, and only those that spent a full season with the team):



No real point here; you already knew Carpenter/Wainwright and Lincecum/Cain were really good. If I wanted to push it, I'd talk about how little the top teams in this regard accomplished in the playoffs...

* Before the season I picked Ubaldo Jimenez to win the NL Cy Young. I don't take those awards predictions very seriously, and I don't expect readers to either; I picked Ubaldo because 1) I genuinely thought he was sitting on a big year and 2) he pitched in the WBC, and I wanted to thumb my nose at the "WBC ruins pitchers hypothesis"--which, incidentally, I didn't hear nearly as much of this year as in 2006. Jimenez didn't crack my top five for the Cy Young, but he was one of the top ten pitchers in the NL this year, and I really enjoy watching him pitch.

* Ricky Nolasco had a strange season; you don't need me to tell you this, but I'll do it anyway. His peripherals were strong: 3.88 dRA and 9.5 KG, but unfortunately that big K performance against Atlanta in the last week will severely damage his sleeper prospects for 2010.

* Livan Hernandez had his typical innings-eating, flirting with replacement level type season. Yet he managed to toss 58% quality starts. The NL average was 50%, and only two other pitchers (Nolasco and Derek Lowe) were better than league average with RAs over 5.

* Nick Swisher drew 97 walks, second in the AL; his W/AB ratio was .195, first in the AL; his .250 ISO was eighth in the AL; and thus his .447 SEC was second in the AL. This all makes me very proud.

* As I mention in my MVP ballot post, this was another down year for AL position players, with the obvious exception of Joe Mauer. Of course lots of players had good years, but there were just three players over 60 RAR, compared to six in 2006, eight in 2006, and five in 2005.

* I am not one who is generally in the habit of urging athletes to retire. As long as someone is willing to employ them, and they want to do it, what harm is it to me? All the hand-wringing about "legacy" fails to impress me, as I'm aware of very few examples of players whose images have been permanently tarnished by late career ineffectiveness. Most of the old athletes with tarnished legacies do it through themselves through their post-career off-field lives (see OJ and Pete Rose), not because of hanging on too long at the end.

With that being said, it really does seem to be time for Ken Griffey to give it up. His 4.8 RG was average, but then you consider that he's a DH, and he really was not very far above replacement level. Last year was about the same when you factor in his dreadful fielding performance. I don't find it depressing or anything, but that level of performance (particularly with a $2 MM pricetag) is not helpful.

* There was much wailing and nashing of teeth among the talk radio type of Indian fans when Ryan Garko was dealt. A first baseman with a HRAA of zero, who was -2 in 2008 and +11 in 2007.

* The NL continues to have the upper hand at first base versus the AL. NL first baseman ranked first, third, fourth, tenth, eleventh, twelfth, fourteenth, and nineteenth in the league in RAR (four of the top ten and eight of the top twenty). AL first baseman managed fifth, seventh, and eighteenth, and only one DH chipped in (thirteenth).

There is nothing in the way I figure RAR that discriminates against AL first baseman. The NL first baseman have simply produced more runs over the last few seasons than their AL counterparts.

* David Ortiz managed 5.1 RG and +14 RAR this year; Travis Hafner was at 5.9 and +16. Just three years ago, those two ran two-three in the AL, each over 70 RAR. As career DHs with big contracts in their early-to-mid thirties (and fun nicknames that start with p), they make an obvious pairing. Hafner hit better this season, but Ortiz was better in 2008 and Hafner's shoulder is a recurring issue. I wouldn't want either of their contracts, but I think I'd rather have Ortiz going forward on the field--but it's close.

* Mark Reynolds shattered his own strikeout record with 223. I have to believe that this is pretty close to the upper limit on this record, at least for the time being. In saying so, I realize full well that I may look like a moron by this next year. It is easy to go through the archives of baseball punditry and find statements that something will never happen again, only to have it happen in short order. Personally, I find the ever-present "Will X be the last pitcher to win 300 games?" articles insufferable.

But when you top the previous record by nineteen, while having a 40 HR season, I don't see a lot of room for record extension. Even with 44 homers, Reynolds "only" created 105 runs; only Jay Bruce had a higher HR:RC ratio among NL players. His batting average when he made contact was .423, which led the NL; his slugging average was .885 (Ryan Howard was next at .815). That level of production is probably not sustainable, and if it falls, he'd probably lose some playing time.

I am not saying that I think Reynolds is going to crash and burn--I wouldn't expect him to replicate 2009, but I don't think he's going to fall off a cliff. I just don't think he'll continue to strikeout at the same rate and still play full-time. As Bill James pointed out in one of his Gold Mines, the trend with the 200 strikeout barrier has been for young hitters to challenge it in some of their first few seasons, then improve/refine their approach and stop striking out so much. Perhaps Reynolds is an anomaly. Time will tell. (Quick: count the clichés in this post!)

* It's tough to pass up opportunities to poke fun at the Reds, and in that regard Willy Taveras' season was too good to be true. His .267 OBA was second worst in the NL and he was the only NL player to slug under .300 (.279). His .087 secondary average was easily the worst in baseball--Cesar Izturis was next at .119. It was tough to imagine the Reds failing to upgrade their center field situation, but Patterson had turned in -25 RAA/-10 RAR...Taveras -24/-9.

* Are we going to have to start a "Free Chris Iannetta" movement? Iannetta may have hit just .220, but with a .364 SEC he still created 4.8 runs per game. This came on the heels of a 6 RG season, and he's just 26. Admittedly, he struggled in July and August, but is that really a good reason to bench him for Yorvit Torreabla?

* Kansas City boasted four of the bottom thirteen AL hitters in terms of RAR (all four had <= 0 RAR). These four combined for 1,730 PA, creating 172 runs whilst making 1,231 outs. They had a combined RG of 3.6, -76 RAA, and -9 RAR.

In fairness, that includes Yuniesky Betancourt's performance in Seattle--the Royals themselves "only" invested 1,496 PA between the four. The other three were Willie Bloomquist, Jose Guillen, and Mike Jacobs. What is really sad about this is that all of them were recent acquisitions from outside the organization: Betancourt in a mid-season trade, Bloomquist and Guillen as free agents, and Jacobs in an off-season trade. Their 2009 salaries totaled nearly $19M. Good work, Dayton.

Sunday, October 25, 2009

Disjointed Ramblings on the Indians' Managerial Vacancy

NOTE: I wrote this on Thursday and didn't expect the Indians to hire Acta over the weekend.

While the Indians have been searching for their next manager, it has been amusing to observe the reaction of non-analytical fans on message boards and talk radio. There are a large number of people who are furious at the prospect of Manny Acta becoming manager.

Let me digress for a moment by saying that I hope he gets the job. From everything I've read and heard from him, his outlook on the game is one that I can relate to. He says the right things about being open to analytics and his managing seems to reflect that. His bullpen usage seems to this distant observer to fall into the over-managing category, but I have to question how much of that was conviction and how much of that was trying to squeeze every possible advantage out of a bunch of lemons. In any event, I'm thoroughly unconcerned about his win-loss record in Washington, a franchise that was a basket case before he got there and maybe now with a new GM can finally right itself. (Acta bonus fact: He's the David Aardsma or Hank Aaron of big league managers--first all-time alphabetically.)

I say all of that, but if you asked me whether it was more likely, should Acta become Tribe skipper, that he would be considered a success or a failure when his tenure was over, I wouldn't hesitate: failure. It's a cliché, but it's a cliché with a lot of truth: managers are hired to be fired. Most of them get three or four years to turn around a team that was usually already in some sort of distress (or else they wouldn't have been in the market for a new manager at all) and fail to do so, often through no fault of their own.

I don't want to make it sound as if I think managers are unimportant--I certainly think they are less important than a lot of non-analytical observers believe they are, but I also am much more concerned about the identity of the GM and whether anyone can hit, pitch, and field. I do believe, however, that most of what really separates managers from one another are factors that we as outsiders cannot judge with any sort of accuracy--discipline, motivation, the makeup of their coaching staff, how well they interface with the GM, and the like. Those things may not turn the Royals into World Series contenders, but I believe they matter more than the usually small tactical differences between managers (there are exceptions of course, many of whom do not need to be named).

The amusing part is the ways that fans attempt to evaluate managers. The following is an incomplete listing of some of the criteria I see fans using:

1. Tactics: Of course, this is where your baseball worldview really comes into play. One man's genius is another man's moron on the tactical scale. While sabermetrics certainly has some insight to offer on this front, it's not as if you can just plug some variables into a formula and get a strategic rating.

2. Past success: Fans like it better when the prospective manager has won something. However...

3. Freshness: Other fans don't want a "retread" manager. Of course, there is no definition of what constitutes a retread versus a Proven Veteran (TM) manager. Bobby Valentine managed parts of fifteen seasons, compiling a .510 W%, two playoff appearances, and a pennant. Does that make him a proven winner, a proven mediocrity, a winner, a loser, or something else? Does his tenure in Japan count for anything?

4. Media image

These criteria often result in a bewildering mix of contradictory preferences. With the Phillies winning another pennant, there are now Tribe fans bemoaning that Charlie Manuel was once our manager. But how many of these folks were upset that he was fired? How many of them believed that he was a country bumpkin? How many of them really, honestly believe that he would have led the Indians to victory with the same players Eric Wedge was given, or that Wedge would have flopped with Chase Utley and Jimmy Rollins on his team?

My opinion of Charlie Manuel today is the same as it was the day he was fired by Cleveland: Nice guy. Presumably knows a lot about hitting. Makes a lot of inexplicable decisions while managing.

Since I think it's a pretty decent bet that Eric Wedge will be a manager again, I can't wait to see what will happen if he ever leads a team to a pennant. Near the end of his tenure, it was hard to find many Indian fans who had anything positive at all to say about the man (other than perhaps that he had class). I've written some tepid pro-Wedge stuff over the past year and only because no one reads this blog was I able to avoid being labeled as an apologist. Should he win, he will join Manuel as a tool with which to attack the organization--rather than as the cautionary tale about judging a manager on his record in one stop.

Anyway, to sum up my position:

1. Managers matter, but not as much as the average fan thinks they do.
2. Much of what distinguishes managers from one another is almost unknowable to outsiders.
3. I prefer a manager who is open to analysis and/or independently came to a similar view of baseball as the one I possess.
4. It's silly to think that because a manager didn't win during one job, he'll never win in another.
5. It's more likely than Manny Acta will be unceremoniously fired than that he will lead the Indians to a World Series. That doesn't mean he's a bad hire--I'd say that about anyone stepping into this position.

To really beat the dead horse that is the fourth point, try a thought experiment. Right down the names of 5-10 current managers that you think you'd like to have managing your team. It's a pretty decent bet that a lot of your picks have been fired at some point.

Suppose you'd chosen the eight managers who managed in the postseason this year:

Ron Gardenhire, MIN--first managerial position
Joe Girardi, NYA--fired by Florida, although not really for on-field performance
Mike Scioscia, LAA--first managerial position
Terry Francona, BOS--fired by PHI (285-363, .440)
Tony LaRussa, STL--fired by CHA (522-510, .506)
Joe Torre, LA--fired by ATL, NYN, STL (894-1003, .471), not extended by NYA
Charlie Manuel, PHI--fired/not extended by CLE (220-190, .537)
Jim Tracy, COL--fired by LA and PIT (562-572, .496)

Tuesday, October 20, 2009

IBA Ballot: Rookie of the Year

Disclaimer: Presented below is my ballot (and some justification) for one of the categories in the Internet Baseball Awards hosted at Baseball Prospectus. I’m just one person, and the whole point of having a vote like the IBA is to get a wide variety of (intelligent) perspectives, and so I will not feel in the list bit slighted if you don’t give a flip about this. You've been warned. Also, the RAA and RAR figures that will be cited are my own estimates, detailed here. Any Leverage Index, WPA, or UZR figures cited are from FanGraphs; any quality of opposition or baserunning figures are from Baseball Prospectus.

In the American League, there aren't many viable position player candidates for the award. The top ranking rookie in RAR is Nolan Reimold, but at +21 he doesn't rank very highly, and his fielding knocks him out of contention. Elvis Andrus moves up to +30 if UZR is taken at face value, which gets him on the ballot but not at the top. Gordon Beckham is second among position players at around +20. Had he played more, he would definitely be a contender.

That leaves the pitchers. Jeff Niemann, Rick Porcello, Ricky Romero, and Brad Bergesen all are at +30 or better. All of them have less impressive peripherals than actual run averages, which leaves me without a lot of justification for changing the RAR ordering listed above (Certainly the differences are small enough that you could justify making changes, so I don't mean to imply that you must rank them in that order. I just don't see anything that makes me want to do so.)

Some people may give Porcello a boost due to his age, but I do not take age into account when looking at Rookie of the Year. If it was an award for best future potential, then I would be looking at a very different set of candidates.

One name missing from my ballot that you'll surely see on many others is Brett Anderson. If you are of the school of thought that pitcher seasonal awards should be determined by DIPS school metrics, then Anderson is the best of the lot. If you don't, you'll note that his RA is nearly half a run higher than that of Romero, who has the highest of the four pitcher cohort defined above.

None of those players are my selection for the top spot, though. It is usually very difficult for a relief pitcher to crack one of my Cy Young ballots, and you can just about forget about MVP support entirely. However, I think that I tend to support more relievers for ROY than the mainstream media does. I believe this is because of my philosophy that the ROY should be for the rookie who provides the most value and that age is not a factor. Often rookie relievers are on the old side (Brad Ziegler last year), and even if they are young it would usually be a stretch to project them as having greater future value than a starter or a position player. Some people also may tend to write off the extra value relievers generate by pitching in high leverage situations in a ROY discussion because htey are looking for the best performance. Since I don't consider age and do consider leverage, I've reserved spots on my ballots for Ziegler (2008), Okajima (2007), Papelbon, Zumaya, and Saito (2006), and Street and Majewski (2005).

In that spirit, my choice this year is Andrew Bailey. Bailey was +31 RAR with strong peripherals (2.09 eRA and 3.09 dRA), and served as Oakland's closer for most of the season, going 26/30 in save opportunities and recording a 1.4 LI. While his LI was lower than that of many bullpen aces, it's still enough for me to edge him ahead of Niemann and into the top spot on the ballot. This is how I see it:

1) RP Andrew Bailey, OAK
2) SP Jeff Niemann, TB
3) SP Rick Porcello, DET
4) SS Elvis Andrus, TEX
5) SP Brad Bergesen, BAL

Moving on to the Neanderthal League (I promise I won't use that again when I discuss the Cy Young and MVP), I reluctantly have to support JA Happ for the award. Happ, at +49 RAR, has a ten run lead over any other NL rookie. So why am I reluctant about my choice? Simply, Happ's eRA is a full run higher than his RA and his dRA is another .7 runs higher still. I suspect that many sabermetriclly-inclined folks will dock him significantly for this, but I am always very cautious of doing so. Differences between RA and DIPS can be either due to "luck" (I'm using that word as a catch-all, not literally) or superior fielding support. If it's the latter, then I'm all for adjusting it away. If it's the former, then I'm not--and it is often a little of both.

Had Tommy Hanson been in the majors longer, he may have made it all a moot point, as he was outstanding over 128 innings of work, good enough to edge Randy Wells as the second most impressive rookie starter.

Among position players, Chris Coghlan seems to be getting some mainstream support, but he doesn't crack my ballot. Yes, he was +37 RAR, tying him with Andrew McCutchen, but his -10 UZR puts a big dent in that. Comparing Coghlan to McCutchen offensively, Coghlan hit .323 to McCutchen's .288, but McCutchen's edge in secondary average was .311 to .246. Put it all together, and they each created 6.3 runs per game. Coghlan played more, but he was a bad left fielder and McCutchen was an average center fielder according to UZR. It's not a particularly tough call for me.

Comparing McCutchen to his out of nowhere teammate Garrett Jones, Jones was certainly a better hitter on a rate basis, but he played in 26 less games, came to the plate 130 less times, and had a -6 UZR in right.

So I have it:

1) SP JA Happ, PHI
2) CF Andrew McCutchen, PIT
3) SP Tommy Hanson, ATL
4) SP Randy Wells, CHN
5) RF Garrett Jones, PIT

Thursday, October 08, 2009

End of Season Statistics, 2009

Note: This is largely the same explanation as for the last two years.

For the past several years I have been posting Excel spreadsheets with sabermetric stats like RC for regular players on my website. I have not been doing this because I think it is a unique thing that nobody else does--Hardball Times, Baseball Prospectus, and other sites have similar data available. However, since I figure my own stats for myself anyway, I figured I might as well post it on the net.

This year, I am not putting out Excel spreadsheets, but I will have Google Spreadsheets that I will link to from this blog.  If you would prefer an Excel copy of the spreadsheets, all you have to do is change the end of the link from "=html" to "=xls".  What I wanted to do here is a quick run down of the methodology used. These will be added as they are completed; as I post this, there are none, but by the end of the week they should start popping up.

First, I should acknowledge that the primary data source is Doug’s Stats, and that park data for past seasons comes from KJOK’s park database. Baseball-Reference.com and ESPN.com round out the sources.

The general philosophy of these stats is to do what is easiest while not being too imprecise, unless you can do something just a little bit more complex and be more precise. Or at least it used to be. Then I decided to put my money where my mouth was on the matter of Base Runs for pitchers and teams and Pythagenpat. On the other hand, using ERP as the run estimator is not optimal--I could, in lieu of having empirical linear weights for 2009, use Base Runs or another approach to generate custom linear weights. I have decided that does not constitute a worthwhile improvement. Others might disagree, and that’s alright. I’m not claiming that any of these numbers are the state of the art or cannot be improved upon.

First, the team report. I list Park Factor (PF), Winning %, Expected Winning % (EW%), Predicted Winning % (PW%), Wins, Losses, Runs, Runs Allowed, Runs Created (RC), Runs Created Allowed (RCA), Runs/Game (R/G), Runs Allowed/Game (RA/G), Runs Created per Game (RCG), and Runs Created Allowed per Game (RCAG):

EW% is based on runs and runs allowed in Pythagenpat, with the exponent = RPG^.29. PW% is based on runs created and runs created allowed in Pythagenpat.

Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. For the offense, the formula is:
A = H + W - HR - CS
B = (2TB - H - 4HR + .05W + 1.5SB)*.76
C = AB - H
D = HR
For the defense:
A = H + W - HR
B = (2TB - H - 4HR + .05W)*.78
C = AB - H (approximated as IP*2.82, or whatever the league (AB-H)/IP average is)
D = HR
Of course, these are both put together, like all BsR, as A*B/(B + C) + D. The only difference between the formulas is that I include SB and CS for the offense, but don’t want to waste time scrounging up stolen bases allowed for the defense.

R/G, RA/G, RCG, and RCAG are all calculated straightforwardly by dividing by games, then park adjusted by dividing by park factor. Ideally, I would use outs as the denominator, but for teams, outs and games are so closely related that I don’t think it’s worth the extra effort.

Next, we have park factors. I have explained the methodology used to figure the PFs before, but the cliff’s notes version is that they are based on five years of data when applicable, include both runs scored and allowed, and they are regressed towards average (PF = 1), with the amount of regression varying based on the number of years of data used. There are factors for both runs and home runs. The initial PF (unshown) is:
iPF = (H*T/(R*(T - 1) + H) + 1)/2
where H = RPG in home games, R = RPG in road games, T = # teams in league (14 for AL and 16 for NL). Then the iPF is converted to the PF by taking 1- (1-iPF)*x, where x = .6 if one year of data is used, .7 for 2, .8 for 3, and .9 for 4+.

It is important to note, since there always seems to be confusion about this, that these park factors already incorporate the fact that the average player plays 50% on the road and 50% at home. That is what the adding one and dividing by 2 in the iPF is all about. So if I list Fenway Park with a 1.02 PF, that means that it actually increases RPG by 4%.

In the calculation of the PFs, I did not get picky and take out “home” games that were actually at neutral sites, like the Astros/Cubs series that was moved to Milwaukee in 2008.

I also offer a league report, for which some explanation is necessary. The advent of interleague play has created two different sets of league totals--one for the offense of league teams and one for the defense of league teams. Before interleague play, these two were identical. I do not present both sets of totals (you can figure the defensive ones yourself from the team spreadsheet, if you desire), just those for the offenses. The exception is for the defense-specific statistics, like innings pitched and quality starts. The figures for those categories in the league report are for the defenses of the league's teams.

Next is the relief pitchers report. I defined a starting pitcher as one with 15 or more starts. All other pitchers are eligible to be included as a reliever. If a pitcher has 40 appearances, then they are included. Additionally, if a pitcher has 50 innings and less than 50% of his appearances are starts, he is also included as a reliever (this allows some swingmen type pitchers who wouldn’t meet either the minimum start or appearance standards to get in).

For all of the player reports, ages are based on simply subtracting their year of birth from 2009. I realize that this is not compatible with how ages are usually listed and so “Age 27” doesn’t necessarily correspond to age 27 as I list it, but it makes everything a heckuva lot easier, and I am more interested in comparing the ages of the players to their contemporaries, for which case it makes very little difference.

Anyway, for relievers, the statistical categories are Games, Innings Pitched, Run Average (RA), Relief Run Average (RRA), Earned Run Average (ERA), Estimated Run Average (eRA), DIPS-style estimated Run Average (dRA), Guess-Future (G-F), Strike Zone ERA (szERA), Inherited Runners per Game (IR/G), Inherited Runs Saved (IRSV), hits per ball in play (%H), Runs Above Average (RAA), and Runs Above Replacement (RAR).

All of the run averages are park adjusted with the exception of szERA. RA is R*9/IP, and you know ERA. Relief Run Average subtracts IRSV from runs allowed, and thus is (R - IRSV)*9/IP; it was published in By the Numbers by Sky Andrecheck. eRA, dRA, %H, and RAA will be explained in the starters section.

Guess-Future is a JUNK STAT. G-F is A JUNK STAT. I just wanted to make that clear so that no anonymous commentator posts that without any explanation. It is just something that I have used for some time that combines eRA and strikeout rate into a unitless number. As a rule of thumb, anything under 4 is pretty good. I include it not because I think it is meaningful, but because it is a number that I have been looking at for some time and still like to, despite the fact that it is a JUNK STAT. JUNK STATS can be fun as long as you recognize them for what they are. G-F = 4.46 + .095(eRA) - .113(KG), where KG is strikeouts per 9 innings. JUNK STAT JUNK STAT JUNK STAT JUNK STAT JUNK STAT

Inherited Runners per Game is per relief appearance (G - GS); it is an interesting thing to look at, I think, in lieu of actual leverage data. You can see which closers come in with runners on base, and which are used nearly exclusively to start innings. Of course, you can’t infer too much; there are bad relievers who come in with a lot of people on base, not because they are being used in high leverage situations, but because they are long men or what have you. I think it’s mildly interesting, so I include it.

Inherited Runs Saved is the difference between the number of inherited runs the reliever allowed to score, subtracted from the number of inherited runs an average reliever would have allowed to score, given the same number of inherited runners. I do not park adjust this figure. Of course, the way I am doing it is without regard to which base the runners were on, which of course is a very important thing to know. Obviously, with a lot of these reliever measures are superfluous if you have access to WPA and LI data and the like.

IRSV = Inherited Runners*League % Stranded - Inherited Runs Scored

Runs Above Replacement is a comparison of the pitcher to a replacement level reliever, which is assumed to be a .450 pitcher, or as I would prefer to say, one who allows runs at 111% of the league average. So the formula is (1.11*N - RRA)*IP/9, where N is league runs/game. Runs Above Average is simply (N - RRA)*IP/9. Note that RAR compares the reliever to a replacement-level pitcher, while RAA compares him to an average pitcher regardless of role, not to an average relief pitcher.

On to the starting pitchers. The categories are Wins, Losses, Innings Pitched, Run Average, ERA, eRA, dRA, KG, szERA, G-F, %H, Neutral W% (NW%), Quality Start% (QS%), RAA, and RAR.

The run averages (RA, ERA, eRA, dRA) are all park-adjusted except for szERA, simply by dividing by park factor.

eRA is figured by plugging the pitcher’s stats into the Base Runs formula above (the one not including SB and CS that is used for estimating team runs allowed), multiplying the estimated runs by nine and dividing by innings.

dRA is a DIPS method (which of course means that Voros McCracken is the true developer), using Base Runs as the run estimator. This is overkill, since a DIPS estimator like FIP will work just fine, but I decided to use Base Runs wherever I could this year. To find, it first estimate PA as IP*x + H + W, where x = Lg(AB-H)/IP. Then, find %K (K/PA), %W (W/PA), %HR (HR/PA), and BIP% = 1- %K - %W - %HR. Next, find estimated %H (which I will just call %H for the sake of this explanation, but it is not the same as the %H displayed in the stats. That is the pitcher’s actual rate, (H-HR)/(estimated PA-W-K-HR)) as BIP%*Lg%H.


Then you use BsR to find the new estimated RA:

A = %H + %W
B = (2*(%H*Lg(TB-4*HR)/(H-HR) + 4*%HR) - %H - 5*%HR + .05*%W)*.78
C = 1 - %H - %W - %HR
D = %HR

dRA = (A*B/(B+C) + D)/C*25.2/PF

Yes, it's true that pitchers do have some control over their BABIP, and presenting a DIPS run average here is in no way intended to deny that fact. Even if there was no insight to DIPS whatsoever, though, I still think that a DIPS run average would be an interesting freak show statistic, as it only considers the three true outcomes. To restate my point, even if Voros' insight has no analytical utility (and I don't think anyone worth listening to has staked out such an extreme position), it would still be worth some kicks to ignore defense-influenced events.

szERA is a Tango Tiger creation which uses only the difference between strikeouts and walks per PA to estimate ERA. I have not used actual PA here but instead have estimated PA as (IP*x + H + W) as in dRA above, giving this formula for szERA (which is not park-adjusted):

szERA = 5.4 - 12*(K-W)/(IP*x + H + W)

Neutral Winning Percentage is the pitcher’s winning percentage adjusted for the quality of his team. It makes the assumption that all teams are perfectly balanced between offense and defense, and then projects what the pitcher’s W% would be on an average team. I do not place a lot of faith in anything based on wins and losses, of course, and particularly not for a one-year sample. In the long run, we would expect pitchers to pitch for fairly balanced teams and for run support for an individual to be approximately the same as for the pitching staff as a whole. For individual seasons, we know that things are not going to even out.

I used to use Run Support to compare a pitcher’s W% to what he would have been expected to earn, but now I have decided that is more trouble than it is worth. RS can be a pain to run down, and I don’t put a lot of stock in the resulting figures anyway. So why bother? NW% = W% - (Mate + .5)/2 + .25, where Mate is (Team Wins - Pitcher Wins)/(Team Decisions - Pitcher Decisions).

Likewise, I include Quality Start Percentage (which of course is just QS/GS) only because my data source (Doug’s Stats) includes them. As for RAA and RAR for starters, RAA = (N - RA)*IP/9, and RAR = (1.25*N - RA)*IP/9.

For hitters with 300 or more PA, I list Games (G), Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Runs Created (RC), Runs Created per Game (RG), Secondary Average (SEC), Speed Unit (SU), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), Hitting Runs Above Replacement (HRAR), and Runs Above Replacement (RAR).

I do not bother to include hit batters, so take note of that for players who do get plunked a lot. Therefore, PA are simply AB + W. Outs are AB - H + CS. BA and SLG you know, but remember that without HB and SF, OBA is just (H + W)/(AB + W). Secondary Average = (TB - H + W)/AB. I have not included net steals as many people (and Bill James himself) do--it is solely hitting events.

For the last two season, the park adjustment method I’ve used for BA, OBA, SLG, and SEC has been based on the same principle as the “Willie Davis method” introduced by Bill James in the New Historical Baseball Abstract. The idea is to deflate all of the positive offensive events by a constant percentage in order to make the new runs created estimate from those stats equal to the park adjusted runs created we get from the player’s actual stats. I based it on the run estimator (ERP) that I use here instead of RC.

However, this year I have decided that this is really not necessary. One can obtain similar results by just using the square root of park factor, and while the Willie Davis method is clever and elegant, it's still an approximation that has its accuracy constrained by the accuracy of the run estimator itself. The square root adjustment is much quicker and again, the results will be similar.

Next up is Runs Created, which as previously mentioned is actually Paul Johnson’s ERP. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available. Anyway, RC = (TB + .8H + W + .7SB - CS - .3AB)*.322.

RC is park adjusted by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the rate, is RC/O*25.5. I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).

Speed Unit is my own take on a “speed skill” estimator ala Speed Score. I AM NOT CLAIMING THAT IT IS BETTER THAN SPEED SCORE. I don’t use Speed Score because I always like to make up my own crap whenever possible (while of course recognizing that others did it first and better), because some of the categories aren’t readily available, and because I don’t want to mess with square roots. Anyway, it considers four categories: runs per time on base, stolen base percentage (using Bill James’ technique of adding 3 to the numerator and 7 to the denominator), stolen base frequency (steal attempts per time on base), and triples per ball in play. These are then converted to a pseudo Z-score in each category, and are on a 0-100 scale. I will not reprint the formula here, but I have written about it before here. I AM NOT CLAIMING THAT IT IS BETTER THAN SPEED SCORE. I AM NOT CLAIMING THAT IT IS AS GOOD AS SPEED SCORE.

There are a whopping four categories that compare to a baseline; two for average, two for replacement. Hitting RAA compares to a league average hitter; it is in the vein of Pete Palmer’s Batting Runs. RAA compares to an average hitter at the player’s primary position. Hitting RAR compares to a “replacement level” hitter; RAR compares to a replacement level hitter at the player’s primary position. The formulas are:

HRAA = (RG - N)*O/25.5
RAA = (RG - N*PADJ)*O/25.5
HRAR = (RG - .73*N)*O/25.5
RAR = (RG - .73*N*PADJ)*O/25.5

PADJ is the position adjustment, and it is based on 1992-2001 offensive data. For catchers it is .89; for 1B/DH, 1.19; for 2B, .93; for 3B, 1.01; for SS, .86; for LF/RF, 1.12; and for CF, 1.02.

How do I deal with players who split time between teams? I assign all of their statistics to the team with which they played more, even if this means it is across leagues. This is obviously the lazy way out; the optimal thing would be to look at the performance with the teams separately, and then sum them up.

You can stop reading now if you just want to know how the numbers were calculated. The rest of this post will be of a rambling nature and will discuss the underpinnings behind the choices I have made on matters like park adjustments, positional adjustments, run to win converters, and replacement levels.

First of all, the term “replacement level” is obnoxious, because everyone brings their preconceptions to the table about what that means, and people end up talking past each other. Unfortunately, that ship has sailed, and the term “replacement level” is not going away. Secondly, I am not really a believer in replacement level. I don’t deny that it is a valid concept, or that comparisons to replacement level can be useful for answering certain questions. I just don’t believe that replacement level is clearly the correct baseline. I also don’t believe that it’s clearly NOT the correct baseline, and since most sabermetricians use it, I go along with the crowd in this case.

The way that reads is probably too wishy-washy; I do think that it is PROBABLY the correct choice. There are few things in sabermetrics that I am 100% sure of, though, and this is certainly not one of them.

I have used distinct replacement levels for batters, starters, and relievers. For batters, it is 73% of the league RG, or since replacement levels are often discussed in these terms, a .350 W% (at least using a conventional Pythagorean exponent of two). For starters, I used 125% of the league RA or a .390 W%. For relievers, I used 111% of the league RA or a .450 W%. I am certainly not positive that any of these choices are “correct”. I do think that it is extremely important to use different replacement levels for starters and relievers; Tango Tiger's work on reliever replacement level convinced me of this (he actually uses .380, .380, .470 as his baselines). Relievers have a natural RA advantage over starters, and thus their replacements will as well.

Now, park adjustments. Since I am concerned about the player’s value last season, the proper type of PF to use is definitely one based on runs. Given that, there are still two paths you can go down. One is to park adjust the player’s statistics; the other is to park adjust the league or replacement statistics when you plug in to a RAA or RAR formula. I go with the first option, because it is more useful to have adjusted RC or adjusted RA, ERA, etc. than to only have the value stats adjusted. However, given a certain assumption about the run to win converter, the two approaches are equivalent.

Speaking of those RPW: David Smyth, in his Base Wins methodology, uses RPW = RPG. If the RPG is 9.4, then there are 9.4 runs per win. It is true that if you study marginal RPW for teams, the relationship is not linear. However, if you back up from the team and consider things in league context, one can make the case that the proper approach is the simple RPW = RPG.

Given that RPW = RPG, the two park factor approaches are equivalent. Suppose that we have a player in an extreme park (PF = 1.15, approximately like Coors Field) who has an 8 RG before adjusting for park, while making 350 outs in a 4.5 N league. The first method of park adjustment, the one I use, converts his value into a neutral park, so his RG is now 8/1.15 = 6.957. We can now compare him directly to the league average:

RAA = (6.957 - 4.5)*350/25.5 = +33.72

The second method would be to adjust the league context. If N = 4.5, then the average player in this park will create 4.5*1.15 = 5.175 runs. Now, to figure RAA, we can use the unadjusted RG of 8:

RAA = (8 - 5.175)*350/25.5 = +38.77

These are not the same, as you can obviously see. The reason for this is that they are in two different contexts. The first figure is in a 9 RPG (2*4.5) context; the second figure is in a 10.35 RPG (2*4.5*1.15) context. Runs have different values in different contexts; that is why we have RPW converters. If we convert to WAA (using RPW = RPG), then we have:

WAA = 33.72/9 = +3.75
WAA = 38.77/10.35 = +3.75

Once you convert to wins, the two approaches are equivalent. This is another advantage for the first approach: since after park adjusting, everyone in the league is in the same context, there is no need to convert to wins at all. Sure, you can convert to wins if you want. If you want to compare to performances from other seasons and other leagues, then you need to. But if all you want to do is compare Ryan Howard to Adrian Gonzalez to Joey Votto, there is no need to convert to wins. Personally, I think that stating something as +34 is a lot nicer than stating it as +3.8, if you can get away with it. None of this is to deny that wins are not the ultimate currency, but runs are directly related to wins, and so there is no difference in conclusion from using them if the RPW is the same for all players, which it is for a given league season coupled with park adjusting runs rather than context.

Finally, there is the matter of position adjustments. What I have done is apply an offensive positional adjustment to set a baseline for each player. A second baseman’s RAA will be figured by comparing his RG to 93% of the league average, while a third baseman’s will compare to 101%, etc. Replacement level is set at 73% of the estimated average for each position.

So what I am doing is comparing to a “replacement hitter at position”. As Tango Tiger has pointed out, there is really no such thing as a “replacement hitter” or a “replacement fielder”--there are just replacement players. Every player is chosen because his total value, both hitting and fielding, is sufficient to justify his inclusion on the team. Segmenting it into hitting and fielding replacements is not realistic and causes mass confusion.

That being said, using “replacement hitter at position” does not cause too many distortions. It is not theoretically correct, but it is practically powerful. For one thing, most players, even those at key defensive positions, are chosen first and foremost for their offense. Empirical work by Keith Woolner has shown that the replacement level hitting performance is about the same for every position, relative to the positional average.

The offensive positional adjustment makes the inherent assumption that the average player at each position is equally valuable. I think that this is close to being true, but it is not quite true. The ideal approach would be to use a defensive positional adjustment, since the real difference between a first baseman and a shortstop is their defensive value. When you bat, all runs count the same, whether you create them as a first baseman or as a shortstop.

Figuring what the defensive positional adjustment should be, though, is easier said than done. Therefore, I use the offensive positional adjustment. So if you want to criticize that choice, or criticize the numbers that result, be my guest. But do not claim that I am holding this up as the correct analytical structure. I am holding it up as the most simple and straightforward structure that conforms to reality reasonably well, and because while the numbers may be flawed, they are at least based on an objective formula. If you feel comfortable with some other assumptions, please feel free to ignore mine.

One other note here is that since the offensive PADJ is a proxy for average defensive value by position, ideally it would be applied by tying it to defensive playing time. I have done it by outs, though. For example, shortstops have a PADJ of .86. If we assume that an average full-time player makes 10% of his team’s outs (about 408 for a 162 game season with 25.5 O/G) and the league has a 4.75 N, the average shortstop is getting an adjustment of (1 - .86)*4.75/25.5*408 = +10.6 runs. However, I am distributing it based on player outs. If you have one shortstop who makes 350 outs and another who makes 425 outs, then the first player will be getting 9.1 runs while the second will be getting 11.1 runs, despite the fact that they may both be full-time players.

The reason I have taken this flawed path is because 1) it ties the position adjustment directly into the RAR formula rather then leaving it as something to subtract on the outside and more importantly 2) there’s no straightforward way to do it. The best would probably be to use defensive innings--set the full-time player to X defensive innings, figure how Derek Jeter’s innings compare to X, and adjust his PADJ accordingly. Games in the field or games played are dicey because they can cause distortion for defensive replacements. Plate Appearances avoid the problem that outs have of being highly related to player quality, but they still carry the illogic of basing it on offensive playing time. And of course the differences here are going to be fairly small (a few runs). That is not to say that this way is preferable, but it’s not horrible either, at least as far as I can tell.

Given the inherent assumption of the offensive PADJ that all positions are equally valuable, once we have a player’s RAR, we should account for his defensive value by adding on his runs above average relative to a player at his own position. If there is a shortstop out there who is -2 runs defensively versus an average shortstop, he is without a doubt a plus defensive player, and a more valuable defensive player than a first baseman who was +1 run better than an average first baseman. Regardless, since we have implicitly assumed that they are both average defensively for their position when RAR was calculated, the shortstop will see his value docked two runs. This DOES NOT MEAN that the shortstop has been penalized for his defense. The whole process of accounting for positional differences, going from hitting RAR to positional RAR, has benefited him.

It is with some misgivings that I publish “hitting RAR” at all, since I have already stated that there is no such thing as a replacement level hitter. It is useful to provide a low baseline total offensive evaluation that does not include position, though, and it can also be thought of as the theoretical value above replacement in a world in which nobody plays defense at all, yet players are still selected with defensive ability in mind. Imagine that you had to pick a team thinking you were going to play baseball as usual, but right before the game was about to start and your lineup was set, you found out that a third party was going to man the field for both teams I realize that scenario is contrived and absurd, but there is utility in having a measure that compares a player to a low baseline without bringing fielding into the mix.

The DH is a special case, and it caused a lot of confusion when my MVP post was linked at BTF once. Some of that confusion has to do with assuming that any runs above replacement methodology is the same as VORP from the Baseball Prospectus. Obviously there are similarities between my approach and VORP, but there also key differences. One key difference is that I use a better run estimator. Simple, humble old ERP is, in my opinion, a superior estimator to the complex MLV. I agree with almost all of the logic behind MLV--but using James’ Runs Created as the estimator to fuel it is putting lipstick on a pig (this is a much more exciting way of putting it in the 2008 context, don’t you think?).

The big difference, though, as it relates to the DH, is that VORP considers the DH to be a unique position, and I consider DHs as in the same pool as first baseman. The fact of the matter is that first baseman outhit DH. There is any number of potential explanations for this; DHs are often old or injured, hitting as a DH is harder than hitting as a position player, etc. Anyway, the exact procedure for VORP is propriety, but it is apparent that they use some sort of average DH production to set the DH replacement level. This makes the replacement level for a DH lower than the replacement level for a first baseman.

A couple of the aforementioned nimrods took the fact that VORP did this and assumed that my figures did as well. What I do is evaluate 1B and DH against the same replacement RG. This actually helps first baseman, since the DHs drag the average production of the pool down, thus resulting in a lower replacement level than I would get if I considered first baseman on their own. Contrary to what the chief nimrod thought, this is not “treating a 1B as a DH”. It is “treating a 1B as a 1B/DH offensively”.

It is true, however, that this method assumes that a 1B and a DH have equal defensive value. Obviously, a DH has no defensive value. What I advocate to correct this is to treat a DH as a bad defensive first baseman, and thus knock another five or ten runs off of his RAR for a full-time player. I do not incorporate this into the published numbers, but you should keep it in mind. However, there is no need to adjust the figures for first baseman upwards, despite what the nimrods might think--the only necessary adjustment is to take the DHs down a notch. The simple fact of the matter is that first baseman get higher RAR figures by being pooled with the DHs than they would otherwise.

Finally, I consider each player at his primary defensive position (defined as where he appears in the most games), and do not weight the PADJ by playing time. This does shortchange a player like Kevin Youkilis (who sees significant time at a tougher position than his primary position), and unduly boost a player like Victor Martinez (who logs a lot of games at a much easier position than his primary position). For most players, though, it doesn't matter much. I find it preferable to make manual adjustments for the unusual cases rather than add another layer of complexity to the whole endeavor.

2009 Park Factors

2009 Leagues

2009 Teams

2009 AL Relievers

2009 NL Relievers

2009 AL Starters

2009 NL Starters

2009 AL Hitters

2009 NL Hitters

Wednesday, October 07, 2009

Playoff Musings

For the sake of discussion, let's assume that we can estimate a team's true W% by taking 40% of their actual W%, 20% of their EW%, 20% of their PW%, and 20% of .500. Let's also assume that there is no quality difference between the NL and AL, no home field advantage, and that all games are independent of one another with the probability of a given outcome constant across games. This will allow us to estimate the quality of the playoff teams as follows:





We can couple this with the binomial distribution (since we assumed independence a constant W%) to figure the probability of each team winning the Division Series, and then the LCS and World Series (P(DS) is the probability that a team will win the Division Series, and so on):




Why am I doing this? It's obvious that while these estimates might be reasonable, they are could be improved fairly easily. We know there's a home field advantage, we could incorporate the actual pitching matchups, we could come up with a better combination of the various W%s--or better yet, we could look at projections for the team's actual personnel rather than using aggregate season W%s. So why bother, especially when you can find playoff odds reports elsewhere on the net that do take some of those factors into account?

As I said, they're a reasonable starting point without getting more involved. For the rest of the piece I will treat them as more than that, for the sake of discussion. More importantly, though, they illustrate what sabermetricians usually mean if they say something like "the playoffs are a crapshoot". The most lopsided first round matchup still yields a one-in-three chance for the underdog, and no team has a greater than 40% or less than 10% chance to win the pennant.

In fact, the source of a lot of the differences is the uneven first round matchups between NYA/MIN and LA/STL. Here are the probabilities of each team winning the World Series given that they make it out of the first round:




Here the probabilities range only from 16 to 36%. The Cardinals move ahead of the Phillies and Rockies; their first round matchup with the NL's top team drags down their chances, but if they get past LA, they are the strongest remaining team in the senior circuit.

Suppose that for some reason the Phillies' place in the playoffs (including seeding--obviously they would actually play the Dodgers and not the Rockies, but that's besides the point) was taken by the Nationals. What would the probabilities look like in that case? Washington was last in the majors with a crude strength estimate of .408--plugging that in produces these results:




Even the Nationals have a 1:4 chance of advancing to the LCS, benefiting from the five-game series. It gets tougher in the two seven game series, but they again have a roughly 1:5 chance of winning that series should they get to either one. Even looking at the playoffs as a whole, Washington has a 1% chance to win the World Series given these assumptions.

1 in 100 may not sound like a lot, but considering that they were the worst team in MLB, it's not that bad. Do you think the average mainstream media member would give them a 1% chance in that scenario? Do you think they'd say they had a 25% chance to beat Colorado in a division series?

I'd have to guess that, no, they wouldn't. I could be wrong, but it seems as if the public in general is far too confident in their ability to project the results of the postseason. If you can't state with more than 80% certainty that the Nationals wouldn't advance past the Dodgers, then what can you say with that kind of confidence?

This is all just a long way of saying that I avoid making predictions about the playoffs. I don't think that I'm smart enough to tell you anything with a high enough certainty level to even make it worthwhile. So instead I offer anti-analysis, something completely personal and not entirely rational--my rooting interests.

I am definitely rooting for the Yankees to win the World Series. First, they have Nick Swisher, who is my favorite player in the game. Second, I actually like George Steinbrenner, and I would love to see him win another championship. Third, I would love to see A-Rod silence the critics that have attempted to brand him as a choker (although even a superlative performance throughout the playoffs by A-Rod would not make that meme disappear completely). Fourth, when the Yankees win there is always a hue and cry from the crowd that constantly wrings its hands about competitive balance, and I really don't agree with their position at all and am amused by their lamentations. Cold and petty? You betcha.

The Red Sox also offer the competitive balance angle, as well as a front office that is easy to root for. I've never been an Angel fan in the past, but Bobby Abreu has always been one of my favorite players and Chone Figgins leading the league in walks doesn't hurt. I also have to admit to being a little partisan towards the AL due to the DH issue. Albert Pujols alone is enough to make the Cardinals likeable, and I have no real problems with the other NL clubs. So:

1. Yankees
2. Cardinals
3. Red Sox
4. Angels
5. Rockies
6. Phillies
7. Dodgers
8. Twins

What I'm really rooting for though is some competitive series and compelling games (although I don't know how likely it is that the drama of the AL Central playoff will be topped). There hasn't been a six-game World Series since 2003, so that would be a great place to start.

Tuesday, September 22, 2009

More Mundane Comments on the Playoff Structure

In the previous post I briefly mentioned my dislike of the five-game series format currently used in the Division Series and formerly used in the LCS. But what is the real difference between a five and seven game series? If we make some simple assumptions about team quality, how often will the better team win a series of X length? Common sense tells us that the longer the series, the more likely the better team will win, but let's attempt to quantify that. (Actually, it's not attempting, since given the assumptions that I will make, the answers are simple probability, and it's also tough to classify it as an "attempt" since many people have done it before).

First, let's start with the assumptions:

* each game result is independent of the other games in the series (this assumption is likely weaker for the post-season than for the regular season, as the series status has a great influence on how the manager approaches the game, particularly with regards to pitcher usage).

* there is no home field advantage

* the probability of a win for the teams is the same from game-to-game--we are not making any allowances for the aforementioned home field advantage, the identity of the starting pitcher, etc.

With these assumptions in place, we can use the binomial and geometric distributions and the principles behind them to crudely model series of X length. Throughout the rest of the piece, I will refer to "better" or "correct" outcomes. Please understand that I am using these terms in conjunction with the stated assumptions--we know the precise probability of each team winning, and therefore we absolutely know which team is better and ideally will win the series. Obviously, in real life situations we do not know with certainty which team is better. Which is the point--if a playoff format does a poor job of rewarding the better team when we are certain about its identity, it will be even less efficient at that task when we don't know which team is better.

First, let's look at the probability of a team winning the series, given that it is of X length. This can be done with the binomial distribution. For example, for a seven-game series, we simply add up the probability that a given team will win all seven games (even though they will not all be played), six out of seven, five out of seven, and four out of seven. This is the probability that they will win the series.

I will present the probabilities for each interval of .01 in W% between .51 and .65. I have limited the range because realistically in playoff series we will rarely see matchups in which one team is a heavy favorite over the other. The most unbalanced realistic playoff matchup would pit a .700 team against a .500 team, with an expected W% of .700. And that is assuming that the team's sample W%s are their true talent W%s, which would be unlikely for a .700 team. Again, these W%s are the expectations for a single game between the two teams.

I figured the probabilities for series ranging in length from one to fifteen games. I went up to fifteen games because fifteen games was the actual length of the World's Series in 1887, even if the series was not treated with the full championship reverence of today's World Series:



I bolded the 53% line because I'm going to use it as the "average" playoff series--I realize this table is tough to read with fifteen different scenarios. The explanation for why I chose that particular W% is explained below--it's not profound by any stretch (*).

One takeaway from this chart is how silly it is when folks talk about locks to win a playoff series. Even in a situation in which one team has a 65% chance to win each game (which is a big mismatch in the playoffs--a .500 team against a 105 win team or a 90 win team against a 112 win team using Log5), that team only has an 80% shot at winning a seven-game series. Even if you more than double the series length to fifteen games, there's still an 11.3% chance of an upset.

When sabermetrically-inclined people say that the playoffs are a crapshoot, this is the kind of thing they're generally talking about. It's not that you have no way of knowing which team is better or estimating the degree to which they are, it's just that even in a case where you have clear superiority, the short length of the series makes an upset quite feasible.

It was quite amusing during the Roy Halladay sweepstakes to hear commentators talk about how the Phillies were a lock to win the pennant if they got Halladay. Just like it was amusing to read about how the Cubs were going to march right through the weak NL to the pennant last year, or how the Tigers were going to trounce the Cardinals in the World Series. I wish I knew one-twentieth as much about baseball as those folks think they know.

Let's express that table in a more useful form by showing the marginal probabilities for each extension of series length. For example, the team that wins 51% of their games will win a one-game playoff 51% of the time. Expanding to a three game playoff will lead to them winning 51.5% of the time, an increase of .5%. If we expand to a five game playoff, they will win 51.9% of the time, an additional increase of .4%. This will enable us to see the benefit to lengthening series in terms of ensuring the better team wins:




As you can see, the added benefit starts diminishing quickly and for the normal range, essentially levels out after you make the move to seven games. Of course, these are the marginal outcomes, so longer series are still "better"...but less so with every additional pair of games.

Since the marginal benefit levels off after lengthening to seven games (for the nearly even matchups at least--the more lopsided matchups continue to show significant increases), it seems like as good of a point as any at which to stop.

Of course, I have approached this solely from the perspective of encouraging correct outcomes. This is not the goal of a league--if it was, there would be no need for any kind of playoffs at all. The league is going to act in a way so as to maximize its profits. Which is well and good, but I am examining this from the personal perspective of what I'd like to see and/or what will produce the best outcomes.

There is one thing that overlaps between my perspective and the economic interests of the owners, and that is the desire for a competitive series. Close series encourage higher ratings, and longer series means more ticket revenue. For a fan, there's nothing more exciting that a decisive game for the world championship after a hard fought series. While I have a strong preference for better outcomes, I can't completely suppress the desire for a winner-take-all finale.

So, given the underlying assumptions of this post, let's look at the probabilities of a decisive game, given a series of X length. This is done with the geometric distribution, and I have included the formula (**) because I think many fewer people are familiar with it than the binomial distribution--just speaking for myself, I know the binomial function by heart but have to look up the geometric function just to be safe:



A five-game series with a fairly normal matchup will produce a game five about 37% of the time; a seven-game series about 30% of the time. So for a roughly 1% increase in the likelihood of the better team winning, you give up decisive games in 7% of your series.

The next step, moving from a seven-game series to a nine-game series, would result in roughly the same increase in the likelihood of the better team winning while sacrificing another 4% of series without a grand finale.

All told, it shouldn't be too surprising that the probabilities here can be read to suggest that MLB has correctly identified the series lengths that provide the best combination of practicality, uncertainty of outcome, and producing desired outcomes. The extra benefit in terms of desired outcomes from expanding to longer series is relatively small, and is offset by a larger percentage drop in the expected proportion of series with decisive games.

Finally, let's take a look at the potential value of home field advantage in a five-game series. I previously looked at World Series HFA (i.e. seven-game series), and the same principles will apply here. I have not looked at the empirical data in this case and will only be discussing theoretical results.

First, we can use the geometric distribution to calculate the percentage of series that are expected to go X games, assuming that each game is a 50/50 proposition (in other words, not considering HFA):



Just as is the case for a seven-game series, the probability of a full-length series and one short of it are equal. This makes logical sense, of course; in order to create this situation the first three games must have produced a 2-1 series. There is a 50% chance that the team that has already won two wins, ending the series and a 50% chance that the team behind forces a decisive game.

Unlike a seven-game series, it is impossible for the team with on-paper home field advantage to play more road games than home games, as the format is 2-2-1 (Obviously, I'm talking about the current format; I'm aware that it was sometimes different in the past). Theoretically, on-paper home field advantage results in a true home field advantage 62.5% time, and the other 37.5% of the time there is no HFA for either team.

In order to add HFA into the mix, we need to identify all the possible series sequences, which I will not reproduce here. Suffice it to say that from the perspective of the winning team, there is one series sequence that produces a three-game series (WWW), three that produce a four-game series (WWLW, WLWW, and LWWW), and six that produce a five-game series.

I will assume a home field W% of .573, which is the empirical World Series statistic. I believe that the "true" parameter is likely lower, for reasons discussed in the earlier post, but I'll use the sample statistic for the sake of discussion. Retaining the assumptions of evenly matched teams and independent game outcomes, the probability of the team with on-paper HFA winning a five-game series is 52.66%, compared to a 52.31% chance in a seven-game series. So HFA is theoretically more important in a shorter series (no surprise, but we've estimated the degree).

It should also be noted that we would expect the empirical home field advantage in a five-game series to be even stronger because in those series, the on-paper advantage usually goes to the team with the better record. The same applies to LCS games, but not to the World Series as on-paper home field advantage is chosen without regard to the specific teams competing.

That's it, except for the asterisked digressions.

(*) There is no particularly compelling reason to use 53% as a default W%; I just wanted a line that you could focus on that was reasonably telling, because the whole table is a bit much.

Anyway, I chose 53% because in the World Series (for 1923-2008 with a few years excluded), empirically the mean W% of the team with the better record has been .635, and the team with the lesser record has a mean of .594. Regressing 30% to .500, this results in .595 and .566. Log5 tells us that a .595 team should beat a .566 team 53% of the time. And there you are.

(**) The geometric distribution gives the percentage of time a certain number of failures (x) occur before a certain number of successes (r) occur for a binomial process. In the case of a baseball series, r is the number of wins for the victor in a series (3 for a five-game series, 4 for a seven-game series, etc.), x is the number of wins for the series loser (in a five-game series with a decisive games, x = 2; for a seven-game series with a decisive game, x = 3). We also need to know the probability of a success (P), and calculate the number of combinations using the combination function C(x + r - 1, x).

To find the probability of a decisive game for the series as a whole (with either team winning), we need to do a calculation for each team, which is why the results are summed below--one for the winner and one for the loser. Let G be the number of games in a full length series, W the number of wins for the winning team in such a series, L the number of losses for the losing team in such a series, and P the probability of one of the team winning an individual game. Then the probability of a decisive game is:

C(G - 1, L)*P^W*(1 - P)^L + C(G - 1, L)*P^L*(1 - P)^W

For example, the probability of a seventh game in a series (G = 7) in which one team has a 55% chance of winning each game (P = .55) is (W is 4 and L is 3, of course):

C(6, 3)*(1-.55)^3*.55^4 + C(6,3)*.55^3*(1-.55)^3 = 30%

One other thing to note is the expected number of games in a series. This is found by taking summing G*P(G) for all possible series outcomes. So in a five game series, the expected number of games is:

3*P(3 games) + 4*P(4 games) + 5*P(5 games)

Reverting to the assumption that each game is a 50/50 proposition, the expected number of games in a five-game series is 4.125. The expected number of games in a seven-game series is 5.8125. You can see that there are diminishing returns going on; despite lengthening the possible length of the series by two games, our expectation is that the actual number of games will only increase by 1.6875 games.

The probability of a decisive game hints at this as well, but this is another way you could attempt to quantify the real observed benefit of lengthening a series.