Tuesday, November 03, 2009

Statistical Meanderings 2009

What follows is a disjointed collection of observations and thoughts, largely spurred by perusing the end of season statistical reports published here.

* The American League outscored the National League 4.82 to 4.43 runs per game this season. The gap of .39 was the largest since 1998 (.41, 5.01 to 4.60). The AL had a higher BA (.267 to .259), a slight lower walk rate (.099 walk:at bat ratio versus .102), and higher isolated power (.161 to .150).

* I track two different winning percentage estimators, both of which utilize Pythagenpat but with different inputs. Expected W% is based on actual runs scored and allowed, while Predicted W% is based on runs created and runs created allowed (actually Base Runs, but you get the idea). I always like to point out teams with very similar figures in all three categories as well as those with divergent

Teams that are close across the board include Colorado (.568, .556, .561), Texas (.537, .528, .527), and both Chicagos (.516, .524, .522 for the Cubs and .488, .495, .497 for the White Sox). Teams with some notable variations include the Angels (.599, .572, .524), Blue Jays (.463, .517, .514), and Diamondbacks (.432, .461, .495).

An interesting group of teams that may tend to be underrated next year by those who simply look at the so-called Johnson effect are those whose PW% match their W% more closely than their EW% does. These are teams that won more games than their R/RA would suggest, but whose R/RA was weaker than their RC/RCA would suggest. David Cameron noted this in his discussion of the Mariners, and they fit the bill (.525, .464, .490) as do San Diego (.463, .413, .440) and the Yankees (.630, .595, .628).

Cameron discusses this effect in terms of summed WAR for the members of a team; since WAR is based on RC, at least for batters, the results should be similar. However, I think it is a clumsy way of looking at things--it is much more direct to just apply your run estimator directly to the team totals and plug those results into your win estimator. If you want to talk about individual players' contributions, then obviously it makes sense to bring WAR into the discussion.

* Three teams had over ten runs per game scored in their games. I have to admit, I wouldn't have guessed one of them if I had twenty tries, and it would have taken me multiple guesses to come up with another. The Yankees would be one of the firs teams most people would guess, I imagine, but the Indians and Angels are a little tougher.

On the flip side of that, you can probably guess in short order that San Francisco had the lowest run context of any team (just 7.83 RPG). They were fifth to last in MLB in park adjusted R/G (and only .06 ahead of the last place team) and first in park-adjusted RA/G, so it's no surprise that the combination lapped the field (the next lowest RPG was Seattle, 8.22). No team had been under 8 RPG since the 2005 Astros (7.99) and no team had been below 7.83 since the 2003 Dodgers (6.98!)

When I posted this factoid on Twitter, Tommy Bennett asked about how the Dodgers would come out park-adjusted (SF this year had a 100 PF by my estimate). The LA PF in 2003 was 94, so the 6.98 is park adjusted to 7.43--still lower than the Giants, but it slashes half of the gap away.

* There was a lot of hoopla about the new Yankee Stadium being an offensive paradise and of CitiField being where home runs go to die, but the traditional park factor approaches just don't bare this out (I emphasize traditional as park factors, particularly for home runs, can be much improved by incorporating more advanced data than simple home run counts from 81 game sample sizes, and so I'm not asking you to forget what you've read on HitTracker, and of course you should know about the sample size issues inherent when working with one-year PFs).

NYA does have a high HR PF (107), but a neutral run PF (99). If this trend continues, Yankee Stadium will find itself in a group of parks that are unfairly labeled as hitter's paradises due to their higher HR factors, but which have much more muted effects on overall scoring. Since the easiest way to observe a park effect without data is home run frequency, these parks get a bad rap from the mainstream media and casual fans. Camden Yards (105 HR factor/100 runs), Great American (111/104), Enron (105/99), and Citizens Bank (109/103) seem to fit the bill. Other parks with a similar five-year split, including SkyDome (106/100) and Comiskey (112/103) don't seem to get the same treatment, although my perception could certainly be off.

Meanwhile, there were actually more homers hit in Mets home games (1.60) than in road games (1.52). Take it for what it's worth, and don't discard more detailed and relevant data.

One thing you will note in looking at the park factors is that there are few parks that come out as extreme in favor of pitchers. No park has a PF less than 97 except for Petco, which stands alone at 91, which is about as low as you'll ever see. In fact, it matches the lowest in my 1901-2006 spreadsheet, tied with Braves Field (1936), County Stadium (1959), and Dodger Stadium (1966).

* Here are the runs above average for each playoff team's offense and defense (crudely based on runs scored/allowed per game versus the league average, park-adjusted):

You can see that three teams displayed significantly stronger offense than defense (NYA, LAA, PHI); three were fairly balanced (BOS, MIN, LA); and two displayed significantly stronger defense (COL, STL). Both pennant winners were drawn from the stronger offense group.

This observation is not intended to trumpet offense over defense, but simply to poke holes in a conventional wisdom that should already be dead.

* I try to avoid writing too much about Cleveland, but I am a fan so it happens from time to time. When I heard that the Indians had Tomo Ohka (I don't recall if I learned this during spring training or when he was recalled), I thought "He's still around?" Then he proceeded to allow a .257 %H, which allowed his RA to hover right around 6. And yet at times, reason aside and just going by feelings, I actually felt good when he was on the mound. It says a lot about the Tribe's campaign, at least from a fan's emotional perspective.

* No NL reliever had an eye-popping season--the top eight finishers in RAR are mostly journeyman and middle relievers, with Ryan Franklin the exception to the middle relief trend but not to the journeymen trend. Heath Bell at #9 is the first stereotypical power closer, but this was his first season in that role.

* Brad Lidge ranked last among NL relievers in RAR (-17); last year he was third (+21).

* I list a zillion run averages for relief pitchers; it's overkill, but there's no reason not to fill up the page. Rafael Soriano was about as consistent across the board in those categories as one can be: 3.00 RA, 2.82 RRA, 3.00 ERA, 3.07 eRA, 3.02 dRA.

* If there's anyone who should feel fortunate about the myriad of problems encountered by the Mets, it should be Francisco Rodriguez. Rodriguez' 2009 performance was lost in the avalanche of injuries and despair, but it was not impressive--in fact, without a (deserved) allowance for his work with inherited runners, his RA was higher than the league average (for all pitchers, not just relievers). He was 35/42 in save situations, which is not terrible but nothing to write home about, and his WPA was -.45. A performance like that coupled with a Mets team in contention would have been a made-to-order storyline.

* David Hernandez easily had the worst dRA among AL starters--7.17, with the next highest belonging to Trevor Cahill (6.05).

* Zack Greinke's +91 RAR is the best in the majors in several years. I have my own spreadsheets going back to 2003, and it is the highest in that time. I didn't do a thorough check of 2002, but I'm pretty sure it's the highest RAR since Randy Johnson, 2001 (+92). This was not an ordinary, run-of-the-mill Cy Young type season; it was a top-season-of-the-decade contender type season.

* Here is a list of combined RAR for each team's top two starting pitchers (only teams with two +30 pitchers included, and only those that spent a full season with the team):

No real point here; you already knew Carpenter/Wainwright and Lincecum/Cain were really good. If I wanted to push it, I'd talk about how little the top teams in this regard accomplished in the playoffs...

* Before the season I picked Ubaldo Jimenez to win the NL Cy Young. I don't take those awards predictions very seriously, and I don't expect readers to either; I picked Ubaldo because 1) I genuinely thought he was sitting on a big year and 2) he pitched in the WBC, and I wanted to thumb my nose at the "WBC ruins pitchers hypothesis"--which, incidentally, I didn't hear nearly as much of this year as in 2006. Jimenez didn't crack my top five for the Cy Young, but he was one of the top ten pitchers in the NL this year, and I really enjoy watching him pitch.

* Ricky Nolasco had a strange season; you don't need me to tell you this, but I'll do it anyway. His peripherals were strong: 3.88 dRA and 9.5 KG, but unfortunately that big K performance against Atlanta in the last week will severely damage his sleeper prospects for 2010.

* Livan Hernandez had his typical innings-eating, flirting with replacement level type season. Yet he managed to toss 58% quality starts. The NL average was 50%, and only two other pitchers (Nolasco and Derek Lowe) were better than league average with RAs over 5.

* Nick Swisher drew 97 walks, second in the AL; his W/AB ratio was .195, first in the AL; his .250 ISO was eighth in the AL; and thus his .447 SEC was second in the AL. This all makes me very proud.

* As I mention in my MVP ballot post, this was another down year for AL position players, with the obvious exception of Joe Mauer. Of course lots of players had good years, but there were just three players over 60 RAR, compared to six in 2006, eight in 2006, and five in 2005.

* I am not one who is generally in the habit of urging athletes to retire. As long as someone is willing to employ them, and they want to do it, what harm is it to me? All the hand-wringing about "legacy" fails to impress me, as I'm aware of very few examples of players whose images have been permanently tarnished by late career ineffectiveness. Most of the old athletes with tarnished legacies do it through themselves through their post-career off-field lives (see OJ and Pete Rose), not because of hanging on too long at the end.

With that being said, it really does seem to be time for Ken Griffey to give it up. His 4.8 RG was average, but then you consider that he's a DH, and he really was not very far above replacement level. Last year was about the same when you factor in his dreadful fielding performance. I don't find it depressing or anything, but that level of performance (particularly with a $2 MM pricetag) is not helpful.

* There was much wailing and nashing of teeth among the talk radio type of Indian fans when Ryan Garko was dealt. A first baseman with a HRAA of zero, who was -2 in 2008 and +11 in 2007.

* The NL continues to have the upper hand at first base versus the AL. NL first baseman ranked first, third, fourth, tenth, eleventh, twelfth, fourteenth, and nineteenth in the league in RAR (four of the top ten and eight of the top twenty). AL first baseman managed fifth, seventh, and eighteenth, and only one DH chipped in (thirteenth).

There is nothing in the way I figure RAR that discriminates against AL first baseman. The NL first baseman have simply produced more runs over the last few seasons than their AL counterparts.

* David Ortiz managed 5.1 RG and +14 RAR this year; Travis Hafner was at 5.9 and +16. Just three years ago, those two ran two-three in the AL, each over 70 RAR. As career DHs with big contracts in their early-to-mid thirties (and fun nicknames that start with p), they make an obvious pairing. Hafner hit better this season, but Ortiz was better in 2008 and Hafner's shoulder is a recurring issue. I wouldn't want either of their contracts, but I think I'd rather have Ortiz going forward on the field--but it's close.

* Mark Reynolds shattered his own strikeout record with 223. I have to believe that this is pretty close to the upper limit on this record, at least for the time being. In saying so, I realize full well that I may look like a moron by this next year. It is easy to go through the archives of baseball punditry and find statements that something will never happen again, only to have it happen in short order. Personally, I find the ever-present "Will X be the last pitcher to win 300 games?" articles insufferable.

But when you top the previous record by nineteen, while having a 40 HR season, I don't see a lot of room for record extension. Even with 44 homers, Reynolds "only" created 105 runs; only Jay Bruce had a higher HR:RC ratio among NL players. His batting average when he made contact was .423, which led the NL; his slugging average was .885 (Ryan Howard was next at .815). That level of production is probably not sustainable, and if it falls, he'd probably lose some playing time.

I am not saying that I think Reynolds is going to crash and burn--I wouldn't expect him to replicate 2009, but I don't think he's going to fall off a cliff. I just don't think he'll continue to strikeout at the same rate and still play full-time. As Bill James pointed out in one of his Gold Mines, the trend with the 200 strikeout barrier has been for young hitters to challenge it in some of their first few seasons, then improve/refine their approach and stop striking out so much. Perhaps Reynolds is an anomaly. Time will tell. (Quick: count the clich├ęs in this post!)

* It's tough to pass up opportunities to poke fun at the Reds, and in that regard Willy Taveras' season was too good to be true. His .267 OBA was second worst in the NL and he was the only NL player to slug under .300 (.279). His .087 secondary average was easily the worst in baseball--Cesar Izturis was next at .119. It was tough to imagine the Reds failing to upgrade their center field situation, but Patterson had turned in -25 RAA/-10 RAR...Taveras -24/-9.

* Are we going to have to start a "Free Chris Iannetta" movement? Iannetta may have hit just .220, but with a .364 SEC he still created 4.8 runs per game. This came on the heels of a 6 RG season, and he's just 26. Admittedly, he struggled in July and August, but is that really a good reason to bench him for Yorvit Torreabla?

* Kansas City boasted four of the bottom thirteen AL hitters in terms of RAR (all four had <= 0 RAR). These four combined for 1,730 PA, creating 172 runs whilst making 1,231 outs. They had a combined RG of 3.6, -76 RAA, and -9 RAR.

In fairness, that includes Yuniesky Betancourt's performance in Seattle--the Royals themselves "only" invested 1,496 PA between the four. The other three were Willie Bloomquist, Jose Guillen, and Mike Jacobs. What is really sad about this is that all of them were recent acquisitions from outside the organization: Betancourt in a mid-season trade, Bloomquist and Guillen as free agents, and Jacobs in an off-season trade. Their 2009 salaries totaled nearly $19M. Good work, Dayton.


  1. Using RA as a measure for Frankie Rodriguez is misleading -- his RA is exaggerated by a few big run-scoring games. For closers, I prefer OPS against as a fairer measure of overall performance. By that measure, Frankie's '09 was not far off of his '08 performance with the Angels, and was far better than the NL average for pitchers generally and for relievers in particular.

  2. I don't disagree that RA can be misleading for relievers. The K-Rod comment was focused on the perception of his performance, and how it might have been different had the Mets been in the race.

    I don't like OPS, though; I prefer a metric with similar inputs like Component ERA or the "eRA" I publish here. Rodriguez' eRA rose from 3.50 in 2008 to 3.93 this year, and while that is above average, it's not particularly impressive either.


I reserve the right to reject any comment for any reason.