Monday, December 03, 2012

Statistical Meanderings 2012

This post is probably shorter than in past years, which is good for you if you choose to read. There’s not much of substance here--just a few observations about quirky things in my stat reports that happened to catch my eye. Jayson Stark-level stuff, but with better (I hope) stats.

* If I could only know one statistic regarding each major league season, I would want to know the league average of runs per game. It is the simplest way to boil down the environment in which the game is played into a single number. As such, I like to make note of how it fluctuates from year to year, even if one cannot draw any conclusions about long-term trends from single seasons. In fact, I consider this number so important that I refer to it as N in my league stat report.

Runs were up slightly in the majors in 2012 relative to 2011, with an uptick in NL scoring leading the way. N for the 2012 was AL was 4.45 versus 4.46 in 2011, but the NL’s crept up to 4.22 from 4.13. I should note that interleague play has made it more difficult to look at N for each league, since runs no longer equal runs allowed for the league. The version here is just runs scored, which I persist with by convention and expediency rather than anything I can justify to you.

* Home W% also rebounded a bit after cratering in 2011. Home teams won at a .533 clip in 2012, up from .526, but well behind the conspiracy theory-inspiring .559 of 2010. Four of thirty teams had better record on the road, while there were seven such teams in 2011, and just two in 2012. Again, trends need to be observed over time.

* As the Rays fell out of contention in September, one factor that was bandied about in mainstream circles was a decline in their team fielding. It’s true that Tampa went from leading the majors in modified FA in 2011 (.978) to bringing up the rear in the AL in 2012 (.963), but their DER remained excellent. Tampa’s .694 was good enough for a virtual five-way tie for the ML lead (SEA, LAA, OAK, and WAS were all at either .695 or .694).

And the Rays pitching was outstanding; they narrowly missed leading the majors in both starter (their 3.90 led the AL by a large margin as DET was second at 4.26, but WAS was at 3.85) and reliever eRA (the closest challenger to their 3.19 was the Reds at 3.44).

* Given the fact that Colorado spent a good chunk of the season dabbling with a pitch-restricted four man rotation, it’s not a surprise that their starting staff put up some unusual numbers. But there are a few that really catch the eye:

- COL got quality starts in just 17% of their games (Coors doesn’t help QS% of course, but regardless of the park, you don’t get a quality start without pitching six innings); MIN was next lowest at 38%.
- COL starters logged 4.72 innings per start; MIN was next with 5.43
- COL starters had a 5.71 eRA (this figure is park-adjusted), but it was not the worst in the majors. That fate belonged to...the Twins (5.73).

* The four lowest scoring teams in the majors played in the National League, which isn’t shocking, but the fact that every AL team outscored the Reds was very surprising to me. CIN’s 4.05 R/G was well ahead of their two division rivals, at least--Houston (3.64) and the Cubs (3.67) brought up the rear, and Miami didn’t do much better (3.76). The Marlins and Reds were the two most underachieving offensive teams in terms of scoring fewer runs than predicted by Base Runs, with respective shortfalls of 47 and 40 runs.

Of course, that means the fifth lowest-scoring offense belonged to an AL team, and if you’ve been paying any attention at all over the last few seasons, you should guess that it was Seattle. You’d be right--the Mariners mustered just 4.06 R/G, which was actually better than their RC/G (4.04). Seattle was last in home R/G (3.17--the Mets were next at 3.54), but ranked eighth in road R/G (4.47; for reference, TEX was at 4.46). The Mets were also much better on the road, ranking seventh just ahead of Seattle with 4.48. In case you were wondering, the home and road R/G are raw numbers.

* For the past three years, I’ve included a chart with each playoff team’s runs above average on offense and defense, and in all three, playoff teams on average have been better offensively than defensively. That was not the case in 2012--it was a big year for playoff team defense:

The two teams that jump off the page for me are the Tigers and the Giants. All season I had a good time snarking at the expense of those wild-eyed dreamers who thought that Detroit might have a chance at scoring 1000 runs, and in fact the Tigers were below the AL average in R/G after adjusting for park. It would have seemed difficult to believe two years ago that San Francisco would win another World Series but do it with the offense contributing more than the defense.

* The Mets factoids provide a great segue into discussing Citi Field’s park factor. When significant changes are made to a park, my usual course of action is to treat it as a brand new park. This is not an entirely satisfactory solution, but neither is regressing a park to 1.00 as my methodology also does. One could attempt to develop a hypothetical mean for a park, considering factors such as dimensions, altitude, temperature, knowledge about how the park played before alterations, and other factors. My approach keeps it simple, but not necessarily more accurate--deal only with the park under its current dimensions, use a maximum of five years of data in whichever direction (past or future) that you can grab it from (although 2012 park factors are necessarily 100% based on past experience at this point), and regress heavily.

Considering only 2012, I have Citi Field’s park factor as .96 for runs and 1.02 for home runs. If instead I’d used all four years of available data (2009-2012), it would have been .96 and .95. In this case, regardless of how I figured the park factor, it wouldn’t have made any difference for Mets players.

* NL relief leaders in strikeout rate:

Kimbrel and Chapman are in video game territory, but the most surprising name to me is Jason Grilli.

What I list as KG and WG are strikeouts and walks per plate appearance, scaled to the league average of plate appearances per game. I’ve always advocated looking at strikeout and walk rates per PA rather than per inning, and am happy to see that PA-denominated rates have gained a lot of traction in stathead circles (if you think I’m implying causation between the two causes of this sentence, I’m not). Why have I chosen to scale them to PA-games rather than just leaving them as per PA proportions or percentages?

For the same reason I express batter’s RC rates as per 25.5 outs rather than per out--I have a much better feel for the per game scale, and since it’s only a scalar multiplication, there’s no harm done in terms of the ability to make comparisons . Using the actual league average of PA/G can admittedly create some odd results that don't occur when using outs/game. Outs/game are generally fixed across leagues, but PA/G can vary significantly across contexts. The same is true for K/9. If K/PA is constant between two entities, K/9 will be higher for the entity with a higher average of PA/G (this is the biggest reason I dislike K/9). So the KG figures I report will rank an AL pitcher with the same K/PA ahead of a comparable NL pitcher.

I don’t really consider this a problem, since I’d recommend using adjusted statistics when comparing across leagues anyway. But I can see why for some might find that objectionable for a single season; perhaps next year I’ll use the major league average PA/G to set the scale.

* I am easily amused by teams whose estimated W%s are roughly equal and pitchers whose various run averages are similar. A great example of the latter this year was Josh Johnson. Johnson’s RA was 3.95 and his bullpen support was pretty average, so his RRA was just a touch lower at 3.90. He didn’t allow many unearned runs, so his 3.81 ERA was pretty close to his actual RA. His eRA was 3.81, so his peripheral statistics were in line with his actual RA. And his dRA was 3.90 thanks to his .304 %H being right in line with the NL average of .303.

* Remember when Aaron Cook was a nightmare for sabermetrics? The trashiest site in sports media was all over that case, but in the end, not only was Cook unable to sustain success with a 1.9 KG (and who could have ever forseen that?), his .295 %H was indistinguishable from the AL’s .296 average. And his 5.89 dRA was equal to his 5.89 RRA. Aaron Cook 2012 could be the DIPS poster child.

Speaking of former DIPS debunkers, how did JA Happ manage to get his strikeout rate up to 8.6?

* Major league pitchers with 15 or more starts who averaged at least 100 pitches/start (treating stray relief appearances as half a start): Homer Bailey, Clay Buchholz, Madison Bumgarner, Matt Cain, Johnny Cueto, Yu Darvish, RA Dickey, Yovani Gallardo, Dillon Gee, Gio Gonzalez, Miguel Gonzalez, Cole Hamels, Aaron Harang, Matt Harrison, Felix Hernandez, Phil Hughes, Ubaldo Jimenez, Josh Johnson, Ian Kennedy, Clayton Kershaw, Cliff Lee, Tim Lincecum, Justin Masterson, Jon Niese, Jake Peavy, David Price, CC Sabathia, Chris Sale, Max Scherzer, Jamie Shields, Jason Vargas, Justin Verlander, Edinson Volquez, CJ Wilson.

You are more than free to quibble on this, but the name that really jumps out to me on that list is Dillon Gee. I would argue that the other pitchers on the list pitched very well in 2012 (even if that was a surprise, as in the case of Miguel Gonzalez) or had a reputation (like veteran workhorse Aaron Harang or whatever-the-heck-he-is Ubaldo Jimenez). And Gee only made 17 starts, so his workload was not comparable to the pitchers who were in the rotation all year. Still, Gee was -8 RAA in 2012 and -13 in 2011.

His teammate Jon Niese may have caught your eye as well, but what’s really interesting is that both Gee (103 pitches/start) and Niese (101) actually threw more pitchers per start than their knuckleball and Cy Young contender teammate RA Dickey (100).

* There have been a number of trades in recent years which have involved trading an established starting pitcher for a package that includes a young major league or very advanced starting pitching prospect, only to see the latter outperform the former. Sometimes the superior performance has only been for one season, and either the jury is still out on which guy ultimately is more effective or a second season has given a mixed results (examples include Daniel Hudson/Edwin Jackson, Clayton Richard/Jake Peavy, Jarrod Parker/Trevor Cahill). One of the funniest (and the most personal for me) is the Ubaldo Jimenez trade. I was vehemently opposed to this trade from Cleveland’s perspective when it went down. What happened this year was that all three pitchers involved in the trade were ineffective, but that still is a big win for Colorado when you consider the contract status of all involved:

The Indians pitching in general was awful. I wrote a little bit above about the horrible starting pitching of Colorado and Minnesota, but Cleveland was right behind. The Indians 5.64 IP/S trailed only those two and Kansas City, their 45 QS% was sixth-lowest in MLB, and their 5.65 starters eRA trailed only the terrible two as well (fourth-worst was the Royals, again, but with a relatively Verlanderesque 5.29). Cleveland allowed 5.58 runs/game on the road, worst in the majors (Houston was next at 5.42; Minnesota next in the AL all the way down at 5.12). Not that they were much better at home--4.85 RA/G was the fourth-most (Colorado was worst, of course, with 6.46).

Their starting pitchers were every bit as dreadful individually as the team totals would indicate:

At least two were above replacement level. 77 primarily AL pitchers made 15 or more starts, and in terms of RAR, the Indians boasted six of the bottom nineteen. Obviously Manny Acta’s fault.

* Adam Dunn hit .204 this year, which was a bit of a disappointment to me. Throughout the season, I was hoping that his final average would slip below the arbitrary Mendoza line in order to have a legitimately solid batting season at that BA level to point too (Dunn was +12 HRAA). It is a bit of consolation that his park-adjusted BA is .199, and that his .445 secondary averaged led all AL hitters with 400+ PA (Jose Bautista and David Ortiz each had a higher SEC and were close to that cutoff). Dunn also came tantalizingly close to breaking the strikeout record, but his 222 fell one short of Mark Reynolds’ 2009 total.

* Bryan LaHair and Anthony Rizzo have obviously had very different pasts and likely will have very different futures. But in 2012, they were statistically about as close as you could ever hope to see from two guys at the same position on the same team:

The only appreciable difference is in the shape of production, with Rizzo posting a higher BA and LaHair a higher SEC.

* Another fun first base pair is erstwhile division rivals Mark Teixeira and Adrian Gonzalez:

The differences here are more marked, thanks to a bigger gap in shape of performance and Gonzalez’s extra 166 PA. The collapse of Gonzalez’s secondary contributions was well documented during the season, and I have no insight to offer regarding the cause. Since Gonzalez became a full-time player in 2006, his annual raw secondary averages have been:

.247, .288, .320, .351, .489, .371, .327, .231

If Ned Colletti knew what secondary average was, he might be concerned.

* Here is a pair of veteran teammates:

Looking at just their offensive contributions, you can see that they were equally effective when in the lineup, albeit with a shape difference (player A had a superior BA, player B a superior SEC). Obviously Player A was in the lineup every day, while Player B missed some time. It isn’t reflected here, but Player A also “played” a tougher fielding position. All told, though, there appears to be little good reason why Player A is still celebrated and Player B is reviled. Don’t overthink this one--you know who they are.

