Tuesday, November 25, 2008

Hitting by Position, 2008

Here is another mail-it-in annual post; this time I will look at offensive production by position, based on the data from baseball-reference.com. This is actually one of my favorite areas of inquiry, although the one-year data shouldn’t be overanalyzed.

First, here are the positional totals for 2008. “MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the position (non-pitcher) average. “LPADJ” is the long-term positional adjustment that I used, based on 1992-2001 data. The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:

Again, I don’t want to draw any conclusions from one-year of data, so I’ll let those figures speak for themselves.

Now, let’s take a look at the exciting and pivotal spectacle of pitcher batting. Here are the basic stats for the pitchers of each NL team. RAA is runs above the average pitcher, a RG of .35 according to the first chart. It should be noted that sacrifices, a major part of a pitcher’s batting responsibility, are not included in these figures in any way:

The Cubs’ pitchers were clearly the standouts, as they were the only ones that managed to crack Mendoza line, led in OBA, led by 60 points in SLG, and were 8 RAA ahead of their closest challenger, the Cardinals. Nonetheless, they still only created runs at 40% of the overall league average.

The Rockies bring up the rear at -8, which is even worse considering I did not apply a park adjustment to the pitcher figures.

Last year, Toronto pitchers turned in an excellent 5.1 RG in their 21 PA. This year, in 16 PA, Blue Jay hurlers failed to reach base. The Twins were the most productive AL pitching staff, compiling a .316/.316/.368 line in 19 PA that compares favorably to that of their center fielder, Carlos Gomez (.258/.289/.360).

Now, let’s take a look at the worst hitting positions in the majors, as measured by RAA (compared to the overall MLB average for 2008 at the position, with left and right fields considered together, and with park adjustment). It’s more interesting to look at the worst than the best, as the best are easy to figure out--they are generally teams with a star player who plays all the time. It’s no surprise that St. Louis first baseman or Minnesota catchers hit well. So first, a simple list of the best positions, and then a table for the worst:


The Astros have the unfortunate distinction of two sinkholes, which is a good news, bad news situation. The bad news is obvious; the good news is that it shouldn’t be that hard to improve at those positions. You can see why Mariner fans were fed up with Jose Vidro and the other players in their DH; wonder why Washington has horrid production out of left despite Jim Bowden’s love of collecting candidates for the outfield corners (Dukes, Kearns, Pena); and see that one really bad position can be overcome (Angels).

In the past I have concluded this piece by discussing those teams which had unusually strong, weak, and negative correlations between expected and actual production at the position. This time, I am going to instead present a series of charts showing the RAA at each position for each team, organized by division. Below average performances are in red; outstanding performances (arbitrarily defined as +20 RAA or more) are in bold; and each table is sorted by “SUM”, which is the sum of the RAA figures for the positions (no pitchers or pinch hitters). These ARE park adjusted:

Did you ever expect to see a team with below average production at every position except for the one largely manned by Cristian Guzman? The Mets are also interesting--three positions were at +20 or more, and the primary performer at each of those positions was in the top ten on my IBA ballot for NL MVP. The other five positions are a composite -16. While the Mets still led the division in RAA, it illustrates my contention in defense of my ballot that the Mets’ stars can hardly be blamed for the team’s failure to make the playoffs. Florida led the majors in combined middle infield RAA (+77) on the backs of Dan Uggla and Hanley Ramirez. Atlanta had the lowest combined outfield RAA in the majors (-62); they balanced this out by having the best infield RAA in the majors (+75) and solid catcher production (+19).

Pittsburgh had the worst combined middle infield RAA (-46). St. Louis had the top outfield RAA (+50); they also had the top combined corner RAA (+99 for 1B, 3B, LF, and RF). Eyeballing it, the Cubs may have gotten the most balanced contributions relative to positional norms for a team with a good offense. Cincinnati had only three positive positions, two of which were manned by favored whipping boys of what I consider the “Pete Rose idolizing” segment of their fan base (Encarnacion and Dunn).

San Diego got its best production out of center field, and the fourth highest RAA at that position in the majors. Extra credit to anyone who thought before the campaign that a Jody Gerut/Scott Hairston combination would pull that off. San Francisco had the lowest infield RAA (-87) in the majors, “besting” their neighbors, the A’s (-81). No other team was worse than -70.

Baltimore shortstops post-Tejada were certainly problematic, although excellent performance from Brian Roberts and Nick Markakis managed to keep the Orioles’ attack respectable. As I pointed out in a post a few weeks ago, Toronto was a team stocked with guys who hit like middling middle infielders. Only their center fielders managed to match the league average.

Is there any primary position holder for a +20 position who gets more grief from his hometown fans than Jhonny Peralta? (There may well be, but Peralta is a favorite whipping boy on Cleveland sports talk. While I freely admit that his fielding is subpar, he’s still at worst an average player. Yet there are a number of Indians fans who can’t wait to ditch him). The Twins offense defied all run estimators this year, finishing second in the AL with 5.1 R/G but only eighth in RC/G at 4.8. Only the Mauer and Morneau-manned positions managed above average performances.

Finally, sneaking in on the last table, is a team (Oakland) below average at every position. They were last in the majors in composite corner position RAA (-73).

You will note that the AL teams have a negative total; this is because I used the overall MLB average. Believe it or not, the AL with pitchers removed hit .268/.332/.421, while the NL with pitchers removed hit .267/.336/.426.

Here is a link to a Google spreadsheet with the positional data for each team.

Tuesday, November 11, 2008

Why I Don’t Care About the BBWAA Awards

Because it’s pretty apparent that the BBWAA doesn’t care. Why should I?

This post is gratuitous piling-on, no doubt, as the votes for NL Rookie of the Year received by Edinson Volquez have already become and will continue to be easy message board/blog fodder.

Seriously, though, why should anyone care about an award if three of the thirty-two voters can’t even correctly identify who is eligible for it? You trust people drawn from this same pool to fill out a ten-deep MVP ballot intelligently?

This is not a sabermetrician’s rant against the stupid old sportswriters looking at RBI and “chemistry” or win-loss record or what have you. This is much more elementary.

In the original Historical Baseball Abstract, Bill James compared and contrasted the voting system for MVP (and by extension Cy Young and ROY, since they are similar) and the Hall of Fame. Both are chosen by members of the same group (although the HOF electorate is much larger and does not require the close attention to current baseball that one must engage in to be granted a MVP ballot), but James argued that one system is intelligently designed and the other was haphazardly designed and is inherently flawed.

I agree with James’ points in that article. However, any system, no matter how well-designed, is going to produce bad results if you have unqualified or unserious people as the voters. And it is clear that there are at least three people with a say in the matter who are one or both of the above. Even worse is the fact that the BBWAA apparently did not notice this when they tabulated the vote!

If you are getting a tone of outrage from this, then I have failed. I’m not outraged; I’m actually more amused and bemused. I like the Hot Stove fodder that the MVP and similar awards provide, and naming the best player, pitcher, and rookie in each league is a perfectly worthwhile activity. But each individual can have that discussion with their friends and internet associates, post their own ballot on a blog or message board, participate in a broad-based amalgamation like the IBA, and so on without caring what the BBWAA decides, except as a passing curiosity. And hey, at least the IBA restricts the ballot to eligible candidates.

Tuesday, November 04, 2008

Leadoff Hitters, 2008

Once again, here is a look at the composite performances of the players who batted in the leadoff spot for each team. The data is from baseball-reference.com and again, it includes ALL of the PA out of the leadoff spot. In parentheses I list the players who appeared in twenty or more games in the #1 slot (which is not the same as starting twenty games; they could have been pinch runners, defensive replacements, etc.), but that does not in any way mean that they are the only contributor to the team total.

I always feel obliged to point out that as a sabermetrician, I feel that the importance of the batting order is often overstated, and that the best leadoff hitters would generally be the best cleanup hitters, the best #9 hitters, etc. However, since the leadoff spot gets a lot of attention, it is instructive to look at how each team fared there.

The conventional wisdom would say that the most important function of the leadoff hitter is to get on base and score runs. So a good place to start is looking at runs scored per 25.5 outs (AB - H + CS):

1. FLA (Ramirez), 7.3

2. DET (Granderson), 6.4

3. TEX (Kinsler/Arias), 6.4

Leadoff Average, 5.1

MLB Average, 4.8

28. TOR (Inglett/Eckstein/Scutaro/Rios), 4.4

29. WAS (Lopez/Guzman/Harris/Bonifacio), 4.3

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), 3.7

For clarification, “Leadoff Average” is the average for leadoff hitters, and “MLB Average” is the average for all hitters, regardless of lineup slot.

I am not going to insult your intelligence by extensively lecturing about the drawbacks of using actual runs scored figured or any of the other metrics presented here.

Another very basic measure by which to gauge a leadoff hitter is On Base Average. I did not include hit batters or sacrifices, so this is just (H + W)/(AB + W):

1. FLA (Ramirez), .385

2. BAL (Roberts), .373

3. CLE (Sizemore), .364

Leadoff Average, .341

MLB Average, .329

28. COL (Taveras/Barmes/Podsednik), .308

29. HOU (Bourn/Matsui/Erstad), .290

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), .277

Last year, leadoff hitters had the same .341 OBA, but the league average was .331, so there was a tiny relative improvement for the leadoff spot this year.

I had an online discussion with an Indians fan some time in May about who, to that point, was the team’s most productive offensive player. He argued for Victor Martinez on the basis of his Batting Average, which was sad and predictable. What was odd about it was that when I pointed out Sizemore’s far-superior OBA, he scoffed that it wasn’t in the top ten in the league and thus was inadequate for a leadoff hitter.

I don’t know if he is representative of the larger group of casual fans or not, but in his case at least, there is a misguided belief that leadoff hitters have superior OBAs. As a group, they don’t, at least not to an extent where a .360 OBA would be subpar.

I am amused by the third and second to last finishes of the Rockies, spearheaded by Willy Taveras, and the Astros, led by Michael Bourn. Houston parted with Taveras in the Jason Jennings trade, then decided they couldn’t live without a speedy center fielder who can’t hit, so they accepted him as the key piece for Brad Lidge.

A slightly modified OBA is what I like to call Runners On Base Average. It is the A component of Base Runs per PA, and it simply removes home runs and caught stealing from the numerator of OBA. Thus, it leaves only times in which the hitter was actually on base, waiting to be driven in by the subsequent batters.

1. BAL (Roberts), .348

2. SEA (Suzuki), .345

3. LAA (Figgins), .338

Leadoff Average, .309

MLB Average, .297

28. MIN (Gomez/Span), .276

29. HOU (Bourn/Matsui/Erstad), .258

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), .258

Florida falls to tenth (.324) and Cleveland to seventeenth (.313), mostly because they tied for the ML lead with 34 homers by leadoff hitters.

Now I will look at two statistics which are describe the shape of performance, not the quality (ROBA is sort of in this class--a high ROBA is good, but so are home runs which don’t help you out there). The first is simply the ratio of runs scored to RBI. Leadoff hitters as a group score many more runs than they drive in, partly due to their skills and partly due to lineup dynamics. Those with low ratios don’t fit the traditional leadoff profile as well as those with high ratios:

1. LAA (Figgins), 2.8

2. SEA (Suzuki), 2.5

3. BOS (Ellsbury), 2.2

Leadoff Average, 1.6

28. CIN (Hairston/Patterson/Dickerson/Bruce/Freel), 1.2

29. CHN (Soriano), 1.2

30. CLE (Sizemore), 1.1

MLB Average, 1.0

A similar idea posited by Bill James is the Run Element Ratio, which James intended to balance skills more helpful in setting up an inning (walks and steals) against those more helpful in driving runners in (power, measured by extra bases). RER is simply the ratio (SB + W)/(TB - H):

1. LAA (Figgins), 3.0

2. COL (Taveras/Barmes/Podsednik), 1.8

3. BOS (Ellsbury), 1.8

Leadoff Average, 1.0

MLB Average, .8

28. CHN (Soriano), .6

29. ARI (Drew/Young), .6

30. SD (Gerut/Giles/Hairston), .5

Returning to measures which attempt to measure quality, Bill James used an estimated runs scored to rate leadoff hitters. He assumed that if a leadoff hitter reached first (S + W - SB - CS), he would score 35% of the time; 55% from second (D + SB); 80% from third (T), and of course once for each home run. Expressed per 25.5 outs, I’ll call this Leadoff Efficiency:

1. FLA (Ramirez), 7.3

2. CLE (Sizemore), 7.1

3. BAL (Roberts), 6.6

Leadoff Average, 5.7

MLB Average, 5.5

28. MIN (Gomez/Span), 4.8

29. HOU (Bourn/Matsui/Erstad), 4.4

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), 4.3

As Tango Tiger pointed out in the comments last year, James’ weights aren’t optimal. You can see this in the fact that he expects leadoff hitters to score 5.72 runs/“individual game”, whereas they actually average 5.36. Tango suggested alternate scoring percentages of 30/50/65. I stuck with James’ here, but please heed the warnings them.

When I first did a review of leadoff hitters in this vein, David Smyth suggested that I include 2*OBA + SLG. Since the optimal weight for OBA in a x*OBA + SLG construction is somewhere in the vicinity of 1.7, using “2OPS” is closer to the mark than regular OPS, while also providing an extra boost in value for OBA. So here is that list (the actual figure displayed here is .7*(2*OBA + SLG), to bring it in line with the regular OPS scale. OPS and 2OPS are both unitless, so I may as well express 2OPS on the more familiar regular OPS scale):

1. FLA (Ramirez), 906

2. CLE (Sizemore), 860

3. SD (Gerut/Giles/Hairston), 851

Leadoff Average, 768

MLB Average, 753

28. COL (Taveras/Barmes/Podsednik), 675

29. HOU (Bourn/Matsui/Erstad), 640

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), 617

Finally, we can evaluate the leadoff men in exactly the same way as I would evaluate anyone else--their RG, based on ERP:

1. FLA (Ramirez), 7.1

2. CLE (Sizemore), 6.7

3. BAL (Roberts), 6.1

Leadoff Average, 5.1

MLB Average, 4.8

28. MIN (Gomez/Span), 4.0

29. HOU (Bourn/Matsui/Erstad), 3.4

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), 3.1

Here is a link to the spreadsheet if you want to examine this yourself.