Walk Like a Sabermetrician: Leadoff Hitters, 2010

This post kicks off a series of posts that I write every year, and therefore struggle to infuse with any sort of new perspective. However, they're a tradition on this blog and hold some general interest, so away we go.

This post looks at the offensive performance of teams' leadoff batters. I will try to make this as clear as possible: the statistics are based on the players that hit in the #1 slot in the batting order, whether they were actually leading off an inning or not. It includes the performance of all players who batted in that spot, including substitutes like pinch-hitters. Listed in parentheses after a team are all players that appeared in twenty or more games in the leadoff slot--while you may see a listing like "MIN (Span) this does not mean that the statistic is only based solely on Span's performance; it is the total of all Minnesota batters in the #1 spot, of which Span was the only one to appear in that spot in twenty or more games. I will list the top and bottom three teams in each category (plus the top/bottom team from each league if they don't make the ML top/bottom three); complete data is available in a spreadsheet linked at the end of the article. There are also no park factors applied anywhere in this article.

That's as clear as I can make it, and I hope it will suffice. I always feel obligated to point out that as a sabermetrician, I think that the importance of the batting order is often overstated, and that the best leadoff hitters would generally be the best cleanup hitters, the best #9 hitters, etc. However, since the leadoff spot gets a lot of attention, and teams pay particular attention to the spot, it is instructive to look at how each team fared there.

The conventional wisdom is that the primary job of the leadoff hitter is to get on base, and most simply, score runs. So let's start by looking at runs scored per 25.5 outs (AB - H + CS):

1. NYA (Jeter/Gardner), 6.5
2. FLA (Coghlan/Maybin/Bonifacio/Ramirez), 6.0
3. DET (Jackson), 5.9
Leadoff average, 5.0
ML average, 4.4
28. CLE (Brantley/Crowe/Cabrera), 4.0
29. WAS (Morgan), 4.0
30. SEA (Suzuki), 4.0

Obviously this category is heavily influence by the quality of the subsequent batters in the lineup; the best indication of this is Ichiro's last-place finish, as you'll see that his leadoff spot actually ranks among the leaders in a couple of more independent categories. Ichiro was the only batter to appear in the leadoff spot in all of his team's games; Juan Pierre (156), Rickie Weeks (155), and Denard Span (151) were the other batters to appear in 150 or games.

The other obvious metric to look at is On Base Average, which speaks to the other conventional goal of a leadoff hitter. The figures here exclude HB and SF to be directly comparable to earlier versions of this article, but those categories are available in the spreadsheet if you'd like to include them:

1. ARI (Johnson/Drew/Young), .366
2. SEA (Suzuki), .358
3. LA (Furcal/Podsednik), .351
Leadoff average, .324
ML average, .322
28. CIN (Phillips/Cabrera/Stubbs), .299
29. WAS (Morgan), .293
30. CLE (Brantley/Crowe/Cabrera), .292

The Reds just cannot seem to find a way to get their leadoff hitters on base. Last year they ranked 29th with a .301 OBA led by Willy Taveras, Drew Stubbs, and Chris Dickerson; and in 2008 they were 24th with Jerry Hairston, Corey Patterson, Jay Bruce and Dickerson. At least this year they weren't wasting PAs on proven failures like Taveras and Patterson.

As alluded to above, Seattle's leadoff hitters had the highest OBA in the American League, yet scored the fewest runs per out.

The next statistic is what I call Runners On Base Average. The genesis of it is from the A factor of Base Runs. It measures the number of times a batter reaches base per PA--excluding homers, since a batter that hits a home run never actually runs the bases.

My 2009 leadoff post was linked to a Cardinals message board, and this metric was the cause of a lot of confusion (this was mostly because the poster in question was thick-headed as could be, but it's still worth addressing). ROBA, like several other methods that follow, is not really a quality metric, it is a descriptive metric. A high ROBA is a good thing, but it's not necessarily better than a slightly lower ROBA plus a higher home run rate (which would produce a higher OBA and more runs). Listing ROBA is not in any way, shape or form a statement that hitting home runs is bad for a leadoff hitter. It is simply a recognition of the fact that a batter that hits a home run is not a baserunner.

ROBA also removes CS, and so the formula is (H + W - HR - CS)/(AB + W):

1. SEA (Suzuki), .337
2. NYA (Jeter/Gardner), .328
3. DET (Jackson), .323
5. LA (Furcal/Podesdnik), .323
Leadoff average, .296
ML average, .290
26. TOR (Lewis/Snider/Wise), .276
28. SF (Torres/Rowand), .269
29. CIN (Phillips/Cabrera/Stubbs), .267
30. WAS (Morgan), .260

Arizona's leadoff hitters, who led the majors in OBA, rank sixth in ROBA because they were second in the majors with 25 homers (Milwaukee hit 28 from the leadoff spot). Washington loses ground on the list (although they were just 29th in OBA) not because their leadoff hitters were driving the ball out of the park (5 homers ranked tied for fifth-fewest), but because they led the majors by being caught stealing 19 times.

I will also include what I've called Literal OBA here--this is just ROBA with HR subtracted from the denominator so that a homer does not lower LOBA, it simply has no effect. You don't really need ROBA and LOBA (or either, for that matter), but this might save some poor message board out there twenty posts, so here goes. LOBA = (H + W - HR - CS)/(AB + W - HR):

1. SEA (Suzuki), .340
2. NYA (Jeter/Gardner), .332
3. DET (Jackson), .330
4. ARI (Johnson/Drew/Young), .330
Leadoff average, .301
ML average, .298
26. CLE (Brantley/Crowe/Cabrera), .280
28. SD (Hairston/Venable/Gwynn/Eckstein), .275
29. CIN (Phillips/Cabrera/Stubbs), .273
30. WAS (Morgan), .262

The next two categories are most definitely categories of shape, not value. The first is the ratio of runs scored to RBI. Leadoff hitters as a group score many more runs than they drive in, partly due to their skills and partly due to lineup dynamics. Those with low ratios don’t fit the traditional leadoff profile as closely as those with high ratios:

1. FLA (Coghlan/Maybin/Bonifacio/Ramirez), 2.5
2. TEX (Andrus), 2.4
3. DET (Jackson), 2.1
Leadoff average, 1.7
27. KC (Podsednik/Blanco/DeJesus), 1.4
28. SD (Hairston/Venable/Gwynn/Eckstein), 1.4
29. SF (Torres/Rowand), 1.3
30. PHI (Victorino/Rollins), 1.2
ML average, 1.1

Florida's leadoff hitters scored a lot of runs (as we saw earlier), so it's no surprise they had a high R/RBI ratio. Texas ranks second because they drove in just 42 runs (tied with Cleveland for fewest), and with a ML-low .290 SLG it's not hard to see why (they trailed in SLG by a wide margin; CHA was next at .310).

A similar gauge, but one that doesn't rely on the teammate-dependent R and RBI totals, is Bill James' Run Element Ratio. RER was described by James as the ratio between those things that were especially helpful at the beginning of an inning (walks and stolen bases) to those that were especially helpful at the end of an inning (extra bases). It is a ratio of "setup" events to "cleanup" events. Singles aren't included because they often function in both roles.

Of course, there are RBI walks and doubles can be a great way to start an inning, but RER classifies events based on when they have the highest relative value, at least from a simple analysis:

1. CHA (Pierre), 4.1
2. TEX (Andrus), 2.4
3. HOU (Bourn/Bourgeois), 2.6
Leadoff average, 1.1
23. TOR (Lewis/Snider/Wise), .9
ML average, .8
28. NYN (Reyes/Pagan), .7
29. ATL (Prado/Infante), .7
30. SF (Torres/Rowand), .6

The influence of stolen bases is pretty strong in RER, which is why the White Sox rank so highly--their 67 swipes led all leadoff spots, with the Astros next at 61. Atlanta's 10 steals only beat out two teams (Boston and St. Louis) that stole nine.

Speaking of stolen bases, I decided it would be worthwhile this year to look at a pure measure of base stealing. Obviously there's a lot more that goes into being a leadoff hitter than simply stealing bases, but it is one of the areas that is often cited as important. So I've included the ranking for what some analysts call net steals, SB - 2*CS. I'm not going to worry about the precise breakeven rate, which is probably closer to 75% than 67%, but is also variable based on situation. The ML and leadoff averages in this case are per team lineup slot:

1. OAK (Crisp/Davis/Pennington), 45
2. CHA (Pierre), 33
3. HOU (Bourn/Bourgeois), 31
Leadoff average, 11
ML average, 3
28. ATL (Prado/Infante), -2
29. LAA (Aybar/Abreu), -4
30. COL (Fowler/Gonzalez/Young), -10

It is really quite mind-boggling that a playoff contender playing in the major's most offense-friendly park would allow its leadoff men to attempt 41 steals with a 59% success rate. And there's those Moneyball A's...never mind, too easy.

Oakland and Philadelphia tied for the lead in SBA at 87.5%; the Phillies were 35/40, and fourth in net steals. The only teams below the 2/3 success rate for net steals were BOS (9/14), LAA (22/35), ATL (10/16), and of course COL (24/41). Leadoff hitters composite SBA was 75.7%, compared to the overall major league rate of 72.4%.

Let's shift gears back to quality measures, beginning with one that David Smyth proposed when I first wrote this annual leadoff review. Since the optimal weight for OBA in a x*OBA + SLG metric is generally something like 1.7, David suggested figuring 2*OBA + SLG for leadoff hitters, as a way to give a little extra boost to OBA while not distorting things too much, or even suffering an accuracy decline from standard OPS. Since this is a unitless measure anyway, I multiply it by .7 to approximate the standard OPS scale and call it 2OPS:

1. ARI (Johnson/Drew/Young), 850
2. LA (Furcal/Podsednik), 794
3. MIL (Weeks), 791
4. SEA (Suzuki), 776
ML average, 733
Leadoff average, 722
28. SD (Hairston/Venable/Gwynn/Eckstein), 638
29. WAS (Morgan), 633
30. CLE (Brantley/Crowe/Cabrera), 629

Along the same lines, one can also evaluate leadoff hitters in the same way I'd go about evaluating any hitter, and just use Runs Created per Game with standard weights (this will include SB and CS, which are ignored by 2OPS):

1. ARI (Johnson/Drew/Young), 6.2
2. LA (Furcal/Podsednik), 5.4
3. MIL (Weeks), 5.3
4. SEA (Suzuki), 5.2
ML average, 4.5
Leadoff average, 4.4
28. TEX (Andrus), 3.1
29. CLE (Brantley/Crowe/Cabrera), 3.1
30. WAS (Morgan), 3.1

Not surprisingly, this list is extremely similar to the 2OPS list.

Finally, allow me to close with a crude theoretical measure of linear weights supposing that the player always led off an inning (that is, batted in the bases empty, no outs state). There are weights out there (see The Book) for the leadoff slot in its average situation, but this variation is much easier to calculate (although also based on a silly and impossible premise).

The weights I used were based on the 2010 run expectancy table from Baseball Prospectus. Ideally I would have used multiple seasons but this is a seat-of-the-pants metric. Here are the relevant lines of that RE table:

To calculate the value of a single or walk in (---, 0), simply subtract .492 from .859 to get .367. Similarly, the value of a double is 1.101 - .492 = .610 and a triple is 1.358 - .492 = .866. A home run is worth one run, as the state remains (---, 0) but there is a run on the board.

Assuming (conveniently and inaccurately) that all stolen base attempts occur with 0 out and are of second base, the value of a steal is 1.101 - .859 = .242, which necessarily is the same as the extra value of a double over that of a single or walk.

I will deal with outs in such a manner so as to force the average leadoff hitter to zero RAA. They will not come out to zero without special treatment; after all, this is a theoretical construct. Leadoff hitters are not perfectly average nor are events evenly distributed across base/out states.

First, a caught stealing costs the team the value of the baserunner previously earned (-.367), plus the cost of the out itself, which also applies to (AB - H). So to calculate the out value, solve this equation for x:

0 = .367(S + W - CS) + .61D + .866T + HR + .242SB + x*(AB - H + CS)

For 2010 leadoff hitters, x = -.230, and so our theoretical leadoff RAA (which I'll call raw Leadoff Efficiency because I was already using that name for a different metric in the past) is:

rLE = .367(S + W) + .61D + .866T + HR + .242SB - .583CS - .23(AB - H)

To convert this to a rate (it is a RAA total in its current form), I divided by PA (AB + W) and multiplied by the average number of PA for leadoff hitters in 2010 (742). This yields Leadoff Efficiency:

1. ARI (Johnson/Drew/Young), 27
2. LA (Furcal/Podsednik), 15
3. MIL (Weeks), 15
4. SEA (Suzuki), 12
ML average, 1
Leadoff average, 0
28. SD (Hairston/Venable/Gwynn/Eckstein), -19
29. CLE (Brantley/Crowe/Cabrera), -21
30. WAS (Morgan), -22

The fact that this list is very similar to the lists based on metrics designed to apply generic weights to all batters illustrates how the relative values of offensive events are fairly stable.

One thing I noticed when writing this article was how many teams were using multiple players in their leadoff spot. Compared to 2007-2009, there were indeed a lot of different players used in the role:

The first column is the average number of games in the leadoff spot for the team leader; the second column is the number of teams that had a player appear in 100 or more games a leadoff man, and the third is the total number of players with 20 or more appearances in the leadoff spot.

What was unusual in 2010 was not the number of players appearing in twenty or more games, but rather the lack of players that lead off in the bulk of their team's games. For now it is just a blip; it will be interesting to see if it remains that, or is indicative of a trend. My guess is the former, but it caught my eye and so I mentioned it here.

Assuming for the sake of discussion that it is the beginning of a trend, one would have to question whether the new approach is working. Leadoff hitters' composite OBA was just two points better than the major league average, the smallest margin since I started tracking it in 2005 (the previous low was six points in 2006). Leadoff hitters were also below-average in a generic RC analysis (4.4 RG versus a ML average of 4.5), and it's tough to believe that represents optimal lineup construction.

Here is a link to a Google Spreadsheet with the data used in this post.

Walk Like a Sabermetrician

Monday, December 13, 2010

Leadoff Hitters, 2010

No comments:

Post a Comment

Me, Elsewhere

Analysis Links

Reference Links

Blog Archive

OSU Baseball

End of Season Statistics

Win Shares Walkthrough

NL 1876-1881 Series

Labels

About Me