Writing an Indians preview is something of an annual tradition here, and I see no reason to stop now. My interest at this point really lies in amusing myself by seeing if I can correctly predict the 25-man roster. There are a lot of places on the internet where you can find micro-analysis of teams and players; it’s not my strength as an analyst, and so for the most part any such analysis here will be perfunctory by design.
The Indians certainly surpassed my expectations in 2013, winning 92 games and earning one of the two wildcard playoff berths before being shut out by Tampa Bay in the one-game playoff, which I had the mixed pleasure of attending (pro: first playoff game, con: lousy outcome for an Indians fan). However, the offseason that follows has felt a lot like 2008--standing pat rather than bolstering the roster to take another shot. While such a decision should hardly be surprising given the track record of the Dolan-era Indians, it was a little jarring coming on the heels of the team’s aggressive attempt to contend in 2011 (most notably the Ubaldo Jimenez trade, which finally paid belated dividends and looks less of a disaster than anticipated as the prospects sent to Colorado have fallen upon hard times) and last season’s off-season spending spree that netted Nick Swisher and Michael Bourn. The only free agent signing that is a lock for the 2014 roster is David Murphy, and a second LOOGY for the pen (Josh Outman) is the only trade acquisition that figures to make an impact. Meanwhile, two starters (Jimenez and Scott Kazmir) signed multi-year deals with other clubs.
This is not to say that these were not the right moves--I tend to think that the Indians need to make significant headway in player development before they can consider themselves a perennial contender given their financial resources--but they suggest that the front office doesn’t buy its own rhetoric about the team’s competitive prospects in 2014. I don’t think I would have wanted to sign Jimenez and Kazmir to those contracts either, but it’s hard to say that the Indians roster looks better today than it did on October 2.
Starting pitching remains a big question mark for the Tribe. Justin Masterson was good again last year, but he’s yet to put together consecutive effective seasons, and his struggles with lefties and sinker/slider reliance make him a disquieting pitcher in which to place one’s trust. He also is the owner of a big arbitration raise and a potential trade deadline casualty should Cleveland fail to contend. Danny Salazar was brilliant in the minors and his ten major league starts last year, and appears to have all the necessary stuff to be a top of the rotation arm, but is building off of a base of just 149 total innings pitched. Corey Kluber was a pleasant surprise, a reliable middle of the rotation arm earning cult status in the internet baseball community (or maybe just one extremely loyal devotee; the strength of the Kluber movement is unclear).
Zach McAllister represents a fourth righty with a rotation spot essentially sewn up, but his 3.75 ERA exaggerates his effectiveness (4.54 RA, 4.88 eRA), and he does not strike me as a pitcher with significant untapped potential. There are a variety of contenders for the fifth spot, from washed up vets (Aaron Harang and Shaun Marcum) to former top prospect (Carlos Carrasco) to soon-to-be former top prospect (Trevor Bauer) to junkballing righty (Josh Tomlin). The smart money is on Carrasco; the organization has remained positive about his potential and he is out of options. If he does not win the job, expect him to get a bullpen spot. Bauer will probably be given more time to work through his mechanical issues in AAA, while Marcum is unlikely to be ready for Opening Day, and Harang appears to be emergency depth. Tomlin is the pitcher most likely to give Carrasco a run for his money for the spot.
The bullpen will take on a very different look than it did on Opening Day 2013, as the seventh, eighth, and ninth inning options are all in very different places. The bullpen shakeup started with injury and legal trouble for Chris Perez, who was utilized as the closer when available but fell out of favor with the brass, leading to an unceremonious non-tender. Vinnie Pestano, who was extremely effective in the setup role in 2011-12, struggled all season, spending much of it in Columbus. Joe Smith was solid and moved up from the seventh to eighth inning, but signed a multi-year deal with the Angels.
In their stead, the Indians have returned to their standard closer strategy: someone else’s reject, often with questionable stuff or command. John Axford represents the latter, and sad to say one of those 30 save seasons with a poor RA appears to be the optimum outcome. Cody Allen will serve as one of the top set up man, provided his arm doesn’t fall off--Terry Francona’s usage of Allen was bizarre last year, as even as he moved up the pecking order in trust, he was not held back from more mundane situations, leading to a largely unnecessary 77 appearances (second in the AL). He’ll be joined by Bryan Shaw, who earned Francona’s trust with a 2.84 eRA, 9.0 KG, and 3.5 WG over 75 innings.
The other two sure things for the pen are the LOOGYs. Marc Rzepczynski pitched very well after being plucked from St. Louis’ bullpen excess, and Josh Outman was picked up from Colorado in exchange for Drew Stubbs. Neither are particularly exciting in terms of potential to outgrow their roles, although both have enough starting pedigree to warrant long relief duty.
That leaves two spots open. I expect one of them to go to Pestano, who has the highest upside of any of the candidates by far. Other options include Nick Hagadone, who’d be a third lefty and could provide long-ish relief; a few righties who had cups of coffee in 2013 in CC Lee, Preston Guilmet, and Blake Wood; lefty Scott Barnes and Colt Hynes; and non-roster invitees including David Aardsma, Scott Atchison, Matt Capps, JC Ramirez, and Mike Zagurski. Capps would have been a pretty good guess given the team’s propensity to give the seventh spot to a veteran reclamantion project, but a sore shoulder will hold him back. My guess here is Wood, who had was serviceable for the Royals over 120 inning in 2010-11 but missed all of 2012 and spent 2013 working his way through the Indians farm save two September appearances in the bigs. Wood has walked too many batters throughout his career to be reliable, but his stuff and relative youth (28) will make him an attractive flier.
To talk about the composition of the roster as it relates to position players, the third base situation has to take center stage. Former first-round pick Lonnie Chisenhall has never really hit that well at any level (94 OPS+ over 682 PA in the majors, 821 career minor league OPS) which has made alternatives attractive, including moving Carlos Santana to third. Santana was not serving as the regular catcher down the stretch, as Yan Gomes superior defense and surprising offensive showing freed Santana up to DH. If Santana is passable at third, I expect him to play there, a sentiment which surprises me. When the Indians started talking about this over the winter, I sort of laughed it off, but I now am convinced of their sincerity.
Gomes will be the catcher and is a prime candidate for offensive regression. Nick Swisher is slated as a full-time first baseman, but can play right if need be and while he held his own offensively with 3 RAA, the Indians had hoped for more in the first year of his four-year deal. Second baseman Jason Kipnis was not nearly as good in the second half as he was in the first (149 OPS+ to 103), but the season as a whole was worthy of down-ballot MVP consideration. Asdrubal Cabrera had a disappointing year (just 2 RAA and a .297 OBA), but the only thing that would jeopardize his position at shortstop is a deadline selloff.
Left field goes to Michael Brantley and his shiny new contract extension; I had been a skeptic of Brantley’s value, but he provided average production even for a corner spot last year, shedding concerns that he might be an over-extended fourth outfielder. If Swisher was a minor disappointment, Michael Bourn was a major disappointment; the center fielder had an OBA of just .322, created 4 runs/game, and missed 32 games due to injury. Right field figures to be a platoon of Ryan Raburn (right) and David Murphy (left) with Drew Stubbs shipped to Colorado. Raburn made a deal with the devil last year (.272/.357/.543 in 277 PA) and Murphy had a miserable year (.214/.273/.363, -5 RAR third lowest among AL regulars); together, more moderate luck might make them a mediocre whole.
The prospect of Santana being able to cut it at third should engender more excitement than it actually does, as Cleveland lacks a good DH candidate to plug in his place. Jason Giambi provided the single most memorable Indians moment since 2007, but his overall season line of .183/.282/.371 and his current age of 43 don’t lie: he has no business getting regular playing time in the majors.
I’ll assume that Chisenhall plays third and Santana is the regular DH, which still leaves a big roster question: will the Indians carry a third catcher or just use Santana as Gomes’ caddy? Given that the options are veterans Matt Treanor and Luke Carlin or minor league non-hitter Roberto Perez (623 OPS in 2013, 685 career), I’m guessing no, leaving three bench spots. One clearly belongs to utility man Mike Aviles, who is fine in that role but got too many starts (87) in 2013. Brantley can play a passable center, which means that Raburn/Murphy is fine as fourth outfielder, which does leave room for Jason Giambi. The Indians also could consider Jeff Francoeur (just say no), Nyjer Morgan, or Matt Carson as fourth outfielders. Elliott Johnson could snag a utility man spot, while September pinch-runner Jose Ramirez will be farmed out for everyday playing time. It’s conceivable that a hot spring from Bryan LaHair could put him in the mix for DH at bats as well. Frankly, my guess on the composition of the bench is no better than drawing names out of a hat.
I don’t think this Indians team is well-positioned to repeat their success of 2013. The rotation isn’t particularly strong, the bullpen is unproven; the offense should be fine, but I see no reason to expect it will be any better than it was in 2013. The Tigers remain the class of the division; they have never been as invulnerable as mainstream thought would have them, but the Royals are lurking around and while it may be completely irrational, I always am wary of a White Sox miracle. And as far as wildcard prospects go, the only AL teams I’d write off completely play in the Central except for Houston. I see the 50th percentile scenario for the Tribe as 80-82, third place.
1. 8 Michael Bourn
2. 3 Nick Swisher
3. 4 Jason Kipnis
4. D Carlos Santana
5. 7 Michael Brantley
6. 9 David Murphy
7. 6 Asdrubal Cabrera
8. 2 Yan Gomes
9. 5 Lonnie Chisenhall
Bench: IF Mike Aviles, IF Elliott Johnson, OF Ryan Raburn, DH Jason Giambi
1. Justin Masterson (R)
2. Danny Salazar (R)
3. Corey Kluber (R)
4. Zach McAllister (R)
5. Carlos Carrasco (R)
RP: Blake Wood (R)
RP: Vinnie Pestano (R)
LOOGY: Josh Outman (L)
LOOGY: Mark Rzepczynski (L)
SU: Bryan Shaw (R)
SU: Cody Allen (R)
CL: John Axford (R)
Saturday, March 01, 2014
Writing an Indians preview is something of an annual tradition here, and I see no reason to stop now. My interest at this point really lies in amusing myself by seeing if I can correctly predict the 25-man roster. There are a lot of places on the internet where you can find micro-analysis of teams and players; it’s not my strength as an analyst, and so for the most part any such analysis here will be perfunctory by design.
Monday, February 10, 2014
2014 will mark Greg Beals’ fourth season at the helm of the OSU baseball program, which means there are no excuses. He has had plenty of time to recruit his own players into the program and have them playing key roles, and perhaps more importantly he has had plenty of time in which to mold them to play baseball as Greg Beals believes it should be played. Unfortunately for the state of the program, 2013 was not encouraging on either front. One of the strengths that allowed OSU to contend in the Big Ten last season was a strong trio of weekend starters, a group that must now be replaced en masse. It is now time for Beals’ recruits to lead the offense, and so far the most noticeable characteristic of a Beals’ offense is baserunning that would make a JV high school coach blush.
One spot at which the Buckeyes are well-positioned is catcher, where junior Aaron Gretz will finally have the job to himself. Gretz, for whatever reason, was never able to fully overcome Greg Solomon in the eyes of Beals, despite being a better defensive catcher and possessing any sort of batting eye at all. Gretz figures to be an above-average Big Ten catcher, and is coming off a busy offseason in which the Minnesota native spent several weeks as a backup goalie for the OSU hockey team thanks to a transfer and an injury to the two scholarship goalies.
At first base, sophomore Zach Ratcliff figures to get the nod after emerging as one of OSU’s few power threats mid-way through the Big Ten season. The other option is redshirt junior Josh Dezse, who had to sit out 2013 with a back injury. Dezse will also pull double-duty on the mound, and so it stands to reason that he will act as the primary DH to save unnecessary wear and tear. Sophomore Troy Kuhn will start at second base after serving as the utility infielder in his freshman campaign. Kuhn got off to hot start collecting base hits, but showed little in secondary skills as a freshman, particularly in the power department. His choice of walkup music (“Who Let the Dogs Out”) may have been the most disappointing element of his campaign.
The first crack at the shortstop job will go to sophomore Craig Nennig, but there have to be serious questions about his ability to hold the job after hitting just .125/.143/.146 in 52 PA as a freshman, all against non-conference foes. At third base, sophomore Jacob Bosiokovic will be counted on to be a key cog in the offense. Bosiokovic showed flashes of potential, hitting for big power when he first got into the lineup, but his overall season line of .273/.327/.369 must improve greatly for the Buckeye offense to run as expected.
In the outfield, senior left fielder Tim Wetzel will look to rebound as an on-base guy and basestealing threat after a dismal junior season that is the outlier in his three-year track record (.215/.292/.304). Wetzel could also slide back over to center field in the event that true freshman Troy Montgomery stumbles. The Indians freshman is said to be a prototypical leadoff-hitting center fielder and will get a chance to assume both roles early in his career. In right field, junior Pat Porter was OSU’s only consistent offensive threat last year, building on a solid freshman campaign, and along with Bosiokovic and Dezse will be counted on to man the middle part of the lineup.
With so many youngsters graduating to the starting lineup, the bench will be unproven. The backup catcher figures to be junior Connor Sabanosh, a junior college transfer from Arizona (Arizona JUCOs have continued to be a key pipeline for Beals). Freshman Jalen Washington is listed in the old Rico Washington role (C/IF), and will be an option at the infield spots as well as behind the plate.
The utility infielder should be redshirt sophomore Ryan Leffel, who showed promise as a fielder and with his approach at the plate as a freshman in 2012 before sitting out 2013 with an injury. Leffel may be the backup plan at shortstop if Nennig cannot hit enough to retain the job. Other infielders on the roster are redshirt sophomore Nick Sergakis (a transfer from Coastal Carolina) and true freshman Curtiss Irving (first baseman) and JP Sorma. The two key outfield reserves are sophomore Jake Brobst, who got limited playing time as a freshman and can play center, and true freshman Ronnie Dawson, a powerful hitter from the Columbus area who may challenge Wetzel for his left field playing time.
For most of the offensive positions that need to be replaced, the likely starters are pretty clear based on last year’s roster utilization. The same is not true for the starting pitchers. Only one of the likely top three started a significant number of games in 2013, so uncertainty abounds.
The three most likely weekend starters in this observer’s opinion are senior Greg Greve, junior Ryan Riga and sophomore Jake Post. Greve started in 2011-12, but worked exclusively in relief last year, pitching solidly (3.65 RA). His previous performance as a starter was poor (ERA over 5 in each of 2011-12), so his spot is far from a lock. Riga was OSU’s top lefty reliever last year, also working in long relief, and was highly effective with 7.4 strikeouts and 1.7 walks per nine over 46 innings of work. Riga is probably the safest bet to actually be in OSU’s weekend plans. Post showed that he has good stuff last year in starting seven midweek games, but the results (7.63 RA in 31 innings) did not match his peripherals (7.6 K/2.6 W).
The rotation wildcard is Dezse, who was slated to move into a starting role in 2013 after serving as closer in 2011-12. Dezse has an electric fastball, but didn’t show much in the secondary offerings department when closing (and also didn’t have the lockdown numbers to back up his reputation). Given his injury issue, he will be worked back into the fold slowly, and (based on absolutely no inside information) I would be surprised if he ended as a starter. That leaves the other top options as a pair of true freshman. Lefty Zach Farmer has received rave reviews and if half of what has been said is true, he’ll be in the rotation by the time Big Ten play opens. Right hander Travis Larkins would be next in the pecking order.
Should Dezse wind up in the bullpen, he’ll join with junior Trace Dempsey to give the Bucks two big right-handed arms. Dempsey was untouchable for much of last season with his three-quarters movement-heavy stuff, but faltered a bit in key games against Indiana and in the Big Ten Tournament. Still, a 1.50 RA in 35 innings is a sterling campaign.
Behind Dempsey, the pen also will have key parts to replace with Riga and Greve starting and the graduation of key right-handers Brett McKinney and David Fathalikhani. The only other returning reliever who pitched significantly is senior Tyler Giannonatti, who mopped up last year and probably won’t be used in high leverage innings in 2013. Sophomore three-quarters lefty Matt Panek took a redshirt for injury last year after pitching sparingly as a freshman, but given Beals’ propensity for playing matchups, a healthy Panek should see the mound.
According to Big Ten baseball guru Chris Webb, the other most likely relievers for key innings are redshirt freshman sidearmer Michael Koltak, true freshman Adam Niemeyer, true freshman lefty Tanner Tully, and redshirt freshman Shea Murray. Other pitchers on the roster include junior lefty Michael Horesjei, redshirt freshman lefty Joe Stoll, and true freshman right-handers Kyle Michalik, Brennan Milby, and Yianni Pavlopoulus.
OSU opens its season this weekend with four games in Port Charlotte, Florida (UConn, Auburn, and two with Indiana State). The following weekend they will back in the Sunshine State to play Central Florida, The Citadel, and Oklahoma in Orlando. The first weekend in March will see the Bucks in Greenville, North Carolina to face Pitt, Western Kentucky, and East Carolina. The destination portion of the schedule ends the following week with a three game series at Oregon followed by a single contest at Oregon State.
The home opener is slated for March 14 with a three-game series against Siena, then midweek games against Akron and Xavier. Other midweek opponents throughout the season will be Marshall, Ohio University, Toledo, Dayton, at West Virginia, Ball State (Beals’ former employer), at Louisville, Miami, and Cincinnati.
Big Ten play opens March 21 with OSU at Michigan State. The weekends for the rest of the season will be hosting Indiana, at Nebraska, hosting Penn State, a bye week in which OSU will host Murray State, at Purdue, hosting Iowa, at Michigan, and hosting Northwestern. Should the Bucks qualify for the Big Ten Tournament by finishing in the top six, they will head to Omaha on May 21.
Will the Buckeyes get to Omaha? With Indiana breaking a twenty-nine year drought of Big Ten participation in the College World Series, this question might carry a different connotation than in the past. But given the current status of the program, the question should be limited to qualifying for the Big Ten Tournament. The Buckeyes have too many question marks to be considered a favorite in the conference race, and given that it’s year four of Beals’ tenure, I’m hoping that I am very wrong.
1. CF Troy Montgomery (FM)
2. 2B Troy Kuhn (SM)
3. RF Pat Porter (JR)
4. DH Jose Dezse (JR)
5. 3B Jacob Bosiokovic (SM)
6. 1B Zach Ratcliff (SM)
7. C Aaron Gretz (JR)
8. LF Tim Wetzel (SR)
9. SS Craig Nennig (SM)
SP #1: R Greg Greve (SR)
SP #2: L Ryan Riga (JR)
SP #3: R Jake Post (SM)
SP #4 (midweek): L Zach Farmer (FM)
SP #5 (midweek): R Travis Larkins (FM)
RP: R Tyler Giannonatti (SR)
RP: R Michael Koltak (FM)
RP: L Matt Panek (SM)
RP: R Trace Dempsey (JR)
CL: R Josh Dezse (JR)
Tuesday, January 28, 2014
A couple of caveats apply to everything that follows in this post. The first is that there are no park adjustments anywhere. There's obviously a difference between scoring 5 runs at Petco and scoring 5 runs at Coors, but if you're using discrete data there's not much that can be done about it unless you want to use a different distribution for every possible context. Similarly, it's necessary to acknowledge that games do not always consist of nine innings; again, it's tough to do anything about this while maintaining your sanity.
All of the conversions of runs to wins are based only on 2013 data. Ideally, I would use an appropriate distribution for runs per game based on average R/G, but I've taken the lazy way out and used the empirical data for 2013 only. (I have a methodology I could use to do estimate win probabilities at each level of scoring that take context into account, but I’ve not been able to finish the full write-up it needs on this blog before I am comfortable using it without explanation).
The first breakout is record in blowouts versus non-blowouts. I define a blowout as a margin of five or more runs. This is not really a satisfactory definition of a blowout, as many five-run games are quite competitive--"blowout” is just a convenient label to use, and expresses the point succinctly. I use these two categories with wide ranges rather than more narrow groupings like one-run games because the frequency and results of one-run games are highly biased by the home field advantage. Drawing the focus back a little allows us to identify close games and not so close games with a margin built in to allow a greater chance of capturing the true nature of the game in question rather than a disguised situational effect.
In 2013, 74.7% of major league games were non-blowouts while the complement, 25.3%, were. Team record in non-blowouts:
And in blowouts:
Teams sorted by difference between blowout and non-blowout W%, as well as the percentage of blowouts for each team:
Baltimore is one of the teams that interest me here; their unbelievable one-run record in 2012 was well-documented, and so it shouldn’t surprise that the Orioles ranked second in the majors in 2012 in non-blowout W% but were just over .500 in non-blowouts (23-21). In 2013, Baltimore just quit playing in blowouts, with only 15% of their games decided by five or more runs (only the White Sox at 17% joined them under 20% blowouts), but when they did they had a 14-11 record. Boston had the largest W% differential between blowouts and non-blowouts and were also the best team in the majors per most result-based perspectives.
A more interesting way to consider game-level results is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:
The “marg” column shows the marginal W% for each additional run scored. In 2013, the second run was the marginally most valuable while the fourth was the cutoff point between winning and losing.
I use these figures to calculate a measure I call game Offensive W% (or Defensive W% as the case may be), which was suggested by Bill James in an old Abstract. It is a crude way to use each team’s actual runs per game distribution to estimate what their W% should have been by using the overall empirical W% by runs scored for the majors in the particular season.
A theoretical distribution would be much preferable to the empirical distribution for this exercise, but as I mentioned earlier I haven’t yet gotten around to writing up the requisite methodological explanation, so I’ve defaulted to the 2013 empirical data. Some of the drawbacks of this approach are:
1. The empirical distribution is subject to sample size fluctuations. In 2013, teams that scored 7 runs won 85.8% of the time while teams that scored 8 runs won 83.2% of the time. Does that mean that scoring 7 runs is preferable to scoring 8 runs? Of course not--it's a quirk in the data. Additionally, the marginal values don’t necessary make sense even when W% increases from one runs scored level to another (In figuring the gEW% family of measures below, I lumped all games with 7 and 8 runs scored/allowed into one bucket, which smoothes any illogical jumps in the win function, but leaves the inconsistent marginal values unaddressed and fails to make any differentiation between scoring 7 and 8. The values actually used are displayed in the “use” column, and the “invuse” column is the complements of these figures--i.e. those used to credit wins to the defense. I've used 1.0 for 12+ runs, which is a horrible idea theoretically. In 2013, teams were 102-0 when scoring 12 or more runs).
2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored.
3. Related to #2 (really its root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce more quirks into the data.
I will use my theoretical distribution (Enby, which you can read about here) for a few charts in this post. The first is a comparison of the frequency of scoring X runs in the majors to what would be expected given the overall major league average of 4.166 R/G (Enby distribution parameters are r = 3.922, B = 1.07, z = .0649):
Enby generally does a decent job of estimating the actual scoring distribution, and while I am certainly not an unbiased observer, I think it does so here as well.
I will not go into the full details of how gOW%, gDW%, and gEW% (which combines both into one measure of team quality) are calculated in this post, but full details were provided here. The “use” column here is the coefficient applied to each game to calculate gOW% while the “invuse” is the coefficient used for gDW%. For comparison, I have looked at OW%, DW%, and EW% (Pythagenpat record) for each team; none of these have been adjusted for park to maintain consistency with the g-family of measures which are not park-adjusted.
For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate):
Positive: CHA, MIL, CHN, BAL, MIA, PIT, MIN
Negative: BOS, OAK, STL, TEX, CLE
There were an abnormally high number of teams this season whose gOW% diverged significantly from their standard OW%; as you’ll see in a moment, the opposite was true for gDW%. The White Sox gOW% of .467 was 3.5 games better than their OW% of .445. Their gOW% was seventh-lowest in the majors, but their OW% was second-worst. So while their offense was still bad, they wound up distributing their runs in a manner that should have resulted in more wins than one would expect from their R/G average.
As such, Chicago makes for an interesting case study in how a measly 3.69 runs/game can be doled out more efficiently. The black line is Chicago’s actual 2013 run distribution, the blue line is Enby’s estimate for a team averaging 3.691 R/G (r = 3.662, B = 1.018, z = .0853), and the red line is that of the majors as a whole (Chicago did not actually score more than twelve runs in a game this season, but fifteen is the standard I’ve always used in these graphs):
Chicago scored 3, 4, and 5 runs significantly more often than Enby would expect and more often that the major league average despite having a poor offense. 3-5 runs is a good spot to be in, at least in the current scoring environment--in 2013, teams won 54% of the time when scoring 3-5.
I deliberately wrote the preceding paragraph to be a little misleading--Chicago's propensity to score 3-5 runs was not really a positive, since it meant fewer games in which they scored more than five runs. The White Sox were shutout more often than the major league average (8% to 6.8%), scored < 2 runs more often than average (19.1% to 18%), but scored < 3 runs less often than average (50.6% to 47.8%). That is the only step at which Chicago was above average, and they quickly fell into well below average territory--Chicago scored < 6 runs 82% of the time versus the average of 71.9%:
Teams with differences of +/- 2 wins between gDW% and standard DW%:
Negative: ATL, TEX, OAK
The 3.7 win discrepancy between Atlanta’s gDW% (.570) and standard DW% (.592) was the largest such difference for any unit in the majors (greater than Chicago’s gOW% difference). The Braves were the only team which did not allow eleven or more runs in a game; the average was 3.4% and only Oakland (one) and St. Louis (two) had fewer than three such games. Avoiding those disaster games helped keep their RA/G low, but the Braves allowed four and five runs more often than both the Enby expectation for a team allowing 3.383 runs per game (r = 3.478, B = .983, z = .1023) would predict and the major league average:
Teams with differences of +/- 2 wins between gEW% and EW% (standard Pythagenpat):
Positive: SEA, CHA, PHI, PIT, MIN, CHN
Negative: OAK, TEX, STL, ATL, BOS, CLE, CIN
The negative list includes all playoff teams which obviously were not too badly hampered by seemingly inefficient run distributions. Standard Pythagenpat had a freakishly good year predicting actual W% in 2013, with a RMSE of 3.66 while gEW% had a 3.95 RMSE. gEW% does not incorporate any knowledge about the joint distribution of runs scored and allowed; if you do that, you may as well just look at actual win-loss record. But since it doesn’t have knowledge of the joint distribution, it’s quite possible for standard EW% to perform better as a predictor.
For now most of the applications of this methodology, at least in my writings, have been freak show in nature. The more interesting questions will be easier to investigate once I’ve finished my update of the Enby methodology. Do certain types of offenses tend to bunch their runs more efficiently? Can the estimate of variance of runs scored (which is really the key assumption underpinning Enby) be improved by considering team characteristics? How well do efficient or non-efficient distributions by teams predict team performance in future years? I don’t mean to imply that others have not investigated these questions, simply that I hope to have more interesting material in these year-end reviews starting in 2014. I said that last year too though.
Tuesday, January 14, 2014
For the last few years I have published a set of team ratings that I call "Crude Team Ratings". The name was chosen to reflect the nature of the ratings--they have a number of limitations, of which I documented several when I introduced the methodology.
I explain how CTR is figured in the linked post, but in short:
1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.
2) Figure the average win ratio of the team’s opponents.
3) Adjust for strength of schedule, resulting in a new set of ratings.
4) Begin the process again. Repeat until the ratings stabilize.
First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS:
This was a banner year for those of us who prefer the best teams to make it through the playoffs, as the pennant winners ranked one-two in MLB. The ten playoff teams were also the ten that had the most impressive win-loss records, with the exception of #9 Texas, but of course they had a shot in the one game playoff. Also, the Rangers were still only fifth in the AL so it’s not as if their schedule unfairly kept them out. What is a departure from recent seasons is that no other also-ran AL teams finished with higher ratings than the NL playoff qualifiers. Still, the AL dominated the top spots again as can be seen by the fact that only St. Louis snuck into the top five.
Below are the mean ratings for each league and division, actually calculated as the geometric rather than arithmetic mean:
Last year, the AL-NL gap was 112-89, and if you count Houston with the NL it was 106-88 in 2013. In any event, the AL remains the stronger league based on the interleague results (which is what underpins any differences in these rankings), with an implied W% of .521 against the NL.
Speaking of Houston, they actually ticked up a bit in CTR, from 46 to 48. While I wouldn’t claim that is a meaningful difference, it does indicate that their four win drop is largely a function of opponent quality, moving from the 21st most difficult schedule in 2012 to 4th in 2013. They also provide a good opportunity to point out that the schedule rankings are dependent on the quality of the team in question--Houston's schedule was tougher than that of their divisional opponents because they did not get the benefit of playing nineteen games against Houston.
Schedule can make a big difference when comparing two teams across leagues, in a tough and weak division--naturally, the largest schedule disparity is between the winner of the weakest division (NL East) and cellar dweller of the strongest (AL East). In the actual tallies, Atlanta was 96-66 and Toronto was 74-88. However, the ratings (as indicated by aW%) suggest that Atlanta was equivalent to a 92-70 team and Toronto to 78-84, an eight game swing in a head-to-head comparison. Atlanta’s SOS of 90 and Toronto’s of 112 implies that Toronto’s average opponent would have a .554 W% against Atlanta’s average opponent--comparable in 2013 CTR terms to the Dodgers or Rangers.
I will present the rest of the ratings with minimal comment. The next set is based on Pythagenpat record from R/RA:
Next is based on gEW%, which is explained in this post--some of the other exhibits for the annual post on that metric are a little more involved so I’m running these ratings first. The basic idea of gEW% is to take into account (separately) the distribution of runs scored and runs allowed per game for each team rather than simply using season totals as in Pythagenpat:
And finally, based on Runs Created and Allowed run through Pythagenpat:
These ratings are based on regular season data only, but one could also choose to include playoff results in the mix. Regardless of what your thoughts may be on the value of considering playoff data, it is most commonly omitted simply because of the way statistics are presented. It usually takes extra effort to combine regular season and playoff data.
So I decided to run the win-loss based ratings with playoff records and schedules included, and to see how large a difference it would create in the results. I was a little surprised by the results:
It’s not a surprise of course that Boston strengthened its rating--the Red Sox went 11-5 against very good competition. What did surprise me was that the only other playoff team to have a noticeable change in rating was Atlanta. Their 1-3 record against the Dodgers pushed their rating down by four points. Much of the movement in ratings for the other teams was felt by non-playoff teams whose SOS numbers fluctuated, in particular the AL East in which each team gained a point, and the NL in general, whose collective rating was pushed further down.
An angle that could make the playoff-inclusive ratings more interesting would be if I included regression in the ratings, which I do not. My reasoning is that I intend the ratings to be a reflection of the actual results of the season rather than an attempt to measure true quality of the teams. Additionally, regression would have little impact on the rank order of teams--it would mostly serve to compress the variance of the ratings. On the other hand, even if one wants to use the actual record of a team untouched to establish its rating, the case can be made that its opponents’ records should still be regressed, to avoid overcompensating for strength of schedule in ratings. Some purveyors of team ratings in other sports take a similar approach in basing calculations of opponent strength on those teams’ point-based rankings, but still base each team’s own rating on their actual wins and losses.
Again, though, these ratings are advertised as crude and are clearly only intended to be used in viewing 2013 retrospectively, so I’ve not bothered with regression here. I do use regression on the rare occasions when I use the CTRs to give crude estimates of win probabilities (such as playoff odds).
Monday, December 30, 2013
Since I have a crude rating system set up to evaluate MLB teams that relies on win ratio and identity of opponents and thus can be adapted to any number of sports, I see no reason not to apply it to the lesser NFL once a year. Since I am only a casual follower of the NFL, I will endeavor to avoid excessive comment on the results.
As a brief overview, the ratings are based on win ratio for the season, adjusted over the course of several iterations for opponent’s win ratio. They know nothing about injuries, about where games were played, about the distribution of points from game to game; nothing beyond the win ratio of all of the teams in the league and each team’s opponents. The final result is presented in a format that can be directly plugged into Log5. I call them “Crude Team Ratings” to avoid overselling them, but they tend to match the results from systems that are not undersold fairly decently.
First are ratings based on actual wins and losses. 12.2 games of regression are included when figuring the win ratios (this will apply to the point-based ratings as well). CTR is the bottom line rating, aW% converts it to an adjusted W%, and SOS is the average CTR of the team’s opponents:
I prefer to focus on the ratings based on points and points allowed, which are coupled with a Pythagorean approach published at Pro-Football Reference to generate the win ratios:
As you can see, the top five teams all hail from the NFC South and West, which unfortunately had a maximum of four playoff spots available, leaving Arizona as the odd team out. Note that despite going 10-6, a raw record that was bettered by nine NFL teams, the Cardinals ranked sixth in win-based rating, so this is not a Pythagorean fluke. Arizona was a legitimately outstanding team based on the actual on-field results in 2013, but will sit home as far lesser teams battle it out thanks to the vagaries of their micro-division.
The Browns are second-to-last either way you figure it; by W-L record the Redskins are worse, but rank 30th by points, and by points the Jaguars are worse, but rank 27th by W-L.
I use the geometric mean of the CTR of each team to calculate division and conference ratings:
The NFC West would rank fourth if it was a team--it was an absurdly strong division, with all of its teams among the top ten. The ratings imply that the composite NFC team would be expected to win about 55.2% of the time against its AFC counterpart.
The ratings can be used to feed playoff odds, naturally; here home field is assumed to be a 32.6% boost to CTR (equivalent to a .570 home W%). I’m not going to bother with the round-by-round breakout of potential matchups as I do for MLB, but here are the overall crude odds:
It’s worth acknowledging that each of the last two Super Bowl champs were longshots by this or any other estimate--last year’s Ravens were given only a 3% chance. Of course, I’d also point out that the probability of any longshot winning (let’s define that as 5% rounded probability or lower) is 20% and was 14% in 2012.
These odds imply a 60% chance that the NFC champ will win the Super Bowl, but also a 95% chance that the NFC champ will be favored by the odds to win the Super Bowl. The AFC’s best team, Denver, would be favored in only two potential Super Bowl matchups, as would...all five other AFC teams. The top four playoff teams in CTR hail from the NFC, the next six from the AFC, and then the winners of the micro-division lottery, Philadelphia and Green Bay. The NFL frequently provides examples of why I dislike tiny divisions, but never as clearly or as destructively as in 2013.
Tuesday, December 17, 2013
Of all the annual repeat posts I write, this is the one which most interests me--I have always been fascinated by patterns of offensive production by fielding position, particularly trends over baseball history and cases in which teams have unusual distributions of offense by position. I also contend that offensive positional adjustments, when carefully crafted and appropriately applied, remain a viable and somewhat more objective competitor to the defensive positional adjustments often in use, although this post does not really address those broad philosophical questions.
The first obvious thing to look at is the positional totals for 2013, with the data coming from Baseball-Reference.com. "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the overall major league average (this is a departure from past posts; I’ll discuss this a little at the end). “LPADJ” is the long-term positional adjustment that I use, based on 2002-2011 data. The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:
In 2012, there was an unusual convergence of overall positional RG for third base, DH, and all three outfield spots. This did not carry over to 2013 as a more typical spread returned to the defensive spectrum. Still, when compared to the long-term averages, there were quirks as usual. Catchers continued their strong performance with a PADJ of 94 after a 97 in 2012. Right fielders went back to their recent trend of solidly outhitting their left field cousins (one of the quirks that one must be cognizant of when attempting to use offensive data to craft positional adjustments). DHs were about as low as they’ve ever been (a 102 in 1985 is the only lower showing), and pitchers rebounded from a historical low of 1 to post a PADJ of 3, which obviously vindicates any continuing resistance to the DH.
That provides a useful segue from which to take a quick look at the performance by team of NL pitchers. I need to stress that the runs created method I’m using here does not take into account sacrifices, which usually is not a big deal but can be significant for pitchers. Note that all team figures from this point forward in the post are park-adjusted. The RAA figures for each position are baselined against the overall major league average RG for the position, except for left field and right field which are pooled. So pitchers as you can see from the chart above are compared to their robust average output of .11 runs per 25.5 outs:
Dodger pitchers led in BA, OBA, and SLG and ran away with the RG lead. Zack Greinke was the standout, hitting a raw .328/.409/.379 over 72 PA thanks to a .396 BABIP. Greinke drew seven walks, as many or more than the pitching collectives of the Padres, Marlins, Cubs, Reds, and Brewers. However, the most remarkable performance is that of Pittsburgh’s pitchers, who trudged through 318 plate appearances without a single extra base hit. In 2012 the Pirates only mustered one double in 304 PA. I assumed last year that the Pirate performance was without precedent, and clearly a .000 ISO has never been topped. San Francisco gave Pittsburgh a run for their money at the bottom of the list with a .099 BA and just one double and one triple.
I don’t run a full chart of the leading positions since you will very easily be able to go down the list and identify the individual primarily responsible for the team’s performance and you won’t be shocked by any of them, but the teams with the highest RAA at each spot were:
C--MIN, 1B--CIN, 2B--NYA, 3B--DET, SS--LA, LF--STL, CF--LAA, RF--WAS, DH--BOS
More interesting are the worst performing positions; the player listed is the one who appeared in the most games at that position for the team:
The Marlins, Blue Jays, and Yankees all land multiple names on the list, but Houston’s centerfielders were the very worst outfit, a hole that has been plugged elegantly by trading for Dexter Fowler. Jeff Mathis was also replaced in Miami by Jarrod Saltalamacchia, and Carlos Beltran should improve the Yankees production at right field and/or DH. Yankee DHs .186 BA was the worst of any non-NL pitcher spot, with Chicago, Toronto, and Miami catchers all posting a .193 mark. Or, to express their futility in another manner, it seems kind of shocking that only twelve team positions were less productive in terms of RG than Yankee DHs.
Teams with unusual profiles of offense by position has been of interest to me in recent years because of the way the Indians have been constructed--often they have gotten good production from positions on the right side of the defensive spectrum while struggling at the more offensively-inclined positions. The easiest way I’ve come up with to express this numerically is the correlation between a team’s RG by position and the long-term positional adjustment (I’ve pooled left and right field but not 1B and DH in this case; pitchers are excluded for all teams and DHs excluded for NL teams, and I’ve broken the lists out by league because of this):
As usual, the Indians had a negative correlation between PADJ and RG, but they were only the seventh-most extreme team in the majors. Seattle is the team which had the highest correlation, as they got little production from catcher and middle infield (2.6 RG from backstops, 3.2 from the keystone positions) while the four corners and DH all created at least 4.5 RG. On the flip side was Minnesota, largely due to the fact that catcher was easily their most productive position with 6.4 RG and their left fielders and DH created 3.3 RG, only better than their shortstops.
Boston and St. Louis won their pennants largely thanks to respectively having the best offense in their leagues, and in a neat coincidence here, they were near the middle of the pack in correlation for their leagues with identical marks of +.44.
The following charts, broken out by division, display RAA for each position, with teams sorted by the sum of positional RAA. Positions with negative RAA are in red, and positions that are +/-20 RAA are bolded:
Atlanta led the NL in corner infield RAA. New York was last in the NL in outfield RAA. Miami had the worst offense in the majors with a remarkable six positions at -20 runs or worse, and the left fielders just missed at -18. Only the Giancarlo Stanton-led right fielders were above average, and their +18 only managed to offset the opposite outfield corner. The whole division struggled with production from centerfield; the division total of -104 RAA from one position was easily the worst in the majors as the next worst division total was -49 from NL Central shortstops.
St. Louis led all of the majors in outfield RAA as they were the only team with two +20 positions in the outfield. Pittsburgh’s McCutchen-led centerfielders had the highest RAA of any position in the NL. Cincinnati’s offense continues to look wobbly post-Choo as only the star led first base, center, and right units were above average. As seen above, Milwaukee had the most unusual distribution of offense by position in the NL, and it’s actually somewhat impressive that they managed to field an average offense despite -37 runs from first base. Chicago had the worst middle infield RAA in the majors and their infield as a whole was awful at -70, with only the disaster in Miami sparing them from finishing last.
Los Angeles middle infielders led the NL in RAA; San Francisco and Arizona tied for the NL lead for total infield RAA. Colorado had the worst corner infield RAA in the NL, which may explain the desire (albeit not the decision) to give Justin Morneau a multi-year deal. This division had the highest total RAA for a position with 59 RAA from their shortstops.
Boston led the majors in total RAA as only their third basemen were below average. Red Sox middle infielders led the majors in RAA. The Yankees finishing with just two above average positions is still jarring; another way to look at their troubles is that they spent $50.5 million on their intended corner infield starters and wound up with the worst corner infield RAA in the majors.
Detroit led the majors in corner infield and overall infield RAA thanks almost solely their third basemen compiling a whopping 71 RAA (all Cabrera has other Tiger third basemen combined for 85 PA with a .222/.341/.306 line). The rest of their offense was far from impressive, though, although it wouldn’t be fair for me to snark too much about it since the 1,000 run talk was non-existent in the spring. The Indians were close to average around the diamond except for catcher and second base (excellent) and third base (bad). Kansas City’s middle infielders were last in the AL in RAA and as the corner infielders were bad as well, the infield’s total RAA was also last in the league. Minnesota had only one above average position and the worst outfield production in the majors. Chicago had just two above average positions, but just barely with a total of 3 RAA between, leading to the lowest team total RAA in the AL.
Angel outfielders led the AL in RAA, which of course is due to the great Mike Trout. Seattle’s offense is still bad, but the last two seasons have moved them past the laughingstock phase and into consistent organization deficiency status. Houston had only one above average position, but at least they have the excuse that they weren’t really trying; what can the Yankees say?
The full spreadsheet is available here.
Tuesday, December 10, 2013
I devoted a whole post to leadoff hitters, whether justified or not, so it's only fair to have a post about hitting by batting order position in general. I certainly consider this piece to be more trivia than sabermetrics, since there’s no analytical content.
The data in this post was taken from Baseball-Reference. The figures are park-adjusted. RC is ERP, including SB and CS, as used in my end of season stat posts. The weights used are constant across lineup positions; there was no attempt to apply specific weights to each position, although they are out there and would certainly make this a little bit more interesting.
NL #3 hitters have now topped all positions in RG for five years running, and again the AL demonstrated balance between #3 and #4 while NL teams got superior performance out of #3 hitters. The other curiosity that stands out to me is that #3 and #4 were the only lineup slots in which the NL had a higher RG. Throw in the fact that the other most celebrated “key” lineup spot (leadoff) was essentially even between the two leagues, and there’s enough fuel to construct some sort of theory (for which there wouldn’t be enough evidence to proceed logically, as if that’s ever stopped anyone before).
During the playoffs I remarked that it seemed like 2013 had been a year in which the notion of batting one’s best hitter #2 had gained traction; when presented with the actual numbers here, I’d be hard pressed to defend that statement. In addition to the overall RG, if this was the case I’d expect to see an uptick in isolated power for #2 hitters. However, AL #2 hitters collective .137 ISO was better only than that of AL #1, #8, and #9 hitters, and the same was true of the NL’s .130.
Next, here are the team leaders in RG at each lineup position. The player listed is the one who appeared in the most games in that spot (which can be misleading, particularly for the bottom the batting order where there is no fixed regular as in the case of the Dodgers #8 spot, or guys who move around the batting order like Jason Castro who takes the blame for Houston’s #3s):
And the worst:
The domination of bad AL lineup spots by just four teams is something I’ve not seen since I’ve been running this report. It’s not that unusual to have one team with several dead spots (Seattle’s hapless offenses pulled this off), but the White Sox, Astros, and Yankees all had multiple such holes. Chicago boasting four such disasters is an impressive feat. Meanwhile, while Ryan Howard hit better than the Phillies collective cleanup hitters, it’s still amusing to see they were the worst unit in the NL.
The next list is the ten best positions in terms of runs above average relative to average for their particular league spot (so leadoff spots are compared to the league average leadoff performance, etc.):
Baltimore’s #5s were significantly more productive than their #3s or #4s (4.4 and 5.4 RG respectively) thanks to Buck Showalter keeping Chris Davis in that spot for much of the season. The only other #5 spot to outhit both the #3s and #4s was Philadelphia (4.5, 4.1, 5.5 RG respectively) on the backs of the Dominic Brown-led performance which paced NL #5s.
The worst positions:
Chicago’s #9 hitters had a lower RG than three groups of NL #9s (LA, COL, and PHI). They were last among AL lineup slots in BA and OBA and just narrowly missed completing the rate stat sweep as NYA #9s slugged .265 (the only other AL lineup slot with a sub-.300 SLG was SEA #9 at .275). While some passage of time in baseball is sad, like Travis Hafner and Adam Dunn-fronted spots landing on this list, it’s comforting to still have Juan Pierre to kick around.
The last set of charts show each team’s RG rank within their league at each lineup spot. The top three are bolded and the bottom three displayed in red to provide quick visual identification of excellent and poor production:
It so happens that each pennant winner sticks out as having fielded a well-balanced, productive lineup--they ranked #1 and #2 in the majors in R/G, so it’s not a surprise, but other than the very bottom of the St. Louis lineup, there were no weak links in either team’s batting order.
The spreadsheet used to generate these figures is here.
Monday, December 02, 2013
This post kicks off a series of posts that I write every year, and therefore struggle to infuse with any sort of new perspective. However, they're a tradition on this blog and hold some general interest, so away we go.
This post looks at the offensive performance of teams' leadoff batters. I will try to make this as clear as possible: the statistics are based on the players that hit in the #1 slot in the batting order, whether they were actually leading off an inning or not. It includes the performance of all players who batted in that spot, including substitutes like pinch-hitters.
Listed in parentheses after a team are all players that started in twenty or more games in the leadoff slot--while you may see a listing like "OAK (Crisp)” this does not mean that the statistic is only based solely on Crisp's performance; it is the total of all Atlanta batters in the #1 spot, of which Crisp was the only one to start in that spot in twenty or more games. I will list the top and bottom three teams in each category (plus the top/bottom team from each league if they don't make the ML top/bottom three); complete data is available in a spreadsheet linked at the end of the article. There are also no park factors applied anywhere in this article.
That's as clear as I can make it, and I hope it will suffice. I always feel obligated to point out that as a sabermetrician, I think that the importance of the batting order is often overstated, and that the best leadoff hitters would generally be the best cleanup hitters, the best #9 hitters, etc. However, since the leadoff spot gets a lot of attention, and teams pay particular attention to the spot, it is instructive to look at how each team fared there.
The conventional wisdom is that the primary job of the leadoff hitter is to get on base, and most simply, score runs. It should go without saying on this blog that runs scored are heavily dependent on the performance of one’s teammates, but when writing on the internet it’s usually best to assume nothing. So let's start by looking at runs scored per 25.5 outs (AB - H + CS):
1. STL (Carpenter/Jay), 7.2
2. CIN (Choo), 6.3
3. BOS (Ellsbury), 5.9
Leadoff average, 4.8
ML average, 4.1
28. PHI (Rollins/Revere/Young/Hernandez), 3.7
29. HOU (Grossman/Villar/Altuve/Barnes), 3.4
30. MIA (Pierre/Yelich/Hechavarria), 3.0
Speaking of getting on base, the other obvious measure to look at is On Base Average. The figures here exclude HB and SF to be directly comparable to earlier versions of this article, but those categories are available in the spreadsheet if you'd like to include them:
1. CIN (Choo), .397
2. STL (Carpenter/Jay), .371
3. MIL (Aoki), .347
4. OAK (Crisp), .346
Leadoff average, .324
ML average, .314
28. NYN (Young), .289
29. MIN (Dozier/Presley/Carroll), .283
30. MIA (Pierre/Yelich/Hechavarria), .278
The next statistic is what I call Runners On Base Average. The genesis for ROBA is the A factor of Base Runs. It measures the number of times a batter reaches base per PA--excluding homers, since a batter that hits a home run never actually runs the bases. It also subtracts caught stealing here because the BsR version I often use does as well, but BsR versions based on initial baserunners rather than final baserunners do not.
My 2009 leadoff post was linked to a Cardinals message board, and this metric was the cause of a lot of confusion (this was mostly because the poster in question was thick-headed as could be, but it's still worth addressing). ROBA, like several other methods that follow, is not really a quality metric, it is a descriptive metric. A high ROBA is a good thing, but it's not necessarily better than a slightly lower ROBA plus a higher home run rate (which would produce a higher OBA and more runs). Listing ROBA is not in any way, shape or form a statement that hitting home runs is bad for a leadoff hitter. It is simply a recognition of the fact that a batter that hits a home run is not a baserunner. Base Runs is an excellent model of offense and ROBA is one of its components, and thus it holds some interest in describing how a team scored its runs, rather than how many it scored:
1. STL (Carpenter/Jay), .352
2. CIN (Choo), .348
3. BOS (Ellsbury), .322
Leadoff average, .294
ML average, .283
28. SEA (Miller/Chavez/Saunders), .260
29. MIA (Pierre/Yelich/Hechavarria), .254
30. MIN (Dozier/Presley/Carroll), .252
The Cardinals move ahead of the Reds here, making up the 26 point gap in standard OBA. Part of this is the obvious – home runs, as Cincinnati leadoff hitters hit 21 to St. Louis’ 11. But another factor is caught stealing, as we’ll see a little later--Reds leadoff hitters were just fifteen for thirty on stolen base attempts, tied for the second most caught stealing. St. Louis leadoff hitters were just three for six on steal attempts--no other team had fewer than ten stolen bases and only Kansas City had as few caught stealing (albeit with 15 SB), so the Cardinals easily had the fewest attempts (Detroit was next with fourteen).
I will also include what I've called Literal OBA here--this is just ROBA with HR subtracted from the denominator so that a homer does not lower LOBA, it simply has no effect. You don't really need ROBA and LOBA (or either, for that matter), but this might save some poor message board out there twenty posts, by not implying that I think home runs are bad, so here goes. LOBA = (H + W - HR - CS)/(AB + W - HR):
1. CIN (Choo), .358
2. STL (Carpenter/Jay), .358
3. BOS (Ellsbury), .327
Leadoff average, .300
ML average, .290
28. SEA (Miller/Chavez/Saunders), .268
29. MIN (Dozier/Presley/Carroll), .257
30. MIA (Pierre/Yelich/Hechavarria), .257
There is a high degree of repetition for the various OBA lists, which shouldn’t come as a surprise since they are just minor variations on each other.
The next two categories are most definitely categories of shape, not value. The first is the ratio of runs scored to RBI. Leadoff hitters as a group score many more runs than they drive in, partly due to their skills and partly due to lineup dynamics. Those with low ratios don’t fit the traditional leadoff profile as closely as those with high ratios (at least in the way their seasons played out):
1. MIA (Pierre/Yelich/Hechavarria), 2.2
2. MIL (Aoki), 2.1
3. PIT (Marte/Tabata), 2.1
7. TB (Jennings/Joyce/DeJesus), 1.9
Leadoff average, 1.6
27. CHN (DeJesus/Castro/Valbeuna), 1.3
28. MIN (Dozier/Presley/Carroll), 1.2
29. KC (Gordon), 1.2
30. TEX (Kinsler/Andrus/Martin), 1.1
ML average, 1.1
Again, this is not a quality list, as indicated by the mix of good and bad OBAs among the leaders and trailers. This is also a good interlude at which to remind you that the players listed are those who started twenty or more games in the leadoff spot for their teams and they are not solely responsible for the overall performance of the team’s leadoff hitters. David DeJesus lead off 66 games for the Cubs and 20 for the Rays and thus finds himself as part of both the leaders and trailers list here.
A similar gauge, but one that doesn't rely on the teammate-dependent R and RBI totals, is Bill James' Run Element Ratio. RER was described by James as the ratio between those things that were especially helpful at the beginning of an inning (walks and stolen bases) to those that were especially helpful at the end of an inning (extra bases). It is a ratio of "setup" events to "cleanup" events. Singles aren't included because they often function in both roles.
Of course, there are RBI walks and doubles are a great way to start an inning, but RER classifies events based on when they have the highest relative value, at least from a simple analysis:
1. NYN (Young), 1.7
2. HOU (Grossman/Villar/Altuve/Barnes), 1.7
3. MIL (Aoki), 1.4
Leadoff average, 1.0
ML average, .7
27. PIT (Marte/Tabata), .7
28. DET (Jackson/Dirks), .7
29. LAA (Shuck/Aybar/Bourjos), .7
30. SEA (Miller/Chavez/Saunders), .6
Since stealing bases is part of the traditional skill set for a leadoff hitter, I've included the ranking for what some analysts call net steals, SB - 2*CS. I'm not going to worry about the precise breakeven rate, which is probably closer to 75% than 67%, but is also variable based on situation. The ML and leadoff averages in this case are per team lineup slot:
1. BOS (Ellsbury), 47
2. NYN (Young), 27
3. BAL (McLouth/Markakis), 18
Leadoff average, 5
ML average, 3
28. CIN (Choo), -10
29. ARI (Prado/Pollock/Eaton), -11
29. HOU (Grossman/Villar/Altuve/Barnes), -11
Since 2007, the percentage of major league stolen base attempts from leadoff hitters has declined (2007 is an arbitrary endpoint due to it being the first year I have the data at my finger tips):
30.2%, 29.6%, 27.8%, 25.9%, 27.9%, 25.1%, 25.9%
Leadoff hitters should have a disproportionate share of stolen base attempts for three obvious reasons:
1. they by definition get the most plate appearances of any lineup slot, creating more opportunities to get on base
2. as a group, they usually have above-average OBAs more heavily tied up in singles and walks, creating more good opportunities to steal bases
3. managers still tend to strongly consider speed when choosing a leadoff hitter
While #1 is an unalterable truth and #2 is generally supported by sabermetric orthodoxy, #3 is a factor which may decline in importance in a more sabermetrically-minded game. The percentage of steal attempts from leadoff hitters is something I’ll be keeping an eye on in future seasons as an imperfect indicator of shifting reasoning.
Let's shift gears back to quality measures, beginning with one that David Smyth proposed when I first wrote this annual leadoff review. Since the optimal weight for OBA in a x*OBA + SLG metric is generally something like 1.7, David suggested figuring 2*OBA + SLG for leadoff hitters, as a way to give a little extra boost to OBA while not distorting things too much, or even suffering an accuracy decline from standard OPS. Since this is a unitless measure anyway, I multiply it by .7 to approximate the standard OPS scale and call it 2OPS:
1. CIN (Choo), 881
2. STL (Carpenter/Jay), 832
3. OAK (Crisp), 795
Leadoff average, 727
ML average, 717
28. MIN (Dozier/Presley/Carroll), 639
29. NYN (Young), 625
30. MIA (Pierre/Yelich/Hechavarria), 607
Along the same lines, one can also evaluate leadoff hitters in the same way I'd go about evaluating any hitter, and just use Runs Created per Game with standard weights (this will include SB and CS, which are ignored by 2OPS):
1. CIN (Choo), 6.4
2. STL (Carpenter/Jay), 5.8
3. BOS (Ellsbury), 5.4
Leadoff average, 4.4
ML average, 4.3
28. HOU (Grossman/Villar/Altuve/Barnes), 3.2
29. MIN (Dozier/Presley/Carroll), 3.2
30. MIA (Pierre/Yelich/Hechavarria), 2.9
It’s kind of sad not having the Mariners offense ranking last in just about everything anymore, but the Marlins leadoff hitters were just part of a valiant effort by Miami to take up the mantle.
Finally, allow me to close with a crude theoretical measure of linear weights supposing that the player always led off an inning (that is, batted in the bases empty, no outs state). There are weights out there (see The Book) for the leadoff slot in its average situation, but this variation is much easier to calculate (although also based on a silly and impossible premise).
The weights I used were based on the 2010 run expectancy table from Baseball Prospectus. Ideally I would have used multiple seasons but this is a seat-of-the-pants metric. The 2010 post goes into the detail of how this measure is figured; this year, I’ll just tell you that the out coefficient was -.216, the CS coefficient was -.583, and for other details refer you to that post. I then restate it per the number of PA for an average leadoff spot (739 in 2013):
1. CIN (Choo), 32
2. STL (Carpenter/Jay), 22
3. BOS (Ellsbury), 19
Leadoff average, 0
ML average, -2
28. HOU (Grossman/Villar/Altuve/Barnes), -20
29. MIN (Dozier/Presley/Carroll), -21
30. MIA (Pierre/Yelich/Hechavarria), -25
A common theme in these rankings has been the turnaround for Cincinnati leadoff hitters, who last year were historically awful. Truly, unbelievably (especially for a playoff team) awful. In 2012, Reds leadoff hitters led by Zack Cozart and Brandon Phillips were last in the majors in R/G (3.8), OBA (.247), ROBA (.224), LOBA (.229), R/BI (2.2), RER (.6), 2OPS (575), and LE (-32). To be fair R/BI and RER are not good/bad categories, but they indicate that the Reds did not fit the traditional leadoff hitter mold.
This year, the Shin-Soo Choo led Reds were tops in R/G, OBA, LOBA, 2OPS, RG, and LE. The bad news is that it was just a one year fix; the good news is that Bryan Price may have a more modern take on leadoff decisions than Dusty Baker. Still, the Reds better have sent Manny Acta a fruit basket for making Choo a “proven” leadoff hitter.
For the full lists and data, see the spreadsheet here.