Walk Like a Sabermetrician: 2013

Monday, December 30, 2013

Crude NFL Ratings, 2013

Since I have a crude rating system set up to evaluate MLB teams that relies on win ratio and identity of opponents and thus can be adapted to any number of sports, I see no reason not to apply it to the lesser NFL once a year. Since I am only a casual follower of the NFL, I will endeavor to avoid excessive comment on the results.

As a brief overview, the ratings are based on win ratio for the season, adjusted over the course of several iterations for opponent’s win ratio. They know nothing about injuries, about where games were played, about the distribution of points from game to game; nothing beyond the win ratio of all of the teams in the league and each team’s opponents. The final result is presented in a format that can be directly plugged into Log5. I call them “Crude Team Ratings” to avoid overselling them, but they tend to match the results from systems that are not undersold fairly decently.

First are ratings based on actual wins and losses. 12.2 games of regression are included when figuring the win ratios (this will apply to the point-based ratings as well). CTR is the bottom line rating, aW% converts it to an adjusted W%, and SOS is the average CTR of the team’s opponents:

I prefer to focus on the ratings based on points and points allowed, which are coupled with a Pythagorean approach published at Pro-Football Reference to generate the win ratios:

As you can see, the top five teams all hail from the NFC South and West, which unfortunately had a maximum of four playoff spots available, leaving Arizona as the odd team out. Note that despite going 10-6, a raw record that was bettered by nine NFL teams, the Cardinals ranked sixth in win-based rating, so this is not a Pythagorean fluke. Arizona was a legitimately outstanding team based on the actual on-field results in 2013, but will sit home as far lesser teams battle it out thanks to the vagaries of their micro-division.

The Browns are second-to-last either way you figure it; by W-L record the Redskins are worse, but rank 30th by points, and by points the Jaguars are worse, but rank 27th by W-L.

I use the geometric mean of the CTR of each team to calculate division and conference ratings:

The NFC West would rank fourth if it was a team--it was an absurdly strong division, with all of its teams among the top ten. The ratings imply that the composite NFC team would be expected to win about 55.2% of the time against its AFC counterpart.

The ratings can be used to feed playoff odds, naturally; here home field is assumed to be a 32.6% boost to CTR (equivalent to a .570 home W%). I’m not going to bother with the round-by-round breakout of potential matchups as I do for MLB, but here are the overall crude odds:

It’s worth acknowledging that each of the last two Super Bowl champs were longshots by this or any other estimate--last year’s Ravens were given only a 3% chance. Of course, I’d also point out that the probability of any longshot winning (let’s define that as 5% rounded probability or lower) is 20% and was 14% in 2012.

These odds imply a 60% chance that the NFC champ will win the Super Bowl, but also a 95% chance that the NFC champ will be favored by the odds to win the Super Bowl. The AFC’s best team, Denver, would be favored in only two potential Super Bowl matchups, as would...all five other AFC teams. The top four playoff teams in CTR hail from the NFC, the next six from the AFC, and then the winners of the micro-division lottery, Philadelphia and Green Bay. The NFL frequently provides examples of why I dislike tiny divisions, but never as clearly or as destructively as in 2013.

Tuesday, December 17, 2013

Hitting by Position, 2013

Of all the annual repeat posts I write, this is the one which most interests me--I have always been fascinated by patterns of offensive production by fielding position, particularly trends over baseball history and cases in which teams have unusual distributions of offense by position. I also contend that offensive positional adjustments, when carefully crafted and appropriately applied, remain a viable and somewhat more objective competitor to the defensive positional adjustments often in use, although this post does not really address those broad philosophical questions.

The first obvious thing to look at is the positional totals for 2013, with the data coming from Baseball-Reference.com. "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the overall major league average (this is a departure from past posts; I’ll discuss this a little at the end). “LPADJ” is the long-term positional adjustment that I use, based on 2002-2011 data. The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:

In 2012, there was an unusual convergence of overall positional RG for third base, DH, and all three outfield spots. This did not carry over to 2013 as a more typical spread returned to the defensive spectrum. Still, when compared to the long-term averages, there were quirks as usual. Catchers continued their strong performance with a PADJ of 94 after a 97 in 2012. Right fielders went back to their recent trend of solidly outhitting their left field cousins (one of the quirks that one must be cognizant of when attempting to use offensive data to craft positional adjustments). DHs were about as low as they’ve ever been (a 102 in 1985 is the only lower showing), and pitchers rebounded from a historical low of 1 to post a PADJ of 3, which obviously vindicates any continuing resistance to the DH.

That provides a useful segue from which to take a quick look at the performance by team of NL pitchers. I need to stress that the runs created method I’m using here does not take into account sacrifices, which usually is not a big deal but can be significant for pitchers. Note that all team figures from this point forward in the post are park-adjusted. The RAA figures for each position are baselined against the overall major league average RG for the position, except for left field and right field which are pooled. So pitchers as you can see from the chart above are compared to their robust average output of .11 runs per 25.5 outs:

Dodger pitchers led in BA, OBA, and SLG and ran away with the RG lead. Zack Greinke was the standout, hitting a raw .328/.409/.379 over 72 PA thanks to a .396 BABIP. Greinke drew seven walks, as many or more than the pitching collectives of the Padres, Marlins, Cubs, Reds, and Brewers. However, the most remarkable performance is that of Pittsburgh’s pitchers, who trudged through 318 plate appearances without a single extra base hit. In 2012 the Pirates only mustered one double in 304 PA. I assumed last year that the Pirate performance was without precedent, and clearly a .000 ISO has never been topped. San Francisco gave Pittsburgh a run for their money at the bottom of the list with a .099 BA and just one double and one triple.

I don’t run a full chart of the leading positions since you will very easily be able to go down the list and identify the individual primarily responsible for the team’s performance and you won’t be shocked by any of them, but the teams with the highest RAA at each spot were:

C--MIN, 1B--CIN, 2B--NYA, 3B--DET, SS--LA, LF--STL, CF--LAA, RF--WAS, DH--BOS

More interesting are the worst performing positions; the player listed is the one who appeared in the most games at that position for the team:

The Marlins, Blue Jays, and Yankees all land multiple names on the list, but Houston’s centerfielders were the very worst outfit, a hole that has been plugged elegantly by trading for Dexter Fowler. Jeff Mathis was also replaced in Miami by Jarrod Saltalamacchia, and Carlos Beltran should improve the Yankees production at right field and/or DH. Yankee DHs .186 BA was the worst of any non-NL pitcher spot, with Chicago, Toronto, and Miami catchers all posting a .193 mark. Or, to express their futility in another manner, it seems kind of shocking that only twelve team positions were less productive in terms of RG than Yankee DHs.

Teams with unusual profiles of offense by position has been of interest to me in recent years because of the way the Indians have been constructed--often they have gotten good production from positions on the right side of the defensive spectrum while struggling at the more offensively-inclined positions. The easiest way I’ve come up with to express this numerically is the correlation between a team’s RG by position and the long-term positional adjustment (I’ve pooled left and right field but not 1B and DH in this case; pitchers are excluded for all teams and DHs excluded for NL teams, and I’ve broken the lists out by league because of this):

As usual, the Indians had a negative correlation between PADJ and RG, but they were only the seventh-most extreme team in the majors. Seattle is the team which had the highest correlation, as they got little production from catcher and middle infield (2.6 RG from backstops, 3.2 from the keystone positions) while the four corners and DH all created at least 4.5 RG. On the flip side was Minnesota, largely due to the fact that catcher was easily their most productive position with 6.4 RG and their left fielders and DH created 3.3 RG, only better than their shortstops.

Boston and St. Louis won their pennants largely thanks to respectively having the best offense in their leagues, and in a neat coincidence here, they were near the middle of the pack in correlation for their leagues with identical marks of +.44.

The following charts, broken out by division, display RAA for each position, with teams sorted by the sum of positional RAA. Positions with negative RAA are in red, and positions that are +/-20 RAA are bolded:

Atlanta led the NL in corner infield RAA. New York was last in the NL in outfield RAA. Miami had the worst offense in the majors with a remarkable six positions at -20 runs or worse, and the left fielders just missed at -18. Only the Giancarlo Stanton-led right fielders were above average, and their +18 only managed to offset the opposite outfield corner. The whole division struggled with production from centerfield; the division total of -104 RAA from one position was easily the worst in the majors as the next worst division total was -49 from NL Central shortstops.

St. Louis led all of the majors in outfield RAA as they were the only team with two +20 positions in the outfield. Pittsburgh’s McCutchen-led centerfielders had the highest RAA of any position in the NL. Cincinnati’s offense continues to look wobbly post-Choo as only the star led first base, center, and right units were above average. As seen above, Milwaukee had the most unusual distribution of offense by position in the NL, and it’s actually somewhat impressive that they managed to field an average offense despite -37 runs from first base. Chicago had the worst middle infield RAA in the majors and their infield as a whole was awful at -70, with only the disaster in Miami sparing them from finishing last.

Los Angeles middle infielders led the NL in RAA; San Francisco and Arizona tied for the NL lead for total infield RAA. Colorado had the worst corner infield RAA in the NL, which may explain the desire (albeit not the decision) to give Justin Morneau a multi-year deal. This division had the highest total RAA for a position with 59 RAA from their shortstops.

Boston led the majors in total RAA as only their third basemen were below average. Red Sox middle infielders led the majors in RAA. The Yankees finishing with just two above average positions is still jarring; another way to look at their troubles is that they spent $50.5 million on their intended corner infield starters and wound up with the worst corner infield RAA in the majors.

Detroit led the majors in corner infield and overall infield RAA thanks almost solely their third basemen compiling a whopping 71 RAA (all Cabrera has other Tiger third basemen combined for 85 PA with a .222/.341/.306 line). The rest of their offense was far from impressive, though, although it wouldn’t be fair for me to snark too much about it since the 1,000 run talk was non-existent in the spring. The Indians were close to average around the diamond except for catcher and second base (excellent) and third base (bad). Kansas City’s middle infielders were last in the AL in RAA and as the corner infielders were bad as well, the infield’s total RAA was also last in the league. Minnesota had only one above average position and the worst outfield production in the majors. Chicago had just two above average positions, but just barely with a total of 3 RAA between, leading to the lowest team total RAA in the AL.

Angel outfielders led the AL in RAA, which of course is due to the great Mike Trout. Seattle’s offense is still bad, but the last two seasons have moved them past the laughingstock phase and into consistent organization deficiency status. Houston had only one above average position, but at least they have the excuse that they weren’t really trying; what can the Yankees say?

The full spreadsheet is available here.

Tuesday, December 10, 2013

Hitting by Lineup Position, 2013

I devoted a whole post to leadoff hitters, whether justified or not, so it's only fair to have a post about hitting by batting order position in general. I certainly consider this piece to be more trivia than sabermetrics, since there’s no analytical content.

The data in this post was taken from Baseball-Reference. The figures are park-adjusted. RC is ERP, including SB and CS, as used in my end of season stat posts. The weights used are constant across lineup positions; there was no attempt to apply specific weights to each position, although they are out there and would certainly make this a little bit more interesting.

NL #3 hitters have now topped all positions in RG for five years running, and again the AL demonstrated balance between #3 and #4 while NL teams got superior performance out of #3 hitters. The other curiosity that stands out to me is that #3 and #4 were the only lineup slots in which the NL had a higher RG. Throw in the fact that the other most celebrated “key” lineup spot (leadoff) was essentially even between the two leagues, and there’s enough fuel to construct some sort of theory (for which there wouldn’t be enough evidence to proceed logically, as if that’s ever stopped anyone before).

During the playoffs I remarked that it seemed like 2013 had been a year in which the notion of batting one’s best hitter #2 had gained traction; when presented with the actual numbers here, I’d be hard pressed to defend that statement. In addition to the overall RG, if this was the case I’d expect to see an uptick in isolated power for #2 hitters. However, AL #2 hitters collective .137 ISO was better only than that of AL #1, #8, and #9 hitters, and the same was true of the NL’s .130.

Next, here are the team leaders in RG at each lineup position. The player listed is the one who appeared in the most games in that spot (which can be misleading, particularly for the bottom the batting order where there is no fixed regular as in the case of the Dodgers #8 spot, or guys who move around the batting order like Jason Castro who takes the blame for Houston’s #3s):

And the worst:

The domination of bad AL lineup spots by just four teams is something I’ve not seen since I’ve been running this report. It’s not that unusual to have one team with several dead spots (Seattle’s hapless offenses pulled this off), but the White Sox, Astros, and Yankees all had multiple such holes. Chicago boasting four such disasters is an impressive feat. Meanwhile, while Ryan Howard hit better than the Phillies collective cleanup hitters, it’s still amusing to see they were the worst unit in the NL.

The next list is the ten best positions in terms of runs above average relative to average for their particular league spot (so leadoff spots are compared to the league average leadoff performance, etc.):

Baltimore’s #5s were significantly more productive than their #3s or #4s (4.4 and 5.4 RG respectively) thanks to Buck Showalter keeping Chris Davis in that spot for much of the season. The only other #5 spot to outhit both the #3s and #4s was Philadelphia (4.5, 4.1, 5.5 RG respectively) on the backs of the Dominic Brown-led performance which paced NL #5s.

The worst positions:

Chicago’s #9 hitters had a lower RG than three groups of NL #9s (LA, COL, and PHI). They were last among AL lineup slots in BA and OBA and just narrowly missed completing the rate stat sweep as NYA #9s slugged .265 (the only other AL lineup slot with a sub-.300 SLG was SEA #9 at .275). While some passage of time in baseball is sad, like Travis Hafner and Adam Dunn-fronted spots landing on this list, it’s comforting to still have Juan Pierre to kick around.

The last set of charts show each team’s RG rank within their league at each lineup spot. The top three are bolded and the bottom three displayed in red to provide quick visual identification of excellent and poor production:

It so happens that each pennant winner sticks out as having fielded a well-balanced, productive lineup--they ranked #1 and #2 in the majors in R/G, so it’s not a surprise, but other than the very bottom of the St. Louis lineup, there were no weak links in either team’s batting order.

The spreadsheet used to generate these figures is here.

Monday, December 02, 2013

Leadoff Hitters, 2013

This post kicks off a series of posts that I write every year, and therefore struggle to infuse with any sort of new perspective. However, they're a tradition on this blog and hold some general interest, so away we go.

This post looks at the offensive performance of teams' leadoff batters. I will try to make this as clear as possible: the statistics are based on the players that hit in the #1 slot in the batting order, whether they were actually leading off an inning or not. It includes the performance of all players who batted in that spot, including substitutes like pinch-hitters.

Listed in parentheses after a team are all players that started in twenty or more games in the leadoff slot--while you may see a listing like "OAK (Crisp)” this does not mean that the statistic is only based solely on Crisp's performance; it is the total of all Atlanta batters in the #1 spot, of which Crisp was the only one to start in that spot in twenty or more games. I will list the top and bottom three teams in each category (plus the top/bottom team from each league if they don't make the ML top/bottom three); complete data is available in a spreadsheet linked at the end of the article. There are also no park factors applied anywhere in this article.

That's as clear as I can make it, and I hope it will suffice. I always feel obligated to point out that as a sabermetrician, I think that the importance of the batting order is often overstated, and that the best leadoff hitters would generally be the best cleanup hitters, the best #9 hitters, etc. However, since the leadoff spot gets a lot of attention, and teams pay particular attention to the spot, it is instructive to look at how each team fared there.

The conventional wisdom is that the primary job of the leadoff hitter is to get on base, and most simply, score runs. It should go without saying on this blog that runs scored are heavily dependent on the performance of one’s teammates, but when writing on the internet it’s usually best to assume nothing. So let's start by looking at runs scored per 25.5 outs (AB - H + CS):

1. STL (Carpenter/Jay), 7.2
2. CIN (Choo), 6.3
3. BOS (Ellsbury), 5.9
Leadoff average, 4.8
ML average, 4.1
28. PHI (Rollins/Revere/Young/Hernandez), 3.7
29. HOU (Grossman/Villar/Altuve/Barnes), 3.4
30. MIA (Pierre/Yelich/Hechavarria), 3.0

Speaking of getting on base, the other obvious measure to look at is On Base Average. The figures here exclude HB and SF to be directly comparable to earlier versions of this article, but those categories are available in the spreadsheet if you'd like to include them:

1. CIN (Choo), .397
2. STL (Carpenter/Jay), .371
3. MIL (Aoki), .347
4. OAK (Crisp), .346
Leadoff average, .324
ML average, .314
28. NYN (Young), .289
29. MIN (Dozier/Presley/Carroll), .283
30. MIA (Pierre/Yelich/Hechavarria), .278

The next statistic is what I call Runners On Base Average. The genesis for ROBA is the A factor of Base Runs. It measures the number of times a batter reaches base per PA--excluding homers, since a batter that hits a home run never actually runs the bases. It also subtracts caught stealing here because the BsR version I often use does as well, but BsR versions based on initial baserunners rather than final baserunners do not.

My 2009 leadoff post was linked to a Cardinals message board, and this metric was the cause of a lot of confusion (this was mostly because the poster in question was thick-headed as could be, but it's still worth addressing). ROBA, like several other methods that follow, is not really a quality metric, it is a descriptive metric. A high ROBA is a good thing, but it's not necessarily better than a slightly lower ROBA plus a higher home run rate (which would produce a higher OBA and more runs). Listing ROBA is not in any way, shape or form a statement that hitting home runs is bad for a leadoff hitter. It is simply a recognition of the fact that a batter that hits a home run is not a baserunner. Base Runs is an excellent model of offense and ROBA is one of its components, and thus it holds some interest in describing how a team scored its runs, rather than how many it scored:

1. STL (Carpenter/Jay), .352
2. CIN (Choo), .348
3. BOS (Ellsbury), .322
Leadoff average, .294
ML average, .283
28. SEA (Miller/Chavez/Saunders), .260
29. MIA (Pierre/Yelich/Hechavarria), .254
30. MIN (Dozier/Presley/Carroll), .252

The Cardinals move ahead of the Reds here, making up the 26 point gap in standard OBA. Part of this is the obvious – home runs, as Cincinnati leadoff hitters hit 21 to St. Louis’ 11. But another factor is caught stealing, as we’ll see a little later--Reds leadoff hitters were just fifteen for thirty on stolen base attempts, tied for the second most caught stealing. St. Louis leadoff hitters were just three for six on steal attempts--no other team had fewer than ten stolen bases and only Kansas City had as few caught stealing (albeit with 15 SB), so the Cardinals easily had the fewest attempts (Detroit was next with fourteen).

I will also include what I've called Literal OBA here--this is just ROBA with HR subtracted from the denominator so that a homer does not lower LOBA, it simply has no effect. You don't really need ROBA and LOBA (or either, for that matter), but this might save some poor message board out there twenty posts, by not implying that I think home runs are bad, so here goes. LOBA = (H + W - HR - CS)/(AB + W - HR):

1. CIN (Choo), .358
2. STL (Carpenter/Jay), .358
3. BOS (Ellsbury), .327
Leadoff average, .300
ML average, .290
28. SEA (Miller/Chavez/Saunders), .268
29. MIN (Dozier/Presley/Carroll), .257
30. MIA (Pierre/Yelich/Hechavarria), .257

There is a high degree of repetition for the various OBA lists, which shouldn’t come as a surprise since they are just minor variations on each other.

The next two categories are most definitely categories of shape, not value. The first is the ratio of runs scored to RBI. Leadoff hitters as a group score many more runs than they drive in, partly due to their skills and partly due to lineup dynamics. Those with low ratios don’t fit the traditional leadoff profile as closely as those with high ratios (at least in the way their seasons played out):

1. MIA (Pierre/Yelich/Hechavarria), 2.2
2. MIL (Aoki), 2.1
3. PIT (Marte/Tabata), 2.1
7. TB (Jennings/Joyce/DeJesus), 1.9
Leadoff average, 1.6
27. CHN (DeJesus/Castro/Valbeuna), 1.3
28. MIN (Dozier/Presley/Carroll), 1.2
29. KC (Gordon), 1.2
30. TEX (Kinsler/Andrus/Martin), 1.1
ML average, 1.1

Again, this is not a quality list, as indicated by the mix of good and bad OBAs among the leaders and trailers. This is also a good interlude at which to remind you that the players listed are those who started twenty or more games in the leadoff spot for their teams and they are not solely responsible for the overall performance of the team’s leadoff hitters. David DeJesus lead off 66 games for the Cubs and 20 for the Rays and thus finds himself as part of both the leaders and trailers list here.

A similar gauge, but one that doesn't rely on the teammate-dependent R and RBI totals, is Bill James' Run Element Ratio. RER was described by James as the ratio between those things that were especially helpful at the beginning of an inning (walks and stolen bases) to those that were especially helpful at the end of an inning (extra bases). It is a ratio of "setup" events to "cleanup" events. Singles aren't included because they often function in both roles.

Of course, there are RBI walks and doubles are a great way to start an inning, but RER classifies events based on when they have the highest relative value, at least from a simple analysis:

1. NYN (Young), 1.7
2. HOU (Grossman/Villar/Altuve/Barnes), 1.7
3. MIL (Aoki), 1.4
Leadoff average, 1.0
ML average, .7
27. PIT (Marte/Tabata), .7
28. DET (Jackson/Dirks), .7
29. LAA (Shuck/Aybar/Bourjos), .7
30. SEA (Miller/Chavez/Saunders), .6

Since stealing bases is part of the traditional skill set for a leadoff hitter, I've included the ranking for what some analysts call net steals, SB - 2*CS. I'm not going to worry about the precise breakeven rate, which is probably closer to 75% than 67%, but is also variable based on situation. The ML and leadoff averages in this case are per team lineup slot:

1. BOS (Ellsbury), 47
2. NYN (Young), 27
3. BAL (McLouth/Markakis), 18
Leadoff average, 5
ML average, 3
28. CIN (Choo), -10
29. ARI (Prado/Pollock/Eaton), -11
29. HOU (Grossman/Villar/Altuve/Barnes), -11

Since 2007, the percentage of major league stolen base attempts from leadoff hitters has declined (2007 is an arbitrary endpoint due to it being the first year I have the data at my finger tips):

30.2%, 29.6%, 27.8%, 25.9%, 27.9%, 25.1%, 25.9%

Leadoff hitters should have a disproportionate share of stolen base attempts for three obvious reasons:

1. they by definition get the most plate appearances of any lineup slot, creating more opportunities to get on base
2. as a group, they usually have above-average OBAs more heavily tied up in singles and walks, creating more good opportunities to steal bases
3. managers still tend to strongly consider speed when choosing a leadoff hitter

While #1 is an unalterable truth and #2 is generally supported by sabermetric orthodoxy, #3 is a factor which may decline in importance in a more sabermetrically-minded game. The percentage of steal attempts from leadoff hitters is something I’ll be keeping an eye on in future seasons as an imperfect indicator of shifting reasoning.

Let's shift gears back to quality measures, beginning with one that David Smyth proposed when I first wrote this annual leadoff review. Since the optimal weight for OBA in a x*OBA + SLG metric is generally something like 1.7, David suggested figuring 2*OBA + SLG for leadoff hitters, as a way to give a little extra boost to OBA while not distorting things too much, or even suffering an accuracy decline from standard OPS. Since this is a unitless measure anyway, I multiply it by .7 to approximate the standard OPS scale and call it 2OPS:

1. CIN (Choo), 881
2. STL (Carpenter/Jay), 832
3. OAK (Crisp), 795
Leadoff average, 727
ML average, 717
28. MIN (Dozier/Presley/Carroll), 639
29. NYN (Young), 625
30. MIA (Pierre/Yelich/Hechavarria), 607

Along the same lines, one can also evaluate leadoff hitters in the same way I'd go about evaluating any hitter, and just use Runs Created per Game with standard weights (this will include SB and CS, which are ignored by 2OPS):

1. CIN (Choo), 6.4
2. STL (Carpenter/Jay), 5.8
3. BOS (Ellsbury), 5.4
Leadoff average, 4.4
ML average, 4.3
28. HOU (Grossman/Villar/Altuve/Barnes), 3.2
29. MIN (Dozier/Presley/Carroll), 3.2
30. MIA (Pierre/Yelich/Hechavarria), 2.9

It’s kind of sad not having the Mariners offense ranking last in just about everything anymore, but the Marlins leadoff hitters were just part of a valiant effort by Miami to take up the mantle.

Finally, allow me to close with a crude theoretical measure of linear weights supposing that the player always led off an inning (that is, batted in the bases empty, no outs state). There are weights out there (see The Book) for the leadoff slot in its average situation, but this variation is much easier to calculate (although also based on a silly and impossible premise).

The weights I used were based on the 2010 run expectancy table from Baseball Prospectus. Ideally I would have used multiple seasons but this is a seat-of-the-pants metric. The 2010 post goes into the detail of how this measure is figured; this year, I’ll just tell you that the out coefficient was -.216, the CS coefficient was -.583, and for other details refer you to that post. I then restate it per the number of PA for an average leadoff spot (739 in 2013):

1. CIN (Choo), 32
2. STL (Carpenter/Jay), 22
3. BOS (Ellsbury), 19
Leadoff average, 0
ML average, -2
28. HOU (Grossman/Villar/Altuve/Barnes), -20
29. MIN (Dozier/Presley/Carroll), -21
30. MIA (Pierre/Yelich/Hechavarria), -25

A common theme in these rankings has been the turnaround for Cincinnati leadoff hitters, who last year were historically awful. Truly, unbelievably (especially for a playoff team) awful. In 2012, Reds leadoff hitters led by Zack Cozart and Brandon Phillips were last in the majors in R/G (3.8), OBA (.247), ROBA (.224), LOBA (.229), R/BI (2.2), RER (.6), 2OPS (575), and LE (-32). To be fair R/BI and RER are not good/bad categories, but they indicate that the Reds did not fit the traditional leadoff hitter mold.

This year, the Shin-Soo Choo led Reds were tops in R/G, OBA, LOBA, 2OPS, RG, and LE. The bad news is that it was just a one year fix; the good news is that Bryan Price may have a more modern take on leadoff decisions than Dusty Baker. Still, the Reds better have sent Manny Acta a fruit basket for making Choo a “proven” leadoff hitter.

For the full lists and data, see the spreadsheet here.

Thursday, November 21, 2013

Statistical Meanderings 2013

Below are my annual observations from perusing the end of season stats I post on this blog. They are generally nuggets that I find interesting or amusing rather than an attempt to engage in serious analysis and should be taken in that light. You’ll notice a bit of an Indians bias in terms of what I found interesting:

* Only one team in MLB finished with between 79 and 84 wins, which seems rather remarkable--the 81-81 Diamondbacks. Making the range an even three wins on both sides of .500 (78-84, or more appropriately a W% between .481 and .519), there were two teams in this range (the Angels were 78-84). The last time there were two or fewer teams in this range was 1994, which of course was a strike-shortened season. Prior to that, one must go back to 1978, 1969, 1967 (with 26, 24, and 20 teams in the majors respectively). The last time only one major league team in that range was 1965 as the Cardinals were 80-81, falling in the range, while the Phillies were 85-76 and the Yankees were 77-85. It has been 1937 since there were no teams in this range. There were a whopping ten teams in this range in 1991.

Obviously the particular range I’ve chosen doesn’t have any particular significance, and there are some more rigorous ways one could measure the lack of centrality in 2013 team records.

* No sub-.500 team had an EW% (based on runs scored and allowed) or PW% (based on runs created and runs created allowed) above .500. The Angels were the closest in both (.481 W% with .497 EW% and .497 PW%). Only the Yankees managed a winning record with an EW% or PW% below .500 (.525 W%, .485 EW%, .446 PW%). The RMSE of EW% (Pythagenpat) as a predictor of W% was 3.66 which is definitely lower than the long-term average, although I’ve never looked at the annual breakouts closely enough to tell you if it’s unusually low or not).

* Atlanta had a 56-25 record at home (thanks largely to just 2.96 RA/G at home), which makes one wonder why they’d want to tear Turner Field down; their .690 mark has been matched or exceeded in the last five years only by the 2009 Yankees and Red Sox, 2010 Braves, and 2011 Brewers. On the other hand, Houston was 24-57 at home, tied for fifth-worst since 1961.

The flip side is that Atlanta was the only playoff team(for the sake of this post, I’m counting the two wildcard losers as playoff teams, which I know sets some people off) with a losing road record (Tampa Bay and Cincinnati were both a win better at 41-41). The Mets were .099 points better on the road (they actually had a winning road record at 41-40 but were just 33-48 at home). That was the biggest discrepancy in favor of road since the 2011 Mets, and in the non-Mets category since the 2002 Red Sox.

* I always like to look at the playoff teams by runs above average on offense and defense (park adjusted and just based on runs per game to keep it simple). This often gives me an opportunity to snark about the usual nonsense about pitching being paramount, and this year is no exception:

Note that I’m not making the opposite argument.

* It was probably never a great idea to lump teams into sabermetric and non-sabermetric front office buckets, or assume that the sabermetric front offices would surely produce teams with higher secondary averages, and it’s even sillier to attempt that now. Still, I find it satisfying on some level that the top four teams in secondary average were Oakland, Tampa Bay, Boston, and Cleveland.

* Drew Smyly ranked tenth among AL relievers in RAR with excellent peripherals to back it up. I didn’t realize this, and based on his playoff deployment of Smyly, neither did Jim Leyland.

*Relievers are a little hard to keep track of due to their somewhat fungible nature and the bloated size of modern bullpens--at any given moment there are roughly 210 full-time relievers in the majors. I watch enough MLB Network, enough games of teams around the league, and read enough box scores to be reasonably familiar with all major league players, but there are without fail a couple relievers on the list every year of whom I have no useful knowledge. The highest ranking in RAR was Seattle’s rookie Yoervis Medina, who was 23rd in the AL with 14.

* Brandon McCarthy had something of a disappointing season after signing with Arizona, pitching 135 innings with a 4.68 RRA for 7 RAR. I was amused last offseason, though, that McCarthy signed for 2/$15.5MM while another free agent Brandon signed with the Dodgers for 3/$22.5MM. To the surprise of just about no one, McCarthy was still a much better value than League, who ranked dead last among NL relievers in RAR and strikeout rate (-16 RAR thanks to a 7.11 RRA with a 4.4 KG).

* In the celebration of Ben Cherington’s makeover of the Red Sox that followed their World Series triumph, one move was convieniently glossed over. In pointing this out, I don’t mean to suggest that Cherington was not worthy of praise or that perfection is a reasonable goal. But the Joel Hanrahan trade made little sense to me when it was made, and as Melancon was one of the NL’s best relievers, it looks much worse in retrospect.

* It’s once again time to play: Which Yankee Reliever Whose Name Begins With R Is It?

* Jose Mijares had one of the largest gaps between his eRA (estimated RA based on opponent’s runs created) and dRA (DIPS-style estimate RA) that you’ll ever see as they were 6.53 and 3.57 respectively. This was driven by an eye-popping .428 %H. Granted, he only faced around 230 hitters, but that jumps off the page.

* Q: What do Arolids Chapman, Craig Kimbrel, Kenley Jansen, Jason Grilli, Trevor Rosenthal, Kevin Siegrist, Jim Henderson, Francisco Rodriguez, Manny Parra, Blake Parker, David Carpenter, Jordan Walden, Paco Rodriguez, Pedro Strop, Nick Vincent, Tyler Clippard, Rex Brothers, Steve Cishek, Carlos Marmol, Antonio Bastardo, Sam LeCure, Mike Gonzalez, Mike Dunn, David Hernandez, AJ Ramos, Heath Bell, Mark Melancon, Jake Diekman, JJ Hoover, Tony Sipp, Luke Gregerson, Will Harris, Sergio Romo, Jean Machi, Dale Thayer, Javier Lopez, Adam Ottavino, Jose Mijares, Alex Wood, Tom Gorzelanny, Logan Ondrusek, and Craig Stammen have in common?

A: They were all NL relievers with higher strikeout rates than Jonathan Papelbon. That Papelbon’s KG was 8.6 speaks a lot about the current environment.

*Paul Clemens appeared in 35 games for Houston with a 5.13 RRA over 73 innings for -3 RAR. His peripherals were worse (6.12 eRA, 6.27 dRA, 5.8 KG, 3.1 WG). My honest question: could Roger Clemens have done better?

Speaking more generally of Houston’s bullpen, it had an eRA of 5.76, 1.04 runs higher than the second-worst bullpen (PHI) and 40% higher than the AL average of 4.11. For comparison, the dreadful Arizona pen of 2010 had a 5.54 eRA in a league with an average of 4.35, only 27% higher than average.

I was overly optimistic about the Astros’ outlook this year, but this is an area that an intelligent organization should be able to improve, should they deign to devote any resources to it all.

* Travis Wood was among the better starters in the NL this year, at least from a non-DIPS perspective, pitching 200 innings with a 3.10 RRA and 3.36 eRA. Even if you start from his 4.15 dRA, he was at worst an average starter pitching a lot of innings. Sean Marshall, on the other hand, pitched just 16 innings for the Reds and made about $4 million more. I wouldn’t advise trading a potential starter for a reliever, even a good one like Marshall, particularly when you intend to use that reliever as a LOOGY and when the starter you’re trading could probably fill the reliever’s potential role nearly as well anyway.

* Last year I made a big point of comparing the aggregate performance of Drew Pomeranz and Alex White (not good) to Ubaldo Jimenez (just as bad and a lot more expensive). To be fair, this year I will point out that Ubaldo wiped the floor with them and was a key contributor to the Indians wildcard spot. Jimenez chipped in 35 RAR, good for 24th among AL starters. As you probably know he was his old (Cleveland-style) self in the first half but much better in the second half. This lack of consistency is captured crudely by his QS%--just fifty percent, ranking tied for 46th among AL starters and a tick below the league average of 51%. Jimenez led all AL starters with a below-avergae QS% in RAR and strikeout rate, and was second in innings pitched (behind AJ Griffin) and RAA (behind Alexi Ogando). Only seven AL starters had a RRA better than the league average with a subpar QS%, and three of them pitched for Cleveland (Jimenez, Cory Kluber, and Scott Kazmir).

* The Indians’ starting pitching was easily the worst of any playoff team. Cleveland’s starters had an eRA of 4.55, just ahead of the AL average of 4.60 and 21st in MLB. The next poorest playoff team was Tampa Bay (4.36, 14th in MLB), with the other eight playoff teams ranking in the top ten (only the Nationals and the Cubs missed the playoffs among the top ten). Cleveland starters averaged 5.7 innings/start compared to the league average of 5.9, and only Pittsburgh was similarly poor among playoff teams (also 5.7). Seven of the playoff teams were in the top ten in this category. The Indians’ QS% of 45% was fourth-worst in MLB; Tampa Bay was next worst among playoff teams (49%, 23rd) and six of the playoff teams finished in the top ten.

* How quickly the mighty can fall when they are built on elbows and shoulders: San Francisco had one starting pitcher with positive RAA (Madison Bumgarner) along with the second and third to last NL starters in RAR. Barry Zito ranking down there was no surprise, but Ryan Vogelsong’s magic ride came to a halt with a line that pretty much made him Zito’s right-handed twin:

* Minnesota’s starting pitching was terrible once again; in 2012, they were last in starters’ eRA and second-to last in innings/start and QS%. In 2013, they completed the triple crown--last in IP/S (5.38), QS% (38), and eRA (5.76). No team was even close to being as hapless in this department as Minnesota--Colorado starters worked 5.43 IP/S and had 40% QS, while Houston and Toronto had the next highest eRA (5.24). Rockies starters were actually respectable with a 4.43 eRA versus a NL average of 4.24.

* I came to age as a baseball fan during the mid-90s, so the recent dip in runs scored is difficult for me to process when I peruse the stats--from an analytical perspective I understand the context issue, but there’s something jarring to me about looking at a list of hitters for a league and seeing only seven players with 100 Runs Created as was the case for the NL in 2013 (there were ten in the AL). 2003 is the earliest year for which I have my end of season stats at easy disposal, and in that season 24 NL and 21 AL players created 100 runs.

Another way to express this is to look at the batting lines of NL hitters with 0 HRAA (that is average batters, albeit compared to a league average that includes pitchers). They include Luis Valbuena (.213/.319/.370), Eric Young (.254/.314/.343), Marcell Ozuna (.264/.297/.387), Brandon Crawford (.256/.316/.374), and Jesus Guzman (.232/.300/.388).

The AL average runs scored per game was 4.33, while the NL was at 4.00. For both leagues, it was the lowest scoring output since 1992 (4.32 in the AL, 3.88 in the NL).

* A quick way to see which players had seasons that most surprised me is to look down the list sorted by RAR and find the first name that makes me do a double take. In the AL, that player is definitely Jason Castro. Castro hit .277/.352/.488 over 485 PA for 39 RAR and was arguably the best catcher in the AL as the only two ahead of him on the RAR list spent a significant amount of time at other positions (Carlos Santana and Joe Mauer). Castro was an All-Star, which should have caused me to look at him more closely in-season, but then again I probably just figured that they had to pick someone from Houston.

* Texas’ once-vaunted offense was below average in 2013, scoring 17 fewer runs than an average AL team when adjusted for park. A look down the list of individuals is jarring; only Adrian Beltre, Ian Kinsler, and Nelson Cruz ranked as above average. There’s an interesting case to be made for the Rangers as a cautionary tale for something (anointing a team as the best since the 1998 Yankees in June, maybe? Obviously that was in 2012, not 2013), but I’m not quite sure what it is.

* I list a variant of Bill James’ Speed Score in my stats (I switched from my own knockoff Speed Unit a few years ago because it’s easier to disclaim the results when you just use someone else’s method), but it really serves very little purpose--it's purposefully not expressed in a meaningful unit, it’s a skill measure rather than a value measure and therefore really should consider more data than one season, and the results usually aren’t surprising. One name that popped out at me, though, was Matt Dominguez, who has a Speed Score of 1.1. The AL players with lower Speed Scores are all catchers, first basemen, or DHs, except for fellow third baseman Alberto Callaspo.

I saw five or so Astro games on TV this year but don’t Dominguez’ speed or lack thereof standing out, and my impression was that defense at third was his calling card (not that speed is a key factor for third base defense, but my mental picture of a good third baseman is a big but athletic guy--he wouldn’t have a high speed score, but neither would he be sandwiched between Joe Mauer and Justin Morenau on the list).

But by the components that go into Speed Score, he’s really slow. He’s only attempted one stolen base in 200 major leagues games (and he was caught). He has two triples, but neither came in 2013. And he’s only scored 25% of the time when reaching base, which of course is somewhat attributable to playing for Houston.

* You may have noticed in reading through that I am easily amused by comparisons of players otherwise connected, that is traded for each other or where one replaced the other. My very favorite combination this year are the AL and NL trailers in RAR, who were once swapped as counterweights in the Zack Greinke deal. Alcides Escobar was 12 runs below replacement considering only offense and position, hitting .232/.255/.297 for 2.5 RG over 626 PA. Yuniesky Betancourt was -9 RAR, hitting .211/.238/.354 for 2.7 RG over 405 PA. And I for one am shocked that “Yuniesky Betancourt, first baseman” was a resounding failure.

Friday, November 08, 2013

IBA Ballot: MVP

I think we can all just dust off what we wrote last year, change the numbers a little bit, and save a bunch of time, because the essence of the AL MVP race is once again Cabrera v. Trout. The circumstances have changed a little, though. For one, both had better seasons with the bat in 2013 than they did in 2012 (which serves to illustrate the silliness of positing that leading the league in three particular categories makes a season inherently more valuable than another). Cabrera went from hitting .326/.390/.600 for 8.1 RG in 2012 to .344/.434/.630 for 9.6 RG in 2013. Trout had a less dramatic uptick, from .332/.406/.575 for 8.7 RG to .329/.438/.568 for 9.1 RG. These productivity increases were even more valuable than those figures suggest as the AL’s run/game average dipped from 4.45 to 4.33.

Put it all together (including position, which isn’t a huge difference when comparing a centerfielder and a third baseman using my position adjustments), and the RAR gap between the two is unchanged from 2012--three runs in favor of Trout (81 to 78 in 2012, 93 to 90 in 2013). Fielding and baserunning are still in Trout’s favor, regardless of his slippage in the fielding metrics--Cabrera also saw his fielding metrics take a plunge, and Trout’s -2 FRAA, +4 UZR, and -9 DRS aren’t enough to flip this race. Cabrera’s fielding, if given 100% credibility, might be enough to allow the rest of the field to challenge for the second spot (-13, -17, -18 in those three metrics).

This will mark the fourth consecutive year in which I have placed Cabrera in the #2 position in the AL MVP race, which has to be some kind of “record” (scare quotes since my opinion on awards are not of sufficient heft to constitute a record).

The rest of the ballot is not that interesting to discuss. The top four pitchers are sprinkled in along with Chris Davis, Robinson Cano, Josh Donaldson, and Evan Longoria. I saw no reason to deviate from RAR ordering with those guys except for Longoria, who was slightly behind Carlos Santana and David Ortiz but has a pretty clear fielding advantage over that pair:

1. CF Mike Trout, LAA
2. 3B Miguel Cabrera, DET
3. 1B Chris Davis, BAL
4. SP Max Scherzer, DET
5. SP Yu Darvish, TEX
6. 2B Robinson Cano, NYA
7. 3B Josh Donaldson, OAK
8. SP Hisashi Iwakuma, SEA
9. SP James Shields, KC
10. 3B Evan Longoria, TB

The National League race is actually more interesting, as there are five players who I believe to be very much removed from the rest of the field, any one of whom would make a completely justifiable MVP selection. And since one of the five is a pitcher, there are a number of ancillary issues that come into play.

I’ll set Clayton Kershaw aside for a moment and first discuss the four position player candidates. Two make an easy comparison to each other given position. Joey Votto and Paul Goldschmidt had very similar seasons in terms of overall offensive performance, and very similar numbers in two key broad “shape” categories, yet still achieved those in different ways. Votto had a .303 BA to Goldschmidt’s .296 and a .415 secondary average versus .404 for Goldschmidt. Votto’s SEC was balanced between a .187 walk/at bat ratio (second among all qualified major leaguers behind Mike Trout) and a .185 isolated power (38th in the NL among those with 300 PA). Goldschmidt’s W/AB was .137 (8th in the NL), but his .244 ISO was third.

I estimate that each created about 124 runs, with Votto using 20 less outs to do so, and so he ends up 3 RAR ahead. In the field, Goldschmidt’s metrics come out a little ahead of Votto’s, but not by a large enough margin to tip the comparison. Where Goldschmidt does have a clear edge is in context-dependent metrics like RE24 and WPA; generally I don’t put much weight on these, but Goldschmidt’s advantage is enough to push him just ahead of Votto on my ballot.

Matt Carpenter is also a legitimate candidate, with 66 RAR. Carpenter is a recent convert to second base and his metrics suggest he’s average, which may be a kinder assessment than the eighteen times Mike Matheny inserted him at third base mid-game. However, Carpenter is ranked by Baseball Prospectus as the top baserunner in the game (excluding stolen base attempts which are already considered in my RAR estimates) with an estimated 9 run contribution. Giving full weight to baserunning could move Carpenter to the head of the position player pack.

Andrew McCutchen is the fourth, and he leads the position pack with 71 RAR. His 7.39 RG is an exact match for Goldschmidt; Goldschmidt’s 40 extra PA prevent that comparison from being a runaway. While fielding metrics aren’t and haven’t been universally enthusiastic about McCutchen (-7 FRAA, 7 UZR, 7 DRS in 2013), I don’t think that’s enough to push Goldschmidt/Votto ahead.

So that leaves Kershaw v. McCutchen for NL MVP. Kershaw starts with a 77 to 71 advantage in RAR, but that is based on his actual runs allowed total. Kershaw’s RAR based on his eRA would be 72, and based on dRA it would be just 53. Using either of those figures, there’s no statistical edge for Kershaw; maybe one can create a little space by considering Kershaw’s own hitting, which was pretty good for a pitcher (.187/.238/.266 over 82, probably about 4 runs beyond an average pitcher).

If I’m going to choose a pitcher over a hitter for MVP, I’d prefer that he at least have the edge when using eRA, since the use of a component RA is conceptually the same methodology that is being used to estimate the batter’s contribution through a runs created analysis. That is, both approaches take the components of performance (hits, walks, outs, etc.) and estimate run contributions rather than look at an actual count of runs contributed/allowed.

Of course, pitcher’s runs allowed are more attributable to an individual pitcher than runs scored or batted in or to a batter; while a pitcher’s runs allowed are influenced strongly by his fielding support, and less so by his bullpen support, the pitcher at least bears some responsibility for the situations in which he finds himself (base/out situations). The batter is presented with these situations independently of his own actions. Sequencing does matter, and pitchers have control over it--but so many other factors are in play that I do consider it worthwhile to consider methods that attempt to control for these other factors, be it sequencing (as done in the case of eRA) or fielding (as done bluntly in the case of dRA and other DIPS approaches such as FIP, and attempted more carefully in the case of some other measures like bWAR).

So my natural inclination would be to side with McCutchen, ever so slightly, but in a case like this I think it is useful to bring in the perspective of other methods (In many other cases, looking at different methods is not particularly helpful because the reason for differences is methodological choices about which one is more comfortable, or because the methodologies are quite similar and so differences are minimal). Two two most used methods are Baseball-Reference and Fangraphs’ WAR. I find the latter unhelpful in a case such as this due to its complete reliance on FIP to value pitching; the former estimates that McCutchen was worth 8.2 WAR and Kershaw 7.9--a difference of about three runs.

This race is extremely close, closer still when you consider the narrow margin by which I chose McCutchen over Goldschmidt, Votto, and Carpenter. And in a complete hand waiving of reason, that is what I will use to tilt the scale--that Kershaw was so much better than any other pitcher, while no one hitter could pull away from the pack. Arbitrary and capricious? Yes. Any sillier than any other rationale for separating the two? That’s for you to judge.

The toughest decision for the rest of the ballot is what to do with two players for whom fielding is such an important consideration. Yadier Molina and Carlos Gomez each have 47 RAR, tied for thirteenth in the NL, but Molina’s defense behind the plate is universally lauded and Gomez was rated highly by all the metrics (11, 24, 38). Molina’s fielding value is harder to quantify, and its impact on his overall value is muted by his poor baserunning (a very believable -5 according to BP). I give them enough of a boost to climb over all but one of the other position players ahead of them by six or fewer RAR (Freddie Freeman, Jayson Werth, Hanley Ramirez, Buster Posey, Hunter Pence, Matt Holliday) and the non-Kershaw pitchers, but not above Shin-Soo Choo (59 RAR, bad defensive, extra hit batters) and David Wright (52 RAR with well-regarded fielding and baserunning). I feel bad about leaving Ramirez off the ballot since his 9.4 RG was the highest in MLB among those with 300 PA except for Miguel Cabrera, and 53 RAR in 331 PA is eyepopping, but sketchy fielding makes it a little easier to swallow. My ballot:

1. SP Clayton Kershaw, LA
2. CF Andrew McCutchen, PIT
3. 1B Paul Goldschmidt, ARI
4. 1B Joey Votto, CIN
5. 2B Matt Carpenter, STL
6. CF Shin-Soo Choo, CIN
7. 3B David Wright, NYN
8. C Yadier Molina, STL
9. CF Carlos Gomez, MIL
10. SP Matt Harvey, NYN

Finally, a brief missive on a topic I wrote about in my MVP post last year but thought worth revisiting: the margin of error for advanced metrics (I’ll use RAR, but it applies equally to WAR) and the use of that uncertainty in award discussions. It is good to acknowledge that the metrics we use have an associated level of uncertainty. It is good to recognize that other people’s award picks may be perfectly justifiable, even by your preferred method, due to the uncertainty. It is good to recognize that certain components of an uberstat may be less reliable than other components (fielding v. batting is the most obvious case and the one with the most impact), and adjust one’s rough estimate of uncertainty in the metric accordingly (or regress the components in question prior to aggregation).

But the margin of error should not be used as a backdoor credit for one’s preferred candidate. If the metric you are using can’t distinguish between Paul Goldschmidt and Joey Votto, and you’d like to use your judgment or some non-quantifiable factor to pick Goldschmidt, that’s great. Just don’t try to tell others that they are obligated to do the same. You might think that I am arguing against a strawman here; please don’ t make me search a few message boards to find those making arguments along these lines in last year’s AL MVP debate.

My philosophy is typically to use a metric and follow the results fairly closely in filling out a ballot. I am not saying that this is the only justifiable way to fill out an IBA ballot, but that’s how I choose to do it. Some might dismiss such an approach as an unthinking reliance on a metric, but that ignores all of the thought that has gone into selecting the metric to be used (and more importantly, if you can get away with claiming some credit, the thought that went into developing the metric). If just picking the player with the higher RAR appears to be ducking the question of which player was more valuable by falling back on an easy answer, realize that it’s not--I've already put time into thinking generally about the questions of how to measure value and have a set (but not inflexible) manner of applying that to particular cases.

Additionally, I will tend to defer to differences in the metric, even those that are clearly not meaningful, unless I can be convinced of a good reason to deviate. This does not mean that I think the difference between 65 RAR and 64 RAR is meaningful; if the choice is essentially a coin flip, then I may as well use the metric as the coin. It’s also worth remembering that from a probability distribution, the player who is 65 RAR +/- 10 RAR is more likely to have a higher true RAR than the player who is 64 RAR +/-10 RAR (this is more important when the difference is larger, say five or ten runs).

Wednesday, November 06, 2013

IBA Ballot: Cy Young

There were four AL pitchers who I estimate to have been worth 60 or more RAR in 2013, then a pack of four pitchers with 53 or 54 RAR. These two groups make a natural candidate set for the Cy Young ballot. The less interesting question is which of the four lower RAR pitchers get the #5 position. Bartolo Colon, Chris Sale, Anibal Sanchez, and Felix Hernandez can’t be distinguished based on their RAR, so I go to peripherals to give Sanchez the nod--he has the best eRA of the bunch (just edging out Sale 3.17 to 3.19) and the best dRA (my Base Runs DIPS-style run average); in fact, Sanchez’ 2.88 dRA led all AL pitchers.

The four pitchers vying for the top spot are Hisashi Iwakuma, Yu Darvish, Max Scherzer, and James Shields. Iwaukuma actually leads in RAR at 67, but has two major drawbacks. The first has nothing to do with his pitching but rather with that calculation--it is assuming that Safeco was a neutral park in 2013 based on one-year of data. While my standard procedure is to reset the park factor for a dimension change, it’s truly not correct to treat it as a completely new park. If we assume that Seattle’s PF is a more pitcher-friendly .96, his RAR lead over Darvish dissipates. Iwakuma also benefited from a very low BABIP (.259), although that is not unique to him as Darvish (.267) and Scherzer (.263) also had low figures. However, Iwakuma’s dRA is the worst among the pack at 4.02 (Darvish 3.50, Scherzer 3.19) and his eRA trails as well (3.35, 3.05, 2.81).

Ultimately, it’s Scherzer’s superiority in both peripheral run averages that compels me to place him first. My philosophy has always been that, when assessing value, one should start with the actual runs allowed by the pitcher, but that in cases where two pitchers are very close, peripherals act as a good tiebreaker. The difference of three RAR between Iwakuma/Darvish and Scherzer is minuscule, but Scherzer’s advantages in the peripherals are more significant. When it comes to pitchers, actual runs allowed is very meaningful, and yet still leaves things like bullpen and defensive support completely unaccounted for. Using RAR based on eRA (eRAR) and dRA (dRAR):

Iwakuma (using 1.00 PF): 67 RAR, 54 eRAR, 36 dRAR
Darvish: 64, 58, 47
Scherzer: 61, 65, 66
Shields: 60, 47, 43

Thus, I would fill out my ballot as follows:

1. Max Scherzer, DET
2. Yu Darvish, TEX
3. Hisashi Iwakuma, SEA
4. James Shields, KC
5. Anibal Sanchez, DET

In the NL, there’s no competition at all for the top spot. Clayton Kershaw had 77 RAR, 22 more than his closest competitor. Both Jose Fernandez and Matt Harvey posted similar RRA, eRA, and dRA to Kershaw, but Kershaw pitched 63 more innings than Fernandez and 58 more innings than Harvey, making it no competition from a value perspective. Of the pitchers that could compete on bulk, none come close on quality. Kershaw’s 236 innings trailed only Adam Wainwright (242), and Cliff Lee was next at 223. The only question regarding Kershaw is whether the Cy Young is enough, or whether he was the NL MVP as well.

The spots behind Kershaw on the ballot come down to choosing between two young pitchers in Harvey and Fernandez who pitched brilliantly but didn’t turn in full seasons, and two veteran workhorses in Wainwright and Lee. These four are very closely bunched in terms of RAR, but the youngsters had much lower RRAs and eRAs. In terms of dRA, it’s much closer, but Harvey and Fernandez still both were lower than the vets:

I really don’t see any reason to deviate from the RAR rankings, and filled out my ballot accordingly. That doesn’t mean I’m claiming that the differences in RAR are meaningful; I think they indicate that these four pitchers are indistinguishable in value as measured by RAR. If you believe that the replacement level is set too high, that would be reason to push Lee and Wainwright ahead; I don’t, obviously--my starting pitcher baseline is 128% of the league average runs allowed, which in W% terms is roughly .380. If I felt it was too high (quality wise rather than RA--you can see why people like to use ERA+ even if the scale distorts), I’d lower it; if anything, my inclination would be to raise the replacement level, which would benefit Harvey and Fernandez.

In the end, though, quibbling about spots 2-5 is irrelevant; this is not a year in which down ballot votes should have any impact on the outcome, which should be Clayton Kershaw, unanimous Cy Young winner:

1. Clayton Kershaw, LA
2. Matt Harvey, NYN
3. Jose Fernandez, MIA
4. Cliff Lee, PHI
5. Adam Wainwright, STL

Monday, November 04, 2013

IBA Ballot: Rookie of the Year

In the spirit of full disclosure of the type that no one would even care about, I did not cast a ballot in the IBAs this year; I was busy with some other stuff and forgot about the deadline. But this is how I would have voted if I did. I’m sure the voting went just fine even without my input.

It was not a particularly strong year for American League rookies. JB Shuck led AL rookies with 464 plate appearances, so there weren’t any full-time, full-season position players in the crop. Incidentally, I would love to be able to justify a vote for Shuck given his alma mater, but it wouldn’t be intellectually honest. He demonstrated the ability to be a fifth outfielder but little else, hitting .299 but with a secondary average of just .138 thanks to an .075 isolated power which ranked last among AL corner outfielders (only Ichiro at .080, Melky Cabrera at .081, and the cratering Nick Markakis at .084 failed to crack .100).

The position player who made the strongest case was Wil Myers, whose callup was delayed, holding him to 88 games and 368 PA. In that time, though, he easily led all AL rookies with 25 RAR and was one of the most productive hitters in the league, ranking twelfth with 6.2 RG. The best of the rest among the position players were middle infielders with Seattle’s pair of Brad Miller and Nick Franklin and Boston/Detroit’s Jose Iglesias. However, Iglesias’ fielding metrics did not match his defensive reputation; his BABIP-driven 12 RAR was very close to that of Miller (14) and Franklin (11). Both Miller and Franklin displayed impressive power for middle infielders (.154 and .157 ISO respectively); Franklin had 80 more PA but a BA forty points lower. Defensive metrics did not like Miller (-5, -2, -3 in FRAA, UZR, DRS) but had a mixed take on Franklin (15, -6, 3).

Myers’ best competition for the award came from his teammate Chris Archer, who led AL starters with 27 RAR. Archer’s peripherals were good as well (3.79 eRA), but his .258 BABIP results in just enough of a ding to edge Myers ahead for me. Dan Straily (21 RAR) and Martin Perez (20 RAR) similarly performed less well in DIPS metrics than in actual runs allowed/peripherals. Another Ray, reliever Alex Torres, was good enough to slip into the final spot on my ballot; a 2.02 RRA over 58 innings made him as valuable as Mariano Rivera, leverage and cheap rhetorical tricks aside (18 RAR).

1. RF Wil Myers, TB
2. SP Chris Archer, TB
3. SP Dan Straily, OAK
4. SP Martin Perez, TEX
5. RP Alex Torres, TB

If Wil Myers had played in the NL, he would be on the bubble for a spot on the bottom of the ballot. The top of the ballot belongs to Jose Fernandez, a legitimate candidate for the non-Kershaw division of the Cy Young discussion, who was simply superb with a 2.33 RRA over 173 innings. Say what you will about the way the Marlins organization is managed and the financial consequences of the decision, they were absolutely right that Fernandez was ready for the majors.

Behind him, the next two spots belong to the Dodgers’ key rookies Hyun-jin Ryu and Yasiel Puig. I don’t dock either of them for international experience. Puig only had 418 PA, but when you hit .328/.388/.548 that doesn’t really matter; a full season of that production would have made Puig v. Fernndez a very interesting case. I have Ryu just ahead of Puig in RAR (43 to 41), but Ryu’s dRA wasn’t quite as good as his actual runs allowed which is enough to scoot Puig ahead. Puig faired decently in defensive and baserunning metrics despite the well-publicized questionable decisions, leaving offense-only RAR as a decent gauge of his value.

Three other starters were in the mix, with Julio Teheran, Shelby Miller, and Gerrit Cole all topping 25 RAR, and there are also two more +20 RAR batters in Jedd Gyorko and Matt Adams. Throw in Nolan Arenado, who didn’t hit much (3.6 RG for 8 RAR) but won a Gold Glove and faired great on fielding metrics, and the NL crop puts the AL to shame. Even giving Arenado full credit for his fielding metrics (17 FRAA, 21 UZR, 30 DRS) is only enough to put him just ahead of Gyorko, so I’ll side with the hitting (Gyorko created 4.7 runs/game to Arenado’s 3.6):

1. SP Jose Fernandez, MIA
2. RF Yasiel Puig, LA
3. SP Hyun-jin Ryu, LA
4. SP Julio Teheran, ATL
5. 2B Jedd Gyorko, SD

Sunday, November 03, 2013

End of Season Statistics 2013

The spreadsheets are published as Google Spreadsheets, which you can download in Excel format by changing the extension in the address from "=html" to "=xls". That way you can download them and manipulate things however you see fit. The player spreadsheets are not ready yet, but I want to get the team stuff posted.

The data comes from a number of different sources. Most of the basic data comes from Doug's Stats, which is a very handy site, or Baseball-Reference. KJOK's park database provided some of the data used in the park factors, but for recent seasons park data comes from B-R. Data on pitcher's batted ball types allowed, doubles/triples allowed, and inherited/bequeathed runners comes from Baseball Prospectus.

The basic philosophy behind these stats is to use the simplest methods that have acceptable accuracy. Of course, "acceptable" is in the eye of the beholder, namely me. I use Pythagenpat not because other run/win converters, like a constant RPW or a fixed exponent are not accurate enough for this purpose, but because it's mine and it would be kind of odd if I didn't use it.

If I seem to be a stickler for purity in my critiques of others' methods, I'd contend it is usually in a theoretical sense, not an input sense. So when I exclude hit batters, I'm not saying that hit batters are worthless or that they *should* be ignored; it's just easier not to mess with them and not that much less accurate.

I also don't really have a problem with people using sub-standard methods (say, Basic RC) as long as they acknowledge that they are sub-standard. If someone pretends that Basic RC doesn't undervalue walks or cause problems when applied to extreme individuals, I'll call them on it; if they explain its shortcomings but use it regardless, I accept that. Take these last three paragraphs as my acknowledgment that some of the statistics displayed here have shortcomings as well.

The League spreadsheet is pretty straightforward--it includes league totals and averages for a number of categories, most or all of which are explained at appropriate junctures throughout this piece. The advent of interleague play has created two different sets of league totals--one for the offense of league teams and one for the defense of league teams. Before interleague play, these two were identical. I do not present both sets of totals (you can figure the defensive ones yourself from the team spreadsheet, if you desire), just those for the offenses. The exception is for the defense-specific statistics, like innings pitched and quality starts. The figures for those categories in the league report are for the defenses of the league's teams. However, I do include each league's breakdown of basic pitching stats between starters and relievers (denoted by "s" or "r" prefixes), and so summing those will yield the totals from the pitching side. The one abbreviation you might not recognize is "N"--this is the league average of runs/game for one team, and it will pop up again.

The Team spreadsheet focuses on overall team performance--wins, losses, runs scored, runs allowed. The columns included are: Park Factor (PF), Home Run Park Factor (PFhr), Winning Percentage (W%), Expected W% (EW%), Predicted W% (PW%), wins, losses, runs, runs allowed, Runs Created (RC), Runs Created Allowed (RCA), Home Winning Percentage (HW%), Road Winning Percentage (RW%) [exactly what they sound like--W% at home and on the road], Runs/Game (R/G), Runs Allowed/Game (RA/G), Runs Created/Game (RCG), Runs Created Allowed/Game (RCAG), and Runs Per Game (the average number of runs scored an allowed per game). Ideally, I would use outs as the denominator, but for teams, outs and games are so closely related that I don’t think it’s worth the extra effort.

The runs and Runs Created figures are unadjusted, but the per-game averages are park-adjusted, except for RPG which is also raw. Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. The formula is:

A = H + W - HR - CS
B = (2TB - H - 4HR + .05W + 1.5SB)*.76
C = AB - H
D = HR
Naturally, A*B/(B + C) + D.

I have explained the methodology used to figure the PFs before, but the cliff’s notes version is that they are based on five years of data when applicable, include both runs scored and allowed, and they are regressed towards average (PF = 1), with the amount of regression varying based on the number of years of data used. There are factors for both runs and home runs. The initial PF (not shown) is:

iPF = (H*T/(R*(T - 1) + H) + 1)/2
where H = RPG in home games, R = RPG in road games, T = # teams in league (14 for AL and 16 for NL). Then the iPF is converted to the PF by taking x*iPF + (1-x), where x = .6 if one year of data is used, .7 for 2, .8 for 3, and .9 for 4+.

It is important to note, since there always seems to be confusion about this, that these park factors already incorporate the fact that the average player plays 50% on the road and 50% at home. That is what the adding one and dividing by 2 in the iPF is all about. So if I list Fenway Park with a 1.02 PF, that means that it actually increases RPG by 4%.

In the calculation of the PFs, I did not get picky and take out “home” games that were actually at neutral sites.

There are also Team Offense and Defense spreadsheets. These include the following categories:

Team offense: Plate Appearances, Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Walks per At Bat (WAB), Isolated Power (SLG - BA), R/G at home (hR/G), and R/G on the road (rR/G) BA, OBA, SLG, WAB, and ISO are park-adjusted by dividing by the square root of park factor (or the equivalent; WAB = (OBA - BA)/(1 - OBA) and ISO = SLG - BA).

Team defense: Innings Pitched, BA, OBA, SLG, Innings per Start (IP/S), Starter's eRA (seRA), Reliever's eRA (reRA), Quality Start Percentage (QS%), RA/G at home (hRA/G), RA/G on the road (rRA/G), Battery Mishap Rate (BMR), Modified Fielding Average (mFA), and Defensive Efficiency Record (DER). BA, OBA, and SLG are park-adjusted by dividing by the square root of PF; seRA and reRA are divided by PF.

The three fielding metrics I've included are limited it only to metrics that a) I can calculate myself and b) are based on the basic available data, not specialized PBP data. The three metrics are explained in this post, but here are quick descriptions of each:

1) BMR--wild pitches and passed balls per 100 baserunners = (WP + PB)/(H + W - HR)*100

2) mFA--fielding average removing strikeouts and assists = (PO - K)/(PO - K + E)

3) DER--the Bill James classic, using only the PA-based estimate of plays made. Based on a suggestion by Terpsfan101, I've tweaked the error coefficient. Plays Made = PA - K - H - W - HR - HB - .64E and DER = PM/(PM + H - HR + .64E)

Next are the individual player reports. I defined a starting pitcher as one with 15 or more starts. All other pitchers are eligible to be included as a reliever. If a pitcher has 40 appearances, then they are included. Additionally, if a pitcher has 50 innings and less than 50% of his appearances are starts, he is also included as a reliever (this allows some swingmen type pitchers who wouldn’t meet either the minimum start or appearance standards to get in).

For all of the player reports, ages are based on simply subtracting their year of birth from 2013. I realize that this is not compatible with how ages are usually listed and so “Age 27” doesn’t necessarily correspond to age 27 as I list it, but it makes everything a heckuva lot easier, and I am more interested in comparing the ages of the players to their contemporaries, for which case it makes very little difference. The "R" category records rookie status with a "R" for rookies and a blank for everyone else; I've trusted Baseball Prospectus on this. Also, all players are counted as being on the team with whom they played/pitched (IP or PA as appropriate) the most.

For relievers, the categories listed are: Games, Innings Pitched, estimated Plate Appearances (PA), Run Average (RA), Relief Run Average (RRA), Earned Run Average (ERA), Estimated Run Average (eRA), DIPS Run Average (dRA), Strikeouts per Game (KG), Walks per Game (WG), Guess-Future (G-F), Inherited Runners per Game (IR/G), Batting Average on Balls in Play (%H), Runs Above Average (RAA), and Runs Above Replacement (RAR).

IR/G is per relief appearance (G - GS); it is an interesting thing to look at, I think, in lieu of actual leverage data. You can see which closers come in with runners on base, and which are used nearly exclusively to start innings. Of course, you can’t infer too much; there are bad relievers who come in with a lot of people on base, not because they are being used in high leverage situations, but because they are long men being used in low-leverage situations already out of hand.

For starting pitchers, the columns are: Wins, Losses, Innings Pitched, Estimated Plate Appearances (PA), RA, RRA, ERA, eRA, dRA, KG, WG, G-F, %H, Pitches/Start (P/S), Quality Start Percentage (QS%), RAA, and RAR. RA and ERA you know--R*9/IP or ER*9/IP, park-adjusted by dividing by PF. The formulas for eRA and dRA are based on the same Base Runs equation and they estimate RA, not ERA.

* eRA is based on the actual results allowed by the pitcher (hits, doubles, home runs, walks, strikeouts, etc.). It is park-adjusted by dividing by PF.

* dRA is the classic DIPS-style RA, assuming that the pitcher allows a league average %H, and that his hits in play have a league-average S/D/T split. It is park-adjusted by dividing by PF.

The formula for eRA is:

A = H + W - HR
B = (2*TB - H - 4*HR + .05*W)*.78
C = AB - H = K + (3*IP - K)*x (where x is figured as described below for PA estimation and is typically around .93) = PA (from below) - H - W
eRA = (A*B/(B + C) + HR)*9/IP

To figure dRA, you first need the estimate of PA described below. Then you calculate W, K, and HR per PA (call these %W, %K, and %HR). Percentage of balls in play (BIP%) = 1 - %W - %K - %HR. This is used to calculate the DIPS-friendly estimate of %H (H per PA) as e%H = Lg%H*BIP%.

Now everything has a common denominator of PA, so we can plug into Base Runs:

A = e%H + %W
B = (2*(z*e%H + 4*%HR) - e%H - 5*%HR + .05*%W)*.78
C = 1 - e%H - %W - %HR
cRA = (A*B/(B + C) + %HR)/C*a

z is the league average of total bases per non-HR hit (TB - 4*HR)/(H - HR), and a is the league average of (AB - H) per game.

In the past couple years I’ve presented a couple of batted ball RA estimates. I’ve removed these this year, not just because batted ball data exhibits questionable reliability but because these metrics were complicated to figure, required me to collate the batted ball data, and were not personally useful to me. I figure these stats for my own enjoyment and have in some form or another going back to 1997. I share them here only because I would do it anyway, so if I’m not interested in certain categories, there’s no reason to keep presenting them.

Instead, I’m showing strikeout and walk rate, both expressed as per game. By game I mean not 9 innings but rather the league average of PA/G. I have always been a proponent of using PA and not IP as the denominator for non-run pitching rates, and now the use of per PA rates is widespread. Usually these are expressed as K/PA and W/PA, or equivalently, percentage of PA with a strikeout or walk. I don’t believe that any site publishes these as K and W per equivalent game as I am here. This is not better than K%--it’s simply applying a scalar multiplier. I like it because it generally follows the same scale as the familiar K/9.

To facilitate this, I’ve finally corrected a flaw in the formula I use to estimate plate appearances for pitchers. Previously, I’ve done it the lazy way by not splitting strikeouts out from other outs. I am now using this formula to estimate PA (where PA = AB + W):

PA = K + (3*IP - K)*x + H + W
Where x = league average of (AB - H - K)/(3*IP - K)

Then KG = K*Lg(PA/G) and WG = W*Lg(PA/G).

G-F is a junk stat, included here out of habit because I've been including it for years. It was intended to give a quick read of a pitcher's expected performance in the next season, based on eRA and strikeout rate. Although the numbers vaguely resemble RAs, it's actually unitless. As a rule of thumb, anything under four is pretty good for a starter. G-F = 4.46 + .095(eRA) - .113(K*9/IP). It is a junk stat. JUNK STAT JUNK STAT JUNK STAT. Got it?

%H is BABIP, more or less--%H = (H - HR)/(PA - HR - K - W), where PA was estimated above. Pitches/Start includes all appearances, so I've counted relief appearances as one-half of a start (P/S = Pitches/(.5*G + .5*GS). QS% is just QS/(G - GS); I don't think it's particularly useful, but Doug's Stats include QS so I include it.

I've used a stat called Relief Run Average (RRA) in the past, based on Sky Andrecheck's article in the August 1999 By the Numbers; that one only used inherited runners, but I've revised it to include bequeathed runners as well, making it equally applicable to starters and relievers. I am using RRA as the building block for baselined value estimates for all pitchers this year. I explained RRA in this article , but the bottom line formulas are:

BRSV = BRS - BR*i*sqrt(PF)
IRSV = IR*i*sqrt(PF) - IRS
RRA = ((R - (BRSV + IRSV))*9/IP)/PF

The two baselined stats are Runs Above Average (RAA) and Runs Above Replacement (RAR). RAA uses the league average runs/game (N) for both starters and relievers, while RAR uses separate replacement levels for starters and relievers. Thus, RAA and RAR will be pretty close for relievers:

RAA = (N - RRA)*IP/9
RAR (relievers) = (1.11*N - RRA)*IP/9
RAR (starters) = (1.28*N - RRA)*IP/9

All players with 300 or more plate appearances are included in the Hitters spreadsheets (along with some players close to the cutoff point who I was interested in). Each is assigned one position, the one at which they appeared in the most games. The statistics presented are: Games played (G), Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Runs Created (RC), Runs Created per Game (RG), Speed Score (SS), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), Hitting Runs Above Replacement (HRAR), and Runs Above Replacement (RAR).

I do not bother to include hit batters, so take note of that for players who do get plunked a lot. Therefore, PA are simply AB + W. Outs are AB - H + CS. BA and SLG you know, but remember that without HB and SF, OBA is just (H + W)/(AB + W). Secondary Average = (TB - H + W)/AB = SLG - BA + (OBA - BA)/(1 - OBA). I have not included net steals as many people (and Bill James himself) do--it is solely hitting events.

BA, OBA, and SLG are park-adjusted by dividing by the square root of PF. This is an approximation, of course, but I'm satisfied that it works well. The goal here is to adjust for the win value of offensive events, not to quantify the exact park effect on the given rate. I use the BA/OBA/SLG-based formula to figure SEC, so it is park-adjusted as well.

Runs Created is actually Paul Johnson's ERP, more or less. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available. Anyway, RC = (TB + .8H + W + .7SB - CS - .3AB)*.322.

RC is park adjusted by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the rate, is RC/O*25.5. I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).

I have decided to switch to a watered-down version of Bill James' Speed Score this year; I only use four of his categories. Previously I used my own knockoff version called Speed Unit, but trying to keep it from breaking down every few years was a wasted effort.

Speed Score is the average of four components, which I'll call a, b, c, and d:

a = ((SB + 3)/(SB + CS + 7) - .4)*20
b = sqrt((SB + CS)/(S + W))*14.3
c = ((R - HR)/(H + W - HR) - .1)*25
d = T/(AB - HR - K)*450

James actually uses a sliding scale for the triples component, but it strikes me as needlessly complex and so I've streamlined it. I also changed some of his division to mathematically equivalent multiplications.

There are a whopping four categories that compare to a baseline; two for average, two for replacement. Hitting RAA compares to a league average hitter; it is in the vein of Pete Palmer’s Batting Runs. RAA compares to an average hitter at the player’s primary position. Hitting RAR compares to a “replacement level” hitter; RAR compares to a replacement level hitter at the player’s primary position. The formulas are:

HRAA = (RG - N)*O/25.5
RAA = (RG - N*PADJ)*O/25.5
HRAR = (RG - .73*N)*O/25.5
RAR = (RG - .73*N*PADJ)*O/25.5

PADJ is the position adjustment, and it is based on 2002-2011 offensive data. For catchers it is .89; for 1B/DH, 1.17; for 2B, .97; for 3B, 1.03; for SS, .93; for LF/RF, 1.13; and for CF, 1.02. I had been using the 1992-2001 data as a basis for the last ten years, but finally have done an update. I’m a little hesitant about this update, as the middle infield positions are the biggest movers (higher positional adjustments, meaning less positional credit). I have no qualms for second base, but the shortstop PADJ is out of line with the other position adjustments widely in use and feels a bit high to me. But there are some decent points to be made in favor of offensive adjustments, and I’ll have a bit more on this topic in general below.

That was the mechanics of the calculations; now I'll twist myself into knots trying to justify them. If you only care about the how and not the why, stop reading now.

The first thing that should be covered is the philosophical position behind the statistics posted here. They fall on the continuum of ability and value in what I have called "performance". Performance is a technical-sounding way of saying "Whatever arbitrary combination of ability and value I prefer".

With respect to park adjustments, I am not interested in how any particular player is affected, so there is no separate adjustment for lefties and righties for instance. The park factor is an attempt to determine how the park affects run scoring rates, and thus the win value of runs.

I apply the park factor directly to the player's statistics, but it could also be applied to the league context. The advantage to doing it my way is that it allows you to compare the component statistics (like Runs Created or OBA) on a park-adjusted basis. The drawback is that it creates a new theoretical universe, one in which all parks are equal, rather than leaving the player grounded in the actual context in which he played and evaluating how that context (and not the player's statistics) was altered by the park.

The good news is that the two approaches are essentially equivalent; in fact, they are equivalent if you assume that the Runs Per Win factor is equal to the RPG. Suppose that we have a player in an extreme park (PF = 1.15, approximately like Coors Field pre-humidor) who has an 8 RG before adjusting for park, while making 350 outs in a 4.5 N league. The first method of park adjustment, the one I use, converts his value into a neutral park, so his RG is now 8/1.15 = 6.957. We can now compare him directly to the league average:

RAA = (6.957 - 4.5)*350/25.5 = +33.72

The second method would be to adjust the league context. If N = 4.5, then the average player in this park will create 4.5*1.15 = 5.175 runs. Now, to figure RAA, we can use the unadjusted RG of 8:

RAA = (8 - 5.175)*350/25.5 = +38.77

These are not the same, as you can obviously see. The reason for this is that they take place in two different contexts. The first figure is in a 9 RPG (2*4.5) context; the second figure is in a 10.35 RPG (2*4.5*1.15) context. Runs have different values in different contexts; that is why we have RPW converters in the first place. If we convert to WAA (using RPW = RPG, which is only an approximation, so it's usually not as tidy as it appears below), then we have:

WAA = 33.72/9 = +3.75
WAA = 38.77/10.35 = +3.75

Once you convert to wins, the two approaches are equivalent. The other nice thing about the first approach is that once you park-adjust, everyone in the league is in the same context, and you can dispense with the need for converting to wins at all. You still might want to convert to wins, and you'll need to do so if you are comparing the 2010 players to players from other league-seasons (including between the AL and NL in the same year), but if you are only looking to compare Jose Bautista to Miguel Cabrera, it's not necessary. WAR is somewhat ubiquitous now, but personally I prefer runs when possible--why mess with decimal points if you don't have to?

The park factors used to adjust player stats here are run-based. Thus, they make no effort to project what a player "would have done" in a neutral park, or account for the difference effects parks have on specific events (walks, home runs, BA) or types of players. They simply account for the difference in run environment that is caused by the park (as best I can measure it). As such, they don't evaluate a player within the actual run context of his team's games; they attempt to restate the player's performance as an equivalent performance in a neutral park.

I suppose I should also justify the use of sqrt(PF) for adjusting component statistics. The classic defense given for this approach relies on basic Runs Created--runs are proportional to OBA*SLG, and OBA*SLG/PF = OBA/sqrt(PF)*SLG/sqrt(PF). While RC may be an antiquated tool, you will find that the square root adjustment is fairly compatible with linear weights or Base Runs as well. I am not going to take the space to demonstrate this claim here, but I will some time in the future.

Many value figures published around the sabersphere adjust for the difference in quality level between the AL and NL. I don't, but this is a thorny area where there is no right or wrong answer as far as I'm concerned. I also do not make an adjustment in the league averages for the fact that the overall NL averages include pitcher batting and the AL does not (not quite true in the era of interleague play, but you get my drift).

The difference between the leagues may not be precisely calculable, and it certainly is not constant, but it is real. If the average player in the AL is better than the average player in the NL, it is perfectly reasonable to expect the average AL player to have more RAR than the average NL player, and that will not happen without some type of adjustment. On the other hand, if you are only interested in evaluating a player relative to his own league, such an adjustment is not necessarily welcome.

The league argument only applies cleanly to metrics baselined to average. Since replacement level compares the given player to a theoretical player that can be acquired on the cheap, the same pool of potential replacement players should by definition be available to the teams of each league. One could argue that if the two leagues don't have equal talent at the major league level, they might not have equal access to replacement level talent--except such an argument is at odds with the notion that replacement level represents talent that is truly "freely available".

So it's hard to justify the approach I take, which is to set replacement level relative to the average runs scored in each league, with no adjustment for the difference in the leagues. The best justification is that it's simple and it treats each league as its own universe, even if in reality they are connected.

The replacement levels I have used here are very much in line with the values used by other sabermetricians. This is based both on my own "research", my interpretation of other's people research, and a desire to not stray from consensus and make the values unhelpful to the majority of people who may encounter them.

Replacement level is certainly not settled science. There is always going to be room to disagree on what the baseline should be. Even if you agree it should be "replacement level", any estimate of where it should be set is just that--an estimate. Average is clean and fairly straightforward, even if its utility is questionable; replacement level is inherently messy. So I offer the average baseline as well.

For position players, replacement level is set at 73% of the positional average RG (since there's a history of discussing replacement level in terms of winning percentages, this is roughly equivalent to .350). For starting pitchers, it is set at 128% of the league average RA (.380), and for relievers it is set at 111% (.450).

I am still using an analytical structure that makes the comparison to replacement level for a position player by applying it to his hitting statistics. This is the approach taken by Keith Woolner in VORP (and some other earlier replacement level implementations), but the newer metrics (among them Rally and Fangraphs' WAR) handle replacement level by subtracting a set number of runs from the player's total runs above average in a number of different areas (batting, fielding, baserunning, positional value, etc.), which for lack of a better term I will call the subtraction approach.

The offensive positional adjustment makes the inherent assumption that the average player at each position is equally valuable. I think that this is close to being true, but it is not quite true. The ideal approach would be to use a defensive positional adjustment, since the real difference between a first baseman and a shortstop is their defensive value. When you bat, all runs count the same, whether you create them as a first baseman or as a shortstop.

That being said, using "replacement hitter at position" does not cause too many distortions. It is not theoretically correct, but it is practically powerful. For one thing, most players, even those at key defensive positions, are chosen first and foremost for their offense. Empirical work by Keith Woolner has shown that the replacement level hitting performance is about the same for every position, relative to the positional average.

Figuring what the defensive positional adjustment should be, though, is easier said than done. Therefore, I use the offensive positional adjustment. So if you want to criticize that choice, or criticize the numbers that result, be my guest. But do not claim that I am holding this up as the correct analytical structure. I am holding it up as the most simple and straightforward structure that conforms to reality reasonably well, and because while the numbers may be flawed, they are at least based on an objective formula that I can figure myself. If you feel comfortable with some other assumptions, please feel free to ignore mine.

That still does not justify the use of HRAR--hitting runs above replacement--which compares each hitter, regardless of position, to 73% of the league average. Basically, this is just a way to give an overall measure of offensive production without regard for position with a low baseline. It doesn't have any real baseball meaning.

A player who creates runs at 90% of the league average could be above-average (if he's a shortstop or catcher, or a great fielder at a less important fielding position), or sub-replacement level (DHs that create 4 runs a game are not valuable properties). Every player is chosen because his total value, both hitting and fielding, is sufficient to justify his inclusion on the team. HRAR fails even if you try to justify it with a thought experiment about a world in which defense doesn't matter, because in that case the absolute replacement level (in terms of RG, without accounting for the league average) would be much higher than it is currently.

The specific positional adjustments I use are based on 1992-2001 data. There's no particular reason for not updating them; at the time I started using them, they represented the ten most recent years. I have stuck with them because I have not seen compelling evidence of a change in the degree of difficulty or scarcity between the positions between now and then, and because I think they are fairly reasonable. The positions for which they diverge the most from the defensive position adjustments in common use are 2B, 3B, and CF. Second base is considered a premium position by the offensive PADJ (.94), while third base and center field are both neutral (1.01 and 1.02).

Another flaw is that the PADJ is applied to the overall league average RG, which is artificially low for the NL because of pitcher's batting. When using the actual league average runs/game, it's tough to just remove pitchers--any adjustment would be an estimate. If you use the league total of runs created instead, it is a much easier fix.

One other note on this topic is that since the offensive PADJ is a proxy for average defensive value by position, ideally it would be applied by tying it to defensive playing time. I have done it by outs, though.

The reason I have taken this flawed path is because 1) it ties the position adjustment directly into the RAR formula rather then leaving it as something to subtract on the outside and more importantly 2) there’s no straightforward way to do it. The best would be to use defensive innings--set the full-time player to X defensive innings, figure how Derek Jeter’s innings compared to X, and adjust his PADJ accordingly. Games in the field or games played are dicey because they can cause distortion for defensive replacements. Plate Appearances avoid the problem that outs have of being highly related to player quality, but they still carry the illogic of basing it on offensive playing time. And of course the differences here are going to be fairly small (a few runs). That is not to say that this way is preferable, but it’s not horrible either, at least as far as I can tell.

To compare this approach to the subtraction approach, start by assuming that a replacement level shortstop would create .86*.73*4.5 = 2.825 RG (or would perform at an overall level of equivalent value to being an average fielder at shortstop while creating 2.825 runs per game). Suppose that we are comparing two shortstops, each of whom compiled 600 PA and played an equal number of defensive games and innings (and thus would have the same positional adjustment using the subtraction approach). Alpha made 380 outs and Bravo made 410 outs, and each ranked as dead-on average in the field.

The difference in overall RAR between the two using the subtraction approach would be equal to the difference between their offensive RAA compared to the league average. Assuming the league average is 4.5 runs, and that both Alpha and Bravo created 75 runs, their offensive RAAs are:

Alpha = (75*25.5/380 - 4.5)*380/25.5 = +7.94

Similarly, Bravo is at +2.65, and so the difference between them will be 5.29 RAR.

Using the flawed approach, Alpha's RAR will be:

(75*25.5/380 - 4.5*.73*.86)*380/25.5 = +32.90

Bravo's RAR will be +29.58, a difference of 3.32 RAR, which is two runs off of the difference using the subtraction approach.

The downside to using PA is that you really need to consider park effects if you, whereas outs allow you to sidestep park effects. Outs are constant; plate appearances are linked to OBA. Thus, they not only depend on the offensive context (including park factor), but also on the quality of one's team. Of course, attempting to adjust for team PA differences opens a huge can of worms which is not really relevant; for now, the point is that using outs for individual players causes distortions, sometimes trivial and sometimes bothersome, but almost always makes one's life easier.

I do not include fielding (or baserunning outside of steals, although that is a trivial consideration in comparison) in the RAR figures--they cover offense and positional value only). This in no way means that I do not believe that fielding is an important consideration in player valuation. However, two of the key principles of these stat reports are 1) not incorporating any data that is not readily available and 2) not simply including other people's results (of course I borrow heavily from other people's methods, but only adapting methodology that I can apply myself).

Any fielding metric worth its salt will fail to meet either criterion--they use zone data or play-by-play data which I do not have easy access to. I do not have a fielding metric that I have stapled together myself, and so I would have to simply lift other analysts' figures.

Setting the practical reason for not including fielding aside, I do have some reservations about lumping fielding and hitting value together in one number because of the obvious differences in reliability between offensive and fielding metrics. In theory, they absolutely should be put together. But in practice, I believe it would be better to regress the fielding metric to a point at which it would be roughly equivalent in reliability to the offensive metric.

Offensive metrics have error bars associated with them, too, of course, and in evaluating a single season's value, I don't care about the vagaries that we often lump together as "luck". Still, there are errors in our assessment of linear weight values and players that collect an unusual proportion of infield hits or hits to the left side, errors in estimation of park factor, and any number of other factors that make their events more or less valuable than an average event of that type.

Fielding metrics offer up all of that and more, as we cannot be nearly as certain of true successes and failures as we are when analyzing offense. Recent investigations, particularly by Colin Wyers, have raised even more questions about the level of uncertainty. So, even if I was including a fielding value, my approach would be to assume that the offensive value was 100% reliable (which it isn't), and regress the fielding metric relative to that (so if the offensive metric was actually 70% reliable, and the fielding metric 40% reliable, I'd treat the fielding metric as .4/.7 = 57% reliable when tacking it on, to illustrate with a simplified and completely made up example presuming that one could have a precise estimate of nebulous "reliability").

Given the inherent assumption of the offensive PADJ that all positions are equally valuable, once RAR has been figured for a player, fielding value can be accounted for by adding on his runs above average relative to a player at his own position. If there is a shortstop that is -2 runs defensively versus an average shortstop, he is without a doubt a plus defensive player, and a more valuable defensive player than a first baseman who was +1 run better than an average first baseman. Regardless, since it was implicitly assumed that they are both average defensively for their position when RAR was calculated, the shortstop will see his value docked two runs. This DOES NOT MEAN that the shortstop has been penalized for his defense. The whole process of accounting for positional differences, going from hitting RAR to positional RAR, has benefited him.

I've found that there is often confusion about the treatment of first baseman and designated hitters in my PADJ methodology, since I consider DHs as in the same pool as first baseman. The fact of the matter is that first baseman outhit DH. There is any number of potential explanations for this; DHs are often old or injured, players hit worse when DHing than they do when playing the field, etc. This actually helps first baseman, since the DHs drag the average production of the pool down, thus resulting in a lower replacement level than I would get if I considered first baseman alone.

However, this method does assume that a 1B and a DH have equal defensive value. Obviously, a DH has no defensive value. What I advocate to correct this is to treat a DH as a bad defensive first baseman, and thus knock another five or ten runs off of his RAR for a full-time player. I do not incorporate this into the published numbers, but you should keep it in mind. However, there is no need to adjust the figures for first baseman upwards --the only necessary adjustment is to take the DHs down a notch.

Finally, I consider each player at his primary defensive position (defined as where he appears in the most games), and do not weight the PADJ by playing time. This does shortchange a player like Ben Zobrist (who saw significant time at a tougher position than his primary position), and unduly boost a player like Joe Mauer (who logged a lot of games at a much easier position than his primary position). For most players, though, it doesn't matter much. I find it preferable to make manual adjustments for the unusual cases rather than add another layer of complexity to the whole endeavor.

2013 Leagues

2013 Teams

2013 Team Offense

2013 Team Defense

2013 AL Relievers

2013 NL Relievers

2013 AL Starters

2013 NL Starters

2013 AL Hitters

2013 NL Hitters

Walk Like a Sabermetrician

Monday, December 30, 2013

Crude NFL Ratings, 2013

Tuesday, December 17, 2013

Hitting by Position, 2013

Tuesday, December 10, 2013

Hitting by Lineup Position, 2013

Monday, December 02, 2013

Leadoff Hitters, 2013

Thursday, November 21, 2013

Statistical Meanderings 2013

Friday, November 08, 2013

IBA Ballot: MVP

Wednesday, November 06, 2013

IBA Ballot: Cy Young

Monday, November 04, 2013

IBA Ballot: Rookie of the Year

Sunday, November 03, 2013

End of Season Statistics 2013

Me, Elsewhere

Analysis Links

Reference Links

Blog Archive

OSU Baseball

End of Season Statistics

Win Shares Walkthrough

NL 1876-1881 Series

Labels

About Me