Monday, January 23, 2017

Crude Team Ratings, 2016

For the last several years I have published a set of team ratings that I call "Crude Team Ratings". The name was chosen to reflect the nature of the ratings--they have a number of limitations, of which I documented several when I introduced the methodology.

I explain how CTR is figured in the linked post, but in short:

1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.

2) Figure the average win ratio of the team’s opponents.

3) Adjust for strength of schedule, resulting in a new set of ratings.

4) Begin the process again. Repeat until the ratings stabilize.

The resulting rating, CTR, is an adjusted win/loss ratio rescaled so that the majors’ arithmetic average is 100. The ratings can be used to directly estimate W% against a given opponent (without home field advantage for either side); a team with a CTR of 120 should win 60% of games against a team with a CTR of 80 (120/(120 + 80)).

First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS:



Last year, the top ten teams in CTR were the playoff participants. That was not remotely the case this year thanks to a resurgent gap in league strength. While the top five teams in the AL made the playoffs and the NL was very close, St. Louis slipping just ahead of New York and San Francisco (by a margin of .7 wins if you compare aW%), the Giants ranked only fifteenth in the majors in CTR. Any of the Mariners, Tigers, Yankees, or Astros were considered stronger than the actual NL #3 seed and CTR finisher the Dodgers.

The Dodgers had the second-softest schedule in MLB, ahead of only the Cubs. (The natural tendency is for strong teams in weak divisions to have the lowest SOS, since they don’t play themselves. The flip is also true--I was quite sure without checking to verify that Tampa Bay had the toughest schedule). The Dodgers average opponent was about as good as the Pirates or the Marlins; the Mariners average opponent was rated stronger than the Cardinals.

At this point you probably want to see just how big of a gap there was between the AL and NL in average rating. Originally I gave the arithmetic average CTR for each divison, but that’s mathematically wrong--you can’t average ratios like that. Then I switched to geometric averages, but really what I should have done all along is just give the arithemetic average aW% for each division/league. aW% converts CTR back to an “equivalent” W-L record, such that the average across the major leagues will be .50000. I do this by taking CTR/(100 + CTR) for each team, then applying a small fudge factor to force the average to .500. In order to maintain some basis for comparison to prior years, I’ve provided the geometric average CTR alongside the arithmetric average aW%, and the equivalent CTR by solving for CTR in the equation:

aW% = CTR/(100 + CTR)*F, where F is the fudge factor (it was 1.0012 for 2016 lest you be concerned there is a massive behind-the-scenes adjustment taking place).



Every AL division was better than every AL division, a contrast from 2015 in which the two worst divisions were the NL East and West, but the NL Central was the best division. Whether you use the geometric or backdoor-arithmetric average CTRs to calculate it, the average AL team’s expected W% versus an average NL team is .545. The easiest SOS in the AL was the Indians, as to be expected as the strongest team in the weakest division; it was still one point higher than that of the toughest NL schedule (the Reds, the weakest team in the strongest division).

I also figure CTRs based on various alternate W% estimates. The first is based on game-Expected W%, which you can read about here. It uses each team’s game-by-game distribution of runs scored and allowed, but treats the two as independent:



Next is Expected W%, that is to say Pythagenpat based on actual runs scored and allowed:



Finally, CTR based on Predicted W% (Pythagenpat based on runs created and allowed, actually Base Runs):



A few seasons ago I started including a CTR version based on actual wins and losses, but including the postseason. I am not crazy about this set of ratings, but I can’t quite articulate why.

On the one hand, adding in the playoffs is a no-brainer. The extra games are additional datapoints regarding team quality. If we have confidence in the rating system (and I won’t hold it against you if you don’t), then the unbalanced nature of the schedule for these additional games shouldn’t be too much of a concern. Yes, you’re playing stronger opponents, but the system understands that and will reward you (or at least not penalize you) for it.

On the other hand, there is a natural tendency among people who analyze baseball statistics to throw out the postseason, due to concerns about unequal opportunity (since most of the league doesn’t participant) and due to historical precedent. Unequal opportunity is a legitimate concern when evaluating individuals--particularly for counting or pseudo-counting metrics like those that use a replacement level baseline--but much less of a concern with teams. Even though the playoff participants may not be the ten most deserving teams by a strict, metric-based definition of “deserving”, there’s no question that teams are largely responsible for their own postseason fate to a much, much greater extent than any individual player is. And the argument from tradition is fine if the issue at hand is the record for team wins or individual home runs or the like, but not particularly applicable when we are simply using the games that have been played as datapoints by which to gauge team quality.

Additionally, the fact that playoff series are not played to their conclusion could be seen as introducing bias. If the Red Sox get swept by the Indians, they not only get three losses added to their ledger, they lose the opportunity to offset that damage. The number of games that are added to a team’s record, even within a playoff round, is directly related to their performance in the very small sample of games.

Suppose that after every month of the regular season, the bottom four teams in the league-wide standings were dropped from the schedule. So after April, the 7-17 Twins record is frozen in place. Do you think this would improve our estimates of team strength? And I don’t just mean from the smaller sample, obviously their record as used in the ratings could be more heavily regressed than teams that played more games. But it would freeze our on-field observations of the Twins, and the overall effect would be to make the dropped teams look worse than their “true” strength.

I doubt that poorly reasoned argument swayed even one person, so the ratings including playoff performance are:



The teams sorted by difference between playoff CTR (pCTR) and regular season CTR (rsCTR):



It’s not uncommon for the pennant winners to be the big gainers, but the Cubs and Indians made a lot of hay this year, as the Cubs managed to pull every other team in the NL Central up one point in the ratings. The Rangers did the reverse with the AL West by getting swept out of the proceedings. They still had a better ranking than the team that knocked them out, as did Washington.

Tuesday, January 10, 2017

Hitting by Position, 2016

Of all the annual repeat posts I write, this is the one which most interests me--I have always been fascinated by patterns of offensive production by fielding position, particularly trends over baseball history and cases in which teams have unusual distributions of offense by position. I also contend that offensive positional adjustments, when carefully crafted and appropriately applied, remain a viable and somewhat more objective competitor to the defensive positional adjustments often in use, although this post does not really address those broad philosophical questions.

The first obvious thing to look at is the positional totals for 2016, with the data coming from Baseball-Reference.com. "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the overall major league average (this is a departure from past posts; I’ll discuss this a little at the end). “LPADJ” is the long-term positional adjustment that I use, based on 2002-2011 data. The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:



Obviously when looking at a single season of data it’s imperative not to draw any sweeping conclusions. That doesn’t make it any less jarring to see that second basemen outhit every position save the corner infield spots, or that left fielders created runs at the league average rate. The utter collapse of corner outfield offense left them, even pooled, ahead only of catcher and shortstop. Pitchers also added another point of relative RG, marking two years in a row of improvement (such as it is) over their first negative run output in 2014.

It takes historical background to fully appreciate how much the second base and corner outfield performances stack up. 109 for second base is the position’s best showing since 1924, which was 110 thanks largely to Rogers Hornsby, Eddie Collins and Frankie Frisch. Second base had not hit for the league average since 1949. (I should note that the historical figures I’m citing are not directly comparable - they based on each player’s primary position and include all of their PA, regardless of whether they were actually playing the position at the time or not, unlike the Baseball-Reference positional figures used for 2016). Corner outfield was even more extreme at 103, the nadir for the 116 seasons starting with 1901 (the previous low was 107 in 1992).

If the historical perspective is of interest, you may want to check out Corrine Landrey’s article in The Hardball Time Baseball Annual. She includes some charts showing OPS+ by position in the DH-era and theorizes that an influx of star young players, still playing on the right-side of the defensive spectrum, has led to the positional shakeup. While I cautioned above about over-generalizing from one year of data, it has been apparent over the last several years that the spread between positions has declined. Landrey’s explanation is as viable as any I’ve seen to explain these season’s results.

Moving on to looking at more granular levels of performance, I always start by looking at the NL pitching staffs and their RAA. I need to stress that the runs created method I’m using here does not take into account sacrifices, which usually is not a big deal but can be significant for pitchers. Note that all team figures from this point forward in the post are park-adjusted. The RAA figures for each position are baselined against the overall major league average RG for the position, except for left field and right field which are pooled.



This is the second consecutive year that the Giants led the league in RAA, and of course they employ the active pitcher most known for his batting. But as usual the spread from top to bottom is in the neighborhood of twenty runs.

I don’t run a full chart of the leading positions since you will very easily be able to go down the list and identify the individual primarily responsible for the team’s performance and you won’t be shocked by any of them, but the teams with the highest RAA at each spot were:

C--WAS, 1B--CIN, 2B--WAS, 3B--TOR, SS--LA, LF--PIT, CF--LAA, RF--BOS, DH--BOS

More interesting are the worst performing positions; the player listed is the one who started the most games at that position for the team:



I am have as little use for batting average as anyone, but I still find the Angels .209 left field average to be the single most entertaining number on that chart (remember, that’s park-adjusted; it was .204 raw). The least entertaining thing for me at least was the Indians’ production at catcher, which was tolerable when Roberto Perez was drawing walks but intolerable when Terry Francona was pinch-running for him in Game 7.

I like to attempt to measure each team’s offensive profile by position relative to a typical profile. I’ve found it frustrating as a fan when my team’s offensive production has come disproportionately from “defensive” positions rather than offensive positions (“Why can’t we just find a corner outfielder who can hit?”) The best way I’ve yet been able to come up with to measure this is to look at the correlation between RG at each position and the long-term positional adjustment. A positive correlation indicates a “traditional” distribution of offense by position--more production from the positions on the right side of the defensive spectrum. (To calculate this, I use the long-term positional adjustments that pool 1B/DH as well as LF/RF, and because of the DH I split it out by league):



As you can see, there are good offenses with high correlations, good offenses with low correlations, and every other combination. I have often used this space to bemoan the Indians continual struggle to get adequate production from first base, contributing to their usual finish in the bottom third or so of correlation. This year, they rank in the middle of the pack, and while it is likely a coincidence that they had a good season, it’s worth noting that Mike Napoli only was average for a first baseman. Even that is much better than some of their previous showings.

Houston’s two best hitting positions (not relative to positional averages, but in terms of RG) were second base and shortstop. In fact the Astros positions in descending order of RG was 4, 6, 9, 2, 5, 3, D, 7, 8. That’s how you get a fairly strong negative correlation between RG and PADJ.

The following charts, broken out by division, display RAA for each position, with teams sorted by the sum of positional RAA. Positions with negative RAA are in red, and positions that are +/-20 RAA are bolded:



Boston had the AL’s most productive outfield, while Toronto was just an average offense after bashing their way to a league leading 118 total RAA in 2015. It remains jarring to see New York at the bottom of an offense list, even just for a division, and their corner infielders were the worst in the majors.



Other than catcher, Cleveland was solid everywhere, with no bold positions--and in this division, that’s enough to lead in RAA and power a cruise to the division title. Detroit had the AL’s top corner infield RAA (no thanks to third base). Kansas City, where to begin with the sweet, sweet schadenfreude? Eksy Magic? No, already covered at length in the leadoff hitters post. Maybe the fact that they had the worst middle infield production in MLB? Or that the bros at the corners chipped in another -19 RAA to also give them the worst infield? The fact that they were dead last in the majors in total RAA? It’s just too much.



The pathetic production of the Los Angeles left fielders was discussed above. The Mike Trout-led center fielders were brilliant, the best single position in the majors. And so, even with a whopping -31 runs from left field, the Angels had the third-most productive outfield in MLB. Houston’s middle infielders, also mentioned above, were the best in the majors. Oakland’s outfield RAA was last in the AL.



Washington overcame the NL’s least productive corner infielders, largely because they had the NL’s most productive middle infielders. Miami had a similar but even more extreme juxtaposition, the NL’s worst infield and the majors’ best outfield, and that with a subpar season from Giancarlo Stanton as right field was the least productive of the three spots. Atlanta had the NL’s worst-hitting middle infield, and Philadelphia the majors’ worst outfield despite Odubel Herrera making a fool of me.



Chicago was tops in the majors in corner infield RAA and total infield RAA. No other teams in this division achieved any superlatives but thanks to Joey Votto and a half-season of Jonathon Lucroy, every team was in the black for total RAA, even if we were to add in Cincinnati’s NL-trailing -9 RAA from pitchers.



No position grouping superlatives in this division, but it feels like more should be said about Corey Seager. It seems like a rookie shortstop hitting as he did, fielding adequately enough to be a serious MVP candidate for a playoff team in a huge market for one of the five or so most venerated franchises should have gotten a lot more attention than it did. Is it the notion that a move to third base is inevitable? Is he, like the superstar down the road, just considered too boring of a personality?

The full spreadsheet is available here.

Monday, December 12, 2016

Hitting by Lineup Position, 2016

I devoted a whole post to leadoff hitters, whether justified or not, so it's only fair to have a post about hitting by batting order position in general. I certainly consider this piece to be more trivia than sabermetrics, since there’s no analytic content.

The data in this post was taken from Baseball-Reference. The figures are park-adjusted. RC is ERP, including SB and CS, as used in my end of season stat posts. The weights used are constant across lineup positions; there was no attempt to apply specific weights to each position, although they are out there and would certainly make this a little bit more interesting:



The seven year run of NL #3 hitters as the best position in baseball was snapped, albeit by an insignificant .01 RG by AL #3 hitters. Since Mike Trout’s previous career high in PA out of the #3 spot was 336 in 2015 and he racked up 533 this year, I’m going to give full credit to Trout; as we will see in a moment, the Angels’ #3 hitters were the best single lineup spot in baseball. #2 hitters did not outperform #5 in both circuits as they did last year, just the AL. However, the NL made up for hit by having their leadoff hitters create runs at almost the exact same rate as their #5s.

Next are the team leaders and trailers in RG at each lineup position. The player listed is the one who appeared in the most games in that spot (which can be misleading, especially for spots low in the batting order where many players cycle through):





A couple things that stood out to me was St. Louis’ dominance at the bottom of the order and the way in which catchers named Perez managed to sabotage lineup spots for two teams. Apologies to Carlos Beltran (the real culprits for the poor showing of Texas #3 hitters were Adrian Beltre, Prince Fielder, and Nomar Mazara) and Luis Valbuena (Carlos Gomez and Marwin Gonzalez).

The case of San Diego’s cleanup hitters deserves special attention. Yangervis Solarte was actually pretty good when batting cleanup, as his .289/.346/.485 line in 289 PA compares favorably to the NL average for cleanup hitters. The rest of the Padres who appeared in that spot combined for 399 PA with a dreadful .187/.282/.336 line. Just to give you a quick idea of how bad this is, the 618 OPS would have been the eleventh-worst among any non-NL #9 lineup spot in the majors, leading only 6 AL #9s, 2 #2s, a #7, and the horrible Oakland #2s. It was also worse than the Cardinals’ #9 hitters.

The next list is the ten best positions in terms of runs above average relative to average for their particular league spot (so AL leadoff spots are compared to the AL average leadoff performance, etc.):



And the ten worst:



Joe Mauer himself wasn’t that bad, with a 799 OPS when hitting third. That’s still well-below the AL average, but not bottom ten in RAA bad without help from his friends.

The last set of charts show each team’s RG rank within their league at each lineup spot. The top three are bolded and the bottom three displayed in red to provide quick visual identification of excellent and poor production:





The full spreadsheet is available here.

Monday, December 05, 2016

Leadoff Hitters, 2016

I will try to make this as clear as possible: the statistics are based on the players that hit in the #1 slot in the batting order, whether they were actually leading off an inning or not. It includes the performance of all players who batted in that spot, including substitutes like pinch-hitters.

Listed in parentheses after a team are all players that started in twenty or more games in the leadoff slot--while you may see a listing like "COL (Blackmon)" this does not mean that the statistic is only based solely on Blackmon's performance; it is the total of all Colorado batters in the #1 spot, of which Blackmon was the only one to start in that spot in twenty or more games. I will list the top and bottom three teams in each category (plus the top/bottom team from each league if they don't make the ML top/bottom three); complete data is available in a spreadsheet linked at the end of the article. There are also no park factors applied anywhere in this article.

That's as clear as I can make it, and I hope it will suffice. I always feel obligated to point out that as a sabermetrician, I think that the importance of the batting order is often overstated, and that the best leadoff hitters would generally be the best cleanup hitters, the best #9 hitters, etc. However, since the leadoff spot gets a lot of attention, and teams pay particular attention to the spot, it is instructive to look at how each team fared there.

The conventional wisdom is that the primary job of the leadoff hitter is to get on base, and most simply, score runs. It should go without saying on this blog that runs scored are heavily dependent on the performance of one’s teammates, but when writing on the internet it’s usually best to assume nothing. So let's start by looking at runs scored per 25.5 outs (AB - H + CS):

1. HOU (Springer/Altuve), 6.9
2. COL (Blackmon), 6.7
3. DET (Kinsler), 6.6
Leadoff average, 5.2
ML average, 4.5
28. SF (Span), 4.4
29. KC (Escobar/Dyson/Merrifield), 4.1
30. OAK (Crisp/Burns), 3.4

Again, no park adjustments were applied, so the Rockies performance was good but it wasn’t really “best in the NL good”. I’m also going to have a hard time resisting just writing “Esky Magic” every time the Royals appear on a trailers list.

The most basic team independent category that we could look at is OBA (figured as (H + W + HB)/(AB + W + HB)):

1. CHN (Fowler/Zobrist), .383
2. HOU (Springer/Altuve), .375
3. STL (Carpenter), .370
Leadoff average, .341
ML average, .324
28. WAS (Turner/Revere/Taylor), .305
29. KC (Escobar/Dyson/Merrifield), .298
30. OAK (Crisp/Burns), .290

Esky Magic. And once again Billy Burns chipping in to Oakland’s anemic showing and of course Kansas City just had to have Billy Burns.

The next statistic is what I call Runners On Base Average. The genesis for ROBA is the A factor of Base Runs. It measures the number of times a batter reaches base per PA--excluding homers, since a batter that hits a home run never actually runs the bases. It also subtracts caught stealing here because the BsR version I often use does as well, but BsR versions based on initial baserunners rather than final baserunners do not. Here ROBA = (H + W + HB - HR - CS)/(AB + W + HB).

This metric has caused some confusion, so I’ll expound. ROBA, like several other methods that follow, is not really a quality metric, it is a descriptive metric. A high ROBA is a good thing, but it's not necessarily better than a slightly lower ROBA plus a higher home run rate (which would produce a higher OBA and more runs). Listing ROBA is not in any way, shape or form a statement that hitting home runs is bad for a leadoff hitter. It is simply a recognition of the fact that a batter that hits a home run is not a baserunner. Base Runs is an excellent model of offense and ROBA is one of its components, and thus it holds some interest in describing how a team scored its runs, rather than how many it scored:

1. CHN (Fowler/Zobrist), .351
2. MIA (Gordon/Suzuki/Dietrich/Realmuto), .335
3. ATL (Inciarte/Peterson/Markakis), .331
4. HOU (Springer/Altuve), .331
Leadoff average, .305
ML average, .287
28. TEX (Choo/Odor/DeShields/Profar), .264
29. WAS (Turner/Revere/Taylor), .260
30. MIN (Dozier/Nunez), .256

Kansas City leadoff hitters finished tied for last in the majors with five home runs (with Miami), so Esky Magic was only good for 23rd place. Twins leadoff hitters, thanks primarily to Dozier, led the majors with 39 homers. So only after around 25.6% of leadoff hitter plate appearances did they actually wind up with a runner on base. Their .320 OBA was well-below average too, but again ROBA describes how an offense plays out--other considerations are necessary to determine how good it was.

I also include what I've called Literal OBA--this is just ROBA with HR subtracted from the denominator so that a homer does not lower LOBA, it simply has no effect. It “literally” (not really, thanks to errors, out stretching, caught stealing after subsequent plate appearances, etc.) is the proportion of plate appearances in which the batter becomes a baserunner able to be advanced by his teammates. You don't really need ROBA and LOBA (or either, for that matter), but this might save some poor message board out there twenty posts, by not implying that I think home runs are bad, so here goes. LOBA = (H + W + HB - HR - CS)/(AB + W + HB - HR):

1. CHN (Fowler/Zobrist), .360
2. HOU (Springer/Altuve), .344
3. STL (Carpenter), .342
Leadoff average, .313
ML average, .297
28. OAK (Crisp/Burns), .273
29. MIN (Dozier/Nunez), .270
30. WAS (Turner/Revere/Taylor), .268

The next two categories are most definitely categories of shape, not value. The first is the ratio of runs scored to RBI. Leadoff hitters as a group score many more runs than they drive in, partly due to their skills and partly due to lineup dynamics. Those with low ratios don’t fit the traditional leadoff profile as closely as those with high ratios (at least in the way their seasons played out):

1. MIA (Gordon/Suzuki/Dietrich/Realmuto), 2.6
2. SD (Jankowski/Jay), 2.3
3. ATL (Inciarte/Peterson/Markakis), 2.0
6. LAA (Escobar/Calhoun), 1.9
Leadoff average, 1.5
ML average, 1.0
26. STL (Carpenter), 1.3
28. BOS (Betts/Pedroia), 1.2
29. OAK (Crisp/Burns), 1.2
30. MIN (Dozier/Nunez), 1.1

This speaks more to me than the measure, but the most interesting thing I learned from that list was that Travis Jankowski was San Diego’s primary leadoff hitter (71 games). Looking at the rest of the list, I think I could have guessed most team’s in two or three, I never would have gotten the Padres.

A similar gauge, but one that doesn't rely on the teammate-dependent R and RBI totals, is Bill James' Run Element Ratio. RER was described by James as the ratio between those things that were especially helpful at the beginning of an inning (walks and stolen bases) to those that were especially helpful at the end of an inning (extra bases). It is a ratio of "setup" events to "cleanup" events. Singles aren't included because they often function in both roles.

Of course, there are RBI walks and doubles are a great way to start an inning, but RER classifies events based on when they have the highest relative value, at least from a simple analysis:

1. MIA (Gordon/Suzuki/Dietrich/Realmuto), 1.8
2. ATL (Inciarte/Peterson/Markakis), 1.4
3. PHI (Herrera/Hernandez), 1.4
6. NYA (Ellsbury/Gardner), 1.2
Leadoff average, .8
ML average, .7
26. COL (Blackmon), .5
28. TB (Forsythe/Guyer), .5
29. DET (Kinsler), .5
30. BAL (Jones/Rickard), .4

The Orioles certainly had a non-traditional leadoff profile thanks mostly to Jones; their five stolen base attempts was the fewest of any team, they were tied for third with 30 homers, and they drew 20 less walks than an average team out of the leadoff spot.

Since stealing bases is part of the traditional skill set for a leadoff hitter, I've included the ranking for what some analysts call net steals, SB - 2*CS. I'm not going to worry about the precise breakeven rate, which is probably closer to 75% than 67%, but is also variable based on situation. The ML and leadoff averages in this case are per team lineup slot:

1. WAS (Turner/Revere/Taylor), 30
2. MIL (Villar/Santana), 27
3. MIA (Gordon/Suzuki/Dietrich/Realmuto), 22
4. CLE (Santana/Davis), 20
Leadoff average, 6
ML average, 2
28. TB (Forsythe/Guyer), -11
29. SEA (Aoki/Martin), -13
30. PHI (Herrera/Hernandez), -16

The Indians are a good example of why I list all players who had at least twenty starts in the leadoff spot; AL steal leader Rajai Davis’ 69 games leading off led to them leading the AL in net steals.

Shifting back to quality measures, first up is one that David Smyth proposed when I first wrote this annual leadoff review. Since the optimal weight for OBA in a x*OBA + SLG metric is generally something like 1.7, David suggested figuring 2*OBA + SLG for leadoff hitters, as a way to give a little extra boost to OBA while not distorting things too much, or even suffering an accuracy decline from standard OPS. Since this is a unitless measure anyway, I multiply it by .7 to approximate the standard OPS scale and call it 2OPS:

1. COL (Blackmon), 880
2. BOS (Betts/Pedroia), 872
3. HOU (Springer/Altuve), 865
Leadoff average, 775
ML average, 745
28. SF (Span), 722
29. OAK (Crisp/Burns), 654
30. KC (Escobar/Dyson/Merrifield), 650

Esky Magic.

Along the same lines, one can also evaluate leadoff hitters in the same way I'd go about evaluating any hitter, and just use Runs Created per Game with standard weights (this will include SB and CS, which are ignored by 2OPS):

1. COL (Blackmon), 6.4
2. BOS (Betts/Pedroia), 6.3
3. HOU (Springer/Altuve), 6.2
Leadoff average, 4.9
ML average, 4.5
28. SF (Span), 4.1
29. KC (Escobar/Dyson/Merrifield), 3.4
30. OAK (Crisp/Burns), 3.3

Esky Magic.

The same six teams make up the leaders and trailers, which shouldn’t be a big surprise.

Allow me to close with a crude theoretical measure of linear weights supposing that the player always led off an inning (that is, batted in the bases empty, no outs state). There are weights out there (see The Book) for the leadoff slot in its average situation, but this variation is much easier to calculate (although also based on a silly and impossible premise).

The weights I used were based on the 2010 run expectancy table from Baseball Prospectus. Ideally I would have used multiple seasons but this is a seat-of-the-pants metric. The 2010 post goes into the detail of how this measure is figured; this year, I’ll just tell you that the out coefficient was -.224, the CS coefficient was -.591, and for other details refer you to that post. I then restate it per the number of PA for an average leadoff spot (746 in 2014):

1. HOU (Springer/Altuve), 30
2. COL (Blackmon), 28
3. CHN (Fowler/Zobrist), 27
Leadoff average, 7
ML average, 0
28. SF (Span), -8
29. KC (Escobar/Dyson/Merrifield), -19
30. OAK (Crisp/Burns), -21

Esky Magic. Lest anyone think I am being unduly critical of Escobar's performance (he did after all start only half (actually 82) of KC's games as the leadoff hitter), note that Escobar when in the #1 spot hit .242/.272/.289. The rest of the Royals combined for .274/.317/.378, which would only rank second worst in the majors in 2OPS. So the Royals team performance was terrible, but Escobar was dreadful. Just the worst.

The spreadsheet with full data is available here.

Monday, November 28, 2016

Statistical Meanderings, 2016

What follows is an abbreviated version of my annual collection of oddities that jump out at me from the year-end statistical reports I publish on this blog. These tidbits are intended as curiosities rather than as sober sabermetric analysis:

* The top ten teams in MLB in W% were the playoff participants. The top six were the division winners. A rare case in which obvious inequities aren't created by micro-divisions, in stark constant to 2015's NL Central debacle.

* In the NL, only Washington (.586) had a better overall W% than Chicago's road W% (.575). Of course, the Cubs were a truly great team, and with 103 wins and a world title on the heels of 97 wins a year ago, they belong in any discussion of the greatest teams of all-time. In Baseball Dynasties, Eddie Epstein and Rob Neyer used three years as their base time period for ranking the greatest dynasties. Another comparable regular season in 2017, regardless of playoff result, would in my opinion place the Cubs forwardly on a similarly-premised list.

Most impressive about the Cubs is that despite winning 103, their EW% (.667) and PW% (.660) outpaced their actual W% of .640.

* It is an annual tradition to run a chart in this space that compares the offensive and defensive runs above average for each of the playoff teams. RAA is figured very simply here by comparing park adjusted runs or runs allowed per game to the league average. Often I enjoy showing that the playoff teams were stronger offensively than defensively, but that was not the case in 2016:



This is another way to show just how great the Cubs were--only two other playoff teams were as many as 80 RAA on either side of the scorecard and the Cubs were +101 offensively and +153 defensively.

* The Twins have a multi-year run of horrible starting pitching, and 2016 only added to the misery. Only the Angels managed a worse eRA from their starters (5.61 to 5.58); only A's starters logged fewer innings per start among AL teams (5.39 to 5.40); and the Twins were dead last in the majors in QS% (36%). In their surprising contention blip of 2015, the Twins were only in the bottom third of the AL in starting pitching performance, but in 2014 they were last in the majors in eRA, second-last in IP/S (ahead of only Colorado and QS%; in 2013 they were last in all three categories; and in 2012 they were last in the majors in eRA and second-last in IP/S and QS%.

* There were a lot of great things from my perspective about the 2016 season from a team performance perspective, chiefly the Indians winning the pennant and playoffs in which the lesser participants did not advance their way through. Both were helped along by the comeuppance finally delivered to the Royals. It wasn't quite as glorious as it might have been, as they still managed to scrap out a .500 record, but the fundamental problems with their vaunted contact offense were laid bare. KC was easily the lowest scoring team in the AL at 4.05 R/G, with the Yankees of all teams second-worst with 4.19. They were last in the majors with .075 walks/at bat (COL, .084 was second worst). They were last in the AL in isolated power by 12 points (.137) and beat out only Atlanta and Miami, edging out the 30th-ranked Braves by just .007 points. Combining those two, their .212 secondary average was sixteen points lower than the Marlins for last in the majors. But they were at the AL average in batting average at .257, so that's something.

* Andrew Miller averaged 17.1 strikeouts and 1.3 walks per 37.2 plate appearances (I use the league average of PA/G for to rest K and W rate per PA on the familiar scale of per nine innings while still using the proper denominator of PA). If you halve his K rate and double his walk rate, that's 8.6 and 2.6, which is still a pretty solid reliever. A comparable but slightly inferior performer this year was Tony Watson (8.2 and 2.8).

* Boston's bullpen was built (or at least considered by some preseason) to be a lockdown unit with Tazawa, Uehara, and Kimbrel. Tazawa had a poor season with 0 RAR; Uehara and Kimbrel missed some time with injuries and were just okay when they pitched for 10 RAR each. Combined they had 20 RAR. Dan Otero, a non-roster invitee to spring training with Cleveland, had 26 RAR.

* Matt Albers (-18) had the lowest RAR of anyone who qualified for any of my individual stat reports. I don't think that save is very likely at this point.

* Just using your impression of Toronto's starters, their talent/stuff/age/etc., just try to associate each to their strikeout and walk rates (the five pitchers are RA Dickey, Marco Estrada, JA Happ, Aaron Sanchez, and Marcus Stroman):



The correct answer from A to E is Dickey, Sanchez, Stroman, Estrada, Happ. I never got a chance to play this game without being spoiled, but I'm certain that I would have at least said that Aaron Sanchez was pitcher D.

* Jameson Taillon made it to the majors at age 25, and the thing that jumped out at me from his stat line was his very low walk rate (1.5, lower than any NL starter with 15 starts save Clayton Kershaw and Bartolo Colon. note that Taillon just cleared the bar for inclusion).

John Lackey, at age 38, chipped in 49 RAR to Chicago (granted, fielding support contributed to his performance). Taillon and Lackey are always linked in my head thanks to a Fangraphs prospect post from several years ago that I will endeavor to find. I believe the Fangraphs writer offered Lackey as a comp for Taillon. A commenter, perhaps a Pittsburgh partisan, responded by saying it was a ridiculous comparison, essentially an insult to Taillon.

My thought at the time was that if I had any pitching prospect in the minors, and you told me that if I signed on the dotted line he would wind up having John Lackey's career, I would take it every time. That's not to say that there aren't pitchers in the minors who won't exceed Lackey's career, but to think that it's less than the median likely outcome for any pitching prospect is pretty aggressive. And this was before Lackey's late career performance which has further bolstered his standing. What odds would you place now on Jameson Taillon having a better career than John Lackey?

* Jeff Francoeur had exactly 0 RAR. Ryan Howard had 1, before fielding/baserunning which would push him negative.

* I mentioned in my MVP post how unique it was that Kyle and Corey Seager were both worthy of being on the MVP ballot. They performed fairly comparably across the board:



Chase and Travis d'Arnaud also had pretty similar numbers. Not good numbers, but similar nonetheless (which in Chase's case was probably a triumph whilst a disappointment for Travis):



* It wouldn't be a meanderings post without some Indians-specific comments. It has actually been harder than usual to move on to writing the year-end posts because of the disappointment of seeing the Indians lose their second, third, and fourth-consecutive games with a chance to close out the World Series. Three of those losses have come by one run and two in Game 7 in extra innings. The Indians have now gone 68 seasons without winning the World Series, losing four consecutive World Series after winning the first two in franchise history. That now matches the record of the Red Sox from 1918 - 1986, which if Ken Burns' "Baseball" and plagiarist/self-proclaimed patron saint of sad sack franchises Doris Kearns Goodwin are to believed was a level of baseball fan suffering unmatched and possibly comparable to the Battle of Stalingrad. Well, except for the initial two World Series winning streak--Boston won their first four World Series.

The two Cleveland notes I have are negative, which is only because I have been thinking about them in conjunction with Game 7. One is how bad Yan Gomes was this season, creating just 1.9 runs per game over 262 PA, dead last in the AL among players with 250 or more PA. I did not understand Terry Francona's decision to pinch-run for Roberto Perez with the Indians down multiple runs in the seventh inning. He must have felt that a basestealing threat would distract Jon Lester, but given the inning and the extent of Cleveland's deficit, it basically ensured that Gomes would have to bat at some point. And bat he did, with the go-ahead run on first and two outs in the eighth against a laboring Chapman who had just coughed up the lead.

Also costly was the decision to bring Michael Martinez in to play outfield in the ninth. That move made more sense given Coco Crisp's noodle arm, but to see Martinez make the last out was a tough pill to swallow (and had Martinez somehow reached base, Gomes would have followed). And don't even get me started on the intentional walks in the tenth inning.

Also, it must be noted that Mike Napoli, who struggled in the postseason, was a very average performer in the regular season, creating 5.2 runs per game as first baseman. This is not intended as a criticism of Napoli, especially since I have been kvetching for years about the Indians inability to get even average production out of the corners. Napoli fit that need perfectly. But it felt as if the fans and media evaluated his performance as better than that (even limited strictly to production in the batter's box and not alleged leadership/veteran presence/etc.)

* For various reasons, a few of the players who were in the thick of the NL MVP race a year ago and were surely considered favorites coming into this season had disappointing seasons. These three outfielders (Bryce Harper, Andrew McCutchen, Giancarlo Stanton) all wound up fairly close in 2016 RAR (28, 27, 23 respectively), yielding the MVP center stage to youngsters (Kris Bryant and Corey Seager), first basemen (Freddie Freeman, Anthony Rizzo, Joey Votto) and a guy having a career year (Daniel Murphy).

More interestingly, those big three outfielders combined for 78 RAR--five fewer than Mike Trout.

Wednesday, November 16, 2016

Hypothetical Ballot: Cy Young

There are no particular standout candidates for the Cy Young in either league, and I was tempted to open up this post by saying something like “Maybe it is a harbinger of things to come, as starting pitchers workloads continue to decrease and more managers consider times through the order in making the decision to go to the bullpen…we can expect more seasons like this, where no Cy Young contender really distinguishes himself.”

And then I stopped and concluded, “You idiot, don’t you dare write that.” This is exactly the kind of banal over-extrapolation of heavily selected data that I rail against constantly. In the long run, is it possible that those factors could contribute to a dilution of clear Cy Young candidates, leaving voters to comb over a pack of indistinguishable guys pitching 180 innings a year? Entirely possible. Does that make 2016 the new normal? Of course not. Just last year, there was an epic three-way NL Cy Young race. This year, only an injury to Clayton Kershaw seems to have stood in the way of a historic season and Cy Young landslide.

In the AL race, Justin Verlander had a 70 to 61 RAR lead over Chris Sale, with a pack of pitchers right behind them (Rick Porcello 59, Corey Kluber 58, Jose Quintana 57, Aaron Sanchez/JA Happ/Masahiro Tanaka 56). Convieniently, the first four in RAR also are the only pitchers who would also have 50 or more RAR based on eRA or dRA, with one exception. Verlander allowed a BABIP of just .261 and would so his dRA is 3.80, significantly higher than his 3.04 RRA. However, none of the others look better using dRA--all three are five to eight runs worse. So I go with Verlander for the top spot and Porcello second over Sale (he led the AL with a 3.14 eRA, and since we are talking about one run differences here, Bill James would at least want us to consider his 22-4 W-L record). I didn’t actually consider the W-L record, but he does rank just ahead of Sale if you weight RAR from actual/eRA/dRA at 50%/30%/20%, which has no scientific basis but seems reasonable enough. Again, there’s only a one RAR difference between Sale and Porcello, so using W-L or flipping a coin to order them is just as reasonable. I gave the fifth spot to Jose Quintana over Aaron Sanchez, and would not have guessed that Quintana had a better strikeout rate (8.1 to 7.8).

This leaves out Zach Britton, who I credit with just 35 RAR. I remain thoroughly unconvinced that leverage bonuses are appropriate. Each run allowed and out recorded is worth the same to the final outcome regardless of what inning it comes in. The difference between starters and relief aces is that some of the games the former pitch could have been won or lost with worse or better performances, while relief aces generally are limited to pitching in close games. But the fact that Britton pitches the ninth doesn’t make his shutout inning any more valuable than the one Chris Tillman pitched in the fourth within the context of that single game. To the extent that Britton contributes more value on a per inning basis, it’s because he pitched in a greater proportion of games in which one run might have made a difference, not because that is more apparent for any particular game at the point at which Britton appears in it than it was when the starter was pitching. I have alluded to this viewpoint many times, but have never written it up satisfactorily because I’ve not figured out how to propose a leverage adjustment that captures it, without going to the extreme that value can only be generated by pitching in games your team wins.

1. Justin Verlander, DET
2. Rick Porcello, BOS
3. Chris Sale, CHA
4. Corey Kluber, CLE
5. Jose Quintana, CHA

In the NL, there were seven starters with 60 RAR and then a gap of four to Jake Arrieta, which makes a good cohort to consider for the ballot. Of this group, Tanner Roark and Madison Bumgarner at the bottom in terms of RAR and had high dRAs (4.17 and 3.87) which justify dropping them.

That leaves Jon Lester (71 RAR), Kyle Hendricks (70), Max Scherzer (70), Johnny Cueto (65), and Clayton Kershaw (64). If you weight 50/30/20 as for the AL, all five are clustered between 60 and 64 RAR. This makes it tempting to just to pick Kershaw as he was much the best in every rate and narrowly missed leading the league in RAA despite pitching only 149 innings.

Among the four who pitched full seasons, Scherzer ranks first in innings and third in RRA, eRA, and dRA. However, he pitched significantly more innings than the Cubs candidates--25 more than Lester and 38 more than Hendricks. Comparing him to Cueto, who pitched nine fewer innings, Scherzer leads in RRA by .09 runs, eRA by .13 runs, and trails in dRA by .09 runs. So for my money Scherzer provided the best mix of effectiveness and durability.

All that’s left is a direct comparison of Scherzer to Kershaw, in which I think the innings gap is just too great without giving excessive weight to peripherals. The difference between Scherzer and Kershaw is 79 innings with a 3.62 RRA. To put it in 2016 performance terms, that makes Scherzer equivalent to Kershaw plus a solid reliever like Felipe Rivero or Travis Wood. That’s too much value for me to ignore looking at the gaudy (and they are gaudy!) rate stats:

1. Max Scherzer, WAS
2. Jon Lester, CHN
3. Kyle Hendricks, CHN
4. Clayton Kershaw, LA
5. Johnny Cueto, SF

Hypothetical Ballot: MVP

You could basically copy and paste the same thing for AL MVP every year, so I’ll try to keep it brief. My position is that wins are value, and 8 wins don’t count for more because the rest of your teammates were worth 50 than if the rest of your teammates were only worth 30.

But the debate over the definition of value is not what I find most obnoxious about the Mike Trout-era MVP discussions. It’s easy enough to disagree on that point and move one. What is most bothersome is the way that people attempt to co-opt the sabermetric terms that sound sabermetric like “error bars” to push their own narratives.

Let’s suppose that Player A is estimated to have contributed 87 RAR and player B is estimated to have contributed 80 RAR, and that the standard error is something like 10 runs. In this case, it certainly is inconclusive that player A was truly more valuable than player B. I would grant that player B would be a reasonable choice as MVP.

But if you’re filing out your MVP ballot, *should* you put Player B ahead of Player A? It’s still quite likely that Player A was more valuable than Player B. To me, you need to have a good reason to put Player B ahead, particularly when the margin is “significant” but not beyond the “error bar”.

Worse yet, though, is the attempt to twist oneself into a pretzel to make up those good reasons. The real gem going around, which you will see in comment sections and message boards, is that the error bars must be larger for Player A. Because you see, Player A’s park became a strong pitcher’s park right around when he arrived, and parks don’t change character like that (says someone who has never examined historical park factors). Because you see, Player A always leads the league in RAR, and by a wide margin--that just can’t be right. Player A is so consistently great in the metrics that the metrics must be wrong.

The world is not worthy of Player A. Every week of Player A’s career is scrutinized by pseudo-sabermetricians who have deadlines to fill with their micro-analytical pablum, and who when they aren’t vulturing over Player A are busy writing extrapolating trends from blips in thirty-team samples to blame metrics for their own arrogance. Player A can’t win with the people who should be appreciating him--not in the sense that a fan might but exactly in the sense that a detached analyst would.

I’m sure you’ve deduced by now that Player A is Mike Trout, and you may have guessed that Player B is Mookie Betts. Except those aren’t even my true estimates of their RAR, they’re what I would come up with their RAR if I took my hitting/position RAR + BP’s baserunning runs (for non-steals, since steals are incorporated in the first piece) + the average of each player’s BP FRAA, BIS DRS, and MGL UZR. In other words, if I didn’t regress fielding at all, which I don’t think is the correct position. When adding components together, if one (hitting) is more reliable than another (fielding), it doesn’t make sense to ignore that. In actually estimating RAR for the purpose of filling out a fake MVP ballot, I used 50% FRAA, 25% DRS, 25% UZR, and halved it. Then Trout is at 86 RAR, Betts 68, and Jose Alutve slides in between them at 71, which explains the top of my ballot.

If anything, I think I may be generous to Betts, who needs all of his 8 baserunning runs and 11 “regressed” fielding runs to overcome 49 hitting RAR, which ranked just ninth in the league. Kyle Seager also made it onto my ballot on the strength of 8 fielding runs, and Francisco Lindor came close with 5 from baserunning and 10 from fielding. David Ortiz and Miguel Cabrera gave up 5 runs from non-hitting activities (or in Ortiz’s case, non-acitivty), which pushed them just off the ballot. Last year’s Player B, Josh Donaldson, was only a hair behind Betts, having another excellent season with 65 RAR and good-average fielding except in FRAA, which didn’t like his performance at all (-12).

The AL starting pitchers lacked any standout Cy Young candidates, but made up for it by being tightly bunched, so four of the final six spots go to them:

1. CF Mike Trout, LAA
2. 2B Jose Altuve, HOU
3. RF Mookie Betts, BOS
4. 3B Josh Donaldson, TOR
5. SP Justin Verlander, DET
6. 2B Robinson Cano, NYA
7. SP Rick Porcello, BOS
8. SP Chris Sale, CHA
9. SP Corey Kluber, CLE
10. 3B Kyle Seager, SEA

In the NL, I think Kris Bryant is a pretty clear pick for the top spot. He was second in the league in RAR by just one run to Joey Votto, which he makes up with baserunning alone and pads with strong fielding runs (2, 10, 12). Anthony Rizzo seems to be the other top candidate in mainstream opinion, but he only ranks third among first baseman on my ballot. Rizzo, Freddie Freeman, and Joey Votto all had similar playing time, but both significantly outhit him (Rizzo 6.9 RG, Votto 8.2, Freeman 7.6). Rizzo makes up much of the ground on Votto with his glove, but Freeman is no slouch himself.

Corey Seager got mixed reviews as a fielder (-8, 0, 11) so he falls just behind Freeman on my ballot. I’m quite certain I’ve never had brothers on both of my MVP top 10s in the same year, or any year. Daniel Murphy was third to Votto and Bryant in RAR, but his fielding reviews aren’t so mixed (-5, -11, -6), and even before considering that was actually just behind Max Scherzer in RAR. From there, it’s just a matter of mixing in the pitchers and noting that four Cubs are on the ballot:

1. 3B Kris Bryant, CHN
2. 1B Freddie Freeman, ATL
3. SS Corey Seager, LA
4. SP Max Scherzer, WAS
5. 2B Daniel Murphy, WAS
6. SP Jon Lester, CHN
7. 1B Joey Votto, CIN
8. 1B Anthony Rizzo, CHN
9. SP Kyle Hendicks, CHN
10. SP Clayton Kershaw, LA

Wednesday, November 09, 2016

Hypothetical Ballot: Rookie of the Year

It was a bad year for rookies in the AL, made more interesting by the very late arrival of Gary Sanchez. Most of the discussion about the award seems to center around whether it is appropriate to give it to Sanchez based on his brilliant 227 PA, and whether ROY should be a value award, a future prospect award, or some kind of ungodly hybrid of the two. My own approach is that it should be a value award--anyone who is a rookie should be eligible and my primary criteria is how productive they were in 2016, not how old they are, their prospect pedigree, how their team held down their service time, or the like. Only in a very close decision would I factor in those criteria. I understand why others might consider those factors, and why it makes a lot more sense to deviate from a value approach for ROY than for Cy Young or MVP.

As such, I don’t consider Sanchez’s case to be particularly compelling. Yes, Sanchez was more productive on a rate basis than any AL hitter other than Mike Trout. Yes, the lack of a standout candidate in the rest of the league makes Sanchez all the more appealing. But Sanchez’s performance far outpaced both his prospect status and his minor league numbers (807 OPS in 313 PA at AAA this year, 815 across AA and AAA last year). If I was going to consider a shooting star exception, it would be for someone who checked all the boxes. I would much rather have Sanchez’s future than any of the other four players on my ballot, but in 2016 he fell in the middle in terms of value.

With Sanchez out, the top of the ballot comes down to Michael Fulmer, who is the top non-Sanchez candidate in the popular discussion, and Chris Devenski. I watched a game in which Devenski pitched this year and was vaguely aware of his existence in subsequent box scores, but how effectively he was pitching completely escaped my attention until I put together my annual stat reports. Devenski pitched extremely well for Houston, mostly in relief (48 games, 5 starts) with a 1.80 RRA over 108 innings. His peripherals were strong as well (2.39 eRA and 2.79 dRA).

Fulmer pitched 159 innings with a 3.41 RRA for 42 RAR versus Devenski’s 39. Fulmer’s peripherals were also reasonably strong (3.46 eRA, 4.02 dRA), and since this was a curious case I also checked Baseball Prospectus’ DRA, which attempts to normalize for any number of relevant variables (park, umpires, defensive support, framing, quality of opposition, etc.). Using DRA, Fulmer has a clear edge considering his quantity advantage (3.49 to 3.72).

One thing my RAR figures oversimplify is pitcher’s roles--it is a binary reliever (with replacement level at 111% of league average) or start (replacement level 128% of league average). If I figured RAR using Devenski’s inning split to set his replacement level (83 innings in relief to 24 starting works out to 115% of league as the replacement level), his RAR would edge up to 41. It should be noted too that Devenski pitched decently in his five starts, averaging just under 5 innings with a 4.01 RA.

I think the two are very close; this is a case where Fulmer’s status as a starter and a younger, better regarded prospect leave him just ahead for me. Even so, I assume Devenski will rank higher on my ballot than almost any submitted even for the IBAs.

Filling out the bottom of the ballot, the only other legitimate hitting candidate, Tyler Naquin and his 26 RAR, was heavily platooned and fares poorly in defensive metrics. That leaves two A’s pitchers, one a starter and one a reliever. If I strictly followed RAR, I would actually have the latter (Ryan Dull) ahead of the former (Sean Manaea), and the peripherals don’t really help either’s case, but since they were so close I will vote here for prospect status.

1. SP Michael Fulmer, DET
2. RP Chris Devenski, HOU
3. C Gary Sanchez, NYA
4. SP Sean Manaea, OAK
5. RP Ryan Dull, OAK

The top of the NL ballot is easy, as Corey Seager is a legitimate MVP candidate and far outshines the rest of the rookies. There is a cluster of qualified candidates in the 30-40 RAR range who make up the rest of my ballot. Kenta Maeda gets the nod over Junior Guerra as top pitcher based on stronger peripherals, with apologies to Zach Davies, Tyler Anderson, and Steven Matz. Among hitters, Aledmys Diaz led in RAR with 37 to Trea Turner’s 34, but Diaz’s fielding metrics are bad (-9 FRAA, -3 DRS, -8 UZR) while Turner’s are…not as bad (-3, -2, -5). Both are credited with baserunning value beyond their steals by BP (2 runs for Diaz, 4 for Turner); when you add it up it’s very close, but I consider Turner’s age and the fact that he did it in 130 PA to put him ahead:

1. SS Corey Seager, LA
2. SP Kenta Maeda, LA
3. SP Junior Guerra, MIL
4. CF Trea Turner, WAS
5. SS Aledmys Diaz, STL