Tuesday, March 30, 2021

2021 Predictions

I’m not telling you anything you don’t already know, but the 2021 season will be the hardest to forecast in recent times. While there weren’t people doing systematic forecasts as we would recognize them in the sabermetric era at the time, the last season that I believe would have posed a greater challenge to forecasters was the 1946 season in which so many players returned from military service. 2021 is hard to predict because we only had a sixty-game season on which to judge player’s current performance, and because there was no minor league season at all. The only season that would have been tougher to predict would have been 1995, if any poor sap had attempted that using the rosters as they stood prior to the labor ceasefire.

Of course, this does not pose any particular challenge to me in writing this, because I don’t do a systematic forecast of my unknown. I usually use publicly available player projections as a starting point, making my own seat of the pants adjustments for performance and playing time; because of the additional inaccuracy inherent to such an exercise in trying to predict 2021, I have eschewed that and just used team-level projections as a starting point. Since this is an exercise in fun (baseball is supposed to be fun) and not a serious sabermetric endeavor, cutting out the trappings of formal analysis will not harm it – you can’t go down any lower.

AL East

1. New York

2. Tampa Bay (wildcard)

3. Toronto

4. Boston

5. Baltimore

I’ve been cooler on this generation of Yankees contenders then perhaps I should have been, but they’ve always seemed to rely on paper thin rotations and relatively fragile offensive stars. They’ve also had strong divisional challenges from the Red Sox and the Rays, although rarely simultaneously. This year, it’s hard to develop a compelling case that they aren’t the best team in the AL on paper as the concentration of top teams seems to have swung to the NL. Their rotation remains dependent on fragile pitchers, but which AL team’s isn’t? The Rays remain the pick for second over the Blue Jays here as I think its easy to underestimate how big the gap between them was last year. The Red Sox contending really would not surprise me, although no one would deserve it less than their entitled fans who would rather ignore Alex Verdugo’s existence than consider that maybe trading a player with one year of control left might make baseball sense.  

AL Central

1. Minnesota

2. Chicago

3. Cleveland

4. Kansas City

5. Detroit

I was all set to pick the White Sox, then I looked more closely at the numbers and concluded that the Twins were a slightly better bet – even before Eloy Jimenez was injured. I have actually consumed more spring training baseball this year due to the circumstances of the times then ever before, and this has caused my opinion of the Indians chances to plummet, which may well be an overreaction. This team will need its pitching to be a strength, and it’s easy to glib and say that the rotation is strong. Upon introspection, it may dawn on you as it did me that they have precisely one starter who has completed an entire MLB season in a rotation. I can’t recall a past Indians bullpen that will rely so heavily on back end arms with good stuff but questionable control, and I actually think Phil Maton may be their best reliever. The offense remains cursed by the franchise’s inability to produce cheap corner bats who can contribute anything. Tribe fans and radio play-by-play announcers alike are contemptuous of the decision to clutch onto Jake Bauers and start him at first rather than put him through waivers, but as uninspiring as Bauers’ past two seasons have been (yes, he didn’t play last year but if you can function as a major league left fielder and were not called up to the horror show that was the 2020 Cleveland outfield, it speaks volumes), Bobby Bradley is not exactly Andrew Vaughn as a first base prospect. A Ben Gamel/Amed Rosario center field platoon? A pair of lumbering Padres castoffs being counted on as key cogs in the offense? And history repeats itself again with the farm system, as through development and trades the Indians have built up a fine collection of middle infield prospects (Andres Gimenez, Gabriel Arias, Owen Miller, Tyler Freeman, Bryan Rocchio) but corner bats remain elusive (a lot rides on Nolan Jones). It’s better than the opposite problem, I suppose, but oddly more frustrating as a fan. The Royals think this is their year because they always think this is their year; I think the gap between them and the Indians is pretty narrow but that doesn’t make you a contender. Will the media en masse ever consider AJ Hinch as a possible feel good story? I would guess not, but what do they know?

AL West

1. Houston

2. Los Angeles (wildcard)

3. Oakland

4. Seattle

5. Texas

Last year, the Astros were both the team hurt most by the sixty-game season and helped most by the expanded playoffs, where they provided evidence that they actually were still a decent team. The starting pitching is scary, the offense is weaker and more fragile than before, but they still stand out in this group of teams. I’ve picked the Angels as a wildcard many times during their upstream swims attempting to get back to Mike Trout’s natural habitat; I’ll probably regret it again, but the ChiSox are no sure bet and the East teams will have a tough schedule to overcome. Perhaps their biggest threat will come from the A’s, still a team that could have a scary rotation (and may have the other kind of scary rotation due to the unreliability of Manaea, Puk, Luzardo, etc.) and a couple offensive stars. This division appears to the epicenter of explicit six-man rotations, with the Angels and the Mariners. Do you think announcers will make a big deal of saying things like: “This is the first time a player has done X in Globe Life Park with fans in the ballpark for a Rangers game?” It seems like a preposterous suggestion but is it really that much more ridiculous than “The Red Sox haven’t won a World Series AT HOME since…”?

NL East

1. New York

2. Atlanta (wildcard)

3. Washington

4. Philadelphia

5. Miami

The 2020 season should have been hard to predict, although for different reasons than 2021. There was no issue with data – the same level of historical statistics was available for 2020 as for prior seasons. The issue of course was that a sixty-game season was subject to a higher degree of variance from expectation than a 162 game season is.

Yet something interesting happened – I did better on my predictions than I ever had before. This was not due to some special insight on my part – I thought the picks I made were pretty obvious and pretty chalky. One of the most interesting things about the 2020 season is how few flukes there were on the team level (of course, this arrogantly assumes that my assumptions were correct – one must acknowledge the possibility that the sixty-game season enabled my poor predictions to appear more accurate than they actually were).

In any event, I was right on five of six division winners, both pennant winners, and the identity of the world champion. I bring this up here because this is the one division I missed on – I picked the Mets, and I’m going to double down.

This division also features the team that I think is mostly to disappoint – certainly would have been, at least, before sabermetric thinking became widely diffused. The Marlins had horrible component statistics last year, should have been bad on paper, look like they are bad on paper again, but made it into the playoffs with a team with a reasonable number of young players, particularly on the mound. It’s exactly the kind of team that it would seem reasonable to think had a breakthrough if you weren’t wise to the fine print.

This is the consensus toughest division, and I don’t disagree – the top four are all real contenders. If you’re a fan of the deserved family of metrics from Baseball Prospectus, bet hard on the Phillies.   

NL Central

1. Milwaukee

2. St. Louis

3. Chicago

4. Cincinnati

5. Pittsburgh

This is the consensus weakest division, and again I concur, although I think the Brewers are very interesting with a high upside collection of pitchers. The Cardinals are getting a lot of buzz for acquiring Nolan Arenado, and I don’t see any reason he wouldn’t bounce back to something resembling his prior form, but in terms of helping them in 2021, I think there were a number of positions where an upgrade would have fit better with the current roster. The Cubs are probably being underrated due to the revulsion to a team that’s been a contender for the past six years seeming to enter a retrenchment, but the offense could still be a force. As shifts have come to the fore, we’ve seen a blurring of the line between second and third basemen, with Milwaukee’s usage of Mike Moustakas as one of the harbingers. Moustakas’ current team is also involved in some interesting infield moves, but bringing back the Howard Johnson as a shortstop strategy is considerably bolder than swapping the Moustakases and Travis Shaws of the world between second and third.

NL West

1. Los Angeles

2. San Diego (wildcard)

3. Arizona

4. San Francisco

5. Colorado

There’s not much to be said about the Dodgers – they are the model franchise of the day, arguably the model franchise of the entire free agency era. It was good to see them finally get a World Series trophy but frankly they deserve more. They would be more than worthy of being the first repeat champions in the last two decades. The Padres are fascinating in their own right, likely doomed to a one-game playoff no matter how much they invest in their roster. One interesting thing about the eight-team playoff structure used in 2020 is that the presumed #1 wildcard team is the only team that would qualify for the playoffs under the old system that clearly has their chances of winning the World Series increase as a result. The division winners all have to play a three-game series to advance under the 2020 system, clearly worse than an automatic berth in the LDS (although if there is a dominant #1 like the Dodgers, the #2 and #3 teams do benefit from a higher likelihood that they get taken out before a potential LCS matchup; it’s not enough to offset having to play a three-game series against a competitive opponent). #5 team would rather be in a one-game playoff with the #4 team than a three-game series, assuming that the team’s regular season records are indicative of their true strength. Of course the #6-#8 teams benefit. Last year San Diego lost the first game of their series with St. Louis; a one-game playoff with Atlanta (as I’m predicting) is a poor reward for all of that investment. The Diamondbacks, Giants, and Rockies are all in that terrible position of being older than you would think (especially San Francisco) and in a division with powerhouses that look to be set up for a few years at least.

WORLD SERIES

Los Angeles (N) over New York (A)

Wednesday, March 17, 2021

Subtweeting Without Twitter, Vol. I

Using a positional adjustment as part of a total value metric (WAR, VORP, etc.) doesn't imply that players can be freely interchanged across positions any more than noting that a pizza and a t-shirt both cost $15 implies an assertion that one can wear a pizza or eat a t-shirt.

Saturday, March 06, 2021

dWhat!%

It’s understandable that the editing process for Baseball Prospectus 2021 overlooked something trivial like explaining what a metric in the team prospectus box means. After all, it must have been exhausting work to ensure that each of the many political non-sequiturs in the book were on message (Status: success! You can give this book to your children to read with confidence that they are in a safe space, with no deviation from the blessed orthodoxy). The vital imperative of ideological conformity handled, they would have needed next to run a fine-tooth comb over any reference to the aesthetics of present day MLB on-field play to ensure the proper level of smug conflation of one’s own preferences with the perfect ideal. Another success. Finally, they could turn their attention to making sure there were the requisite number of sneering statements about the fact that there even was a MLB season in 2020.  As always, left unaddressed was how a publication that exists (in theory at least – reading the 2021 annual, this may be a fatally flawed assumption on my part) to analyze professional baseball could continue to exist if professional baseball ceased to exist, but who knows? When you tow the line so perfectly, maybe you can figure out a way to get in some of that sweet $1.9 trillion.

So it is entirely understandable that such a triviality as a publication rooted in statistical analysis could completely overlook explaining a metric that none of its writers ever bother to refer to anyway. The metric in question is called “dWin%”. It didn’t replace any team metric that was listed in the 2020 edition – it literally fills in a blank space in the right data column. A search of the term “dWin%” and “Deserved Winning Percentage” on the BP website doesn’t yield any obvious (non-paywalled, at least) relevant hits. So the best I can do is make an educated guess about what this metric is.

I gave away my guess by searching for “Deserved Winning Percentage”. BP has adopted a family of metrics with the “Deserved” prefix which utilize Jonathan Judge’s mixed model methodology to adjust for all manner of effects (going well beyond the staples of traditional sabermetrics like league run environment and park). The team prospectus box lists “DRC+” and “DRA-“, which are the DRC metric for hitters and DRA for pitchers indexed to the league average. So it’s only natural to assume that dWin% is some type of combination of these two to yield a team’s “deserved” winning percentage.

It’s also natural to assume that there would be a relationship between DRC+, DRA-, and dWin%. If the first two are in essence run ratios (with myriad adjustments, of course, but essentially an estimate of percentage difference between a team’s deserved rate of runs scored or allowed and the league average), then it’s only natural to assume that there would be some close relationship between them and dWin%. If we were in the realm of actual runs scored and allowed, or runs created/runs created allowed, we could confidently state that one powerful way to state the relationship would be a Pythagorean approach. Namely, the square of the ratio of DRC+ to DRA- should be close to the ratio of dWin% to its complement.

There are two obvious caveats to throw on this conclusion:

1) While the statistical introduction does not specifically refer to DRA- (it refers just to DRA, which was listed for teams rather than DRA- in the 2020 edition), it’s reasonable to assume that DRA- is the indexed version of DRA. DRA is a pitching metric, which would attempt to state a pitcher’s deserved runs allowed after removing the impact of the defense that supports him. This means that comparing the ratio of DRC+ and DRA- on the team level is likely ignoring fielding, and thus the relationship I’ve posited above would be incomplete. I would be remiss in saying that this is not the fault of BP, except to the extent that we are left to speculate about the meaning of these metrics, as there's certainly nothing wrong with having a measure that attempts to isolate the performance of a team's pitching staff.

2) It is possible that there is something else going on besides fielding in the process of developing the Deserved family of metrics that would invalidate this manner of combining the offensive and pitching components. Without being privy to the full nature of the adjustments made in these metrics, it’s hard to speculate on what if anything that might be, but I would be remiss in not raising the possibility that there’s something going on behind the curtain or that I have simply overlooked.

I’m not going to run a chart of all of the team values, because that would be infringing on BP’s property rights, and given the first paragraph of this post that would be practically unwise even if it were not morally objectionable. A few summary points provide defensible ground:

1) the average of the team DRC+s listed in the annual is 99.3 and the average of DRA-s is 99.5. Given that the figures are rounded to the nearest whole number (e.g. 99 = 99%), this is encouraging as we would expect the league average to be 100.

2) the average of the team dWin%s is .464. Less encouraging. As I was reading through the book, there were two team figures that really caught my eye and led me to this more formal examination. The first was Philadelphia, which had a dWin% of .580, ranking second in MLB. Their DRA- was 83, also second.

The Deserved family of metrics have always produced some eyebrow-raising results, which are difficult to evaluate objectively given the somewhat black box nature of the metrics and the complexity of the mathematical approach involved (I will be the first to admit that “mixed models” of the kind described are beyond my own mathematical toolkit). So it’s dangerous to focus too much on any particular result, as it may just be a vehicle by which to expose one’s own ignorance. As a second-generation sabermetrician, this is a particular nightmare, becoming the sportswriter you laughed at as a twelve-year old for dismissing RC/27 as impossibly complex and unintelligible.

Still, it is quite remarkable that the team which allowed the second-most park-adjusted runs per inning in the majors might actually have turned in the second-best performance. In fairness, it was a sixty-game season, so the deviation between underlying quality of performance and actual outcome could be enormous, and the East could have been the toughest of the three sub-leagues, especially in terms of balance as the Dodgers tip the scales West. Most significantly, it is just a pitching metric, and the Phillies defense was dreadful at turning balls in play into outs – they were last in the majors in DER at .619. Boston was at .623 and the next worst team was Washington at .642. Further, the East subleague combined for a .657 DER (the fourth-worst DER belonged to the Mets, and Toronto and Miami made it six of the bottom ten) compared to .685 for the Central and .684 for the West. It’s still hard to believe that the Phillies’ pitchers deserved to have the second-fewest runs allowed in the majors, but easy to buy that they performed much, much better than their runs allowed would suggest.

However, every factor that would explain how their pitching was actually second-best does nothing to explain how their overall deserved team performance was also second-best. Adjusting away terrible defensive support doesn’t mean that the team’s poor runs allowed weren’t deserved, it just means that the blame should be pinned on the fielders and not the pitchers. Again, it’s hard to pinpoint any exact criticism given the nature of the metrics, but this one is tough to accept at face value.

It also seems that if one had conviction in the result, it would show up in the narrative somewhere. There’s always been a disconnect between what BP statistics say and what their authors write, which owes partly to the ensemble approach to writing and presumably partly to the timing (the authors of team chapters probably start very soon after the season and without the benefit of the full spread of data that will appear in the book). Still, it seems as if this disconnect has increased with the advent of the deserved metrics, which often tell a very different story than even the mainstream traditional sabermetric tools (e.g. an EqA or a FIP, to refer to metrics previously embraced by BP). But I can assure you that if I believed the Phillies underlying performance as a team was actually second only to the Dodgers, I’d work that into any retrospective of their 2020 performance and forecast of their 2021.

The second team that caught my eye was the A’s, who posted a 103 DRC+, 98 DRA-, and .499 dWin%. The obvious disconnect between an above-average offense, above-average pitching, but sub-.500 deserved W% could be explained by defense. What can’t be explained is how a .499 dWin% ranks ninth in the majors, at least until you line up the thirty teams and see that the average is .464. While we can charitably assume that a combination of our own ignorance and the proprietary nature of the calculations can explain many odd results from the deserved stats, I don’t know what can satisfactorily explain a W% metric that averages to .464 for the whole league.

The hope is that this simply some scalar error, a fudge factor not applied somewhere. There is some evidence that this is the case – if you take the ratio of DRC+ to DRC- and plot against the ratio of dWin% to (1 – dWin%), you get a correlation of +0.974 and a pretty straight line, as you would expect given what should be in the vicinity of a Pythagorean relationship. It might even work out as you’d expect if dWin% is baking in fielding.

Still, it’s disappointing that the question has to be asked.

Wednesday, March 03, 2021

Rob Manfred: Run Killer

There are many “crimes against baseball” that one could charge Rob Manfred with, if one were inclined to use hyperbolic language and pretend that the commissioner had the sole authority to decide matters (I tend to neither but am guilty of seeking a more eye-catching post title):

* Attacking the best player in his sport for not going along with whatever horrible promotional scheme the commissioner had dreamed up

* Making a general mess of negotiations with the MLBPA

* Teaming up with authoritarian governments ranging from cities in Arizona to Leviathan itself to attempt to delay or prevent baseball from being played

* Claiming to be open to every harebrained scheme to reign in shifts, home runs, strikeouts, or whatever the current groupthink of the aesthetically-offended crowd finds most troublesome

From my selfish perspective as a sabermetrician, though, I will argue that the greatest crime of all is that he has rendered team runs scored and allowed totals unusable. The extra innings rule, which I doubt will ever go away even if seven-inning doubleheaders do, makes anything using actual runs scored incomparable with historical standards (in the sense of parameters of metrics rather than context). A RMSE error test of a run estimator against team runs scored? Can’t use it. Pythagenpat? Nope. Relief pitcher’s run average? Use with extreme caution.

Of course, I am not seriously suggesting that the ease with which existing metrics can be used should be a consideration in determining the rules of the game. But if you use these metrics, it is necessary to recognize that they are very much compromised by the rule.

So how can we adjust for it? I will start with a plea that the keepers of the statistical record (which in our day means sites like Baseball-Reference and Fangraphs) compile a split of runs scored and allowed in regulation and extra innings, as well as team innings pitched/batted in regulation and extra innings, and display this data prominently. Having it will allow for adjustments to be made that can at least partially correct, and more importantly increase awareness of the compromised nature of the raw data.

I want to acknowledge a deeper problem that also exists, and then not dwell on it too much even though it is quite important and renders the simple fixes I’m going to offer inaccurate. This is a problem that Tom Tango pointed out some time ago, particularly as it related to run expectancy tables – innings that are terminated due to walkoffs. In such innings, there are often significant potential runs left stranded on base, and so including these innings will understate the final number of runs one could expect. Tango corrected for this by removing these potential game-ending innings from RE calculations. It’s even more of a problem when it comes to extra innings, since rather than just being 1/18 of the half-innings of a regulation game, they represent 1/2 of the half innings of an extra inning game. This means that when we look at just extra innings, the number of potential runs lost upon termination of the game make up a significant portion of the total runs.

I gathered the 2020 data on runs scored by inning from Baseball-Reference, and divided each inning into regulation and extras. I did not, however, do this correctly, as the seven-inning doubleheader rule complicates matters. The eighth and ninth innings of a standard nine-inning game are played under very different circumstances than the eighth and ninth innings of a seven-inning doubleheader. I have ignored these games here, and treated all eighth and ninth innings as belonging to standard games, but this is a distortion. I didn’t feel like combing through box scores to dig out the real data as I’m writing this post for illustrative and not analytical purposes, but it buttresses my plea for the keepers of the data to do this. This is not solely out of my laziness (although I really don’t want to have to compile it myself), but also a recognition of the reality that many casual consumers of statistics will not even be cognizant of the problem if it is not made clear in the presentation of data.

Forging ahead despite these two serious data issues that remain unresolved (counting eighth and ninth innings of seven-inning doubleheaders as regulation innings rather than extra innings, and ignoring the potential runs lost due to walkoffs), I used the team data on runs by inning from Baseball-Reference to get totals for innings played and runs scored between regulation and extra innings. Note that these are innings played, not innings pitched, understating the true nature of the problem since almost most of the regulation innings include three outs (with the exception being bottom of the ninths terminated on walkoffs), a much greater proportion of the extra innings do not.

Still:



Expressed on the intuitive scale of runs per 9 innings, regulation innings yielded 4.80 runs, while extra innings were good for a whopping 8.40, a rate 75% higher. And no wonder, as Baseball Prospectus RE table for 2019 shows .5439 for (---, 0 out) and 1.1465 for (-x-, 0), a rate 111% higher. That we don’t see that big of a difference is due to an indeterminate amount to sample size and environmental differences (e.g. a high-leverage reliever is likely pitching in an extra inning situation, unless they have all been in the game already) but probably more significantly to the lost potential runs.

Considering all runs scored and innings, there were .5378 runs/inning or 4.84 R/9 in the majors in 2020, so even a crude calculation suggests a distortion of around 1% embedded in the raw data due to extra innings. Of course, the impact can vary significantly at the team level since the team-level proportion of extra innings will vary (1.25% of MLB innings played were extras, ranging from a low of 0.40% for Cincinnati to 3.44% for Houston).

How to correct for this? If the walkoff problem didn’t exist, I would suggest a relatively simple approach. After separating each team’s data into regulation and extra innings, calculate each team’s “pre-Manfred runs” as:

PMR = Runs in Regulation Innings + Runs in Extra Innings – park adjusted RE for (-x-,0)*Extra Innings

= Runs - park adjusted RE for (-x-,0)*Extra Innings

You could address the walkoff problem by adding in the park adjusted RE for any innings that terminated, but this gets tricky for two reasons:

1) it means that the simple data dividing runs and innings into “regulation” and “extra” is inadequate for the task; I doubt “potential runs lost at time of game termination” would ever find there way into a standard table of team offensive statistics

2) it overcorrects to the extent that the legacy statistics we have always used ignore the loss of those potential runs as well. Of course, the issue is more pronounced with extra innings as they represent a huge proportion of extra innings rather than a small one of regulation innings (and because the nature of Manfred extra innings increases the proportion of walkoffs within the subset of extra innings, since run expectancy is 111% higher at the start of a Manfred extra inning than at the start of standard inning).

Also note that when I say park-adjusted, I mean that the run expectancy would have to be park-adjusted not in order to normalize across environments, but rather to transform a neutral environment RE table to the specific park. I wouldn’t want to use “just” 1.1465 for Coors Field, but rather a higher value so that the PMR estimate can still be used in conjunction with our Coors Field park adjustment as the Rockies raw runs total would have been pre-2020. Another complication is that the standard runs park factor would likely overstate the park impact because of the issue of lost potential runs (they too would increase in expected value as the park factor increased).

The manner in which I attempted to adjust in my 2020 End of Season statistics was to restate everything for a team on a per nine inning basis, and then use the R/9 and RA/9 figures in conjunction with standard methodology. But this is also unsatisfactory – for instance, a Pythagorean estimate ceases to be an estimate of the team’s actual W%, but rather a theoretical estimate of what their W% would be if they played a full slate of nine inning games. The extra innings aren’t really a problem here, but the seven-inning doubleheaders are. As long as these accursed games exist, in order to develop a true Pythagorean estimate of team wins, one would have to estimate the exponent that would hold for a seven-inning game (Tango came up with a Pythagorean exponent of 1.57 through an empirical analysis; my theoretical approach would be to use the Enby distribution to develop theoretical W%s for seven-inning games for a representative variety of underlying team strengths in terms of runs and runs allowed per inning, then use this to determine the best Pythagenpat z value), and then use runs and runs allowed per inning rates to estimate separate W%s for seven- and nine-inning games, then weight these by the proportion of a team’s games that were scheduled of seven and nine innings.

I also took the unfortunate step of ignoring actual runs everywhere (as I mentioned in passing earlier, Manfred extra innings wreck havoc on reliever’s run averages), since the league averages are polluted by Manfred extra innings. Again, I am not advocating that sabermetric expediency drive the construction of the rules of baseball, but it is a happy coincidence that sabermetric expediency tracks in this case with aesthetic considerations. I should include a caveat about aesthetic considerations being in the eyes of the beholder, but the groupthink crowd that is now in the ascendancy rarely sees the need to do so. No surprise, as many also subscribe to the totalitarian thinking that is ascendant in the broader society. They’ll tell you all about it, and about what a terrible person you are if you dissent, for $25.19.