Wednesday, April 28, 2021

Rate Stat Series, pt. 2: PA Generation

This is a little bit of a detour and certainly nothing new (I don’t know who originally laid out this logic/math – the earliest use I’m aware of was in 1960 by D’Esopo & Lefkowitz as part of their Scoring Index model), but I think a discussion of it is appropriate in the context of this series, and I will later make use of these formulas.  It’s also ground I covered in the original series, but I think my explanation this time is slightly more coherent.

Each batting team starts each inning (excluding scenarios where a walkoff is possible) with three plate appearances guaranteed. Thus each team starts each game with twenty-seven plate appearances guaranteed (excluding scenarios where the home team forgoes batting the bottom of the ninth, rainouts, post-2020 doubleheaders, etc.). Any plate appearances beyond that must be earned by batters avoiding outs. Since it’s more natural to think of a positive outcome rather than the avoidance of a negative outcome, I will simplify and say that each extra plate appearance must be earned by a batter reaching base (and not being subsequently retired on the basepaths).

For the sake of discussion (and keeping with the simple set of statistics being used in the metrics in this series), I’m going to ignore the existence of baserunning outs, including caught stealing, pickoffs, outs stretching, outs advancing, and runners retired on double/triple plays (although not on fielder’s choices, since the batter is charged with an out in that case). I’m going to assume that the out rate is the complement of on base average, which in this series will be defined simply as (H + W)/(AB + W). In reality, considering all the ways in which outs can be made, it would be a more involved equation (I’ve used the acronym NOA for Not Out Average and OA for the complement, Out Average) which would look something like this, although it still doesn’t think I’ve accounted for every possible event (you try incorporating fielders’ choices without complicating the equation significantly):

NOA = (H + W + HB + CI + ROE – CS – DP – Outs Stretching – Outs Advancing – Pickoffs – 2*TP)/(AB + W + HB + SF + SH + CI)

Alternatively, for a team when LOB data is available (and ignoring the walkoff situation), you could have OA = (Plate Appearances – Runs Scored – Left On Base)/Plate Appearances. All of this is just an attempt to calculate, as best we can from the available statistics we have restricted ourselves to, Outs/Plate Appearances. NOA or OA as appropriate could be substituted for OBA in the equations that follow as long as the appropriate corresponding adjustments are made to the numerator.

Let’s assume for the purpose of developing an equation for team plate appearances that the OBA is constant across each of the nine batters in the lineup and doesn’t vary for any other reason (this is obviously never true, but it is a fine simplifying assumption for modeling PA generation). Then a team will start an inning with three plate appearances. For each of those three guaranteed PAs, there is a probability (equal to OBA, given our assumption) that the batter avoids an out (reaches base, given that there are no baserunning outs). This increases the expected number of plate appearances by OBA.

It doesn’t stop there, though. Each additional PA that is generated also has an OBA chance of creating an additional PA, which itself has an OBA chance of creating an additional PA. Thus, for each of the guaranteed PA, the expected final number of team PA is:

OBA + OBA*OBA + OBA*OBA*OBA + … = OBA + OBA^2 + OBA^3 + … OBA^n

which when n is infinity and OBA is between 0 and 1 (which it must be by definition) resolves to:

OBA/(1 – OBA)

The 1994 AL had an OBA of .343. Thus, each guaranteed plate appearance should have generated .343/(1 - .343) = .522 additional plate appearances. In an average inning, starting with three guaranteed PA, we would expect 3 + 3*.522 = 3*(1 + .522) = 4.566 PA, and thus in a game we would expect 9*4.566 = 41.09 PA. Note that instead of calculating the .522 additional PA, we can simplify this to 3/(1 – OBA) for an inning or 27/(1 – OBA) for a game. In reality there were 39.24 PA, so we have an unacceptable 4.7% error. What went wrong?

I’m mixing definitions of plate appearances and definitions of OBA incorrectly, and also ignored that the three guaranteed PA are equal to the number of outs permitted in the inning. In order to estimate the number of plate appearances per inning or game consistently, we need to divide the average number of outs/game by 1 – OBA:

PA/G = (O/G)/(1 – OBA)

The definition of outs that corresponds to our simple (H + W)/(AB + W) complement of out average is AB – H. In the 1994 AL there were 25.19 outs/game using this definition, so our expected PA/G is:

25.19/(1 - .353) = 38.34

The actual average was 38.35; we’re off due to rounding as this is now just a mathematical truism since by our simplified definitions plate appearances = outs + times on base. Using this equation to estimate team PA/G from their OBA for the 1994 AL, the RMSE is .259, which is about .7% of the average PA/G. We shouldn’t expect perfect accuracy at the team level since team PA will be affected by different quantities of all the statistical categories we’re ignoring that have an impact on the actual number of PA a team generates, as well as differences in number of extra inning games, foregone bottom of the ninths, and walkoff-shortened innings.

The key points to keep in mind as we move forward in discussing rate stats are:

1.      The number of plate appearances a team will get is a function of their out rate, and simplifying terms we can very accurately estimate team PA as a function of on base average

2.      Since players have an impact on the number of plate appearances their team gets, and thus the number of plate appearances they get, a proper rate stat for measuring overall offensive productivity must account for that impact

Thursday, April 15, 2021

Almost Perfect

In my earlier days as a baseball fan, I was really interested in no-hitters, and outside of the Indians winning the World Series, my most fervent desire as a fan was to witness one even if only on the radio. Eventually this faded, due to some combination of growing jaded about the extent to which baseball fans sometimes elevate trivial events above game outcomes, the pernicious influence of Voros McCracken on how I thought about the hits column for pitchers, and after fifteen years of intense baseball-watching finally witnessing one (I'm now up to five).

Perfect games retain a bit more of their mystique for me, due to being much more rare (someone who has watched as many games over the years as I have is bound to have seen a no-hitter, but one can't really expect to see a perfect game) and not relying on any arbitrary distinction between hits and errors (which of course doesn't affect all no-hitters). The three closest games I have taken in to being perfect games prior to last night were Mike Mussina against the Indians in 1997 and Armando Galarraga's should-have been perfect game against the Indians in 2010. The latter game is case in point of what I meant about fans sometimes being more interested in trivial events than game outcomes - there was more outcry in favor of replay as a result of that game then there was cumulatively from many calls that much more directly influenced which team won a given game.

Last night's effort by Carlos Rodon combined elements of both of the ninth innings of these games in the way that people who believe in hocus pocus should embrace. From Galarraga's, we took the extremely close play at first base, with Josh Naylor playing the role of Jason Donald, desperately trying to reach first after making weak contract towards first base. In this case, the play was actually much closer, but no replay was required as the call on the field was that Jose Abreu beat him to the bag by a narrow margin. 

From the Mussina game, we borrowed the man, lineup slot, and fielding position to break it up. With one out in the ninth, the Indians catcher. Sandy Alomar singled off Mussina, while Roberto Perez was only hit in the back foot with a slider, but history repeated itself in who ended it. Of course, if Rodon had to lose the perfect game, he got the better outcome than the other two, as he at least got to keep the no-hitter.

Naturally, all of the near perfect games I've seen have been pitched against the Indians. In addition to the infinitely more important distinction of now having the longest World Series drought, after Joe Musgrove's no-hitter for the Padres, the Indians now have the longest drought between no-hitters, it having been nearly forty years since Len Barker's perfect game.

I was keeping score of the Mussina game and Rodon's effort last night, but not the Galarraga game, which I listened to on the radio while I watched some other game on TV. 

Wednesday, April 14, 2021

Rate Stat Series, pt. 1: Introduction

This blog has existed for sixteen years now, and yet with the exception of some (relatively) recent stuff I’ve written about the Enby distribution for team runs per game and the Cigol approach to estimating team winning percentage from Enby, almost all of the interesting sabermetric work appeared in the blog’s first five years, and most in the first year or two.

There are a number of reasons for that - one is that when I started, I was a college student with a lot more free time on his hands than I have with a 9-5. Related, I was also more eager to spend a lot of time staring at numbers on my free time when I didn’t spend a good portion of my day staring at numbers. Remember the Bill James line about how a column of numbers that would put an actuary to sleep can be made to dance if you put Bombo Rivera’s picture on the flip side of the card? Sometimes the numbers do indeed dance, but the actuary in question would rather watch a ballgame or read about the Battle of Gravelines than manipulate them in the evening, dancing or no.

More generally, there has been much less to investigate in the area of sabermetrics that I primarily practice, which I will call for the lack of a better term “classical sabermetrics”. I would define classical sabermetrics as sabermetric study which is primarily focused on game-level (or higher, e.g. season, player career, etc.) data that relates to baseball outcomes on the field (e.g. hits, walks, runs scored, wins). Classical sabermetrics is/was the primary field of inquiry of those I have previously called first or second-generation sabermetricians.

Classical sabermetrics is not dead, but to date the last great achievement of the field was turned in by Voros McCracken when he developed DIPS. I’m not arrogant enough to declare that nothing more will ever be found in the classical field, and there is still much work to be done, but at least as far as I can see, it is highly likely that it will consist of tinkering and incrementally improving work that has already been done, and probably with little impact on the practical implementation of sabermetric ideas. For example, I still would love to find a modification to Pythagenpat that works better for 2 RPG environments, or a different run estimator construct that would preserve the good properties of Base Runs while better handling teams that hit tons of triples. All of this is quite theoretical, and of no practical value to someone who is attempting to run the Pirates.

Which increasingly is what sabermetric practitioners are attempting to do, whether directly through employment by major league teams, or indirectly through publishing post-classical sabermetric research in the public sphere. Let me be very clear: this is not in any way a lament for a simpler, purer time in the past. I think it’s wonderful that sabermetric analysis has transcended the constraints of the data used in its classical practice and is exerting an influence on the game on the field.

Notwithstanding, I am still a classical sabermetrician, not because I don’t value the insight provided by post-classical sabermetrics but because I don’t have some combination of the skillset or the way of thinking or the resources or the drive to become proficient enough in newer techniques to offer anything of value in that space. Thus it is natural that I have less to share here.

The topic that I am embarking on discussing is squarely in the realm of “quite theoretical and of no practical to someone who is attempting to run the Pirates”. About fifteen years ago, I started writing a “Rate Stat Series”, and aborted it somewhere in the middle. I have stated several times that I intend to revisit it, but until now have not. The Rate Stat Series was and now is intended to be a discussion of how best to express a batter’s overall productivity in a single rate stat. I should note three things that it is not:

1. The discussion is strictly limited to the construction of a rate stat measuring overall offensive productivity, not a subset thereof. I am not suggesting that if you are measuring a batter’s walk rate, strikeout rate, ground-rule double rate, or any other component rate you can dream up, that you should follow the conclusions here. For most general applications, plate appearances makes perfect sense as the denominator for a rate for any of those quantities. There may be reasons to follow a sort of decision tree approach that results in different denominators for some applications (McCracken was an innovator in this approach, in DIPS and park factors). All of that is well and good and completely outside the scope of this series.

2. The premise presupposes that the unit of measurement of a batter’s productivity has already been converted to a run-basis. Thus it is not a question of OPS v. OTS v. OPS+ v. 1.8*OBA + SLG v. wOBA v. EqA v. TAv v. whatever, but rather what the denominator for a batter’s estimated run contribution should be. The obvious choices are outs and plate appearances, but there are other possibilities. Spoiler alert: My answer is “it depends”.

3. Revolutionary, groundbreaking, or any other similar adjective. I’m attempting to describe my thoughts on methods that already exist and were created by other people in a coherent, unified format.

In sitting down to write this, I realized I made two fundamental mistakes in my first attempt:

1. I was attempting to “prove” my preferences mathematically, which is not a bad thing in theory, but some of what I was doing begged the question and some of this discussion is of a theoretical nature that lends itself more to logical reasoning/“proofs” than to mathematical “proofs”. I’ve tried to anchor my conclusions in math, logic, and reason where possible, but have also embraced that some of it is subjective and must be so.

2. I posted pieces before I finished writing the whole thing, or even knowing exactly where it was going.

These are rectified in this attempt – all of my assertions are wildly unsupported and as I hit post, all planned installments exist in at least a detailed outline form. While I have attempted to avoid the two mistakes I identified in the previous series, as I look at this series in full I can see I have may have just replaced them with two characteristics that will make reading this a real chore:

1. I’m overly wordy; repeating myself a lot and trying to be way too precise in my language (although I fear not as precise as the topic demands). There’s a lot of jargon in an attempt to delineate between the various concepts and methodological choices.

2. There’s way too much algebra; where possible, I didn’t want to just assert that mathematical operations resolved in a certain way and give an empirical example that backs me up, so there’s a lot of “proofs” that will be of no general interest.

Allow me to close by laying some groundwork for future posts. I am going to use the 1994 AL as a reference point, and when I use examples they will generally be drawn from this league-season. Why have I chosen the 1994 AL?

1. 1994 was the year I became a baseball fan, and I was primarily focused on the AL at that time, so it is nostalgic. I have not turned into a get off my lawn type who thinks that baseball reached its zenith in 1994 and it’s all been downhill since, but I do think that about 1994 Topps, the greatest baseball card set of all-time.

2. As the year in which the “silly ball era” really broke out, and due to the strike shortening the season, there are some fairly extreme performances that are useful when talking about the differences between rate stat approaches.

As discussed, this series starts from the premise that a batter’s contribution is measured in terms of runs, and work from there. This approach does not require the use of any particular run estimator, although one of my assertions is that the choice of run estimator and the choice of rate/denominator for the rate are logically linked. There are three types of run estimators that I will use in the series: a dynamic model, a linear model, and a hybrid theoretical team model.

In order to avoid differences in the run estimator(s) used unduly influencing differences in the resulting rate stats, I am going to anchor a set of internally consistent run estimators in the reference period of the 1994 AL. It will come as no surprise if you’ve read anything I’ve written about run estimators in the past that I am using Base Runs for this job. The point of this series is not to tell you which particular run estimator to use or how to construct it. It really doesn’t matter which version of Base Runs I use (if you are still stuck on Runs Created, there’s no judgment from this corner, at least for the duration of this discussion), or which categories I include in the formula – this is about the conceptual issues regarding the rate that you calculate after estimating the batter’s run contribution, so I am keeping it very simple, looking just at hits, walks, and at bats (thus defining outs as at bats minus hits) and ignoring steals/caught stealing, hit batters, intentional walks, sacrifices, etc..  Since I’m doing this with the run estimator, I will also do it with most other statistics I cite – for example, throughout this series OBA will be (H + W)/(AB + W), and PA will just be AB + W.

A version of Base Runs I have used is below. It’s not perfect by any means; it overvalues extra base hits as we’ll see below, but again, the specific estimator is for example only in this series – the thinking behind constructing the resulting rates is what we’re after:

A = H + W – HR

B = (2TB - H – 4HR + .05W)*.78

C = AB – H

D = HR

BsR = (A*B)/(B + C) + D

Typically, any reconciliation of Base Runs to a desired estimate number of runs scored for an entity like a league is done using the B factor, since it is already something of a balancing factor in the formula, representing the somewhat nebulous concept of “advancement” while the other components (A = baserunners, C = outs, D = automatic runs) represent much more tightly defined quantities. In order to force the Base Runs estimate for the 1994 AL to equal the actual number of runs scored, you need to replace the .78 multiplier with .79776, which can be determined by first calculating the needed B value (where R is the actual runs scored total):

Needed B = (R – D)*C/(A – R + D)

Divide this by (2TB – H – 4HR + .05W) and you get a .79776 multiplier. I usually don’t force the estimated runs equal to the actual runs, but for this series, I want to be internally consistent between all of the estimators and also be able to write formulas using league runs rather than having to worry about any discrepancies between league runs and estimated runs.

So our dynamic run estimator (BsR) used throughout this series will be:

A = H + W – HR = S + D + T + W

B = (2TB - H – 4HR + .05W)*.79776 = .7978S + 2.3933D + 3.9888T + 2.3933HR + .0399W

C = AB – H = Outs

D = HR

BsR = (A*B)/(B + C) + D

To be consistent, I will also use the intrinsic linear weights for the 1994 AL that are derived from this BsR equation as the linear weights run estimator. The intrinsic linear weights are derived through partial differentiation of BsR with respect to each component. If we define A, B, C, and D to be the league totals of those, and a, b, c, and d to be the coefficient for a given event in each of the A, B, C, and D factors respectively, than the linear weight of a given event is calculated as:

LW = ((B + C)*(A*b + B*a) – A*B*(b + c))/(B + C)^2 + d

For the 1994 AL, this results in the equation, where RC is to denote absolute runs created:

LW_RC = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .1076(outs)

We will also need a version of LW expressed in the classic Pete Palmer style to produce runs above average rather than absolute runs. That’s just a simple algebra problem to solve for the out value needed to bring the league total to zero, which results in:

LW_RAA = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs)

I am ignoring any questions about what the appropriate baseline for valuing individual offensive performance is. Regardless of where you side between replacement level, average, and other less common approaches, I hope you will agree that average is a good starting point which can usually be converted to an alternative baseline much more easily than if you start with an alternative baseline. Average is also the natural starting point for linear weights analysis since the empirical technique of calculating linear weights based on average changes in average run expectancy is by definition going to produce an estimate of runs above average.

Later we will also have some “theoretical team” run estimators built off this same foundation, but discussion of them will fit better when discussing that concept in greater detail.

I will also be ignoring park factors and the question of context in this series (at least until the very end, where I will circle back to context). Since I am narrowly focused on the construction of the final rate stat, rather than a full-blown implementation of a rating system for players, park factors can be ignored. Since I am anchoring everything in the 1994 AL, the context of the league run environment can also be ignored since it will be equal for all players once we ignore park factors.

Thursday, April 01, 2021

Give Us This Day Our Daily Ball

Rob Manfred, who art Commissioner

Halloweth be our game

Thy rule changes be undone, thy no longer assault fun

In 2022 as it was in 2002

Give us this day our daily ball

And reconcile with Tony Clark as we reconcile to runners on in extra innings

And lead us not into strike or lockout

And deliver us from pitchers hitting

For thine is the office and the power and the responsibility until 2024

Play ball

Tuesday, March 30, 2021

2021 Predictions

I’m not telling you anything you don’t already know, but the 2021 season will be the hardest to forecast in recent times. While there weren’t people doing systematic forecasts as we would recognize them in the sabermetric era at the time, the last season that I believe would have posed a greater challenge to forecasters was the 1946 season in which so many players returned from military service. 2021 is hard to predict because we only had a sixty-game season on which to judge player’s current performance, and because there was no minor league season at all. The only season that would have been tougher to predict would have been 1995, if any poor sap had attempted that using the rosters as they stood prior to the labor ceasefire.

Of course, this does not pose any particular challenge to me in writing this, because I don’t do a systematic forecast of my unknown. I usually use publicly available player projections as a starting point, making my own seat of the pants adjustments for performance and playing time; because of the additional inaccuracy inherent to such an exercise in trying to predict 2021, I have eschewed that and just used team-level projections as a starting point. Since this is an exercise in fun (baseball is supposed to be fun) and not a serious sabermetric endeavor, cutting out the trappings of formal analysis will not harm it – you can’t go down any lower.

AL East

1. New York

2. Tampa Bay (wildcard)

3. Toronto

4. Boston

5. Baltimore

I’ve been cooler on this generation of Yankees contenders then perhaps I should have been, but they’ve always seemed to rely on paper thin rotations and relatively fragile offensive stars. They’ve also had strong divisional challenges from the Red Sox and the Rays, although rarely simultaneously. This year, it’s hard to develop a compelling case that they aren’t the best team in the AL on paper as the concentration of top teams seems to have swung to the NL. Their rotation remains dependent on fragile pitchers, but which AL team’s isn’t? The Rays remain the pick for second over the Blue Jays here as I think its easy to underestimate how big the gap between them was last year. The Red Sox contending really would not surprise me, although no one would deserve it less than their entitled fans who would rather ignore Alex Verdugo’s existence than consider that maybe trading a player with one year of control left might make baseball sense.  

AL Central

1. Minnesota

2. Chicago

3. Cleveland

4. Kansas City

5. Detroit

I was all set to pick the White Sox, then I looked more closely at the numbers and concluded that the Twins were a slightly better bet – even before Eloy Jimenez was injured. I have actually consumed more spring training baseball this year due to the circumstances of the times then ever before, and this has caused my opinion of the Indians chances to plummet, which may well be an overreaction. This team will need its pitching to be a strength, and it’s easy to glib and say that the rotation is strong. Upon introspection, it may dawn on you as it did me that they have precisely one starter who has completed an entire MLB season in a rotation. I can’t recall a past Indians bullpen that will rely so heavily on back end arms with good stuff but questionable control, and I actually think Phil Maton may be their best reliever. The offense remains cursed by the franchise’s inability to produce cheap corner bats who can contribute anything. Tribe fans and radio play-by-play announcers alike are contemptuous of the decision to clutch onto Jake Bauers and start him at first rather than put him through waivers, but as uninspiring as Bauers’ past two seasons have been (yes, he didn’t play last year but if you can function as a major league left fielder and were not called up to the horror show that was the 2020 Cleveland outfield, it speaks volumes), Bobby Bradley is not exactly Andrew Vaughn as a first base prospect. A Ben Gamel/Amed Rosario center field platoon? A pair of lumbering Padres castoffs being counted on as key cogs in the offense? And history repeats itself again with the farm system, as through development and trades the Indians have built up a fine collection of middle infield prospects (Andres Gimenez, Gabriel Arias, Owen Miller, Tyler Freeman, Bryan Rocchio) but corner bats remain elusive (a lot rides on Nolan Jones). It’s better than the opposite problem, I suppose, but oddly more frustrating as a fan. The Royals think this is their year because they always think this is their year; I think the gap between them and the Indians is pretty narrow but that doesn’t make you a contender. Will the media en masse ever consider AJ Hinch as a possible feel good story? I would guess not, but what do they know?

AL West

1. Houston

2. Los Angeles (wildcard)

3. Oakland

4. Seattle

5. Texas

Last year, the Astros were both the team hurt most by the sixty-game season and helped most by the expanded playoffs, where they provided evidence that they actually were still a decent team. The starting pitching is scary, the offense is weaker and more fragile than before, but they still stand out in this group of teams. I’ve picked the Angels as a wildcard many times during their upstream swims attempting to get back to Mike Trout’s natural habitat; I’ll probably regret it again, but the ChiSox are no sure bet and the East teams will have a tough schedule to overcome. Perhaps their biggest threat will come from the A’s, still a team that could have a scary rotation (and may have the other kind of scary rotation due to the unreliability of Manaea, Puk, Luzardo, etc.) and a couple offensive stars. This division appears to the epicenter of explicit six-man rotations, with the Angels and the Mariners. Do you think announcers will make a big deal of saying things like: “This is the first time a player has done X in Globe Life Park with fans in the ballpark for a Rangers game?” It seems like a preposterous suggestion but is it really that much more ridiculous than “The Red Sox haven’t won a World Series AT HOME since…”?

NL East

1. New York

2. Atlanta (wildcard)

3. Washington

4. Philadelphia

5. Miami

The 2020 season should have been hard to predict, although for different reasons than 2021. There was no issue with data – the same level of historical statistics was available for 2020 as for prior seasons. The issue of course was that a sixty-game season was subject to a higher degree of variance from expectation than a 162 game season is.

Yet something interesting happened – I did better on my predictions than I ever had before. This was not due to some special insight on my part – I thought the picks I made were pretty obvious and pretty chalky. One of the most interesting things about the 2020 season is how few flukes there were on the team level (of course, this arrogantly assumes that my assumptions were correct – one must acknowledge the possibility that the sixty-game season enabled my poor predictions to appear more accurate than they actually were).

In any event, I was right on five of six division winners, both pennant winners, and the identity of the world champion. I bring this up here because this is the one division I missed on – I picked the Mets, and I’m going to double down.

This division also features the team that I think is mostly to disappoint – certainly would have been, at least, before sabermetric thinking became widely diffused. The Marlins had horrible component statistics last year, should have been bad on paper, look like they are bad on paper again, but made it into the playoffs with a team with a reasonable number of young players, particularly on the mound. It’s exactly the kind of team that it would seem reasonable to think had a breakthrough if you weren’t wise to the fine print.

This is the consensus toughest division, and I don’t disagree – the top four are all real contenders. If you’re a fan of the deserved family of metrics from Baseball Prospectus, bet hard on the Phillies.   

NL Central

1. Milwaukee

2. St. Louis

3. Chicago

4. Cincinnati

5. Pittsburgh

This is the consensus weakest division, and again I concur, although I think the Brewers are very interesting with a high upside collection of pitchers. The Cardinals are getting a lot of buzz for acquiring Nolan Arenado, and I don’t see any reason he wouldn’t bounce back to something resembling his prior form, but in terms of helping them in 2021, I think there were a number of positions where an upgrade would have fit better with the current roster. The Cubs are probably being underrated due to the revulsion to a team that’s been a contender for the past six years seeming to enter a retrenchment, but the offense could still be a force. As shifts have come to the fore, we’ve seen a blurring of the line between second and third basemen, with Milwaukee’s usage of Mike Moustakas as one of the harbingers. Moustakas’ current team is also involved in some interesting infield moves, but bringing back the Howard Johnson as a shortstop strategy is considerably bolder than swapping the Moustakases and Travis Shaws of the world between second and third.

NL West

1. Los Angeles

2. San Diego (wildcard)

3. Arizona

4. San Francisco

5. Colorado

There’s not much to be said about the Dodgers – they are the model franchise of the day, arguably the model franchise of the entire free agency era. It was good to see them finally get a World Series trophy but frankly they deserve more. They would be more than worthy of being the first repeat champions in the last two decades. The Padres are fascinating in their own right, likely doomed to a one-game playoff no matter how much they invest in their roster. One interesting thing about the eight-team playoff structure used in 2020 is that the presumed #1 wildcard team is the only team that would qualify for the playoffs under the old system that clearly has their chances of winning the World Series increase as a result. The division winners all have to play a three-game series to advance under the 2020 system, clearly worse than an automatic berth in the LDS (although if there is a dominant #1 like the Dodgers, the #2 and #3 teams do benefit from a higher likelihood that they get taken out before a potential LCS matchup; it’s not enough to offset having to play a three-game series against a competitive opponent). #5 team would rather be in a one-game playoff with the #4 team than a three-game series, assuming that the team’s regular season records are indicative of their true strength. Of course the #6-#8 teams benefit. Last year San Diego lost the first game of their series with St. Louis; a one-game playoff with Atlanta (as I’m predicting) is a poor reward for all of that investment. The Diamondbacks, Giants, and Rockies are all in that terrible position of being older than you would think (especially San Francisco) and in a division with powerhouses that look to be set up for a few years at least.


Los Angeles (N) over New York (A)

Wednesday, March 17, 2021

Subtweeting Without Twitter, Vol. I

Using a positional adjustment as part of a total value metric (WAR, VORP, etc.) doesn't imply that players can be freely interchanged across positions any more than noting that a pizza and a t-shirt both cost $15 implies an assertion that one can wear a pizza or eat a t-shirt.

Saturday, March 06, 2021


It’s understandable that the editing process for Baseball Prospectus 2021 overlooked something trivial like explaining what a metric in the team prospectus box means. After all, it must have been exhausting work to ensure that each of the many political non-sequiturs in the book were on message (Status: success! You can give this book to your children to read with confidence that they are in a safe space, with no deviation from the blessed orthodoxy). The vital imperative of ideological conformity handled, they would have needed next to run a fine-tooth comb over any reference to the aesthetics of present day MLB on-field play to ensure the proper level of smug conflation of one’s own preferences with the perfect ideal. Another success. Finally, they could turn their attention to making sure there were the requisite number of sneering statements about the fact that there even was a MLB season in 2020.  As always, left unaddressed was how a publication that exists (in theory at least – reading the 2021 annual, this may be a fatally flawed assumption on my part) to analyze professional baseball could continue to exist if professional baseball ceased to exist, but who knows? When you tow the line so perfectly, maybe you can figure out a way to get in some of that sweet $1.9 trillion.

So it is entirely understandable that such a triviality as a publication rooted in statistical analysis could completely overlook explaining a metric that none of its writers ever bother to refer to anyway. The metric in question is called “dWin%”. It didn’t replace any team metric that was listed in the 2020 edition – it literally fills in a blank space in the right data column. A search of the term “dWin%” and “Deserved Winning Percentage” on the BP website doesn’t yield any obvious (non-paywalled, at least) relevant hits. So the best I can do is make an educated guess about what this metric is.

I gave away my guess by searching for “Deserved Winning Percentage”. BP has adopted a family of metrics with the “Deserved” prefix which utilize Jonathan Judge’s mixed model methodology to adjust for all manner of effects (going well beyond the staples of traditional sabermetrics like league run environment and park). The team prospectus box lists “DRC+” and “DRA-“, which are the DRC metric for hitters and DRA for pitchers indexed to the league average. So it’s only natural to assume that dWin% is some type of combination of these two to yield a team’s “deserved” winning percentage.

It’s also natural to assume that there would be a relationship between DRC+, DRA-, and dWin%. If the first two are in essence run ratios (with myriad adjustments, of course, but essentially an estimate of percentage difference between a team’s deserved rate of runs scored or allowed and the league average), then it’s only natural to assume that there would be some close relationship between them and dWin%. If we were in the realm of actual runs scored and allowed, or runs created/runs created allowed, we could confidently state that one powerful way to state the relationship would be a Pythagorean approach. Namely, the square of the ratio of DRC+ to DRA- should be close to the ratio of dWin% to its complement.

There are two obvious caveats to throw on this conclusion:

1) While the statistical introduction does not specifically refer to DRA- (it refers just to DRA, which was listed for teams rather than DRA- in the 2020 edition), it’s reasonable to assume that DRA- is the indexed version of DRA. DRA is a pitching metric, which would attempt to state a pitcher’s deserved runs allowed after removing the impact of the defense that supports him. This means that comparing the ratio of DRC+ and DRA- on the team level is likely ignoring fielding, and thus the relationship I’ve posited above would be incomplete. I would be remiss in saying that this is not the fault of BP, except to the extent that we are left to speculate about the meaning of these metrics, as there's certainly nothing wrong with having a measure that attempts to isolate the performance of a team's pitching staff.

2) It is possible that there is something else going on besides fielding in the process of developing the Deserved family of metrics that would invalidate this manner of combining the offensive and pitching components. Without being privy to the full nature of the adjustments made in these metrics, it’s hard to speculate on what if anything that might be, but I would be remiss in not raising the possibility that there’s something going on behind the curtain or that I have simply overlooked.

I’m not going to run a chart of all of the team values, because that would be infringing on BP’s property rights, and given the first paragraph of this post that would be practically unwise even if it were not morally objectionable. A few summary points provide defensible ground:

1) the average of the team DRC+s listed in the annual is 99.3 and the average of DRA-s is 99.5. Given that the figures are rounded to the nearest whole number (e.g. 99 = 99%), this is encouraging as we would expect the league average to be 100.

2) the average of the team dWin%s is .464. Less encouraging. As I was reading through the book, there were two team figures that really caught my eye and led me to this more formal examination. The first was Philadelphia, which had a dWin% of .580, ranking second in MLB. Their DRA- was 83, also second.

The Deserved family of metrics have always produced some eyebrow-raising results, which are difficult to evaluate objectively given the somewhat black box nature of the metrics and the complexity of the mathematical approach involved (I will be the first to admit that “mixed models” of the kind described are beyond my own mathematical toolkit). So it’s dangerous to focus too much on any particular result, as it may just be a vehicle by which to expose one’s own ignorance. As a second-generation sabermetrician, this is a particular nightmare, becoming the sportswriter you laughed at as a twelve-year old for dismissing RC/27 as impossibly complex and unintelligible.

Still, it is quite remarkable that the team which allowed the second-most park-adjusted runs per inning in the majors might actually have turned in the second-best performance. In fairness, it was a sixty-game season, so the deviation between underlying quality of performance and actual outcome could be enormous, and the East could have been the toughest of the three sub-leagues, especially in terms of balance as the Dodgers tip the scales West. Most significantly, it is just a pitching metric, and the Phillies defense was dreadful at turning balls in play into outs – they were last in the majors in DER at .619. Boston was at .623 and the next worst team was Washington at .642. Further, the East subleague combined for a .657 DER (the fourth-worst DER belonged to the Mets, and Toronto and Miami made it six of the bottom ten) compared to .685 for the Central and .684 for the West. It’s still hard to believe that the Phillies’ pitchers deserved to have the second-fewest runs allowed in the majors, but easy to buy that they performed much, much better than their runs allowed would suggest.

However, every factor that would explain how their pitching was actually second-best does nothing to explain how their overall deserved team performance was also second-best. Adjusting away terrible defensive support doesn’t mean that the team’s poor runs allowed weren’t deserved, it just means that the blame should be pinned on the fielders and not the pitchers. Again, it’s hard to pinpoint any exact criticism given the nature of the metrics, but this one is tough to accept at face value.

It also seems that if one had conviction in the result, it would show up in the narrative somewhere. There’s always been a disconnect between what BP statistics say and what their authors write, which owes partly to the ensemble approach to writing and presumably partly to the timing (the authors of team chapters probably start very soon after the season and without the benefit of the full spread of data that will appear in the book). Still, it seems as if this disconnect has increased with the advent of the deserved metrics, which often tell a very different story than even the mainstream traditional sabermetric tools (e.g. an EqA or a FIP, to refer to metrics previously embraced by BP). But I can assure you that if I believed the Phillies underlying performance as a team was actually second only to the Dodgers, I’d work that into any retrospective of their 2020 performance and forecast of their 2021.

The second team that caught my eye was the A’s, who posted a 103 DRC+, 98 DRA-, and .499 dWin%. The obvious disconnect between an above-average offense, above-average pitching, but sub-.500 deserved W% could be explained by defense. What can’t be explained is how a .499 dWin% ranks ninth in the majors, at least until you line up the thirty teams and see that the average is .464. While we can charitably assume that a combination of our own ignorance and the proprietary nature of the calculations can explain many odd results from the deserved stats, I don’t know what can satisfactorily explain a W% metric that averages to .464 for the whole league.

The hope is that this simply some scalar error, a fudge factor not applied somewhere. There is some evidence that this is the case – if you take the ratio of DRC+ to DRC- and plot against the ratio of dWin% to (1 – dWin%), you get a correlation of +0.974 and a pretty straight line, as you would expect given what should be in the vicinity of a Pythagorean relationship. It might even work out as you’d expect if dWin% is baking in fielding.

Still, it’s disappointing that the question has to be asked.

Wednesday, March 03, 2021

Rob Manfred: Run Killer

There are many “crimes against baseball” that one could charge Rob Manfred with, if one were inclined to use hyperbolic language and pretend that the commissioner had the sole authority to decide matters (I tend to neither but am guilty of seeking a more eye-catching post title):

* Attacking the best player in his sport for not going along with whatever horrible promotional scheme the commissioner had dreamed up

* Making a general mess of negotiations with the MLBPA

* Teaming up with authoritarian governments ranging from cities in Arizona to Leviathan itself to attempt to delay or prevent baseball from being played

* Claiming to be open to every harebrained scheme to reign in shifts, home runs, strikeouts, or whatever the current groupthink of the aesthetically-offended crowd finds most troublesome

From my selfish perspective as a sabermetrician, though, I will argue that the greatest crime of all is that he has rendered team runs scored and allowed totals unusable. The extra innings rule, which I doubt will ever go away even if seven-inning doubleheaders do, makes anything using actual runs scored incomparable with historical standards (in the sense of parameters of metrics rather than context). A RMSE error test of a run estimator against team runs scored? Can’t use it. Pythagenpat? Nope. Relief pitcher’s run average? Use with extreme caution.

Of course, I am not seriously suggesting that the ease with which existing metrics can be used should be a consideration in determining the rules of the game. But if you use these metrics, it is necessary to recognize that they are very much compromised by the rule.

So how can we adjust for it? I will start with a plea that the keepers of the statistical record (which in our day means sites like Baseball-Reference and Fangraphs) compile a split of runs scored and allowed in regulation and extra innings, as well as team innings pitched/batted in regulation and extra innings, and display this data prominently. Having it will allow for adjustments to be made that can at least partially correct, and more importantly increase awareness of the compromised nature of the raw data.

I want to acknowledge a deeper problem that also exists, and then not dwell on it too much even though it is quite important and renders the simple fixes I’m going to offer inaccurate. This is a problem that Tom Tango pointed out some time ago, particularly as it related to run expectancy tables – innings that are terminated due to walkoffs. In such innings, there are often significant potential runs left stranded on base, and so including these innings will understate the final number of runs one could expect. Tango corrected for this by removing these potential game-ending innings from RE calculations. It’s even more of a problem when it comes to extra innings, since rather than just being 1/18 of the half-innings of a regulation game, they represent 1/2 of the half innings of an extra inning game. This means that when we look at just extra innings, the number of potential runs lost upon termination of the game make up a significant portion of the total runs.

I gathered the 2020 data on runs scored by inning from Baseball-Reference, and divided each inning into regulation and extras. I did not, however, do this correctly, as the seven-inning doubleheader rule complicates matters. The eighth and ninth innings of a standard nine-inning game are played under very different circumstances than the eighth and ninth innings of a seven-inning doubleheader. I have ignored these games here, and treated all eighth and ninth innings as belonging to standard games, but this is a distortion. I didn’t feel like combing through box scores to dig out the real data as I’m writing this post for illustrative and not analytical purposes, but it buttresses my plea for the keepers of the data to do this. This is not solely out of my laziness (although I really don’t want to have to compile it myself), but also a recognition of the reality that many casual consumers of statistics will not even be cognizant of the problem if it is not made clear in the presentation of data.

Forging ahead despite these two serious data issues that remain unresolved (counting eighth and ninth innings of seven-inning doubleheaders as regulation innings rather than extra innings, and ignoring the potential runs lost due to walkoffs), I used the team data on runs by inning from Baseball-Reference to get totals for innings played and runs scored between regulation and extra innings. Note that these are innings played, not innings pitched, understating the true nature of the problem since almost most of the regulation innings include three outs (with the exception being bottom of the ninths terminated on walkoffs), a much greater proportion of the extra innings do not.


Expressed on the intuitive scale of runs per 9 innings, regulation innings yielded 4.80 runs, while extra innings were good for a whopping 8.40, a rate 75% higher. And no wonder, as Baseball Prospectus RE table for 2019 shows .5439 for (---, 0 out) and 1.1465 for (-x-, 0), a rate 111% higher. That we don’t see that big of a difference is due to an indeterminate amount to sample size and environmental differences (e.g. a high-leverage reliever is likely pitching in an extra inning situation, unless they have all been in the game already) but probably more significantly to the lost potential runs.

Considering all runs scored and innings, there were .5378 runs/inning or 4.84 R/9 in the majors in 2020, so even a crude calculation suggests a distortion of around 1% embedded in the raw data due to extra innings. Of course, the impact can vary significantly at the team level since the team-level proportion of extra innings will vary (1.25% of MLB innings played were extras, ranging from a low of 0.40% for Cincinnati to 3.44% for Houston).

How to correct for this? If the walkoff problem didn’t exist, I would suggest a relatively simple approach. After separating each team’s data into regulation and extra innings, calculate each team’s “pre-Manfred runs” as:

PMR = Runs in Regulation Innings + Runs in Extra Innings – park adjusted RE for (-x-,0)*Extra Innings

= Runs - park adjusted RE for (-x-,0)*Extra Innings

You could address the walkoff problem by adding in the park adjusted RE for any innings that terminated, but this gets tricky for two reasons:

1) it means that the simple data dividing runs and innings into “regulation” and “extra” is inadequate for the task; I doubt “potential runs lost at time of game termination” would ever find there way into a standard table of team offensive statistics

2) it overcorrects to the extent that the legacy statistics we have always used ignore the loss of those potential runs as well. Of course, the issue is more pronounced with extra innings as they represent a huge proportion of extra innings rather than a small one of regulation innings (and because the nature of Manfred extra innings increases the proportion of walkoffs within the subset of extra innings, since run expectancy is 111% higher at the start of a Manfred extra inning than at the start of standard inning).

Also note that when I say park-adjusted, I mean that the run expectancy would have to be park-adjusted not in order to normalize across environments, but rather to transform a neutral environment RE table to the specific park. I wouldn’t want to use “just” 1.1465 for Coors Field, but rather a higher value so that the PMR estimate can still be used in conjunction with our Coors Field park adjustment as the Rockies raw runs total would have been pre-2020. Another complication is that the standard runs park factor would likely overstate the park impact because of the issue of lost potential runs (they too would increase in expected value as the park factor increased).

The manner in which I attempted to adjust in my 2020 End of Season statistics was to restate everything for a team on a per nine inning basis, and then use the R/9 and RA/9 figures in conjunction with standard methodology. But this is also unsatisfactory – for instance, a Pythagorean estimate ceases to be an estimate of the team’s actual W%, but rather a theoretical estimate of what their W% would be if they played a full slate of nine inning games. The extra innings aren’t really a problem here, but the seven-inning doubleheaders are. As long as these accursed games exist, in order to develop a true Pythagorean estimate of team wins, one would have to estimate the exponent that would hold for a seven-inning game (Tango came up with a Pythagorean exponent of 1.57 through an empirical analysis; my theoretical approach would be to use the Enby distribution to develop theoretical W%s for seven-inning games for a representative variety of underlying team strengths in terms of runs and runs allowed per inning, then use this to determine the best Pythagenpat z value), and then use runs and runs allowed per inning rates to estimate separate W%s for seven- and nine-inning games, then weight these by the proportion of a team’s games that were scheduled of seven and nine innings.

I also took the unfortunate step of ignoring actual runs everywhere (as I mentioned in passing earlier, Manfred extra innings wreck havoc on reliever’s run averages), since the league averages are polluted by Manfred extra innings. Again, I am not advocating that sabermetric expediency drive the construction of the rules of baseball, but it is a happy coincidence that sabermetric expediency tracks in this case with aesthetic considerations. I should include a caveat about aesthetic considerations being in the eyes of the beholder, but the groupthink crowd that is now in the ascendancy rarely sees the need to do so. No surprise, as many also subscribe to the totalitarian thinking that is ascendant in the broader society. They’ll tell you all about it, and about what a terrible person you are if you dissent, for $25.19.