Tuesday, October 20, 2020

End of Season Statistics, 2020

At first I wasn’t sure if I would go through the trouble of putting together year-end statistics for a 60-game season. This reticence was due entirely to the small sample size, and not to any belief that the season itself was “illegitimate” or an “exhibition” or any of the other pejoratives that have been lobbed against it. On the contrary, I was thrilled that we got a season at all given the immense social and political pressure aligned against any attempt by human beings to engage in voluntary economic activity. 2020 has provided a glimpse into the near future, and it is indeed a boot stamping a human face forever.

But I digress. 60 games is a very small sample size when one is accustomed to 162, but the reason I decided to do this exercise anyway was simple: I wanted to better understand what happened in those 60 games. I did decide to cull some of the categories I usually look at, mostly to reduce the amount of effort necessary on my part to produce the statistics. Additionally, I had to make some changes to park factors and comparisons to league average which I will describe below.

It is common sabermetric practice to compare player performance to the league average. This is always a simplification of reality, and there are alternative options. Bill James argued in the original Historical Baseball Abstract that the true context for a player’s performance was his team’s games. This is true as it goes, and it conveniently allowed James to sidestep the issue of developing a complete set of historical park factors. On the other hand, we understand that the run environment for a given team is shaped by the quality of their hitters and pitchers. A batter on a team with excellent pitching will benefit in any kind of comparison to average from his teammate’s suppression of runs, but so would a player from another team were they to switch places. Since the usual goal of these exercises is to facilitate context-neutral (this is a loaded term, as there are many shades to a claim of context neutrality which I will not address here) comparisons between players, we typically decide that it is preferable to place the player’s performance in a wider league context, even if that context now includes the events of thousands of games in which the player in question did not participate.

We could split the difference, and one could argue that perhaps we should. We could build a custom “league context” for each team based on their schedule. Were the schedule perfectly balanced, this would not be necessary; alas, the general trend in MLB has been for schedules to become more unbalanced over time. We could, in a typical season, construct a context in which to evaluate Indians players in which 19/162 of the non-Indians portion consists of the Twins, and another 19/162 for each of the Royals, White Sox, and Tigers, and 6/162 for the Reds, and 7/162 for the Yankees, and 6/162 for the Orioles, etc. based on the number of games each team actually plays with each opponent. This is opposed to the league average, in which we simply compare to the entire AL, which implicitly assumes balance and ignores the existence of interleague games altogether.

I am not advocating for all of this complexity, and the approach I just sketched out is insufficiently refined to work in practice. The point I’m trying to make is that the league context is not exactly right, but it is a useful approximation, and with a semi-balanced schedule it makes sense.

When does it not make sense? In 2020, when there is not any semblance of a balanced schedule. The games played by teams in the AL Central bear no relation to those played by teams in the AL East or the AL West, because there are no games or opponents in common. To compare to the AL or NL average in 2020 is not a useful simplification – it is an outright abrogation of any attempt to accurately model the world as it is.

Thus I will be replacing the traditional AL/NL breakdown of the statistics with an East/Central/West breakdown. All comparisons to league average will compare to one of the three divisions, rather than the two leagues. Of course, 2/3 of any given team’s games were against their league brethren and only 1/3 against teams from the other circuit, so this is still a simplification of reality – but no more so than the use of league average in a normal season with an unbalanced schedule. In fact, in a way of looking at things which in my opinion is damning to the wildly unbalanced schedule used in a typical season, teams played close to the same share of their games against their intra-”league” opponents than they normally do (for this example, treating the NL Central as intraleague opponents):


Of course, there are problems associated with using the three divisions as leagues for the purpose of statistical comparisons. The big one is that we all know that league quality is not necessarily equal, or even close to equal, between the AL and NL; even less so as you divide the teams further into E/C/W, partly due to making the units even smaller. I ignore this when dealing with AL/NL; for instance, in ranking players by runs above average, I don’t try to account for league strength, and I’ve also ignored it here. This is a distortion – if we for the sake of argument assume that the Central is weaker than the East, a player who is +10 runs relative to the Central would be less valuable in a truly context-neutral comparison than one who is +10 runs relative to the East. This goes for comparisons to replacement level as well, as one of the assumptions of replacement level comparisons is that a replacement player can be obtained at zero marginal cost and thus is equally available to any team in MLB.

Usually I gloss over these simplifying assumptions without discussion; I wanted to call them out here because the non-traditional approach makes the simplifications more glaring. In short, I will pretend there are three leagues for all calculations that require the use of league average, but I will still group the reports into AL/NL.

The other thorny issue for 2020 is park factors. One complication is hinted at by the discussion above regarding balanced schedules; what happened in games at Yankee Stadium in 2020 is completely irrelevant to the park factor for Safeco Field. The set of parks that comprise road games for any team in 2020 is very different than that which fed their historical park factor calculations.

But you can also go crazy trying to be too precise on a question like this, so I have gone for what I consider to be a less than optimal but defensible approach. It is highly subjective, but it makes sense to me, and the whole purpose of these reports is for me to calculate the statistics that I want to see – if I wanted someone else’s ideal statistics, I could save a lot of effort and find that in other places.

The first step in estimating park factors was to choose a home/road runs ratio to use as a starting point for each park. In so doing, I completely threw out 2020 data except when it was absolutely necessary (the new park in Texas and the Buffalo park used by the Blue Jays). This was simply to avoid having to deal with not just a different number of “home” and “road” games by team, but also the issue of seven-inning games, artificially inflated runs totals from extra inning games, and the like. Hopefully these are just temporary issues, although it seems there is momentum for the scourge of the latter infecting the game on a permanent basis. Should they become permanent issues, I will think of solutions – but it’s not worth it for this 60 game look.

So what I’ve done is for each extant major league park, I’ve used up to four years of data from 2016-2019 (up to because some parks didn’t exist for the entire span) and calculated the ratio of home RPG to road RPG. Then I have regressed this ratio. The regression coefficients were based on an updated study I did of the slope of the linear regression equation using home/road RPG ratios from 1-5 years of data as the independent variable and the home/road RPG ratio for the subsequent season as the dependent variable. This is similar to how the regression coefficients I have used in the past were derived, but the previous ones were only rules of thumb rather than the actual coefficients, and were based on a study by MGL and not my own work. The results are broadly similar, although they give slightly less weight to 4-5 years of historical data. I will do a proper write-up of the study and provide the dataset at a later date, but for now it must suffice to say that I used those results to come up with an equation based on games which is .1364*ln(G/162) + .5866 (the rounded results I’m actually going to apply based on count of seasons is .59 for one year, .68 for two years, .74 for three years, and .78 for four years).

Let me walk through the park factor calculations for the Blue Jays. They played 26 games at home, which had a 11.35 RPG, and 34 on the road, which had a 9.38. So their initial home/road run ratio is 1.209, but over 60 games we only weight that at .1364*ln(60/162) + .5866 = .451 (of course, their actual ratio should be given even less credibility because the equation inherently assumes a 50/50 split of home and road games). So the run ratio we use as a starting point for their park factor is 1.209*.451 + 1.00*(1 - .451) = 1.094.

To finish it off for application to full season statistics, we need to take halve the difference between the calculated ratio and 1.0. Well, usually we halve it, since half of teams’ games are at home or close enough. But in 2020 there was enough weirdness in this regard (especially for the Blue Jays) that I used the team’s actual percentage of home games. In the Blue Jays case this is 26/60, so their final park factor is 1.094*26/60 + 1*34/60 = 1.04

There are a couple things I could have done to refine this a little, but when rounding to two decimal places they don’t have a huge impact. One would be to use something other than 1.00 as the “road” park factor, and another would be to ensure that the average for each of the E/C/W is 1.00, or doing the latter only as a shortcut. Since they round to 1.00 for each of the E/C/W when rounded to two places, that’s close enough for me. We also could have also used inning as a denominator rather than games, but it’s more work than I’m willing to put in for analyzing a sixty-game season.

All data was gathered from various pages on Baseball-Reference. The League spreadsheet is pretty straightforward--it includes league totals and averages for a number of categories, most or all of which are explained at appropriate junctures throughout this piece. The advent of interleague play has created two different sets of league totals--one for the offense of league teams and one for the defense of league teams. Before interleague play, these two were identical. I do not present both sets of totals (you can figure the defensive ones yourself from the team spreadsheet, if you desire), just those for the offenses. The exception is for the defense-specific statistics, like innings pitched and quality starts. The figures for those categories in the league report are for the defenses of the league's teams. However, I do include each league's breakdown of basic pitching stats between starters and relievers (denoted by "s" or "r" prefixes), and so summing those will yield the totals from the pitching side. I have broken this down by E/C/W and A/N/MLB.

I added a column this year for “ActO”, which is actual (rather than estimated) outs made by the team offensively. This can be determined from the official statistics as PA – R – LOB. I have then replaced the column I usually show for league R/G (“N”) with R/9, which is actually R*27/ActO, which is equivalent to R*9/IP. This restates the league run average in the more familiar per nine innings. I’ve done the same for “OG”, which is Outs/Game but only for those outs I count in the individual hitter’s stats (AB – H + CS) ,“PA/G”, which is normally just (AB + W)/G, and “KG” and “WG” (normally just K/G and W/G) – these are now “O/9”, “PA/9”, still “KG”/”WG” and are per 27 actual outs.

The Team spreadsheet focuses on overall team performance--wins, losses, runs scored, runs allowed. The columns included are: Park Factor (PF), Winning Percentage (W%), Expected W% (EW%), Predicted W% (PW%), wins, losses, runs, runs allowed, Runs Created (RC), Runs Created Allowed (RCA), Home Winning Percentage (HW%), Road Winning Percentage (RW%) [exactly what they sound like--W% at home and on the road], R/9, RA/9, Runs Created/9 (RC/9), Runs Created Allowed/9 (RCA/9), and Runs Per Game (the average number of runs scored an allowed per game). For the offensive categories, runs/9 are based on runs per 27 actual outs; for pitching categories, they are runs/9 innings.

I based EW% and PW% on R/9 and RA/9 (and RC/9 and RCA/9) rather than the actual runs totals. This means that what they are not estimating what a team’s winning percentage should have been in the actual game constructions that they played, but what they should have been if playing nine inning games but scoring/allowing runs at the same rate per inning. EW%, which is based on actual R and RA, is also polluted by inflated runs in extra inning games; PW%, which is based on RC and RCA, doesn’t suffer from this distortion.

The runs and Runs Created figures are unadjusted, but the per-game averages are park-adjusted, except for RPG which is also raw. Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. The formula is:

A = H + W - HR - CS

B = (2TB - H - 4HR + .05W + 1.5SB)*.76

C = AB - H

D = HR

Naturally, A*B/(B + C) + D.

In order to actually estimate runs scored accurately for 2020, one would need to use .83 as the B multiplier. When I first saw the discrepancy between actual and estimated runs, I was alarmed; the formula isn’t perfect, of course, but usually it doesn’t vary much from year to year. Then I realized that the biggest reason for this was likely the distortion caused by extra inning games. As such, I’ve kept with my standard formula, but we won’t be able to compare a player’s estimated runs directly to the league average. Keep in mind that any statistic based on actual runs is also contaminated. Should something like the current extra inning rule become a permanent fixture in MLB, it will be necessary to make adjustments to every metric that uses runs as an input. The extent to which the easy extra inning runs distort the statistics is something I did not fully grasp until actually sitting down and going through the year end stats exercise.

The easy runs are everywhere, and they cannot easily be removed - should the rule become permanent, I think the easiest solution will be to make “regulation” runs the starting point, and then tack on extra inning runs less the run expectancy for a man on second, nobody out times the number of extra innings. Even that is a poor solution, as it only exacerbates the distortions caused by the early termination of innings with walkoffs that Tom Tango noted some time ago. Since extra innings will be fewer under such a rule, a higher percentage of them will be walkoff-truncated than otherwise would be the case.

There are also Team Offense and Defense spreadsheets. These include the following categories:

Team offense: Plate Appearances, Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Walks and Hit Batters per At Bat (WAB), Isolated Power (SLG - BA), R/G at home (hR/G), and R/G on the road (rR/G) BA, OBA, SLG, WAB, and ISO are park-adjusted by dividing by the square root of park factor (or the equivalent; WAB = (OBA - BA)/(1 - OBA), ISO = SLG - BA, and SEC = WAB + ISO).

Team defense: Innings Pitched, BA, OBA, SLG, Innings per Start (IP/S), Starter's eRA (seRA), Reliever's eRA (reRA), Quality Start Percentage (QS%), RA/G at home (hRA/G), RA/G on the road (rRA/G), Battery Mishap Rate (BMR), Modified Fielding Average (mFA), and Defensive Efficiency Record (DER). BA, OBA, and SLG are park-adjusted by dividing by the square root of PF; seRA and reRA are divided by PF.

The three fielding metrics I've included are limited it only to metrics that a) I can calculate myself and b) are based on the basic available data, not specialized PBP data. The three metrics are explained in this post, but here are quick descriptions of each:

1) BMR--wild pitches and passed balls per 100 baserunners = (WP + PB)/(H + W - HR)*100

2) mFA--fielding average removing strikeouts and assists = (PO - K)/(PO - K + E)

3) DER--the Bill James classic, using only the PA-based estimate of plays made. Based on a suggestion by Terpsfan101, I've tweaked the error coefficient. Plays Made = PA - K - H - W - HR - HB - .64E and DER = PM/(PM + H - HR + .64E)

For the individual player reports, I only bothered with starting pitchers and batters, since the sample size for relief pitchers was minuscule. For starting pitchers, I included all pitchers with at least six starts. The categories presented are stripped down from prior years, and I included all in one spreadsheet rather than splitting up by league.

For starting pitchers, the columns are: Innings Pitched, Estimated Plate Appearances (PA), RA, ERA, eRA, dRA, KG, WG, G-F, %H, RAA, and RAR. RA and ERA you know--R*9/IP or ER*9/IP, park-adjusted by dividing by PF. The formulas for eRA and dRA are based on the same Base Runs equation and they estimate RA, not ERA.

* eRA is based on the actual results allowed by the pitcher (hits, doubles, home runs, walks, strikeouts, etc.). It is park-adjusted by dividing by PF.

* dRA is the classic DIPS-style RA, assuming that the pitcher allows a league average %H, and that his hits in play have a league-average S/D/T split. It is park-adjusted by dividing by PF.

The formula for eRA is:

A = H + W - HR

B = (2*TB - H - 4*HR + .05*W)*.78

C = AB - H = K + (3*IP - K)*x (where x is figured as described below for PA estimation and is typically around .93) = PA (from below) - H - W

eRA = (A*B/(B + C) + HR)*9/IP

To figure dRA, you first need the estimate of PA described below. Then you calculate W, K, and HR per PA (call these %W, %K, and %HR). Percentage of balls in play (BIP%) = 1 - %W - %K - %HR. This is used to calculate the DIPS-friendly estimate of %H (H per PA) as e%H = Lg%H*BIP%.

Now everything has a common denominator of PA, so we can plug into Base Runs:

A = e%H + %W

B = (2*(z*e%H + 4*%HR) - e%H - 5*%HR + .05*%W)*.78

C = 1 - e%H - %W - %HR

cRA = (A*B/(B + C) + %HR)/C*a

z is the league average of total bases per non-HR hit (TB - 4*HR)/(H - HR), and a is the league average of (AB - H) per game. I’ve used the MLB average for both this year, and have defined a as the league average of (AB – H) per 9 innings rather than per game.

Also shown are strikeout and walk rate, both expressed as per game. By game I mean not nine innings but rather the league average of PA/G (although for 2020 I’m actually using the major league average AB+W per 9 innings which was 37.9). I have always been a proponent of using PA and not IP as the denominator for non-run pitching rates, and now the use of per PA rates is widespread. Usually these are expressed as K/PA and W/PA, or equivalently, percentage of PA with a strikeout or walk. I don’t believe that any site publishes these as K and W per equivalent game as I am here. This is not better than K%--it’s simply applying a scalar multiplier. I like it because it generally follows the same scale as the familiar K/9.

To facilitate this, I’ve finally corrected a flaw in the formula I use to estimate plate appearances for pitchers. Previously, I’ve done it the lazy way by not splitting strikeouts out from other outs. I am now using this formula to estimate PA (where PA = AB + W):

PA = K + (3*IP - K)*x + H + W Where x = league average of (AB - H - K)/(3*IP – K), using one value for the entire majors

Then KG = K*Lg(PA/G) and WG = W*Lg(PA/G).

G-F is a junk stat, included here out of habit because I've been including it for years. It was intended to give a quick read of a pitcher's expected performance in the next season, based on eRA and strikeout rate. Although the numbers vaguely resemble RAs, it's actually unitless. As a rule of thumb, anything under four is pretty good for a starter. G-F = 4.46 + .095(eRA) - .113(K*9/IP). It is a junk stat. JUNK STAT JUNK STAT JUNK STAT. Got it?

%H is BABIP, more or less--%H = (H - HR)/(PA - HR - K - W), where PA was estimated above.

The two baselined stats are Runs Above Average (RAA) and Runs Above Replacement (RAR). Different baselines are used for starters and relievers, although these should be updated given the changes in usage patterns that have taken place since I implemented the adjustment in 2015. It was based on patterns from the previous (circa 2015) several seasons of league average starter and reliever eRA. Thus it does not adjust for any advantages relief pitchers enjoy that are not reflected in their component statistics. This could include runs allowed scoring rules that benefit relievers (although the use of RRA should help even the scales in this regard, at least compared to raw RA) and the talent advantage of starting pitchers. The RAR baselines do attempt to take the latter into account, and so the difference in starter and reliever RAR will be more stark than the difference in RAA.

For 2020, given the small sample sizes and the difficulties of looking at the league average of actual runs, I decided to use eRA to calculate the baselined metrics. So they are no longer based on actual runs allowed by the pitcher, but rather on the component statistics. Since I’m using eRA rather than dRA, I am considering the actual results on balls in play allowed by the pitcher rather than a DIPS approach. Also, the baseline used is based on the division average eRA, not the league average.

RAA (starters) = (1.025*Div(eRA) - eRA)*IP/9

RAR (starters) = (1.28*Div(eRA) – eRA)*IP/9

RAA and RAR are then rescaled so that the three divisions are all on a (roughly) equal win basis. This is a step I don’t normally take – I usually publish the figures expressed in units of runs, without any adjustment for win value. Normally, I intend for the statistics to be viewed in the context of the particular league-season in question, and thus no adjustment for the win value of runs is necessary. However, with three different run contexts being used in each league, it is necessary in 2020. Why did I not convert them to WAR?

1. I have always preferred, when possible, to leave things denominated in runs, if for no reason deeper than its easier to work without the decimal place; I’d rather write “37 RAR” than “3.7 WAR”. 2. In order to actively describe the figures as WAR, I would have to make what is not an altogether obvious decision about what the runs per game context used to determine runs per win actually is. It can’t be based on figures that include easy runs without any adjustment. For the purpose of these statistics, I did not want to try to figure out a corrected runs figure. That is something that will be necessary if the rule becomes permanent.

So what I did instead was simply take the same figure I used as the baseline for RAA or RAR, the eRA of the division as a whole, and divided the initial estimate of RAR by it, then multiplied by the major league average eRA. This puts all the divisions on an equal basis if one assumes 1) that the division/MLB average eRA is a reasonable substitute for whatever the true run scoring rate for the league should be (and I think it is a reasonable substitute) and 2) given the assumption in 1, it means the practical assumption we are making is that RPW = RPG (multiplying by 2 is lost in the wash), which is not the most accurate way to estimate RPW but is a very simple one that also has the neat property of corresponding to assuming a Pythagorean exponent of 2. It is not what I would do as a best practice, but I think it is an acceptable approximation under the circumstances. So the final formulas for RAA and RAR are:

RAA = (1.025*Div(eRA) - eRA)*IP/9/Div(eRA)*MLB(eRA)

RAR = (1.28*Div(eRA) – eRA)*IP/9/Div(eRA)*MLB(eRA)

All players with 100 or more plate appearances (official, total plate appearances) are included in the Hitters spreadsheets (along with some players close to the cutoff point who I was interested in). Each is assigned one position, the one at which they appeared in the most games. The statistics presented are: Games played (G), Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Runs Created (RC), Runs Created per Game (RG), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), and Runs Above Replacement (RAR).

PA is defined simply as AB + W+ HB. Outs are AB - H + CS. BA and SLG you know, but remember that without SF, OBA is just (H + W + HB)/(AB + W + HB). Secondary Average = (TB - H + W + HB)/AB = SLG - BA + (OBA - BA)/(1 - OBA). I have not included net steals as many people (and Bill James himself) do, but I have included HB which some do not.

BA, OBA, and SLG are park-adjusted by dividing by the square root of PF. This is an approximation, of course, but I'm satisfied that it works well. The goal here is to adjust for the win value of offensive events, not to quantify the exact park effect on the given rate. I use the BA/OBA/SLG-based formula to figure SEC, so it is park-adjusted as well.

Runs Created is actually Paul Johnson's ERP, more or less. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available.

RC = (TB + .8H + W + HB - .5IW + .7SB - CS - .3AB)*.310

RC is park adjusted by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the Runs Created per Game rate, is RC/O*25.5. I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).

The baselined stats are calculated in the same manner the pitcher stats are, except here using the division’s RG as the reference level, then with an adjustment to bring the three divisions onto the same win-value basis by using the ratio of division RG to MLB RG:

HRAA = (RG – divRG)*O/25.5/divRG*MLB(RG)

RAA = (RG – divRG*PADJ)*O/25.5/divRG*MLB(RG)

RAR = (RG – divRG*PADJ*.73)*O/25.5/divRG*MLB(RG)

PADJ is the position adjustment, based on 2010-2019 offensive data. For catchers it is .92; for 1B/DH, 1.14; for 2B, .99; for 3B, 1.07; for SS, .95; for LF/RF, 1.09; and for CF, 1.05. As positional flexibility takes hold, fielding value is better quantified, and the long-term evolution of the game continues, it's right to question whether offensive positional adjustments are even less reflective of what we are trying to account for than they were in the past. But while I do not claim that the relationship is or should be perfect, at the level of talent filtering that exists to select major leaguers, there should be an inverse relationship between offensive performance by position and the defensive responsibilities of the position. Not a perfect one, but a relationship nonetheless. An offensive positional adjustment than allows for a more objective approach to setting a position adjustment. Again, I have to clarify that I don’t think subjectivity in metric design is a bad thing - any metric, unless it’s simply expressing some fundamental baseball quantity or rate (e.g. “home runs” or “on base average”) is going to involve some subjectivity in design (e.g linear or multiplicative run estimator, any myriad of different ways to design park factors, whether to include a category like sacrifice flies that is more teammate-dependent).

The replacement levels I have used here are very much in line with the values used by other sabermetricians. This is based both on my own "research", my interpretation of other's people research, and a desire to not stray from consensus and make the values unhelpful to the majority of people who may encounter them.

Replacement level is certainly not settled science. There is always going to be room to disagree on what the baseline should be. Even if you agree it should be "replacement level", any estimate of where it should be set is just that--an estimate. Average is clean and fairly straightforward, even if its utility is questionable; replacement level is inherently messy. So I offer the average baseline as well.

For position players, replacement level is set at 73% of the positional average RG (since there's a history of discussing replacement level in terms of winning percentages, this is roughly equivalent to .350). For starting pitchers, it is set at 128% of the league average RA (.380), and for relievers it is set at 111% (.450).

The spreadsheets are published as Google Spreadsheets, which you can download in Excel format by changing the extension in the address from "=html" to "=xlsx", or in open format as "=ods". That way you can download them and manipulate things however you see fit.

2020 League

2020 Park Factors

2020 Teams

2020 Team Defense

2020 Team Offense

2020 Starters

2020 Hitters

Sunday, October 18, 2020

In Defense of the Astros

I am glad to be writing this post as an irrelevant digression rather than a timely attempt to wrap one’s mind around the possibility of a team with a 29-31 regular season record winning the World Series. My standard outlook when watching the playoffs is to pull for the better team. This is not an ironclad rule, as I do allow my various prejudices (fandom, a favorite player, a least favorite player, style of play, etc.) to override the general principle. I also do this despite being unable to be truly surprised by any outcome of a seven-game series between any two legitimate major league baseball teams. That it does not at all surprise me when inferior teams win series against superior teams, and while it can occasionally be fun to see this happen just to see the egg on the face of people who wildly overestimate the probability of the better team winning (for me, the 2010-11 Phillies were the archetype of this phenomenon), I generally find it disappointing.

After Houston tied the series at three, though, and as I read some online reactions to the possibility of them appearing in the World Series, I decided it would be worthwhile to explain at length why I do not feel that they would be an illegitimate champion should it come to pass. The fact that it did not come to pass makes this discussion academic, but look at where you are. One thing that should be said upfront is that given the emotion generated by the Astros, it is impossible to disentangle how much of the backlash to the possibility of them winning was rooted in genuine concern about the legitimacy of the outcome of the MLB season (concerns that, I have argued, would have been much better channeled towards the possibility of Miami advancing deep into the playoffs), and how much is simply rationalization for the often disproportionate reaction to the sign stealing scandal. There are also people who are looking for any reason to discount the 2020 season, whether out of allowing the perfect to be the enemy of the good (I just cannot relate to people who would rather see no season than 60 games; neither can the bank accounts of the players), or out of darker authoritarian and totalitarian impulses. Arguments borne from there are outside the scope of this piece – this is about the implications of a 29-31 team appearing in or winning the World Series.

The first question that should be asked is “How good are the Astros, really?” My answer is – almost certainly better than 29-31. Why do I say that?

1. The Astros underlying performance was better than their win-loss record

The Astros scored 4.80 and allowed 4.72 runs per nine innings, which in a world of nine-inning games would be expected to result in a winning percentage of .508. The Astros tallied 4.58 runs created and 4.50 runs created allowed per nine innings, which in that same world would also be expected to be allowed in a winning percentage of .508.

Of course, this is not significantly or in any way meaningfully different than their actual W% of .483, but if you have a psychological hangup on a sub-.500 team winning a pennant, their underlying performance was ever so slightly better than that of a .500 team, and to the extent that these tiny differences mean anything, they should mean more the smaller the sample size of games actually is.

2. The Astros win-loss record would likely have been over .500, even at their same level of performance, had a normal schedule been played

Last year, the AL West was the best division in MLB, with an average schedule-adjusted W% of .527 (of course, the Astros as the best team in baseball contributed much to this). The NL West was third at .510. While such calculations cannot be made for 2020 because of the lack of inter-geographic games (we can add the playoffs in, but they don’t provide a sufficient sample to be relied upon, although it is hard not to notice the complete playoff decimation of the Central), my guess is that playing a western-only schedule was tougher than the average schedule faced by a MLB team. This also suggests a slightly better than .500 true performance.

3. Most importantly, the Astros’ true talent was significantly better than that of a .483 team

Some time before the season, I grabbed the win projections from three sources that I think do quality work on these sorts of questions: Baseball Prospectus, Clay Davenport, and Fangraphs. Their win projections for the Astros were 36, 36, and 35. Respectively, that put them second, first, and tied for first in the AL (I should note, I didn’t record the date I jotted down these numbers, so they may not have represented the final estimate from each source).

Of course, things changed, most notably the injuries to Justin Verlander and Yordan Alvarez, but these are insufficient to take a 35 win team to under .500. Some Houston hitters (particularly Jose Altuve) had poor seasons that might have recalibrated estimates of their true talent, but how much stock can you put in sixty games of additional data, and there are players going the other direction too (Framber Valdez is a big one). It seems highly likely that the Astros underlying true talent was that of a winning team.

I note this not to self-aggrandize (heavens knows I’m wrong about these things more than I’m right), but rather to cover myself in case anyone looks back at my 2020 Predictions post, I actually did not pick the Astros to make the playoffs under the traditional format, but did once the field was expanded. While I did not expound on my logic, I did think the Astros were more vulnerable than the purely objective approaches did, mainly because I thought that a starting staff anchored by two pitchers older than me was not a foolproof plan (even if they are Hall of Famers to be). I actually picked the Rays to win the AL pennant, which didn’t take any particular insight but viewed in the wrong light could discredit the opinions I’m expressing here.

If you now will accept the premise that the Astros 1) performed like a winning team 2) would have been a winning team against a normal schedule and 3) could expect to perform better over any stretch of future games due to their underlying talent, are the Astros in fact a beneficiary of a sixty-game season – or a victim? I would argue the latter – the 2020 Astros were a good team that happened to have a mediocre record, largely caused by a season with a schedule that was too short and too unbalanced.

I don’t know what the purpose of a playoff system is (actually I do – it’s to make money). Is it an attempt to reward the teams that performed the best in the regular season? (I’m pretty sure it’s not). Is it an attempt to identify the best team, treating the regular season only as a means of culling the herd? (I think many fans feel this way, which I think is crazy). In reality it is likely some combination of all three. But if the point of a playoff system is to use the regular season to cull the herd, then attempt to identify the best team, I am far from convinced that the Astros should not have been a beneficiary of such a system. And while I would generally consider such a system foolhardy, it is certainly more defensible when the regular season used to do the culling is only sixty games – we should have much less confidence that the results of the sixty games should count for more than the results of the playoffs than we normally would be.

To put this into quantifiable terms, let’s suppose that we decide that in order for a champion to be legitimate, they have to meet some minimal standard of competence. We could then test the worthiness of a champion by calculating the probability that a team that met that minimum standard of competence would have compiled a record equal or worse than the team in question.

For example, let’s say that we decree that in order to be a worthy champion, a team should be at least a true .556 W% team. This is of course an arbitrary value; I’ve chosen it because it corresponds to a 90 win team, which is a value I have always subjectively thought of as marking a legitimate contender. So under this argument, we will accept the playoff result as legitimate, to the extent that the champion it produces was in fact a 90 win quality team.

Using the binomial distribution, it is simple to calculate the probability that a team with a given W% would have a record of 29-31 or worse. The probability of a .556 team going 29-31 or worse over sixty games is 15.8%. Contrast this with the Rays, who were 40-20, which would happen with probability 97.0%.

What would an equivalent performance to Houston’s 29-31 be over the course of 162 games? A .556 team would win 83 or fewer with a probability of 14.9% and 84 or fewer with a probability of 18.9%. The 2006 Cardinals won the World Series after going 83-78 (darn that missing game; the probability for 83-78 is 17.0%). So if you demand that your world champion be at least a .556 team, you could plausibly argue that the Astros would have been the least worthy champion in history. (I would in fact argue that the Astros were more worthy under this criteria based on the considerations discussed above, particularly strength of schedule since it remains grounded in actual wins and losses rather than component performance or true talent).

But .556 may well be setting the bar too low. Is it too much to demand that a world champion be a .600 team (equivalent to about 97 wins)? One could certainly argue that it is not – the champion should be a team that has demonstrated excellence, not simply a contender.

The probability that a .600 team would win 29 or fewer out of 60 is 4.4%. Conveniently, the probability that a .600 team would win 86 or fewer out of 162 is 4.4%. So under this formulation, the Astros could be seen as an equally worthy champion to a team that went 86-76 over a full season. The 2006 Cardinals are joined by the 1987 Twins (85-77) in not clearing this bar, and the 2014 Giants (88-74) were close as well. Of course, in all cases I would argue that we should at least adjust actual wins for strength of schedule, but I think this suffices to make the point. One can make a very logical case that a 29-31 team would not in fact have been the least worthy World Series winner in history. (I am only going to hint at the low hanging fruit offered by the sixty-game record of the reigning world champions who denied a much more worthy team called the Astros in 2019).

Saturday, October 10, 2020

Meanderings

* I used to get emails when there were comments awaiting moderation. This stopped at some point, and so there were a handful of non-spam comments that had been lingering for some time. I want to thank Tango Tiger and David Pinto (along with a couple anonymous readers) for their comments and apologize for neglecting to publish them until now.

* I’ve watched the overwhelming majority of playoff games played since 1997, and I think Game 5 of the TB/NYA ALDS was possibley one of the ten best games I remember. I say I think because a) recency bias is real and 2) I haven’t sat down and comprehensively reviewed past games to make sure I didn’t miss any. Some that stood out off the top of my head are:

1997 ALDS Game 4 (CLE/NYA)

1997 ALCS Game 6 (CLE/BAL)

1999 NLCS Game 5 (NYN/ATL)

2001 WS Game 7 (ARI/NYA)

2003 ALCS Game 7 (NYA/BOS)

2004 ALCS Game 4 (BOS/NYA)

2005 NLDS Game 4 (HOU/ATL)

2005 NLCS Game 5 (STL/HOU)

2006 NLCS Game 7 (STL/NYN)

2011 WS Game 6 (STL/TEX)

2012 NLDS Game 4 (WAS/STL)

2017 WS Game 5 (HOU/LA)

I think the Rays/Yankees tilt belongs in the company of those games. So imagine my surprise when I perused some comments online and saw people using the game as an occasion to recite their evergreen complaints about modern baseball, particularly in this case focused on the fact that the game wasn’t decided by the starting pitchers, and that all of the runs scored on home runs.

In looking at the above list of eleven games that were particularly memorable, you know what I can tell you about only a handful of them – the identities of the starting pitchers (Nagy/Mussina, Schilling/Johnson, Pedro...that’s about all I got). If you are complaining that pitchers rarely complete games in October, well, you missed the boat twenty-five years ago. While one may aesthetically prefer games determined by starters, I think the Rays/Yankees game is an odd one to find flaw with on that front. The fact of the matter is that the playoff format preordained that a decisive game would not be decided by the starters. Gerrit Cole started on short rest and made about as many pitches as could have reasonably been expected. And the lack of an obvious choice of starter actually contributed to one of the great features of this game, namely Kevin Cash’s perfectly executed plan to essentially use one of his best pitchers for each time through the order.

As to the aesthetics of the home run/strikeout game, I think there is a lot of projection going on. People know that they don’t fancy high-HR, high-K baseball, which is certainly their prerogative, but they pretend to know universally what all potential consumers of baseball find aesthetically pleasing. I have not seen any convincing evidence that the current style of baseball is driving people away from the game – arguments that focus on TV ratings are guilty of presupposing that a multi-faceted phenomenon can be boiled down to the stated aesthetic preferences of those who advance them.

You know what wouldn’t have made that game any more memorable? If five strikeouts had been replaced by five ground balls rolled over to second.

When I flip over to the NBA Finals for a few minutes and see Anthony Davis shooting threes, I think that I would much rather see Hakeem Olajuwon and Patrick Ewing battling each other in the post. Whenever I watch a NFL game, I think about how much more pleasing it would be to watch if teams occasionally lined up in the I or the Pro Set on 2nd & 7, and the quarterback was under center unless it was third & long. But I don’t suppose that my own preferences in these regards are indicative of those of the audience at large, or somehow representative of the manner in which basketball or football “should” be played.

Do the aesthetic flaws I find with those games reduce my interest in them from previous levels? Sure, although the primary reason I watch less NBA or NFL than I did previously is that as other obligations and interests reduce the total time I have available to devote to sports, I choose to achieve that by holding the time I allot to baseball reasonably constant, thereby crowding out the lesser sports. Even if the games returned to styles that I preferred, my time investment would not significantly increase – and I think the same is true for many of the complainers about baseball. In fact, I think many of them use the aesthetic argument as an excuse. For whatever reason, many of them have less time to devote to baseball, or feel that it is in someway childish or otherwise a waste of time. Aesthetics makes a nice excuse to justify to yourself why you invested all that time in the past (it was a different game!) or to try to not save face with your internet baseball friends who you fear are judging you for abandoning the thread that brought you together in the first place.

* I am strongly opposed to expanded playoffs. Yet I find hand-wringing about the Astros advancing to the ALCS to be bizarre. Yes, I realize that the Astros carry with them baggage for reasons beyond their 2020 performance, but these lamentations are ostensibly grounded in the fact that they had a 29-31 regular season record.

The fact of the matter is that a sixty-game season is very much incapable of producing the same level of certainty about a team’s quality that a 162 game season can. It should not surprise anyone that a good team could have a sub-.500 record over a sixty games. As fiercely opposed to expanded playoffs as I am, the fewer games are played in the regular season:

a) the more justification there is for an expanded playoff field b) more importantly, the more a team’s performance in the playoffs should change our perception of their actual quality

When the 2006 Cardinals just barely scraped over .500 and went on to win the World Series, I would argue that their performance in the playoffs should have positively impacted our perception of their quality – but only slightly so. Their mediocre record spoke more to their quality than their eleven playoff wins. But if the Astros obtain comparable success, it should provide a much greater positive lift to our perception.

In the preceding paragraphs I have been discussing the matter as if the 2020 regular season was the only information we had by which to gauge the Astros true talent. Of course, this is untrue, and I would argue that the Astros are a good baseball team (even with the injury to Justin Verlander which couldn’t be factored into pre-season assessments) that had a poor sixty games. Another playoff team, one that will be a very trendy pick for 2021, that had the inverse of Houston’s record (31-29), is one to which I would make the opposite argument. The Miami Marlins are a bad baseball team that had a lucky win-loss record over sixty games despite playing like a bad baseball team. (Their Pythagenpat record, based on runs per nine innings, was only .431; based on runs created, only .417. I know some of you are screaming right now about the 29 runs allowed in one game, but of course you can’t just throw that out – perhaps it should be truncated, but it can’t be ignored). The Marlins on paper before the season projected to be one of the worst teams in the NL. I can’t imagine a better under bet on team wins for 2021.

Yet because of a measly two game difference in their records, the Astros get scorn for advancing, while a Marlins advance would have been treated as a heart-warming story. In my world, the inverse is true. There’s no team I wanted out of the playoffs more than the Marlins; while I would have preferred that the A’s had beaten the Astros, and would prefer the Rays to do so, Houston’s success to date is a fantastic troll job of some very odd ways to think about baseball.

* Warning: what follows is not sports-related, and I don’t presume anyone reading this blog is here for anything other than my opinion on sports. The thoughts in this post would have been better expressed succinctly on a platform like Twitter, but I can no longer in good conscience use Twitter – it’s been two years since I tweeted regularly and a few months since I completely deleted my account.

Ostensibly, social media platforms like Twitter exist to facilitate free speech; in practice, regardless of whether it is/was the intent of their creators (and it now seems quite clear that it is the current intent of their owners), they serve as a mechanism for the suppression of free thought. It is their prerogative to do so (although I do not believe for one moment that this recognition of principle would be reciprocated), and it is my prerogative to refuse to use their service. I realize that I am writing this on a Google platform; the only thing I can say is that the value proposition of a free blogging platform makes getting in bed with the devil more attractive than getting in to participate in the cesspool that is Twitter circa 2020.