Walk Like a Sabermetrician: End of Season Statistics, 2020

At first I wasn’t sure if I would go through the trouble of putting together year-end statistics for a 60-game season. This reticence was due entirely to the small sample size, and not to any belief that the season itself was “illegitimate” or an “exhibition” or any of the other pejoratives that have been lobbed against it. On the contrary, I was thrilled that we got a season at all given the immense social and political pressure aligned against any attempt by human beings to engage in voluntary economic activity. 2020 has provided a glimpse into the near future, and it is indeed a boot stamping a human face forever.

But I digress. 60 games is a very small sample size when one is accustomed to 162, but the reason I decided to do this exercise anyway was simple: I wanted to better understand what happened in those 60 games. I did decide to cull some of the categories I usually look at, mostly to reduce the amount of effort necessary on my part to produce the statistics. Additionally, I had to make some changes to park factors and comparisons to league average which I will describe below.

It is common sabermetric practice to compare player performance to the league average. This is always a simplification of reality, and there are alternative options. Bill James argued in the original Historical Baseball Abstract that the true context for a player’s performance was his team’s games. This is true as it goes, and it conveniently allowed James to sidestep the issue of developing a complete set of historical park factors. On the other hand, we understand that the run environment for a given team is shaped by the quality of their hitters and pitchers. A batter on a team with excellent pitching will benefit in any kind of comparison to average from his teammate’s suppression of runs, but so would a player from another team were they to switch places. Since the usual goal of these exercises is to facilitate context-neutral (this is a loaded term, as there are many shades to a claim of context neutrality which I will not address here) comparisons between players, we typically decide that it is preferable to place the player’s performance in a wider league context, even if that context now includes the events of thousands of games in which the player in question did not participate.

We could split the difference, and one could argue that perhaps we should. We could build a custom “league context” for each team based on their schedule. Were the schedule perfectly balanced, this would not be necessary; alas, the general trend in MLB has been for schedules to become more unbalanced over time. We could, in a typical season, construct a context in which to evaluate Indians players in which 19/162 of the non-Indians portion consists of the Twins, and another 19/162 for each of the Royals, White Sox, and Tigers, and 6/162 for the Reds, and 7/162 for the Yankees, and 6/162 for the Orioles, etc. based on the number of games each team actually plays with each opponent. This is opposed to the league average, in which we simply compare to the entire AL, which implicitly assumes balance and ignores the existence of interleague games altogether.

I am not advocating for all of this complexity, and the approach I just sketched out is insufficiently refined to work in practice. The point I’m trying to make is that the league context is not exactly right, but it is a useful approximation, and with a semi-balanced schedule it makes sense.

When does it not make sense? In 2020, when there is not any semblance of a balanced schedule. The games played by teams in the AL Central bear no relation to those played by teams in the AL East or the AL West, because there are no games or opponents in common. To compare to the AL or NL average in 2020 is not a useful simplification – it is an outright abrogation of any attempt to accurately model the world as it is.

Thus I will be replacing the traditional AL/NL breakdown of the statistics with an East/Central/West breakdown. All comparisons to league average will compare to one of the three divisions, rather than the two leagues. Of course, 2/3 of any given team’s games were against their league brethren and only 1/3 against teams from the other circuit, so this is still a simplification of reality – but no more so than the use of league average in a normal season with an unbalanced schedule. In fact, in a way of looking at things which in my opinion is damning to the wildly unbalanced schedule used in a typical season, teams played close to the same share of their games against their intra-”league” opponents than they normally do (for this example, treating the NL Central as intraleague opponents):

Of course, there are problems associated with using the three divisions as leagues for the purpose of statistical comparisons. The big one is that we all know that league quality is not necessarily equal, or even close to equal, between the AL and NL; even less so as you divide the teams further into E/C/W, partly due to making the units even smaller. I ignore this when dealing with AL/NL; for instance, in ranking players by runs above average, I don’t try to account for league strength, and I’ve also ignored it here. This is a distortion – if we for the sake of argument assume that the Central is weaker than the East, a player who is +10 runs relative to the Central would be less valuable in a truly context-neutral comparison than one who is +10 runs relative to the East. This goes for comparisons to replacement level as well, as one of the assumptions of replacement level comparisons is that a replacement player can be obtained at zero marginal cost and thus is equally available to any team in MLB.

Usually I gloss over these simplifying assumptions without discussion; I wanted to call them out here because the non-traditional approach makes the simplifications more glaring. In short, I will pretend there are three leagues for all calculations that require the use of league average, but I will still group the reports into AL/NL.

The other thorny issue for 2020 is park factors. One complication is hinted at by the discussion above regarding balanced schedules; what happened in games at Yankee Stadium in 2020 is completely irrelevant to the park factor for Safeco Field. The set of parks that comprise road games for any team in 2020 is very different than that which fed their historical park factor calculations.

But you can also go crazy trying to be too precise on a question like this, so I have gone for what I consider to be a less than optimal but defensible approach. It is highly subjective, but it makes sense to me, and the whole purpose of these reports is for me to calculate the statistics that I want to see – if I wanted someone else’s ideal statistics, I could save a lot of effort and find that in other places.

The first step in estimating park factors was to choose a home/road runs ratio to use as a starting point for each park. In so doing, I completely threw out 2020 data except when it was absolutely necessary (the new park in Texas and the Buffalo park used by the Blue Jays). This was simply to avoid having to deal with not just a different number of “home” and “road” games by team, but also the issue of seven-inning games, artificially inflated runs totals from extra inning games, and the like. Hopefully these are just temporary issues, although it seems there is momentum for the scourge of the latter infecting the game on a permanent basis. Should they become permanent issues, I will think of solutions – but it’s not worth it for this 60 game look.

So what I’ve done is for each extant major league park, I’ve used up to four years of data from 2016-2019 (up to because some parks didn’t exist for the entire span) and calculated the ratio of home RPG to road RPG. Then I have regressed this ratio. The regression coefficients were based on an updated study I did of the slope of the linear regression equation using home/road RPG ratios from 1-5 years of data as the independent variable and the home/road RPG ratio for the subsequent season as the dependent variable. This is similar to how the regression coefficients I have used in the past were derived, but the previous ones were only rules of thumb rather than the actual coefficients, and were based on a study by MGL and not my own work. The results are broadly similar, although they give slightly less weight to 4-5 years of historical data. I will do a proper write-up of the study and provide the dataset at a later date, but for now it must suffice to say that I used those results to come up with an equation based on games which is .1364*ln(G/162) + .5866 (the rounded results I’m actually going to apply based on count of seasons is .59 for one year, .68 for two years, .74 for three years, and .78 for four years).

Let me walk through the park factor calculations for the Blue Jays. They played 26 games at home, which had a 11.35 RPG, and 34 on the road, which had a 9.38. So their initial home/road run ratio is 1.209, but over 60 games we only weight that at .1364*ln(60/162) + .5866 = .451 (of course, their actual ratio should be given even less credibility because the equation inherently assumes a 50/50 split of home and road games). So the run ratio we use as a starting point for their park factor is 1.209*.451 + 1.00*(1 - .451) = 1.094.

To finish it off for application to full season statistics, we need to take halve the difference between the calculated ratio and 1.0. Well, usually we halve it, since half of teams’ games are at home or close enough. But in 2020 there was enough weirdness in this regard (especially for the Blue Jays) that I used the team’s actual percentage of home games. In the Blue Jays case this is 26/60, so their final park factor is 1.094*26/60 + 1*34/60 = 1.04

There are a couple things I could have done to refine this a little, but when rounding to two decimal places they don’t have a huge impact. One would be to use something other than 1.00 as the “road” park factor, and another would be to ensure that the average for each of the E/C/W is 1.00, or doing the latter only as a shortcut. Since they round to 1.00 for each of the E/C/W when rounded to two places, that’s close enough for me. We also could have also used inning as a denominator rather than games, but it’s more work than I’m willing to put in for analyzing a sixty-game season.

All data was gathered from various pages on Baseball-Reference. The League spreadsheet is pretty straightforward--it includes league totals and averages for a number of categories, most or all of which are explained at appropriate junctures throughout this piece. The advent of interleague play has created two different sets of league totals--one for the offense of league teams and one for the defense of league teams. Before interleague play, these two were identical. I do not present both sets of totals (you can figure the defensive ones yourself from the team spreadsheet, if you desire), just those for the offenses. The exception is for the defense-specific statistics, like innings pitched and quality starts. The figures for those categories in the league report are for the defenses of the league's teams. However, I do include each league's breakdown of basic pitching stats between starters and relievers (denoted by "s" or "r" prefixes), and so summing those will yield the totals from the pitching side. I have broken this down by E/C/W and A/N/MLB.

I added a column this year for “ActO”, which is actual (rather than estimated) outs made by the team offensively. This can be determined from the official statistics as PA – R – LOB. I have then replaced the column I usually show for league R/G (“N”) with R/9, which is actually R*27/ActO, which is equivalent to R*9/IP. This restates the league run average in the more familiar per nine innings. I’ve done the same for “OG”, which is Outs/Game but only for those outs I count in the individual hitter’s stats (AB – H + CS) ,“PA/G”, which is normally just (AB + W)/G, and “KG” and “WG” (normally just K/G and W/G) – these are now “O/9”, “PA/9”, still “KG”/”WG” and are per 27 actual outs.

The Team spreadsheet focuses on overall team performance--wins, losses, runs scored, runs allowed. The columns included are: Park Factor (PF), Winning Percentage (W%), Expected W% (EW%), Predicted W% (PW%), wins, losses, runs, runs allowed, Runs Created (RC), Runs Created Allowed (RCA), Home Winning Percentage (HW%), Road Winning Percentage (RW%) [exactly what they sound like--W% at home and on the road], R/9, RA/9, Runs Created/9 (RC/9), Runs Created Allowed/9 (RCA/9), and Runs Per Game (the average number of runs scored an allowed per game). For the offensive categories, runs/9 are based on runs per 27 actual outs; for pitching categories, they are runs/9 innings.

I based EW% and PW% on R/9 and RA/9 (and RC/9 and RCA/9) rather than the actual runs totals. This means that what they are not estimating what a team’s winning percentage should have been in the actual game constructions that they played, but what they should have been if playing nine inning games but scoring/allowing runs at the same rate per inning. EW%, which is based on actual R and RA, is also polluted by inflated runs in extra inning games; PW%, which is based on RC and RCA, doesn’t suffer from this distortion.

The runs and Runs Created figures are unadjusted, but the per-game averages are park-adjusted, except for RPG which is also raw. Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. The formula is:

A = H + W - HR - CS

B = (2TB - H - 4HR + .05W + 1.5SB)*.76

C = AB - H

D = HR

Naturally, A*B/(B + C) + D.

In order to actually estimate runs scored accurately for 2020, one would need to use .83 as the B multiplier. When I first saw the discrepancy between actual and estimated runs, I was alarmed; the formula isn’t perfect, of course, but usually it doesn’t vary much from year to year. Then I realized that the biggest reason for this was likely the distortion caused by extra inning games. As such, I’ve kept with my standard formula, but we won’t be able to compare a player’s estimated runs directly to the league average. Keep in mind that any statistic based on actual runs is also contaminated. Should something like the current extra inning rule become a permanent fixture in MLB, it will be necessary to make adjustments to every metric that uses runs as an input. The extent to which the easy extra inning runs distort the statistics is something I did not fully grasp until actually sitting down and going through the year end stats exercise.

The easy runs are everywhere, and they cannot easily be removed - should the rule become permanent, I think the easiest solution will be to make “regulation” runs the starting point, and then tack on extra inning runs less the run expectancy for a man on second, nobody out times the number of extra innings. Even that is a poor solution, as it only exacerbates the distortions caused by the early termination of innings with walkoffs that Tom Tango noted some time ago. Since extra innings will be fewer under such a rule, a higher percentage of them will be walkoff-truncated than otherwise would be the case.

There are also Team Offense and Defense spreadsheets. These include the following categories:

Team offense: Plate Appearances, Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Walks and Hit Batters per At Bat (WAB), Isolated Power (SLG - BA), R/G at home (hR/G), and R/G on the road (rR/G) BA, OBA, SLG, WAB, and ISO are park-adjusted by dividing by the square root of park factor (or the equivalent; WAB = (OBA - BA)/(1 - OBA), ISO = SLG - BA, and SEC = WAB + ISO).

Team defense: Innings Pitched, BA, OBA, SLG, Innings per Start (IP/S), Starter's eRA (seRA), Reliever's eRA (reRA), Quality Start Percentage (QS%), RA/G at home (hRA/G), RA/G on the road (rRA/G), Battery Mishap Rate (BMR), Modified Fielding Average (mFA), and Defensive Efficiency Record (DER). BA, OBA, and SLG are park-adjusted by dividing by the square root of PF; seRA and reRA are divided by PF.

The three fielding metrics I've included are limited it only to metrics that a) I can calculate myself and b) are based on the basic available data, not specialized PBP data. The three metrics are explained in this post, but here are quick descriptions of each:

1) BMR--wild pitches and passed balls per 100 baserunners = (WP + PB)/(H + W - HR)*100

2) mFA--fielding average removing strikeouts and assists = (PO - K)/(PO - K + E)

3) DER--the Bill James classic, using only the PA-based estimate of plays made. Based on a suggestion by Terpsfan101, I've tweaked the error coefficient. Plays Made = PA - K - H - W - HR - HB - .64E and DER = PM/(PM + H - HR + .64E)

For the individual player reports, I only bothered with starting pitchers and batters, since the sample size for relief pitchers was minuscule. For starting pitchers, I included all pitchers with at least six starts. The categories presented are stripped down from prior years, and I included all in one spreadsheet rather than splitting up by league.

For starting pitchers, the columns are: Innings Pitched, Estimated Plate Appearances (PA), RA, ERA, eRA, dRA, KG, WG, G-F, %H, RAA, and RAR. RA and ERA you know--R*9/IP or ER*9/IP, park-adjusted by dividing by PF. The formulas for eRA and dRA are based on the same Base Runs equation and they estimate RA, not ERA.

* eRA is based on the actual results allowed by the pitcher (hits, doubles, home runs, walks, strikeouts, etc.). It is park-adjusted by dividing by PF.

* dRA is the classic DIPS-style RA, assuming that the pitcher allows a league average %H, and that his hits in play have a league-average S/D/T split. It is park-adjusted by dividing by PF.

The formula for eRA is:

A = H + W - HR

B = (2*TB - H - 4*HR + .05*W)*.78

C = AB - H = K + (3*IP - K)*x (where x is figured as described below for PA estimation and is typically around .93) = PA (from below) - H - W

eRA = (A*B/(B + C) + HR)*9/IP

To figure dRA, you first need the estimate of PA described below. Then you calculate W, K, and HR per PA (call these %W, %K, and %HR). Percentage of balls in play (BIP%) = 1 - %W - %K - %HR. This is used to calculate the DIPS-friendly estimate of %H (H per PA) as e%H = Lg%H*BIP%.

Now everything has a common denominator of PA, so we can plug into Base Runs:

A = e%H + %W

B = (2*(z*e%H + 4*%HR) - e%H - 5*%HR + .05*%W)*.78

C = 1 - e%H - %W - %HR

cRA = (A*B/(B + C) + %HR)/C*a

z is the league average of total bases per non-HR hit (TB - 4*HR)/(H - HR), and a is the league average of (AB - H) per game. I’ve used the MLB average for both this year, and have defined a as the league average of (AB – H) per 9 innings rather than per game.

Also shown are strikeout and walk rate, both expressed as per game. By game I mean not nine innings but rather the league average of PA/G (although for 2020 I’m actually using the major league average AB+W per 9 innings which was 37.9). I have always been a proponent of using PA and not IP as the denominator for non-run pitching rates, and now the use of per PA rates is widespread. Usually these are expressed as K/PA and W/PA, or equivalently, percentage of PA with a strikeout or walk. I don’t believe that any site publishes these as K and W per equivalent game as I am here. This is not better than K%--it’s simply applying a scalar multiplier. I like it because it generally follows the same scale as the familiar K/9.

To facilitate this, I’ve finally corrected a flaw in the formula I use to estimate plate appearances for pitchers. Previously, I’ve done it the lazy way by not splitting strikeouts out from other outs. I am now using this formula to estimate PA (where PA = AB + W):

PA = K + (3*IP - K)*x + H + W Where x = league average of (AB - H - K)/(3*IP – K), using one value for the entire majors

Then KG = K*Lg(PA/G) and WG = W*Lg(PA/G).

G-F is a junk stat, included here out of habit because I've been including it for years. It was intended to give a quick read of a pitcher's expected performance in the next season, based on eRA and strikeout rate. Although the numbers vaguely resemble RAs, it's actually unitless. As a rule of thumb, anything under four is pretty good for a starter. G-F = 4.46 + .095(eRA) - .113(K*9/IP). It is a junk stat. JUNK STAT JUNK STAT JUNK STAT. Got it?

%H is BABIP, more or less--%H = (H - HR)/(PA - HR - K - W), where PA was estimated above.

The two baselined stats are Runs Above Average (RAA) and Runs Above Replacement (RAR). Different baselines are used for starters and relievers, although these should be updated given the changes in usage patterns that have taken place since I implemented the adjustment in 2015. It was based on patterns from the previous (circa 2015) several seasons of league average starter and reliever eRA. Thus it does not adjust for any advantages relief pitchers enjoy that are not reflected in their component statistics. This could include runs allowed scoring rules that benefit relievers (although the use of RRA should help even the scales in this regard, at least compared to raw RA) and the talent advantage of starting pitchers. The RAR baselines do attempt to take the latter into account, and so the difference in starter and reliever RAR will be more stark than the difference in RAA.

For 2020, given the small sample sizes and the difficulties of looking at the league average of actual runs, I decided to use eRA to calculate the baselined metrics. So they are no longer based on actual runs allowed by the pitcher, but rather on the component statistics. Since I’m using eRA rather than dRA, I am considering the actual results on balls in play allowed by the pitcher rather than a DIPS approach. Also, the baseline used is based on the division average eRA, not the league average.

RAA (starters) = (1.025*Div(eRA) - eRA)*IP/9

RAR (starters) = (1.28*Div(eRA) – eRA)*IP/9

RAA and RAR are then rescaled so that the three divisions are all on a (roughly) equal win basis. This is a step I don’t normally take – I usually publish the figures expressed in units of runs, without any adjustment for win value. Normally, I intend for the statistics to be viewed in the context of the particular league-season in question, and thus no adjustment for the win value of runs is necessary. However, with three different run contexts being used in each league, it is necessary in 2020. Why did I not convert them to WAR?

1. I have always preferred, when possible, to leave things denominated in runs, if for no reason deeper than its easier to work without the decimal place; I’d rather write “37 RAR” than “3.7 WAR”. 2. In order to actively describe the figures as WAR, I would have to make what is not an altogether obvious decision about what the runs per game context used to determine runs per win actually is. It can’t be based on figures that include easy runs without any adjustment. For the purpose of these statistics, I did not want to try to figure out a corrected runs figure. That is something that will be necessary if the rule becomes permanent.

So what I did instead was simply take the same figure I used as the baseline for RAA or RAR, the eRA of the division as a whole, and divided the initial estimate of RAR by it, then multiplied by the major league average eRA. This puts all the divisions on an equal basis if one assumes 1) that the division/MLB average eRA is a reasonable substitute for whatever the true run scoring rate for the league should be (and I think it is a reasonable substitute) and 2) given the assumption in 1, it means the practical assumption we are making is that RPW = RPG (multiplying by 2 is lost in the wash), which is not the most accurate way to estimate RPW but is a very simple one that also has the neat property of corresponding to assuming a Pythagorean exponent of 2. It is not what I would do as a best practice, but I think it is an acceptable approximation under the circumstances. So the final formulas for RAA and RAR are:

RAA = (1.025*Div(eRA) - eRA)*IP/9/Div(eRA)*MLB(eRA)

RAR = (1.28*Div(eRA) – eRA)*IP/9/Div(eRA)*MLB(eRA)

All players with 100 or more plate appearances (official, total plate appearances) are included in the Hitters spreadsheets (along with some players close to the cutoff point who I was interested in). Each is assigned one position, the one at which they appeared in the most games. The statistics presented are: Games played (G), Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Runs Created (RC), Runs Created per Game (RG), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), and Runs Above Replacement (RAR).

PA is defined simply as AB + W+ HB. Outs are AB - H + CS. BA and SLG you know, but remember that without SF, OBA is just (H + W + HB)/(AB + W + HB). Secondary Average = (TB - H + W + HB)/AB = SLG - BA + (OBA - BA)/(1 - OBA). I have not included net steals as many people (and Bill James himself) do, but I have included HB which some do not.

BA, OBA, and SLG are park-adjusted by dividing by the square root of PF. This is an approximation, of course, but I'm satisfied that it works well. The goal here is to adjust for the win value of offensive events, not to quantify the exact park effect on the given rate. I use the BA/OBA/SLG-based formula to figure SEC, so it is park-adjusted as well.

Runs Created is actually Paul Johnson's ERP, more or less. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available.

RC = (TB + .8H + W + HB - .5IW + .7SB - CS - .3AB)*.310

RC is park adjusted by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the Runs Created per Game rate, is RC/O*25.5. I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).

The baselined stats are calculated in the same manner the pitcher stats are, except here using the division’s RG as the reference level, then with an adjustment to bring the three divisions onto the same win-value basis by using the ratio of division RG to MLB RG:

HRAA = (RG – divRG)*O/25.5/divRG*MLB(RG)

RAA = (RG – divRG*PADJ)*O/25.5/divRG*MLB(RG)

RAR = (RG – divRG*PADJ*.73)*O/25.5/divRG*MLB(RG)

PADJ is the position adjustment, based on 2010-2019 offensive data. For catchers it is .92; for 1B/DH, 1.14; for 2B, .99; for 3B, 1.07; for SS, .95; for LF/RF, 1.09; and for CF, 1.05. As positional flexibility takes hold, fielding value is better quantified, and the long-term evolution of the game continues, it's right to question whether offensive positional adjustments are even less reflective of what we are trying to account for than they were in the past. But while I do not claim that the relationship is or should be perfect, at the level of talent filtering that exists to select major leaguers, there should be an inverse relationship between offensive performance by position and the defensive responsibilities of the position. Not a perfect one, but a relationship nonetheless. An offensive positional adjustment than allows for a more objective approach to setting a position adjustment. Again, I have to clarify that I don’t think subjectivity in metric design is a bad thing - any metric, unless it’s simply expressing some fundamental baseball quantity or rate (e.g. “home runs” or “on base average”) is going to involve some subjectivity in design (e.g linear or multiplicative run estimator, any myriad of different ways to design park factors, whether to include a category like sacrifice flies that is more teammate-dependent).

The replacement levels I have used here are very much in line with the values used by other sabermetricians. This is based both on my own "research", my interpretation of other's people research, and a desire to not stray from consensus and make the values unhelpful to the majority of people who may encounter them.

Replacement level is certainly not settled science. There is always going to be room to disagree on what the baseline should be. Even if you agree it should be "replacement level", any estimate of where it should be set is just that--an estimate. Average is clean and fairly straightforward, even if its utility is questionable; replacement level is inherently messy. So I offer the average baseline as well.

For position players, replacement level is set at 73% of the positional average RG (since there's a history of discussing replacement level in terms of winning percentages, this is roughly equivalent to .350). For starting pitchers, it is set at 128% of the league average RA (.380), and for relievers it is set at 111% (.450).

The spreadsheets are published as Google Spreadsheets, which you can download in Excel format by changing the extension in the address from "=html" to "=xlsx", or in open format as "=ods". That way you can download them and manipulate things however you see fit.