Wednesday, November 04, 2020

Musings on Positional Adjustments

This is an old post that I never published. It is sort of an attempt to justify why I use offensive positional adjustments, which is an even more dated position today than it was when I wrote it. In re-reading it, though, I thought my comments about zero-level defense were at least somewhat pertinent (if not particularly insightful) given Bill James' current effort at developing "Runs Saved Against Zero".

This post is not intended to be a comprehensive discussion of the issue of position adjustments; it will just quickly sketch out a system to classify adjustments and then I’ll offer a few of my opinions on them. There is a lot more that could be said and most of it could and has been said more eloquently by others.

The most important technical distinction between position adjustments (which I’ll shorten to PADJ sometimes) is which type of metric is used to set them--offensive or defensive. This distinction is well-known and gets a lot of attention. One that is talked about less is the difference between explicit and implicit position adjustments, and while people who get their hands dirty with various rating systems are well aware of implicit position adjustments, the average reader presented with a metric might gloss over them.

Explicit position adjustments are obvious and are acknowledged as being position adjustments. The first well-known example of their usage was in Pete Palmer’s linear weights system. They have also been used in VORP, just about every implementation of WAR, and many other metrics.

Implicit position adjustments usually crop up in the work of Bill James, although there are other metrics out there that utilize them. An implicit position adjustment is not really implicit in the truest sense of the word--they are obviously position adjustments if you look at them and consider what their function is. James likes to hide them in his fielding systems.

James’ metrics have always attempted to measure absolute wins and losses. I’ve always maintained that this is a fool’s errand, and that absolute wins and losses only make sense on the team level, not the player level. Most sabermetricians are in general agreement on this, and construct systems that are built to yield results against some baseline.

This is especially true for defensive metrics, whether for pitching for fielding. Absolute metrics (such as runs created) are tempting to apply to individual batters because there is a theoretical minimum on the number of runs a player can create (zero, of course), and such a performance represents the worst possible performance. There is no such cap on the poor performance of a defense; a team could allow an infinite number of runs. The only real cap on the poor performance of an individual fielder is the number of balls that are hit to the locations on the field that fall under his responsibility.

As such, it is impossible to develop a true zero baseline metric to evaluate pitchers or fielders (one can certainly argue that it’s impossible for batters as well, but the existence of the theoretical floor makes it undeniably more tempting). You have to start by comparing to a non-zero baseline (average being the most straightforward), but the problem is compounded for fielders by the fact that it’s also impossible to directly compare fielders at different positions. The fielding standards, be it in fielding average, range factor, or more complex methods vary wildly from one position to another. While all fielders have the same objective (record outs and prevent the opponent from scoring), the primary ways in which fielders at different positions contribute to the common goal are very different.

That pretty much leaves comparing a player to the average fielder at his position as the only viable starting point for the developer of a fielding metric. As is, the results are not satisfactory for inclusion in a total value metric, because they implicitly assume that an average fielder at any position is equal in value to an average fielder at any other position.

There is no one with any degree of baseball knowledge that believes this to be true. Everyone agrees that an average shortstop is harder to find than an average first baseman--that is, the pool of available players that can adequately field shortstop is much smaller than the pool of adequate available first basemen. This basic truth is sometimes obfuscated by silly hypotheticals (i.e. “if you didn’t have a catcher, every pitch with a runner on base would be a passed ball” and “without a first baseman, it would be nearly impossible to convert a groundball into an out”), but serious people agree on this.

So what is one to do about this problem? You have to do something--you cannot have a functioning estimate of total value that pretends first basemen and shortstops are equal in fielding value. The easiest answer is a position adjustment.

While James attempts to express all of his value metrics relative to an absolute baseline, he of course can’t pull off a clean implementation. His solution, in both his early 1980s Defensive Winning Percentage and his more recent Win Shares, is to develop a fielding winning percentage for each position and convert this to wins and losses (the terminology and procedure is a little different in Win Shares but that’s a long story).

To make the conversion to from a rate to a total of wins and losses, James multiplies by a number of games for which each position is assumed to be responsibility. Positions on the left side of the defensive spectrum are assigned less responsibility than those on the right side…and thus this is an implicit position adjustment.

In pointing this out, I don’t mean to suggest that James is in any way dishonest in describing his systems--the assigned games are clearly defined in the system and aren’t hidden. The characterization I’ve offered of these adjustments as “implicit” is therefore not really accurate. The real difference between James-style position adjustments and the ones I’ve defined as “explicit” is that explicit adjustments either add or subtract a set number of runs dependent upon a player’s position or apply a different baseline to their rate in converting to a value stat.

The other major characteristic that defines a position adjustment’s type is whether it is an offensive PADJ or a defensive PADJ. The categories are not black and white--many positional adjustments incorporate subjective weighting of various factors, which could include offensive performance by players at a position, the range of offensive performance, the performance of fielders at multiple positions, comparisons of average salary as a stand-in for how teams value players, subjective corrections that the developer feels better matches the results of the system to common sense--but usually the primary basis can be identified as either offensive or defensive.

Offensive position adjustments have fallen out of favor recently, although there are still some people using them (including me). The offensive PADJ originated with Pete Palmer, who used it as part of his linear weights system. The other most prominent use came in Keith Woolner’s VORP.

Defensive positional adjustments are a more recent phenomenon, but are key to both the Fangraphs and Chone WAR methodology. Tango Tiger was the driving force behind their development, and Chone has also done his own research to establish the adjustments for his WAR.

Before deciding how to construct a position adjustment, it’s a good idea to take a step back and figure out why you need a position adjustment at all. Taking the reason behind your metric for granted is a path to just slapping numbers around indiscriminately and failing to model baseball reality. From my perspective, the only real reason that a PADJ is necessary is that it is essentially impossible to measure a player’s fielding value independent of his position. Therefore, one has to have a way of comparing the value of fielding performances across positions--a position adjustment.

A common misperception regarding all position adjustments among people not that well-versed in sabermetrics is that they provide a bonus “just for playing the position”. While I suppose that might be technically true in the sense of calculation, the underlying need for such an adjustment is discussed above. If one does not believe in applying a positional adjustment, and accepts the use of defensive metrics baselined to an average fielder at the position, then they must conclude that, as a group, the most valuable players are those at left-side of the spectrum positions. Or, in other words, that the overall average value of players at a given position is strictly a function of their aggregate offensive production.

It is possible to complicate the question of position adjustments by talking about baselines (particularly replacement level) and other considerations, but at the heart of the issue is the need to compare the value of a shortstop -5 fielding runs relative to an average shortstop to a third baseman +10 runs relative to an average third baseman to an average first baseman.

Such a viewpoint suggests that a defensive PADJ is the way to go, since the sole reason for needing the adjustment is consideration of defense. So while the overwhelming positive of a defensive PADJ is that it is defined in the terms that necessitate the entire endeavor, it also carries a few negatives.

One is the difficulty of accurately evaluating fielding value, even within the constraints of one’s own position. While it is quite possible that any biases or methodological errors will balance out when aggregated over a large population of players, it would nonetheless be more comforting to begin from metrics in which one had a great deal of confidence.

Another key issue is that the pool of players who log times at multiple positions, while relatively large when comparing similar position groups (particularly outfielders, but also middle infielders, corner infielders, etc.), there is a much smaller available sample of players who play very different positions, at least in the same or adjacent seasons. And catcher? Forget about it--Bill James even left catcher off of the defensive spectrum due to the difficulty of comparing it directly to the positions whose occupants stand in fair territory.

Players that move positions introduce all kinds of selective sampling issues as well. Consider the problem comparing positions where left-handers are de facto ineligible, and the fact that players moved off a position are more likely to have been stopgaps. For a more complete discussion of these issues (and an all-around good discussion of PADJ issues), see Colin Wyers’ article at Baseball Prospectus.

Thus, to avoid strange conclusions, defensive position adjustments are always going to require a little subjective massaging. That’s not necessarily a bad thing--the construction of any metric requires subjective decisions made on the part of the developer--but it makes them inherently high maintenance.

Of course, offensive position adjustments are best employed with a measure of caution as well. The pros of offensive adjustments are that they are very easy to work with. Offensive statistics are more reliable than fielding statistics, require much more basic data to calculate, and are available throughout the entire history of the game. Rather than having to compare performance of players across positions, one can at least start by simply looking at the average performance of all players at a particular position.

An offensive PADJ implicitly assumes that teams allocate their talent in such a manner that the average player at any position is equal to the average player at any other position--alternatively stated, that the offensive value gap between positional averages is equal to the defensive value gap. This is certainly never 100% truly the case for any sample period, particularly for single years. Offensive PADJs based on one year of data or other short stretches should be viewed with a great deal of skepticism.

Another problem lurking is what Wyers, in the linked article, refers to as the “Mays problem”--the existence of supremely talented players that excel at both hitting and fielding. Such players might be superstars at any position (ignoring handedness and other impediments), even first base, thanks to their hitting alone but are able to handle the defensive rigors of right-side defensive spectrum positions. While more ordinary players offer a package of offensive and defensive skills that limits their possible fielding positions commensurate to their offensive production, these players are playable anywhere. There are also potential issues with the very worst players at a position.

The Mays problem skews offensive positional averages, so Wyers proposes using an alternative offensive PADJ that adjusts the overall positional average for the gap between the upper and lower median of observed performance at the position. This approach (and other similar algorithms that could be offered) is novel but involves subjective choices similar to those necessitated by defensive PADJs.

The offensive PADJ will surely fail at lower levels of baseball thanks to the Mays problem--the best high school players, for instance, are often the cleanup hitter, ace pitcher, and center fielder or shortstop when not pitching. Such all-around stars are also more common in college ball or in earlier, less developed major leagues than they are in the modern major leagues with their high overall quality of play and relatively strong competitive balance. An offensive PADJ approach will surely break down at those low levels without serious alterations.

There are other relevant issues to discuss with respect to position adjustments, such as their relationship to replacement level and the manner in which they are applied (That is, if they should be used to change the baseline to which performance is compared or if they should be assigned as a lump sum based on playing time. The possible answers to this question is also closely tied to one’s choice of offensive or defensive adjustment), but those will have to wait for some other time.

Tuesday, October 20, 2020

End of Season Statistics, 2020

At first I wasn’t sure if I would go through the trouble of putting together year-end statistics for a 60-game season. This reticence was due entirely to the small sample size, and not to any belief that the season itself was “illegitimate” or an “exhibition” or any of the other pejoratives that have been lobbed against it. On the contrary, I was thrilled that we got a season at all given the immense social and political pressure aligned against any attempt by human beings to engage in voluntary economic activity. 2020 has provided a glimpse into the near future, and it is indeed a boot stamping a human face forever.

But I digress. 60 games is a very small sample size when one is accustomed to 162, but the reason I decided to do this exercise anyway was simple: I wanted to better understand what happened in those 60 games. I did decide to cull some of the categories I usually look at, mostly to reduce the amount of effort necessary on my part to produce the statistics. Additionally, I had to make some changes to park factors and comparisons to league average which I will describe below.

It is common sabermetric practice to compare player performance to the league average. This is always a simplification of reality, and there are alternative options. Bill James argued in the original Historical Baseball Abstract that the true context for a player’s performance was his team’s games. This is true as it goes, and it conveniently allowed James to sidestep the issue of developing a complete set of historical park factors. On the other hand, we understand that the run environment for a given team is shaped by the quality of their hitters and pitchers. A batter on a team with excellent pitching will benefit in any kind of comparison to average from his teammate’s suppression of runs, but so would a player from another team were they to switch places. Since the usual goal of these exercises is to facilitate context-neutral (this is a loaded term, as there are many shades to a claim of context neutrality which I will not address here) comparisons between players, we typically decide that it is preferable to place the player’s performance in a wider league context, even if that context now includes the events of thousands of games in which the player in question did not participate.

We could split the difference, and one could argue that perhaps we should. We could build a custom “league context” for each team based on their schedule. Were the schedule perfectly balanced, this would not be necessary; alas, the general trend in MLB has been for schedules to become more unbalanced over time. We could, in a typical season, construct a context in which to evaluate Indians players in which 19/162 of the non-Indians portion consists of the Twins, and another 19/162 for each of the Royals, White Sox, and Tigers, and 6/162 for the Reds, and 7/162 for the Yankees, and 6/162 for the Orioles, etc. based on the number of games each team actually plays with each opponent. This is opposed to the league average, in which we simply compare to the entire AL, which implicitly assumes balance and ignores the existence of interleague games altogether.

I am not advocating for all of this complexity, and the approach I just sketched out is insufficiently refined to work in practice. The point I’m trying to make is that the league context is not exactly right, but it is a useful approximation, and with a semi-balanced schedule it makes sense.

When does it not make sense? In 2020, when there is not any semblance of a balanced schedule. The games played by teams in the AL Central bear no relation to those played by teams in the AL East or the AL West, because there are no games or opponents in common. To compare to the AL or NL average in 2020 is not a useful simplification – it is an outright abrogation of any attempt to accurately model the world as it is.

Thus I will be replacing the traditional AL/NL breakdown of the statistics with an East/Central/West breakdown. All comparisons to league average will compare to one of the three divisions, rather than the two leagues. Of course, 2/3 of any given team’s games were against their league brethren and only 1/3 against teams from the other circuit, so this is still a simplification of reality – but no more so than the use of league average in a normal season with an unbalanced schedule. In fact, in a way of looking at things which in my opinion is damning to the wildly unbalanced schedule used in a typical season, teams played close to the same share of their games against their intra-”league” opponents than they normally do (for this example, treating the NL Central as intraleague opponents):


Of course, there are problems associated with using the three divisions as leagues for the purpose of statistical comparisons. The big one is that we all know that league quality is not necessarily equal, or even close to equal, between the AL and NL; even less so as you divide the teams further into E/C/W, partly due to making the units even smaller. I ignore this when dealing with AL/NL; for instance, in ranking players by runs above average, I don’t try to account for league strength, and I’ve also ignored it here. This is a distortion – if we for the sake of argument assume that the Central is weaker than the East, a player who is +10 runs relative to the Central would be less valuable in a truly context-neutral comparison than one who is +10 runs relative to the East. This goes for comparisons to replacement level as well, as one of the assumptions of replacement level comparisons is that a replacement player can be obtained at zero marginal cost and thus is equally available to any team in MLB.

Usually I gloss over these simplifying assumptions without discussion; I wanted to call them out here because the non-traditional approach makes the simplifications more glaring. In short, I will pretend there are three leagues for all calculations that require the use of league average, but I will still group the reports into AL/NL.

The other thorny issue for 2020 is park factors. One complication is hinted at by the discussion above regarding balanced schedules; what happened in games at Yankee Stadium in 2020 is completely irrelevant to the park factor for Safeco Field. The set of parks that comprise road games for any team in 2020 is very different than that which fed their historical park factor calculations.

But you can also go crazy trying to be too precise on a question like this, so I have gone for what I consider to be a less than optimal but defensible approach. It is highly subjective, but it makes sense to me, and the whole purpose of these reports is for me to calculate the statistics that I want to see – if I wanted someone else’s ideal statistics, I could save a lot of effort and find that in other places.

The first step in estimating park factors was to choose a home/road runs ratio to use as a starting point for each park. In so doing, I completely threw out 2020 data except when it was absolutely necessary (the new park in Texas and the Buffalo park used by the Blue Jays). This was simply to avoid having to deal with not just a different number of “home” and “road” games by team, but also the issue of seven-inning games, artificially inflated runs totals from extra inning games, and the like. Hopefully these are just temporary issues, although it seems there is momentum for the scourge of the latter infecting the game on a permanent basis. Should they become permanent issues, I will think of solutions – but it’s not worth it for this 60 game look.

So what I’ve done is for each extant major league park, I’ve used up to four years of data from 2016-2019 (up to because some parks didn’t exist for the entire span) and calculated the ratio of home RPG to road RPG. Then I have regressed this ratio. The regression coefficients were based on an updated study I did of the slope of the linear regression equation using home/road RPG ratios from 1-5 years of data as the independent variable and the home/road RPG ratio for the subsequent season as the dependent variable. This is similar to how the regression coefficients I have used in the past were derived, but the previous ones were only rules of thumb rather than the actual coefficients, and were based on a study by MGL and not my own work. The results are broadly similar, although they give slightly less weight to 4-5 years of historical data. I will do a proper write-up of the study and provide the dataset at a later date, but for now it must suffice to say that I used those results to come up with an equation based on games which is .1364*ln(G/162) + .5866 (the rounded results I’m actually going to apply based on count of seasons is .59 for one year, .68 for two years, .74 for three years, and .78 for four years).

Let me walk through the park factor calculations for the Blue Jays. They played 26 games at home, which had a 11.35 RPG, and 34 on the road, which had a 9.38. So their initial home/road run ratio is 1.209, but over 60 games we only weight that at .1364*ln(60/162) + .5866 = .451 (of course, their actual ratio should be given even less credibility because the equation inherently assumes a 50/50 split of home and road games). So the run ratio we use as a starting point for their park factor is 1.209*.451 + 1.00*(1 - .451) = 1.094.

To finish it off for application to full season statistics, we need to take halve the difference between the calculated ratio and 1.0. Well, usually we halve it, since half of teams’ games are at home or close enough. But in 2020 there was enough weirdness in this regard (especially for the Blue Jays) that I used the team’s actual percentage of home games. In the Blue Jays case this is 26/60, so their final park factor is 1.094*26/60 + 1*34/60 = 1.04

There are a couple things I could have done to refine this a little, but when rounding to two decimal places they don’t have a huge impact. One would be to use something other than 1.00 as the “road” park factor, and another would be to ensure that the average for each of the E/C/W is 1.00, or doing the latter only as a shortcut. Since they round to 1.00 for each of the E/C/W when rounded to two places, that’s close enough for me. We also could have also used inning as a denominator rather than games, but it’s more work than I’m willing to put in for analyzing a sixty-game season.

All data was gathered from various pages on Baseball-Reference. The League spreadsheet is pretty straightforward--it includes league totals and averages for a number of categories, most or all of which are explained at appropriate junctures throughout this piece. The advent of interleague play has created two different sets of league totals--one for the offense of league teams and one for the defense of league teams. Before interleague play, these two were identical. I do not present both sets of totals (you can figure the defensive ones yourself from the team spreadsheet, if you desire), just those for the offenses. The exception is for the defense-specific statistics, like innings pitched and quality starts. The figures for those categories in the league report are for the defenses of the league's teams. However, I do include each league's breakdown of basic pitching stats between starters and relievers (denoted by "s" or "r" prefixes), and so summing those will yield the totals from the pitching side. I have broken this down by E/C/W and A/N/MLB.

I added a column this year for “ActO”, which is actual (rather than estimated) outs made by the team offensively. This can be determined from the official statistics as PA – R – LOB. I have then replaced the column I usually show for league R/G (“N”) with R/9, which is actually R*27/ActO, which is equivalent to R*9/IP. This restates the league run average in the more familiar per nine innings. I’ve done the same for “OG”, which is Outs/Game but only for those outs I count in the individual hitter’s stats (AB – H + CS) ,“PA/G”, which is normally just (AB + W)/G, and “KG” and “WG” (normally just K/G and W/G) – these are now “O/9”, “PA/9”, still “KG”/”WG” and are per 27 actual outs.

The Team spreadsheet focuses on overall team performance--wins, losses, runs scored, runs allowed. The columns included are: Park Factor (PF), Winning Percentage (W%), Expected W% (EW%), Predicted W% (PW%), wins, losses, runs, runs allowed, Runs Created (RC), Runs Created Allowed (RCA), Home Winning Percentage (HW%), Road Winning Percentage (RW%) [exactly what they sound like--W% at home and on the road], R/9, RA/9, Runs Created/9 (RC/9), Runs Created Allowed/9 (RCA/9), and Runs Per Game (the average number of runs scored an allowed per game). For the offensive categories, runs/9 are based on runs per 27 actual outs; for pitching categories, they are runs/9 innings.

I based EW% and PW% on R/9 and RA/9 (and RC/9 and RCA/9) rather than the actual runs totals. This means that what they are not estimating what a team’s winning percentage should have been in the actual game constructions that they played, but what they should have been if playing nine inning games but scoring/allowing runs at the same rate per inning. EW%, which is based on actual R and RA, is also polluted by inflated runs in extra inning games; PW%, which is based on RC and RCA, doesn’t suffer from this distortion.

The runs and Runs Created figures are unadjusted, but the per-game averages are park-adjusted, except for RPG which is also raw. Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. The formula is:

A = H + W - HR - CS

B = (2TB - H - 4HR + .05W + 1.5SB)*.76

C = AB - H

D = HR

Naturally, A*B/(B + C) + D.

In order to actually estimate runs scored accurately for 2020, one would need to use .83 as the B multiplier. When I first saw the discrepancy between actual and estimated runs, I was alarmed; the formula isn’t perfect, of course, but usually it doesn’t vary much from year to year. Then I realized that the biggest reason for this was likely the distortion caused by extra inning games. As such, I’ve kept with my standard formula, but we won’t be able to compare a player’s estimated runs directly to the league average. Keep in mind that any statistic based on actual runs is also contaminated. Should something like the current extra inning rule become a permanent fixture in MLB, it will be necessary to make adjustments to every metric that uses runs as an input. The extent to which the easy extra inning runs distort the statistics is something I did not fully grasp until actually sitting down and going through the year end stats exercise.

The easy runs are everywhere, and they cannot easily be removed - should the rule become permanent, I think the easiest solution will be to make “regulation” runs the starting point, and then tack on extra inning runs less the run expectancy for a man on second, nobody out times the number of extra innings. Even that is a poor solution, as it only exacerbates the distortions caused by the early termination of innings with walkoffs that Tom Tango noted some time ago. Since extra innings will be fewer under such a rule, a higher percentage of them will be walkoff-truncated than otherwise would be the case.

There are also Team Offense and Defense spreadsheets. These include the following categories:

Team offense: Plate Appearances, Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Walks and Hit Batters per At Bat (WAB), Isolated Power (SLG - BA), R/G at home (hR/G), and R/G on the road (rR/G) BA, OBA, SLG, WAB, and ISO are park-adjusted by dividing by the square root of park factor (or the equivalent; WAB = (OBA - BA)/(1 - OBA), ISO = SLG - BA, and SEC = WAB + ISO).

Team defense: Innings Pitched, BA, OBA, SLG, Innings per Start (IP/S), Starter's eRA (seRA), Reliever's eRA (reRA), Quality Start Percentage (QS%), RA/G at home (hRA/G), RA/G on the road (rRA/G), Battery Mishap Rate (BMR), Modified Fielding Average (mFA), and Defensive Efficiency Record (DER). BA, OBA, and SLG are park-adjusted by dividing by the square root of PF; seRA and reRA are divided by PF.

The three fielding metrics I've included are limited it only to metrics that a) I can calculate myself and b) are based on the basic available data, not specialized PBP data. The three metrics are explained in this post, but here are quick descriptions of each:

1) BMR--wild pitches and passed balls per 100 baserunners = (WP + PB)/(H + W - HR)*100

2) mFA--fielding average removing strikeouts and assists = (PO - K)/(PO - K + E)

3) DER--the Bill James classic, using only the PA-based estimate of plays made. Based on a suggestion by Terpsfan101, I've tweaked the error coefficient. Plays Made = PA - K - H - W - HR - HB - .64E and DER = PM/(PM + H - HR + .64E)

For the individual player reports, I only bothered with starting pitchers and batters, since the sample size for relief pitchers was minuscule. For starting pitchers, I included all pitchers with at least six starts. The categories presented are stripped down from prior years, and I included all in one spreadsheet rather than splitting up by league.

For starting pitchers, the columns are: Innings Pitched, Estimated Plate Appearances (PA), RA, ERA, eRA, dRA, KG, WG, G-F, %H, RAA, and RAR. RA and ERA you know--R*9/IP or ER*9/IP, park-adjusted by dividing by PF. The formulas for eRA and dRA are based on the same Base Runs equation and they estimate RA, not ERA.

* eRA is based on the actual results allowed by the pitcher (hits, doubles, home runs, walks, strikeouts, etc.). It is park-adjusted by dividing by PF.

* dRA is the classic DIPS-style RA, assuming that the pitcher allows a league average %H, and that his hits in play have a league-average S/D/T split. It is park-adjusted by dividing by PF.

The formula for eRA is:

A = H + W - HR

B = (2*TB - H - 4*HR + .05*W)*.78

C = AB - H = K + (3*IP - K)*x (where x is figured as described below for PA estimation and is typically around .93) = PA (from below) - H - W

eRA = (A*B/(B + C) + HR)*9/IP

To figure dRA, you first need the estimate of PA described below. Then you calculate W, K, and HR per PA (call these %W, %K, and %HR). Percentage of balls in play (BIP%) = 1 - %W - %K - %HR. This is used to calculate the DIPS-friendly estimate of %H (H per PA) as e%H = Lg%H*BIP%.

Now everything has a common denominator of PA, so we can plug into Base Runs:

A = e%H + %W

B = (2*(z*e%H + 4*%HR) - e%H - 5*%HR + .05*%W)*.78

C = 1 - e%H - %W - %HR

cRA = (A*B/(B + C) + %HR)/C*a

z is the league average of total bases per non-HR hit (TB - 4*HR)/(H - HR), and a is the league average of (AB - H) per game. I’ve used the MLB average for both this year, and have defined a as the league average of (AB – H) per 9 innings rather than per game.

Also shown are strikeout and walk rate, both expressed as per game. By game I mean not nine innings but rather the league average of PA/G (although for 2020 I’m actually using the major league average AB+W per 9 innings which was 37.9). I have always been a proponent of using PA and not IP as the denominator for non-run pitching rates, and now the use of per PA rates is widespread. Usually these are expressed as K/PA and W/PA, or equivalently, percentage of PA with a strikeout or walk. I don’t believe that any site publishes these as K and W per equivalent game as I am here. This is not better than K%--it’s simply applying a scalar multiplier. I like it because it generally follows the same scale as the familiar K/9.

To facilitate this, I’ve finally corrected a flaw in the formula I use to estimate plate appearances for pitchers. Previously, I’ve done it the lazy way by not splitting strikeouts out from other outs. I am now using this formula to estimate PA (where PA = AB + W):

PA = K + (3*IP - K)*x + H + W Where x = league average of (AB - H - K)/(3*IP – K), using one value for the entire majors

Then KG = K*Lg(PA/G) and WG = W*Lg(PA/G).

G-F is a junk stat, included here out of habit because I've been including it for years. It was intended to give a quick read of a pitcher's expected performance in the next season, based on eRA and strikeout rate. Although the numbers vaguely resemble RAs, it's actually unitless. As a rule of thumb, anything under four is pretty good for a starter. G-F = 4.46 + .095(eRA) - .113(K*9/IP). It is a junk stat. JUNK STAT JUNK STAT JUNK STAT. Got it?

%H is BABIP, more or less--%H = (H - HR)/(PA - HR - K - W), where PA was estimated above.

The two baselined stats are Runs Above Average (RAA) and Runs Above Replacement (RAR). Different baselines are used for starters and relievers, although these should be updated given the changes in usage patterns that have taken place since I implemented the adjustment in 2015. It was based on patterns from the previous (circa 2015) several seasons of league average starter and reliever eRA. Thus it does not adjust for any advantages relief pitchers enjoy that are not reflected in their component statistics. This could include runs allowed scoring rules that benefit relievers (although the use of RRA should help even the scales in this regard, at least compared to raw RA) and the talent advantage of starting pitchers. The RAR baselines do attempt to take the latter into account, and so the difference in starter and reliever RAR will be more stark than the difference in RAA.

For 2020, given the small sample sizes and the difficulties of looking at the league average of actual runs, I decided to use eRA to calculate the baselined metrics. So they are no longer based on actual runs allowed by the pitcher, but rather on the component statistics. Since I’m using eRA rather than dRA, I am considering the actual results on balls in play allowed by the pitcher rather than a DIPS approach. Also, the baseline used is based on the division average eRA, not the league average.

RAA (starters) = (1.025*Div(eRA) - eRA)*IP/9

RAR (starters) = (1.28*Div(eRA) – eRA)*IP/9

RAA and RAR are then rescaled so that the three divisions are all on a (roughly) equal win basis. This is a step I don’t normally take – I usually publish the figures expressed in units of runs, without any adjustment for win value. Normally, I intend for the statistics to be viewed in the context of the particular league-season in question, and thus no adjustment for the win value of runs is necessary. However, with three different run contexts being used in each league, it is necessary in 2020. Why did I not convert them to WAR?

1. I have always preferred, when possible, to leave things denominated in runs, if for no reason deeper than its easier to work without the decimal place; I’d rather write “37 RAR” than “3.7 WAR”. 2. In order to actively describe the figures as WAR, I would have to make what is not an altogether obvious decision about what the runs per game context used to determine runs per win actually is. It can’t be based on figures that include easy runs without any adjustment. For the purpose of these statistics, I did not want to try to figure out a corrected runs figure. That is something that will be necessary if the rule becomes permanent.

So what I did instead was simply take the same figure I used as the baseline for RAA or RAR, the eRA of the division as a whole, and divided the initial estimate of RAR by it, then multiplied by the major league average eRA. This puts all the divisions on an equal basis if one assumes 1) that the division/MLB average eRA is a reasonable substitute for whatever the true run scoring rate for the league should be (and I think it is a reasonable substitute) and 2) given the assumption in 1, it means the practical assumption we are making is that RPW = RPG (multiplying by 2 is lost in the wash), which is not the most accurate way to estimate RPW but is a very simple one that also has the neat property of corresponding to assuming a Pythagorean exponent of 2. It is not what I would do as a best practice, but I think it is an acceptable approximation under the circumstances. So the final formulas for RAA and RAR are:

RAA = (1.025*Div(eRA) - eRA)*IP/9/Div(eRA)*MLB(eRA)

RAR = (1.28*Div(eRA) – eRA)*IP/9/Div(eRA)*MLB(eRA)

All players with 100 or more plate appearances (official, total plate appearances) are included in the Hitters spreadsheets (along with some players close to the cutoff point who I was interested in). Each is assigned one position, the one at which they appeared in the most games. The statistics presented are: Games played (G), Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Runs Created (RC), Runs Created per Game (RG), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), and Runs Above Replacement (RAR).

PA is defined simply as AB + W+ HB. Outs are AB - H + CS. BA and SLG you know, but remember that without SF, OBA is just (H + W + HB)/(AB + W + HB). Secondary Average = (TB - H + W + HB)/AB = SLG - BA + (OBA - BA)/(1 - OBA). I have not included net steals as many people (and Bill James himself) do, but I have included HB which some do not.

BA, OBA, and SLG are park-adjusted by dividing by the square root of PF. This is an approximation, of course, but I'm satisfied that it works well. The goal here is to adjust for the win value of offensive events, not to quantify the exact park effect on the given rate. I use the BA/OBA/SLG-based formula to figure SEC, so it is park-adjusted as well.

Runs Created is actually Paul Johnson's ERP, more or less. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available.

RC = (TB + .8H + W + HB - .5IW + .7SB - CS - .3AB)*.310

RC is park adjusted by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the Runs Created per Game rate, is RC/O*25.5. I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).

The baselined stats are calculated in the same manner the pitcher stats are, except here using the division’s RG as the reference level, then with an adjustment to bring the three divisions onto the same win-value basis by using the ratio of division RG to MLB RG:

HRAA = (RG – divRG)*O/25.5/divRG*MLB(RG)

RAA = (RG – divRG*PADJ)*O/25.5/divRG*MLB(RG)

RAR = (RG – divRG*PADJ*.73)*O/25.5/divRG*MLB(RG)

PADJ is the position adjustment, based on 2010-2019 offensive data. For catchers it is .92; for 1B/DH, 1.14; for 2B, .99; for 3B, 1.07; for SS, .95; for LF/RF, 1.09; and for CF, 1.05. As positional flexibility takes hold, fielding value is better quantified, and the long-term evolution of the game continues, it's right to question whether offensive positional adjustments are even less reflective of what we are trying to account for than they were in the past. But while I do not claim that the relationship is or should be perfect, at the level of talent filtering that exists to select major leaguers, there should be an inverse relationship between offensive performance by position and the defensive responsibilities of the position. Not a perfect one, but a relationship nonetheless. An offensive positional adjustment than allows for a more objective approach to setting a position adjustment. Again, I have to clarify that I don’t think subjectivity in metric design is a bad thing - any metric, unless it’s simply expressing some fundamental baseball quantity or rate (e.g. “home runs” or “on base average”) is going to involve some subjectivity in design (e.g linear or multiplicative run estimator, any myriad of different ways to design park factors, whether to include a category like sacrifice flies that is more teammate-dependent).

The replacement levels I have used here are very much in line with the values used by other sabermetricians. This is based both on my own "research", my interpretation of other's people research, and a desire to not stray from consensus and make the values unhelpful to the majority of people who may encounter them.

Replacement level is certainly not settled science. There is always going to be room to disagree on what the baseline should be. Even if you agree it should be "replacement level", any estimate of where it should be set is just that--an estimate. Average is clean and fairly straightforward, even if its utility is questionable; replacement level is inherently messy. So I offer the average baseline as well.

For position players, replacement level is set at 73% of the positional average RG (since there's a history of discussing replacement level in terms of winning percentages, this is roughly equivalent to .350). For starting pitchers, it is set at 128% of the league average RA (.380), and for relievers it is set at 111% (.450).

The spreadsheets are published as Google Spreadsheets, which you can download in Excel format by changing the extension in the address from "=html" to "=xlsx", or in open format as "=ods". That way you can download them and manipulate things however you see fit.

2020 League

2020 Park Factors

2020 Teams

2020 Team Defense

2020 Team Offense

2020 Starters

2020 Hitters

Sunday, October 18, 2020

In Defense of the Astros

I am glad to be writing this post as an irrelevant digression rather than a timely attempt to wrap one’s mind around the possibility of a team with a 29-31 regular season record winning the World Series. My standard outlook when watching the playoffs is to pull for the better team. This is not an ironclad rule, as I do allow my various prejudices (fandom, a favorite player, a least favorite player, style of play, etc.) to override the general principle. I also do this despite being unable to be truly surprised by any outcome of a seven-game series between any two legitimate major league baseball teams. That it does not at all surprise me when inferior teams win series against superior teams, and while it can occasionally be fun to see this happen just to see the egg on the face of people who wildly overestimate the probability of the better team winning (for me, the 2010-11 Phillies were the archetype of this phenomenon), I generally find it disappointing.

After Houston tied the series at three, though, and as I read some online reactions to the possibility of them appearing in the World Series, I decided it would be worthwhile to explain at length why I do not feel that they would be an illegitimate champion should it come to pass. The fact that it did not come to pass makes this discussion academic, but look at where you are. One thing that should be said upfront is that given the emotion generated by the Astros, it is impossible to disentangle how much of the backlash to the possibility of them winning was rooted in genuine concern about the legitimacy of the outcome of the MLB season (concerns that, I have argued, would have been much better channeled towards the possibility of Miami advancing deep into the playoffs), and how much is simply rationalization for the often disproportionate reaction to the sign stealing scandal. There are also people who are looking for any reason to discount the 2020 season, whether out of allowing the perfect to be the enemy of the good (I just cannot relate to people who would rather see no season than 60 games; neither can the bank accounts of the players), or out of darker authoritarian and totalitarian impulses. Arguments borne from there are outside the scope of this piece – this is about the implications of a 29-31 team appearing in or winning the World Series.

The first question that should be asked is “How good are the Astros, really?” My answer is – almost certainly better than 29-31. Why do I say that?

1. The Astros underlying performance was better than their win-loss record

The Astros scored 4.80 and allowed 4.72 runs per nine innings, which in a world of nine-inning games would be expected to result in a winning percentage of .508. The Astros tallied 4.58 runs created and 4.50 runs created allowed per nine innings, which in that same world would also be expected to be allowed in a winning percentage of .508.

Of course, this is not significantly or in any way meaningfully different than their actual W% of .483, but if you have a psychological hangup on a sub-.500 team winning a pennant, their underlying performance was ever so slightly better than that of a .500 team, and to the extent that these tiny differences mean anything, they should mean more the smaller the sample size of games actually is.

2. The Astros win-loss record would likely have been over .500, even at their same level of performance, had a normal schedule been played

Last year, the AL West was the best division in MLB, with an average schedule-adjusted W% of .527 (of course, the Astros as the best team in baseball contributed much to this). The NL West was third at .510. While such calculations cannot be made for 2020 because of the lack of inter-geographic games (we can add the playoffs in, but they don’t provide a sufficient sample to be relied upon, although it is hard not to notice the complete playoff decimation of the Central), my guess is that playing a western-only schedule was tougher than the average schedule faced by a MLB team. This also suggests a slightly better than .500 true performance.

3. Most importantly, the Astros’ true talent was significantly better than that of a .483 team

Some time before the season, I grabbed the win projections from three sources that I think do quality work on these sorts of questions: Baseball Prospectus, Clay Davenport, and Fangraphs. Their win projections for the Astros were 36, 36, and 35. Respectively, that put them second, first, and tied for first in the AL (I should note, I didn’t record the date I jotted down these numbers, so they may not have represented the final estimate from each source).

Of course, things changed, most notably the injuries to Justin Verlander and Yordan Alvarez, but these are insufficient to take a 35 win team to under .500. Some Houston hitters (particularly Jose Altuve) had poor seasons that might have recalibrated estimates of their true talent, but how much stock can you put in sixty games of additional data, and there are players going the other direction too (Framber Valdez is a big one). It seems highly likely that the Astros underlying true talent was that of a winning team.

I note this not to self-aggrandize (heavens knows I’m wrong about these things more than I’m right), but rather to cover myself in case anyone looks back at my 2020 Predictions post, I actually did not pick the Astros to make the playoffs under the traditional format, but did once the field was expanded. While I did not expound on my logic, I did think the Astros were more vulnerable than the purely objective approaches did, mainly because I thought that a starting staff anchored by two pitchers older than me was not a foolproof plan (even if they are Hall of Famers to be). I actually picked the Rays to win the AL pennant, which didn’t take any particular insight but viewed in the wrong light could discredit the opinions I’m expressing here.

If you now will accept the premise that the Astros 1) performed like a winning team 2) would have been a winning team against a normal schedule and 3) could expect to perform better over any stretch of future games due to their underlying talent, are the Astros in fact a beneficiary of a sixty-game season – or a victim? I would argue the latter – the 2020 Astros were a good team that happened to have a mediocre record, largely caused by a season with a schedule that was too short and too unbalanced.

I don’t know what the purpose of a playoff system is (actually I do – it’s to make money). Is it an attempt to reward the teams that performed the best in the regular season? (I’m pretty sure it’s not). Is it an attempt to identify the best team, treating the regular season only as a means of culling the herd? (I think many fans feel this way, which I think is crazy). In reality it is likely some combination of all three. But if the point of a playoff system is to use the regular season to cull the herd, then attempt to identify the best team, I am far from convinced that the Astros should not have been a beneficiary of such a system. And while I would generally consider such a system foolhardy, it is certainly more defensible when the regular season used to do the culling is only sixty games – we should have much less confidence that the results of the sixty games should count for more than the results of the playoffs than we normally would be.

To put this into quantifiable terms, let’s suppose that we decide that in order for a champion to be legitimate, they have to meet some minimal standard of competence. We could then test the worthiness of a champion by calculating the probability that a team that met that minimum standard of competence would have compiled a record equal or worse than the team in question.

For example, let’s say that we decree that in order to be a worthy champion, a team should be at least a true .556 W% team. This is of course an arbitrary value; I’ve chosen it because it corresponds to a 90 win team, which is a value I have always subjectively thought of as marking a legitimate contender. So under this argument, we will accept the playoff result as legitimate, to the extent that the champion it produces was in fact a 90 win quality team.

Using the binomial distribution, it is simple to calculate the probability that a team with a given W% would have a record of 29-31 or worse. The probability of a .556 team going 29-31 or worse over sixty games is 15.8%. Contrast this with the Rays, who were 40-20, which would happen with probability 97.0%.

What would an equivalent performance to Houston’s 29-31 be over the course of 162 games? A .556 team would win 83 or fewer with a probability of 14.9% and 84 or fewer with a probability of 18.9%. The 2006 Cardinals won the World Series after going 83-78 (darn that missing game; the probability for 83-78 is 17.0%). So if you demand that your world champion be at least a .556 team, you could plausibly argue that the Astros would have been the least worthy champion in history. (I would in fact argue that the Astros were more worthy under this criteria based on the considerations discussed above, particularly strength of schedule since it remains grounded in actual wins and losses rather than component performance or true talent).

But .556 may well be setting the bar too low. Is it too much to demand that a world champion be a .600 team (equivalent to about 97 wins)? One could certainly argue that it is not – the champion should be a team that has demonstrated excellence, not simply a contender.

The probability that a .600 team would win 29 or fewer out of 60 is 4.4%. Conveniently, the probability that a .600 team would win 86 or fewer out of 162 is 4.4%. So under this formulation, the Astros could be seen as an equally worthy champion to a team that went 86-76 over a full season. The 2006 Cardinals are joined by the 1987 Twins (85-77) in not clearing this bar, and the 2014 Giants (88-74) were close as well. Of course, in all cases I would argue that we should at least adjust actual wins for strength of schedule, but I think this suffices to make the point. One can make a very logical case that a 29-31 team would not in fact have been the least worthy World Series winner in history. (I am only going to hint at the low hanging fruit offered by the sixty-game record of the reigning world champions who denied a much more worthy team called the Astros in 2019).

Saturday, October 10, 2020

Meanderings

* I used to get emails when there were comments awaiting moderation. This stopped at some point, and so there were a handful of non-spam comments that had been lingering for some time. I want to thank Tango Tiger and David Pinto (along with a couple anonymous readers) for their comments and apologize for neglecting to publish them until now.

* I’ve watched the overwhelming majority of playoff games played since 1997, and I think Game 5 of the TB/NYA ALDS was possibley one of the ten best games I remember. I say I think because a) recency bias is real and 2) I haven’t sat down and comprehensively reviewed past games to make sure I didn’t miss any. Some that stood out off the top of my head are:

1997 ALDS Game 4 (CLE/NYA)

1997 ALCS Game 6 (CLE/BAL)

1999 NLCS Game 5 (NYN/ATL)

2001 WS Game 7 (ARI/NYA)

2003 ALCS Game 7 (NYA/BOS)

2004 ALCS Game 4 (BOS/NYA)

2005 NLDS Game 4 (HOU/ATL)

2005 NLCS Game 5 (STL/HOU)

2006 NLCS Game 7 (STL/NYN)

2011 WS Game 6 (STL/TEX)

2012 NLDS Game 4 (WAS/STL)

2017 WS Game 5 (HOU/LA)

I think the Rays/Yankees tilt belongs in the company of those games. So imagine my surprise when I perused some comments online and saw people using the game as an occasion to recite their evergreen complaints about modern baseball, particularly in this case focused on the fact that the game wasn’t decided by the starting pitchers, and that all of the runs scored on home runs.

In looking at the above list of eleven games that were particularly memorable, you know what I can tell you about only a handful of them – the identities of the starting pitchers (Nagy/Mussina, Schilling/Johnson, Pedro...that’s about all I got). If you are complaining that pitchers rarely complete games in October, well, you missed the boat twenty-five years ago. While one may aesthetically prefer games determined by starters, I think the Rays/Yankees game is an odd one to find flaw with on that front. The fact of the matter is that the playoff format preordained that a decisive game would not be decided by the starters. Gerrit Cole started on short rest and made about as many pitches as could have reasonably been expected. And the lack of an obvious choice of starter actually contributed to one of the great features of this game, namely Kevin Cash’s perfectly executed plan to essentially use one of his best pitchers for each time through the order.

As to the aesthetics of the home run/strikeout game, I think there is a lot of projection going on. People know that they don’t fancy high-HR, high-K baseball, which is certainly their prerogative, but they pretend to know universally what all potential consumers of baseball find aesthetically pleasing. I have not seen any convincing evidence that the current style of baseball is driving people away from the game – arguments that focus on TV ratings are guilty of presupposing that a multi-faceted phenomenon can be boiled down to the stated aesthetic preferences of those who advance them.

You know what wouldn’t have made that game any more memorable? If five strikeouts had been replaced by five ground balls rolled over to second.

When I flip over to the NBA Finals for a few minutes and see Anthony Davis shooting threes, I think that I would much rather see Hakeem Olajuwon and Patrick Ewing battling each other in the post. Whenever I watch a NFL game, I think about how much more pleasing it would be to watch if teams occasionally lined up in the I or the Pro Set on 2nd & 7, and the quarterback was under center unless it was third & long. But I don’t suppose that my own preferences in these regards are indicative of those of the audience at large, or somehow representative of the manner in which basketball or football “should” be played.

Do the aesthetic flaws I find with those games reduce my interest in them from previous levels? Sure, although the primary reason I watch less NBA or NFL than I did previously is that as other obligations and interests reduce the total time I have available to devote to sports, I choose to achieve that by holding the time I allot to baseball reasonably constant, thereby crowding out the lesser sports. Even if the games returned to styles that I preferred, my time investment would not significantly increase – and I think the same is true for many of the complainers about baseball. In fact, I think many of them use the aesthetic argument as an excuse. For whatever reason, many of them have less time to devote to baseball, or feel that it is in someway childish or otherwise a waste of time. Aesthetics makes a nice excuse to justify to yourself why you invested all that time in the past (it was a different game!) or to try to not save face with your internet baseball friends who you fear are judging you for abandoning the thread that brought you together in the first place.

* I am strongly opposed to expanded playoffs. Yet I find hand-wringing about the Astros advancing to the ALCS to be bizarre. Yes, I realize that the Astros carry with them baggage for reasons beyond their 2020 performance, but these lamentations are ostensibly grounded in the fact that they had a 29-31 regular season record.

The fact of the matter is that a sixty-game season is very much incapable of producing the same level of certainty about a team’s quality that a 162 game season can. It should not surprise anyone that a good team could have a sub-.500 record over a sixty games. As fiercely opposed to expanded playoffs as I am, the fewer games are played in the regular season:

a) the more justification there is for an expanded playoff field b) more importantly, the more a team’s performance in the playoffs should change our perception of their actual quality

When the 2006 Cardinals just barely scraped over .500 and went on to win the World Series, I would argue that their performance in the playoffs should have positively impacted our perception of their quality – but only slightly so. Their mediocre record spoke more to their quality than their eleven playoff wins. But if the Astros obtain comparable success, it should provide a much greater positive lift to our perception.

In the preceding paragraphs I have been discussing the matter as if the 2020 regular season was the only information we had by which to gauge the Astros true talent. Of course, this is untrue, and I would argue that the Astros are a good baseball team (even with the injury to Justin Verlander which couldn’t be factored into pre-season assessments) that had a poor sixty games. Another playoff team, one that will be a very trendy pick for 2021, that had the inverse of Houston’s record (31-29), is one to which I would make the opposite argument. The Miami Marlins are a bad baseball team that had a lucky win-loss record over sixty games despite playing like a bad baseball team. (Their Pythagenpat record, based on runs per nine innings, was only .431; based on runs created, only .417. I know some of you are screaming right now about the 29 runs allowed in one game, but of course you can’t just throw that out – perhaps it should be truncated, but it can’t be ignored). The Marlins on paper before the season projected to be one of the worst teams in the NL. I can’t imagine a better under bet on team wins for 2021.

Yet because of a measly two game difference in their records, the Astros get scorn for advancing, while a Marlins advance would have been treated as a heart-warming story. In my world, the inverse is true. There’s no team I wanted out of the playoffs more than the Marlins; while I would have preferred that the A’s had beaten the Astros, and would prefer the Rays to do so, Houston’s success to date is a fantastic troll job of some very odd ways to think about baseball.

* Warning: what follows is not sports-related, and I don’t presume anyone reading this blog is here for anything other than my opinion on sports. The thoughts in this post would have been better expressed succinctly on a platform like Twitter, but I can no longer in good conscience use Twitter – it’s been two years since I tweeted regularly and a few months since I completely deleted my account.

Ostensibly, social media platforms like Twitter exist to facilitate free speech; in practice, regardless of whether it is/was the intent of their creators (and it now seems quite clear that it is the current intent of their owners), they serve as a mechanism for the suppression of free thought. It is their prerogative to do so (although I do not believe for one moment that this recognition of principle would be reciprocated), and it is my prerogative to refuse to use their service. I realize that I am writing this on a Google platform; the only thing I can say is that the value proposition of a free blogging platform makes getting in bed with the devil more attractive than getting in to participate in the cesspool that is Twitter circa 2020.

Wednesday, September 16, 2020

Scorekeeping Meanderings

The absence of baseball from mid-March to late July resulted in me spending a lot of time pondering the art of scorekeeping. This is somewhat counterintuitive, I suppose, as scorekeeping is an activity that usually is predicated on live baseball being played, whereas other baseball-adjacent interests like sabermetrics, baseball books, baseball cards, and your OOTP game (go Squires!) can be pursued just as well without a season in-progress. However, one of the ways that I sought to connect with baseball was through examining my old scoresheets. I wouldn’t say that I “relive” a game through reading scoresheets – I don’t, for instance, start at the top of the first and walk through the play-by-play. It is more of a survey of the scoresheet, looking at the names in the lineup, scanning first vertically for the flow of the game, then horizontally for the performance of individual batters. Given the manner in which I keep score, focusing on pitching performance is more of a chore, but usually the first ways of ingesting the scoresheet provide direction on whether there is anything of note. At the very least, I feel like I accomplished something – I now at least have weekly posts scheduled at Weekly Scoresheet through some time in January 2022.

What follows is a collection of disjointed opinions on priorities in keeping score, many of which I’ve previously written about. Of course scorekeeping is a deeply personal activity, and so these are my priorities – they need not be the priorities of any other scorekeeper. I have much enjoyed perusing the BaseballScorecards subreddit, which is the best repository of examples of personal scoresheets that I have found.

1. My primary goal in keeping score is to record as much information as possible about the game. In the Statcast era, it may be necessary to caveat this by saying that what I really mean is “as much information as can be gathered by the human eye watching on TV or at the park” – the amount of information that can be collected about a baseball game now far exceeds the capacity of our basic senses. There are some additional caveats about what this means for practical scorekeepinmg.

I want to have the entire account on one side of one sheet of 8.5 x 11 paper. I think that scoresheets read easier when they are broken up by innings. One of the innovations of the Project Scoresheet system and its offshoots (more on other aspects of these later) was to only provide for six scoreboxes for each player rather than nine, which increases the amount of space available for recording each plate appearance at the cost of losing the clear distinction between innings. I choose to retain the distinction at the cost of sacrificing some space.

I am always confused by the manner in which a team batting around causes some scorekeepers to lose their minds, and start crossing out the numbers in the subsequent innings. This happens in a small number of innings, and usually resolves itself after the following inning if we’re talking about high-level baseball. I simply draw some additional lines at the top and bottom of the scoreboxes that have been used as overflow and move along with my life.

The desire to maximize space is a reason why I don’t like using a pictorial representation of a diamond in a scorebox. In addition to not having artistic talent (you would think this wouldn’t matter when you are just tracing, but you’d be surprised), it renders the inside of the box of limited value for recording information.

2. Since requiring that the entire scoresheet be contained on a single side of a single sheet of paper limits the amount of available space, I completely eschew with any space in which to total individual statistics. It’s just as well – I don’t really want to spend time after the game on this – box scores are readily available, and more importantly, with batters only getting 3-6 plate appearances in a typical game, I can quickly scan horizontally across the scoresheet to take in his entire performance.

This doesn’t work as well for pitchers, as tracking a starter requires scanning over multiple innings. For this reason, when I use an alternative scoresheet that uses one side of a 8.5 x 11 sheet for each team, I devote some of the additional space to track pitcher’s statistics.

3. In recording the game action, I focus primarily on describing what happened rather than adhering rigidly to the rules of scorekeeping as laid out in section 9 of the official rulebook,. The reasoning behind this is that, as above, I am primarily concerned with capturing an accurate account of the game rather than using a the scoresheet as a means to compiling a statistical account of it (thus my preferences here would not serve an official scorer well to adopt). And if I am successful at the former goal, the latter will be recoverable even if it is not immediately evident from the notations on the scoresheet. This is not to say that I intentionally flaunt section 9, but rather disregard it at my convenience.

A few examples will make my point more clear. One is a strikeout/throw out double play. This is a double play, but nowhere on the sheet do I indicate it is a double play through a symbol like “DP”. It will be evident from my scoresheet that the batter struck out, and the runner was caught stealing on the last pitch of his plate appearance. Thus it is not necessary to note that it was a double play – the information recorded is sufficient to work this out after the fact.

Another is catcher’s interference (of which it certainly feels as if I have seen Sandy Leon commit more of in 2020 than multiple Indians catchers have over a number of campaigns). This is technically an error on the catcher, but I simply mark it as “INT”. The “E2” is implied; no need to take up space recording it.

Additionally, some times I will deviate from the official scoring if I think the official account obscures the matter. The most obvious examples are judgment calls like hit/error or wild pitch/passed ball; sometimes my judgment differs, and I’ll go with my opinion. Usually I don’t, though, because I generally favor eliminating as many of these judgment calls from the official record as possible. I would prefer to see a category “Battery Errors” rather than WP/PB; as such, I’m usually content to jot down the official scorer’s ruling even if I might have seen it differently.

A more arcane example can be summed up by this play which ended the Indians/Cardinals game on 8/29/2020. You watch the play and tell me how you would score the putout of Molina.

The official scoring for this play, at least as shown on MLB Gameday, was 353. I do not for the life of me understand why Carlos Santana is credited with the putout of Molina. I have not even attempted to understand it because it is not worth my time – either it is an error in Gameday, or it is an asinine rule to which I refuse to give any credence. It is true that Molina was not putout by the act of Ramirez tagging him, as the umpire ruled him out for leaving the baseline first. I understand that a defensive player must be credited with the putout, but why a phantom putout would be awarded to Santana, rather than the closest defender (Ramirez) or the closest defender in the direction in which Molina was originally oriented (Perez) or the next closest defender in the direction in which Molina turned (Lindor) is beyond me. I have this scored as “DP35 [OBL]” – the putout to Ramirez, and “OBL” indicating that it was a technical putout credited as a result of the out actually occurring when Molina left the baseline (I realize that there are some technicalities about baseline v. established running lane, etc.). I find this to be a much better representation of what actually transpired on the field, official scoring be damned. 

4. I know that I’ve written this missive before, but one of the arguments for the Project Scoresheet system is that it eliminates backtracking. “Backtracking” is defined as the act of having to go back to a previous scorebox to record events that occur as the result of a subsequent plate appearance, which is a long-winded way of saying “tracking baserunners”. Perhaps it is simply being used to it after nearly twenty-five years of scorekeeping, but I’ve never felt this is a great burden. At most you have four boxes in play at any given moment, and I don’t find it unnatural to monitor the progress of baserunners individually.

The Project Scoresheet system introduces a different type of backtracking, which I find much more troublesome – what I would call “readback backtracking”. Since the Project Scoresheet account is entirely linear, you have to go back to try to figure out which runner is which – in the plate appearance of the #6 hitter, who is the runner on first base? It takes reading through the prior boxes to figure it out, which makes it very difficult to tell quickly which player actually stole a base or scored a run.

This is not a knock on the Project Scoresheet system. Because it is an entirely linear system, it is 1) the quickest way to record the information and 2) the easiest format with which to enter it into a computer. The latter is the reason why Project Scoresheet used that format, as it started as a volunteer effort to first record and then to computerize accounts of all major league games. With regards to #1, I will occasionally start scoring a game before I am ready to use my normal scoresheet, and just need to jot down the events of the game in order to copy over later. When I do this, I use as strictly linear approach.

An offshoot of the Project Scoresheet system is the Reisner system, in which at the start of each plate appearance the location and identity of the baserunners are recorded. There is limited backtracking to note batter-runners who wound up scoring, but while readback backtracking is reduced, it’s still present. Personally, I find it tiring to keep repeating the location of baserunners that never advance (e.g. a leadoff walk that stays put will result in noting that the runner is at first base three times).

5. Nomenclature – you may note that I usually refer to a “scoresheet” rather than a “scorebook” or a “scorecard”. The reason for the first is simple – I prefer loose leaf sheets rather than binding them in a book. Sheets are easier to store (I have entire filing cabinet devoted to scoresheets), but most importantly they reduce risk. If I spill something on a single sheet, or it slips out of my binder on the way home from the park, that is unfortunate but not a catastrophe as it would be if an entire scorebook met a tragic end.

“Scorecard” is certainly the more romantic term, but to me it implies one of two things: 1) the Official Scorecard they try to sell you at the park, which is always an abomination if intended for actual scorekeeping or 2) printing on heavier stock, whereas for ease of storage and reproduction I prefer a standard sheet of printer paper backed by a clipboard.

Thursday, September 10, 2020

29 Runs

 Estimated probability of scoring >= 29 runs for various levels of R/G, (using Enby distribution with Tango Distribution c = .767):


R/G Prob 1 in…
3.50 0.00006% 1,719,611
3.55 0.00007% 1,520,381
3.60 0.00007% 1,343,511
3.65 0.00008% 1,197,110
3.70 0.00009% 1,058,359
3.75 0.00011% 939,653
3.80 0.00012% 836,424
3.85 0.00013% 745,271
3.90 0.00015% 664,698
3.95 0.00017% 594,326
4.00 0.00019% 531,900
4.05 0.00021% 476,471
4.10 0.00023% 427,207
4.15 0.00026% 383,962
4.20 0.00029% 345,397
4.25 0.00032% 310,973
4.30 0.00036% 280,638
4.35 0.00040% 253,096
4.40 0.00044% 228,784
4.45 0.00048% 207,280
4.50 0.00053% 187,670
4.55 0.00059% 170,297
4.60 0.00065% 154,651
4.65 0.00071% 140,549
4.70 0.00078% 128,012
4.75 0.00086% 116,513
4.80 0.00094% 106,275
4.85 0.00103% 97,006
4.90 0.00113% 88,732
4.95 0.00123% 81,106
5.00 0.00135% 74,290
5.05 0.00147% 68,093
5.10 0.00160% 62,454
5.15 0.00174% 57,319
5.20 0.00190% 52,713
5.25 0.00206% 48,508
5.30 0.00224% 44,665
5.35 0.00243% 41,153
5.40 0.00263% 37,991
5.45 0.00285% 35,046
5.50 0.00309% 32,391
5.55 0.00334% 29,956
5.60 0.00361% 27,719
5.65 0.00390% 25,664
5.70 0.00420% 23,805
5.75 0.00453% 22,065
5.80 0.00488% 20,490
5.85 0.00525% 19,037
5.90 0.00565% 17,697
5.95 0.00607% 16,480
6.00 0.00652% 15,336