Wednesday, April 14, 2021

Rate Stat Series, pt. 1: Introduction

This blog has existed for sixteen years now, and yet with the exception of some (relatively) recent stuff I’ve written about the Enby distribution for team runs per game and the Cigol approach to estimating team winning percentage from Enby, almost all of the interesting sabermetric work appeared in the blog’s first five years, and most in the first year or two.

There are a number of reasons for that - one is that when I started, I was a college student with a lot more free time on his hands than I have with a 9-5. Related, I was also more eager to spend a lot of time staring at numbers on my free time when I didn’t spend a good portion of my day staring at numbers. Remember the Bill James line about how a column of numbers that would put an actuary to sleep can be made to dance if you put Bombo Rivera’s picture on the flip side of the card? Sometimes the numbers do indeed dance, but the actuary in question would rather watch a ballgame or read about the Battle of Gravelines than manipulate them in the evening, dancing or no.

More generally, there has been much less to investigate in the area of sabermetrics that I primarily practice, which I will call for the lack of a better term “classical sabermetrics”. I would define classical sabermetrics as sabermetric study which is primarily focused on game-level (or higher, e.g. season, player career, etc.) data that relates to baseball outcomes on the field (e.g. hits, walks, runs scored, wins). Classical sabermetrics is/was the primary field of inquiry of those I have previously called first or second-generation sabermetricians.

Classical sabermetrics is not dead, but to date the last great achievement of the field was turned in by Voros McCracken when he developed DIPS. I’m not arrogant enough to declare that nothing more will ever be found in the classical field, and there is still much work to be done, but at least as far as I can see, it is highly likely that it will consist of tinkering and incrementally improving work that has already been done, and probably with little impact on the practical implementation of sabermetric ideas. For example, I still would love to find a modification to Pythagenpat that works better for 2 RPG environments, or a different run estimator construct that would preserve the good properties of Base Runs while better handling teams that hit tons of triples. All of this is quite theoretical, and of no practical value to someone who is attempting to run the Pirates.

Which increasingly is what sabermetric practitioners are attempting to do, whether directly through employment by major league teams, or indirectly through publishing post-classical sabermetric research in the public sphere. Let me be very clear: this is not in any way a lament for a simpler, purer time in the past. I think it’s wonderful that sabermetric analysis has transcended the constraints of the data used in its classical practice and is exerting an influence on the game on the field.

Notwithstanding, I am still a classical sabermetrician, not because I don’t value the insight provided by post-classical sabermetrics but because I don’t have some combination of the skillset or the way of thinking or the resources or the drive to become proficient enough in newer techniques to offer anything of value in that space. Thus it is natural that I have less to share here.

The topic that I am embarking on discussing is squarely in the realm of “quite theoretical and of no practical to someone who is attempting to run the Pirates”. About fifteen years ago, I started writing a “Rate Stat Series”, and aborted it somewhere in the middle. I have stated several times that I intend to revisit it, but until now have not. The Rate Stat Series was and now is intended to be a discussion of how best to express a batter’s overall productivity in a single rate stat. I should note three things that it is not:

1. The discussion is strictly limited to the construction of a rate stat measuring overall offensive productivity, not a subset thereof. I am not suggesting that if you are measuring a batter’s walk rate, strikeout rate, ground-rule double rate, or any other component rate you can dream up, that you should follow the conclusions here. For most general applications, plate appearances makes perfect sense as the denominator for a rate for any of those quantities. There may be reasons to follow a sort of decision tree approach that results in different denominators for some applications (McCracken was an innovator in this approach, in DIPS and park factors). All of that is well and good and completely outside the scope of this series.

2. The premise presupposes that the unit of measurement of a batter’s productivity has already been converted to a run-basis. Thus it is not a question of OPS v. OTS v. OPS+ v. 1.8*OBA + SLG v. wOBA v. EqA v. TAv v. whatever, but rather what the denominator for a batter’s estimated run contribution should be. The obvious choices are outs and plate appearances, but there are other possibilities. Spoiler alert: My answer is “it depends”.

3. Revolutionary, groundbreaking, or any other similar adjective. I’m attempting to describe my thoughts on methods that already exist and were created by other people in a coherent, unified format.

In sitting down to write this, I realized I made two fundamental mistakes in my first attempt:

1. I was attempting to “prove” my preferences mathematically, which is not a bad thing in theory, but some of what I was doing begged the question and some of this discussion is of a theoretical nature that lends itself more to logical reasoning/“proofs” than to mathematical “proofs”. I’ve tried to anchor my conclusions in math, logic, and reason where possible, but have also embraced that some of it is subjective and must be so.

2. I posted pieces before I finished writing the whole thing, or even knowing exactly where it was going.

These are rectified in this attempt – all of my assertions are wildly unsupported and as I hit post, all planned installments exist in at least a detailed outline form. While I have attempted to avoid the two mistakes I identified in the previous series, as I look at this series in full I can see I have may have just replaced them with two characteristics that will make reading this a real chore:

1. I’m overly wordy; repeating myself a lot and trying to be way too precise in my language (although I fear not as precise as the topic demands). There’s a lot of jargon in an attempt to delineate between the various concepts and methodological choices.

2. There’s way too much algebra; where possible, I didn’t want to just assert that mathematical operations resolved in a certain way and give an empirical example that backs me up, so there’s a lot of “proofs” that will be of no general interest.

Allow me to close by laying some groundwork for future posts. I am going to use the 1994 AL as a reference point, and when I use examples they will generally be drawn from this league-season. Why have I chosen the 1994 AL?

1. 1994 was the year I became a baseball fan, and I was primarily focused on the AL at that time, so it is nostalgic. I have not turned into a get off my lawn type who thinks that baseball reached its zenith in 1994 and it’s all been downhill since, but I do think that about 1994 Topps, the greatest baseball card set of all-time.

2. As the year in which the “silly ball era” really broke out, and due to the strike shortening the season, there are some fairly extreme performances that are useful when talking about the differences between rate stat approaches.

As discussed, this series starts from the premise that a batter’s contribution is measured in terms of runs, and work from there. This approach does not require the use of any particular run estimator, although one of my assertions is that the choice of run estimator and the choice of rate/denominator for the rate are logically linked. There are three types of run estimators that I will use in the series: a dynamic model, a linear model, and a hybrid theoretical team model.

In order to avoid differences in the run estimator(s) used unduly influencing differences in the resulting rate stats, I am going to anchor a set of internally consistent run estimators in the reference period of the 1994 AL. It will come as no surprise if you’ve read anything I’ve written about run estimators in the past that I am using Base Runs for this job. The point of this series is not to tell you which particular run estimator to use or how to construct it. It really doesn’t matter which version of Base Runs I use (if you are still stuck on Runs Created, there’s no judgment from this corner, at least for the duration of this discussion), or which categories I include in the formula – this is about the conceptual issues regarding the rate that you calculate after estimating the batter’s run contribution, so I am keeping it very simple, looking just at hits, walks, and at bats (thus defining outs as at bats minus hits) and ignoring steals/caught stealing, hit batters, intentional walks, sacrifices, etc..  Since I’m doing this with the run estimator, I will also do it with most other statistics I cite – for example, throughout this series OBA will be (H + W)/(AB + W), and PA will just be AB + W.

A version of Base Runs I have used is below. It’s not perfect by any means; it overvalues extra base hits as we’ll see below, but again, the specific estimator is for example only in this series – the thinking behind constructing the resulting rates is what we’re after:

A = H + W – HR

B = (2TB - H – 4HR + .05W)*.78

C = AB – H

D = HR

BsR = (A*B)/(B + C) + D

Typically, any reconciliation of Base Runs to a desired estimate number of runs scored for an entity like a league is done using the B factor, since it is already something of a balancing factor in the formula, representing the somewhat nebulous concept of “advancement” while the other components (A = baserunners, C = outs, D = automatic runs) represent much more tightly defined quantities. In order to force the Base Runs estimate for the 1994 AL to equal the actual number of runs scored, you need to replace the .78 multiplier with .79776, which can be determined by first calculating the needed B value (where R is the actual runs scored total):

Needed B = (R – D)*C/(A – R + D)

Divide this by (2TB – H – 4HR + .05W) and you get a .79776 multiplier. I usually don’t force the estimated runs equal to the actual runs, but for this series, I want to be internally consistent between all of the estimators and also be able to write formulas using league runs rather than having to worry about any discrepancies between league runs and estimated runs.

So our dynamic run estimator (BsR) used throughout this series will be:

A = H + W – HR = S + D + T + W

B = (2TB - H – 4HR + .05W)*.79776 = .7978S + 2.3933D + 3.9888T + 2.3933HR + .0399W

C = AB – H = Outs

D = HR

BsR = (A*B)/(B + C) + D

To be consistent, I will also use the intrinsic linear weights for the 1994 AL that are derived from this BsR equation as the linear weights run estimator. The intrinsic linear weights are derived through partial differentiation of BsR with respect to each component. If we define A, B, C, and D to be the league totals of those, and a, b, c, and d to be the coefficient for a given event in each of the A, B, C, and D factors respectively, than the linear weight of a given event is calculated as:

LW = ((B + C)*(A*b + B*a) – A*B*(b + c))/(B + C)^2 + d

For the 1994 AL, this results in the equation, where RC is to denote absolute runs created:

LW_RC = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .1076(outs)

We will also need a version of LW expressed in the classic Pete Palmer style to produce runs above average rather than absolute runs. That’s just a simple algebra problem to solve for the out value needed to bring the league total to zero, which results in:

LW_RAA = .5069S + .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs)

I am ignoring any questions about what the appropriate baseline for valuing individual offensive performance is. Regardless of where you side between replacement level, average, and other less common approaches, I hope you will agree that average is a good starting point which can usually be converted to an alternative baseline much more easily than if you start with an alternative baseline. Average is also the natural starting point for linear weights analysis since the empirical technique of calculating linear weights based on average changes in average run expectancy is by definition going to produce an estimate of runs above average.

Later we will also have some “theoretical team” run estimators built off this same foundation, but discussion of them will fit better when discussing that concept in greater detail.

I will also be ignoring park factors and the question of context in this series (at least until the very end, where I will circle back to context). Since I am narrowly focused on the construction of the final rate stat, rather than a full-blown implementation of a rating system for players, park factors can be ignored. Since I am anchoring everything in the 1994 AL, the context of the league run environment can also be ignored since it will be equal for all players once we ignore park factors.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.