This blog has
existed for sixteen years now, and yet with the exception of some (relatively)
recent stuff I’ve written about the Enby distribution for team runs per game
and the Cigol approach to estimating team winning percentage from Enby, almost
all of the interesting sabermetric work appeared in the blog’s first five
years, and most in the first year or two.
There are a
number of reasons for that - one is that when I started, I was a college
student with a lot more free time on his hands than I have with a 9-5. Related,
I was also more eager to spend a lot of time staring at numbers on my free time
when I didn’t spend a good portion of my day staring at numbers. Remember the
Bill James line about how a column of numbers that would put an actuary to
sleep can be made to dance if you put Bombo Rivera’s picture on the flip side
of the card? Sometimes the numbers do indeed dance, but the actuary in question
would rather watch a ballgame or read about the Battle of Gravelines than
manipulate them in the evening, dancing or no.
More generally,
there has been much less to investigate in the area of sabermetrics that I primarily
practice, which I will call for the lack of a better term “classical
sabermetrics”. I would define classical sabermetrics as sabermetric study which
is primarily focused on game-level (or higher, e.g. season, player career,
etc.) data that relates to baseball outcomes on the field (e.g. hits, walks,
runs scored, wins). Classical sabermetrics is/was the primary field of inquiry
of those I have previously called first or second-generation
sabermetricians.
Classical
sabermetrics is not dead, but to date the last great achievement of the field
was turned in by Voros McCracken when he developed DIPS. I’m not arrogant
enough to declare that nothing more will ever be found in the classical field,
and there is still much work to be done, but at least as far as I can see, it
is highly likely that it will consist of tinkering and incrementally improving
work that has already been done, and probably with little impact on the
practical implementation of sabermetric ideas. For example, I still would love
to find a modification to Pythagenpat that works better for 2 RPG environments,
or a different run estimator construct that would preserve the good properties
of Base Runs while better handling teams that hit tons of triples. All of this
is quite theoretical, and of no practical value to someone who is attempting to
run the Pirates.
Which
increasingly is what sabermetric practitioners are attempting to do, whether
directly through employment by major league teams, or indirectly through
publishing post-classical sabermetric research in the public sphere. Let me be
very clear: this is not in any way a lament for a simpler, purer time in the
past. I think it’s wonderful that sabermetric analysis has transcended the
constraints of the data used in its classical practice and is exerting an
influence on the game on the field.
Notwithstanding,
I am still a classical sabermetrician, not because I don’t value the insight
provided by post-classical sabermetrics but because I don’t have some
combination of the skillset or the way of thinking or the resources or the
drive to become proficient enough in newer techniques to offer anything of
value in that space. Thus it is natural that I have less to share here.
The topic that I
am embarking on discussing is squarely in the realm of “quite theoretical and
of no practical to someone who is attempting to run the Pirates”. About fifteen
years ago, I started writing a “Rate Stat Series”, and aborted it somewhere in
the middle. I have stated several times that I intend to revisit it, but until
now have not. The Rate Stat Series was and now is intended to be a discussion
of how best to express a batter’s overall productivity in a single rate stat. I
should note three things that it is not:
1. The
discussion is strictly limited to the construction of a rate stat measuring
overall offensive productivity, not a subset thereof. I am not suggesting that
if you are measuring a batter’s walk rate, strikeout rate, ground-rule double
rate, or any other component rate you can dream up, that you should follow the
conclusions here. For most general applications, plate appearances makes
perfect sense as the denominator for a rate for any of those quantities. There
may be reasons to follow a sort of decision tree approach that results in
different denominators for some applications (McCracken was an innovator in
this approach, in DIPS and park factors). All of that is well and good and
completely outside the scope of this series.
2. The premise
presupposes that the unit of measurement of a batter’s productivity has already
been converted to a run-basis. Thus it is not a question of OPS v. OTS v. OPS+
v. 1.8*OBA + SLG v. wOBA v. EqA v. TAv v. whatever, but rather what the
denominator for a batter’s estimated run contribution should be. The obvious
choices are outs and plate appearances, but there are other possibilities.
Spoiler alert: My answer is “it depends”.
3.
Revolutionary, groundbreaking, or any other similar adjective. I’m attempting
to describe my thoughts on methods that already exist and were created by other
people in a coherent, unified format.
In sitting down
to write this, I realized I made two fundamental mistakes in my first attempt:
1. I was
attempting to “prove” my preferences mathematically, which is not a bad thing
in theory, but some of what I was doing begged the question and some of this
discussion is of a theoretical nature that lends itself more to logical
reasoning/“proofs” than to mathematical “proofs”. I’ve tried to anchor my
conclusions in math, logic, and reason where possible, but have also embraced
that some of it is subjective and must be so.
2. I posted
pieces before I finished writing the whole thing, or even knowing exactly where
it was going.
These are
rectified in this attempt – all of my assertions are wildly unsupported and as
I hit post, all planned installments exist in at least a detailed outline form.
While I have attempted to avoid the two mistakes I identified in the previous
series, as I look at this series in full I can see I have may have just
replaced them with two characteristics that will make reading this a real
chore:
1. I’m overly
wordy; repeating myself a lot and trying to be way too precise in my language
(although I fear not as precise as the topic demands). There’s a lot of jargon
in an attempt to delineate between the various concepts and methodological
choices.
2. There’s way
too much algebra; where possible, I didn’t want to just assert that
mathematical operations resolved in a certain way and give an empirical example
that backs me up, so there’s a lot of “proofs” that will be of no general
interest.
Allow me to
close by laying some groundwork for future posts. I am going to use the 1994 AL as a reference point, and when I use
examples they will generally be drawn from this league-season. Why have I
chosen the 1994 AL?
1. 1994 was the
year I became a baseball fan, and I was primarily focused on the AL at that time, so it is nostalgic. I have
not turned into a get off my lawn type who thinks that baseball reached its
zenith in 1994 and it’s all been downhill since, but I do think that about 1994
Topps, the greatest baseball card set of all-time.
2. As the year
in which the “silly ball era” really broke out, and due to the strike
shortening the season, there are some fairly extreme performances that are
useful when talking about the differences between rate stat approaches.
As discussed,
this series starts from the premise that a batter’s contribution is measured in
terms of runs, and work from there. This approach does not require the use of
any particular run estimator, although one of my assertions is that the choice
of run estimator and the choice of rate/denominator for the rate are logically
linked. There are three types of run estimators that I will use in the series:
a dynamic model, a linear model, and a hybrid theoretical team model.
In order to
avoid differences in the run estimator(s) used unduly influencing differences
in the resulting rate stats, I am going to anchor a set of internally
consistent run estimators in the reference period of the 1994 AL. It will come as no surprise if you’ve
read anything I’ve written about run estimators in the past that I am using
Base Runs for this job. The point of this series is not to tell you which
particular run estimator to use or how to construct it. It really doesn’t
matter which version of Base Runs I use (if you are still stuck on Runs
Created, there’s no judgment from this corner, at least for the duration of
this discussion), or which categories I include in the formula – this is about
the conceptual issues regarding the rate that you calculate after estimating
the batter’s run contribution, so I am keeping it very simple, looking just at
hits, walks, and at bats (thus defining outs as at bats minus hits) and
ignoring steals/caught stealing, hit batters, intentional walks, sacrifices,
etc.. Since I’m doing this with the run
estimator, I will also do it with most other statistics I cite – for example,
throughout this series OBA will be (H + W)/(AB + W), and PA will just be AB +
W.
A version of
Base Runs I have used is below. It’s not perfect by any means; it overvalues
extra base hits as we’ll see below, but again, the specific estimator is for
example only in this series – the thinking behind constructing the resulting
rates is what we’re after:
A = H + W – HR
B = (2TB - H –
4HR + .05W)*.78
C = AB – H
D = HR
BsR = (A*B)/(B +
C) + D
Typically, any
reconciliation of Base Runs to a desired estimate number of runs scored for an
entity like a league is done using the B factor, since it is already something
of a balancing factor in the formula, representing the somewhat nebulous
concept of “advancement” while the other components (A = baserunners, C = outs,
D = automatic runs) represent much more tightly defined quantities. In order to
force the Base Runs estimate for the 1994 AL to equal the actual number of runs
scored, you need to replace the .78 multiplier with .79776, which can be
determined by first calculating the needed B value (where R is the actual runs
scored total):
Needed B = (R –
D)*C/(A – R + D)
Divide this by
(2TB – H – 4HR + .05W) and you get a .79776 multiplier. I usually don’t force the
estimated runs equal to the actual runs, but for this series, I want to be
internally consistent between all of the estimators and also be able to write
formulas using league runs rather than having to worry about any discrepancies
between league runs and estimated runs.
So our dynamic
run estimator (BsR) used throughout this series will be:
A = H + W – HR =
S + D + T + W
B = (2TB - H –
4HR + .05W)*.79776 = .7978S + 2.3933D + 3.9888T + 2.3933HR + .0399W
C = AB – H =
Outs
D = HR
BsR = (A*B)/(B +
C) + D
To be
consistent, I will also use the intrinsic linear weights for the 1994 AL that are derived from this BsR equation
as the linear weights run estimator. The intrinsic linear weights are derived
through partial differentiation of BsR with respect to each component. If we
define A, B, C, and D to be the league totals of those, and a, b, c, and d to
be the coefficient for a given event in each of the A, B, C, and D factors
respectively, than the linear weight of a given event is calculated as:
LW = ((B + C)*(A*b
+ B*a) – A*B*(b + c))/(B + C)^2 + d
For the 1994 AL, this results in the equation, where RC
is to denote absolute runs created:
LW_RC = .5069S +
.8382D + 1.1695T + 1.4970HR + .3495W - .1076(outs)
We will also
need a version of LW expressed in the classic Pete Palmer style to produce runs
above average rather than absolute runs. That’s just a simple algebra problem
to solve for the out value needed to bring the league total to zero, which
results in:
LW_RAA = .5069S
+ .8382D + 1.1695T + 1.4970HR + .3495W - .3150(outs)
I am ignoring
any questions about what the appropriate baseline for valuing individual
offensive performance is. Regardless of where you side between replacement
level, average, and other less common approaches, I hope you will agree that average
is a good starting point which can usually be converted to an alternative
baseline much more easily than if you start with an alternative baseline.
Average is also the natural starting point for linear weights analysis since
the empirical technique of calculating linear weights based on average changes
in average run expectancy is by definition going to produce an estimate of runs
above average.
Later we will
also have some “theoretical team” run estimators built off this same
foundation, but discussion of them will fit better when discussing that concept
in greater detail.
I will also be
ignoring park factors and the question of context in this series (at least
until the very end, where I will circle back to context). Since I am narrowly
focused on the construction of the final rate stat, rather than a full-blown
implementation of a rating system for players, park factors can be ignored.
Since I am anchoring everything in the 1994 AL, the context of the league run
environment can also be ignored since it will be equal for all players once we
ignore park factors.