Monday, July 12, 2010

Flavors of Component ERA

The introduction of SIERA has triggered a lot of discussion about ERA estimators in the sabermetric community. This piece does not further that discussion; rather, it takes a step back and considers the choices made in constructing such metrics and sketches out a classification system for them.

I'm under no illusion that anyone is going to adopt the system and start referring to ERC as a ERD or xFIP as a EBS-R. My intent is simply to organize my thoughts on the various constructions and hopefully to prod you to do the same. As with so many sabermetric decisions regarding methodology, the choice of which metric to use is very much dependent on the question you'd like to answer. Hopefully this missive will help to clarify (at least in my own mind) the varying constructions of ERA estimators.

I have chosen four properties on which to base the classification system: what the formula estimates (earned runs or total runs allowed); what inputs the method takes (traditional results, defense-independent, or batted ball data); whether the weights placed on events in the formula are dynamic or static; and whether runs are estimated through a model of the scoring process, linear weights, or regression.

Delving into the properties individually:

1. Earned runs (E) or total runs allowed (T)

A system that produces an estimate in earned runs (or ERA) will get an E code; those that estimate total runs get a T. This is pretty straightforward and needs no further explanation.

2. Result inputs (R), defense-independent inputs (D), or batted ball inputs (B)

Some of the categories used have a bit of overlap; batted ball inputs are defense-independent, but make a more restrictive grouping.

By result inputs, I mean traditional statistics: hits allowed, walks, strikeouts, and the like. Defense-independent inputs remove non-HR hits from consideration. Batted ball inputs can include walks, strikeouts, and hit batters in addition to batted ball types, of course.

3. Dynamic (D) or static (S) weighting

Any fully linear formula is static; any formula that involves interaction (to use SIERA's term) between events is dynamic. Generally, dynamic formulas in this category will be any that use a dynamic run estimator like Runs Created or Base Runs as its foundation, although there are two notable exceptions, both from Baseball Prospectus.

4. Regression-based (R)

As I describe it above, this category would cover whether the run estimator used was model-based (like RC, BsR, Markov models, etc.), used linear weights, or used regression. As such, it obviously would largely repeat the property of the metric covered by the third criteria. Additionally, regression is not a mutually-exclusive category, as both models and linear weight formulas could be at least partially regression-based. So instead, I'm just going to include a "-R" at the end of the classification for formulas that are regression-generated.

This is not intended to be a scarlet letter of any kind, but I can see where some readers might get that impression. I just think it is helpful to have a further understanding of where event weights are coming from and to distinguish between methods that get their dynamism from the run estimator used and which get it from conscious choices of the creator to add interactive terms--distinctions that would otherwise be glossed over by this classification system.

Now, I'll consider several component ERAs in common use in the sabermetric community and explain how this scheme would classify them. It is not intended in to be comprehensive in any way; it includes some of the most commonly used metrics and some that help to illustrate the shortcomings of my classification system:

Bill James' Component ERA: ERD

Component ERA estimates earned runs; it uses results (hits, walks, strikeouts, home runs, and the like) as its inputs; and it is based on a dynamic model of run scoring (Runs Created).

As you can see when it is put in practice, this classification system does not make any sort of judgment on the utility of the metric. I don't care for Runs Created, but it is still a dynamic run estimator. Any sort of result-based ERA estimator I'd propose would be based on Base Runs, but would still get the same "ERD" classification.

Tango Tiger's FIP: EDS-R

FIP is an earned run estimator that does not include non-HR hits and uses static weights (with a floating constant). The coefficients logically coincide to linear weights, but were regression-derived so I tack on the "-R" at the end of the code.

My own eRA: TRD & dRA: TDD

These two are "my own" only in the sense that I defined a specific name, formula, and abbreviation for them--otherwise, they simply use Base Runs to estimate RA, one using actual results and one assuming that balls in play become hits with the league average frequency and severity. No innovation on my part, just applying David Smyth's run estimator and Voros McCracken's theory. I have always preferred to look at total runs allowed rather than earned runs, and so I constructed metrics to estimate RA, using my preferred run estimator.

Dave Studeman's xFIP: EBS-R

xFIP illustrates a shortcoming of the R/D/B descriptions, as it only uses one batted ball component (league average for HR on FB), but I've included it in a group with other metrics that use multiple batted ball categories.

Graham MacAree's tRA: TBS

tRA is one of the rare metrics that estimates total runs allowed. It does so by considering all batted ball types at their actual linear weight values.

Peter Jensen's DIRVA: TDS

DIRVA is expressed in terms of runs above/below average, so it's not really an ERA/RA estimator in that form, although of course it could be converted into such.

Jensen's method also differs from many others here because it uses the actual play-by-play sequence. A pitcher's DIRVA is his run value prevented (based on the RE table) for defense-independent events, minus the run value prevented on defense-independent events for an average pitcher in the same number of innings. If you don't understand that sentence, that's because it's not very well written--please go read Jensen's explanation which is infinitely more coherent.

What makes DIRVA unique is that it considers sequencing, something that the other metrics ignore by estimating runs from scratch rather than taking advantage of play-by-play data. This property is not reflected in its code, and static is a questionable description of the weighting--the weights are static in that they are derived from a fixed RE table, but the weight of any given walk will depend on the base/out state in which it occurs, rather than the value of a walk being a constant as it is for other methods tagged with the "S" for static.

Nate Silver's Quick ERA: EBD-R

and

Matt Swartz and Eric Seidman's SIERA: EBD-R

Both metrics from Baseball Prospectus estimate earned runs using batted ball data with dynamic weighting based on regression results. Neither metric considers the complete spectrum of batted ball types--Silver only considers groundballs and Swartz and Seidman use the term (GB-FB-PU)/PA based on their research on the persistence of line drive frequency and the creators' belief that popups might well turn into flyballs in the future.

I've written a little bit about SIERA before, but the point I want to make here is that the "interactivity" which is one of the main selling points is not unique. The specific implementation of interactivity is--SIERA only looks at certain interactions (strikeout rate with itself, grounders less flies and popups per PA with itself, strikeouts, and walks). But interactivity itself is present in any metric based on a dynamic run estimator.

Consider an estimator based on Runs Created, which has a numerator of (H + W)*TB. Of course, that can be broken down further into:

(S + D + T + HR + W)*(S + 2D + 3T + 4HR)

If you foil that out, you get:

S^2 + 2SD + 3ST + 4S(HR) + DS + 2D^2 + 3DT + 4D(HR) + ....

There's interactivity everywhere. Every term in the A factor interacts with every term in the B factor. Of course, you might not actually want all of that interaction. Base Runs (and RC) work by homogenizing baserunner's starting locations, assuming that the score rate is constant regardless of whether a runner starts at first, second, or third. Obviously, that is false, but if you tried to work around that you would have a mess on your hands, and you would lose the benefit of using a formula rather than a more involved method like a Markov model.

The other thing that makes SIERA's interaction unique is that it is a regression equation; a lot of analysts just run linear regressions that don't include any interactive terms.

There's a lot more that could be written on this topic, specifically which class of methods is appropriate dependent on the purpose it is being used for and the emphasis placed on correlation with future ERA. A lot of that is squarely in the realm of opinion, though, and I'll spare you my thoughts for the time being. I believe that having a clear understanding of how each metric operates is a necessary first step in sorting through those more philosophical questions.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.