Saturday, July 23, 2011

Saying Nothing About ERA Estimators

If you follow the sabermetric blog/Twittersphere at all (and if you don’t, why on earth are you wasting time here?), I’m sure you can figure out what prompted this post. However, I’m not going to name the metric that has generated discussion about this general topic because this post is not meant to be targeted at anyone, or to be a debunking of a particular metric, or anything other than me expressing my opinion about the construction of ERA estimators. Others have different philosophies and they are welcome to them. This is mine:

First, I find it helpful to classify the inputs and construction of each metric. This is not necessary, but the reason I find it helpful is that the ERA estimators out there are relatively diverse. Compared to the sabermetric metrics that exist for evaluating offense, they are extremely diverse. Almost all offensive rates are built around an estimate of runs created and divided by either outs or plate appearances. Almost all of them start with the traditional results-based batting line.

ERA estimators, on the other hand, are all over the place. Some follow the lead of their batting cousins and use a run estimator as their base, but some are regression-based. Some use actual results, while some use batted ball data. Some use batted ball data but decide to combine the four standard categories (flyballs, line drives, groundballs, and popups) in some manner. Some assume that the pitcher has no control over what happens once the ball is put into play. Some have implicit or explicit regression built-in with regard to balls in play. Some limit themselves only to what happens when the ball is not put into play. Some estimate ERA, and some estimate total runs allowed.

You probably don’t need personally need more than one overall batting metric. That doesn’t mean there shouldn’t be diversity across the sabermetric community--there's nothing wrong with having a number of intelligently designed choices, but as an individual you don’t need both wOBA and True Average--one will suffice. That is not necessarily the case with ERA estimators--sometimes you might be interested in one that is results-based, sometimes you might be interested in DIPS, sometimes you might want to venture out into the uncertain world of batted ball metrics…even when using a common construction (BsR or LW for example), there is arguably a place for two or three or more different variations based on the inputs.

I believe that the most logical place to start with an ERA estimator is estimating runs. That is intentionally written to sound a little silly but it is not a philosophy shared by all developers of these metrics. Some put formulas down on the page that they would never consider using to try to estimate how many runs a team would score. I say that the place to start is with a logical run estimator. Given the team-level nature of the task, that suggests to me the use of Base Runs or another dynamic estimator, but I’m not going to argue too strenuously if you start with linear weights.

This is a path which is not necessarily going to minimize your RMSE, or give the best correlation with future ERA. With respect to the latter, if your goal is to provide the best possible estimate of future ERA, your metric is not attempting to measure how well the pitcher actually performed, it’s trying to forecast how well he will perform in the future. Certain constructions will by their nature be less accurate at estimating ERA in the same period. Every step you take down the path from outcome inputs (hits, walks, home runs, etc.) to component-based inputs (ignoring the actual outcomes of balls in play, or looking at batted ball types, etc.) will cost you accuracy when the standard is same period ERA. However, one can still use accuracy at predicting same period ERA for methods of similar classes.

Beginning the construction of the metric with a model of run scoring avoids some of the problems inherent in using actual pitcher runs allowed. I’m going to gloss over the fact that the number of runs a pitcher allows, regardless of whether it’s from a base period or a future period, is always dependent upon his defense and other factors outside of his control. There are still other concerns that do not apply when looking at true team-level data. The way runs are charged to individual pitchers is biased towards pitchers who inherit baserunners at the expense of those who bequeath baserunners. In practice, that means favoring relievers at the expense of starters, although depending on the performance of the relievers who inherit baserunners, individual bequeathers might actually benefit.

Thus, whenever an approach detects a reliever ERA advantage is detected, some of it is attributable to the way runs are assigned and not to the actual effectiveness of the pitcher. It might even be possible to increase the accuracy of a metric by giving a bonus to relievers. It is entirely unclear to me what benefit this provides other than lowering RMSE. It doesn’t tell you anything about how well the pitchers performed, and it certainly doesn’t help you measure “true talent” any better--if that is the objective, an adjustment in the opposite direction could be warranted.

Another advantage of modeling runs is that you can easily move between RA and ERA. Most sabermetricians prefer RA because of the biases present in ERA and the distortions created by reconstructing imaginary innings sans errors. It’s easy to rescale from RA to ERA by multiplying by a constant like .91. While it’s also easy to divide by .91 to go the other way, if the metric has been tailored to match ERA, you’ve baked the biases of ERA into your metric. This could potentially be most problematic for a regression-based estimator that uses batted ball data. Even if this bias is small, it’s still completely unnecessary.

Finally, the issue of dynamism is one that is often misunderstood with respect to ERA estimators. SIERA trumpets its “interactive” nature in its name (which does distinguish it from FIP and other linear methods) but any metric based on the foundation of a dynamic run estimator is by nature interactive. Instead of the interactivity being limited to target categories, though, every event interacts with every other event. Singles interact with triples, walks interact with home runs, doubles interact with triples, home runs interact with outs, outs interact with themselves...you get the idea (and I think that’s enough talk of events interacting with themselves).

Building your metric around a run estimator does not necessarily restrict you to simply plugging in the numbers in the appropriate place. Suppose you wanted to construct a metric based on batted ball types, strikeouts, and walks. One way to go about it would be to simply go through and estimate singles, doubles, triples, homers, and outs in play based on the percentage of each batted ball type that wind up as each. So, you would end up with equations that might look something like this:

Singles = .057FB + .217GB + .516LD + .017PU

However, if you believe that you have gleaned some other insights into the relationship between events that could improve your metric (such as strikeout pitchers having lower HR/FB rates) , you could still build that in to your formula for estimated home runs, and plug those into the run estimator. It’s more difficult than running a regression, and a more delicate balancing act (at least in terms of developing the formula), but it allows you to stay grounded in a model that estimates runs by taking a first step of, well, estimating runs.

Again, I want to make it clear that I was attempting to explain where I’m coming from when I examine metrics of this type. There is room for legitimate philosophical differences and I’m not trying to state that sabermetricians who deviate from the way I’d do it are engaging in poor practice. It would certainly be possible to develop a lousy metric based on a run estimator and following some of the other suggestions.

No comments:

Post a Comment

Comments are moderated, so there will be a lag between your post and it actually appearing. I reserve the right to reject any comment for any reason.