Wednesday, August 10, 2011

Sample Simple Limited Input BsR ERA Estimator

In my last post on ERA estimators, I described how my philosophy towards constructing those metrics is predicated on starting with a solid model to estimate runs. By using a solid foundation, you can be confident that, at the very least, your metric will adhere to the fundamental constraints of the run scoring process. The designer retains freedom to experiment and estimate when it comes to selecting the inputs into the model (i.e., what gets filled in for hits, walks, home runs, etc.)

In the article I sort of asserted that this could be done, and while I granted that it might be a more difficult process, I didn’t demonstrate how it could be done. This post will offer an (admittedly simple) estimator using BsR with limited inputs and a lot of estimation. The point is not to develop a metric that anyone will actually use.

I’m going to define plate appearances as AB + W, which can be approximated by IP*2.84 + H + W (it can also of course be calculated from the horribly named BFP column), but I’ll just refer to it as PA in the equations. The BsR equation I’ll be using as a basis is:

A = H + W - HR
B = (2TB - H - 4HR + .05W)*.78 = 1.56TB - .78H - 3.12HR + .039W
C = AB - H
D = HR

We only have direct knowledge of walks. Everything else will have to be filled in using estimation, for which I’ll use the 2010 major league totals. I’m not going to attempt to state any interrelationships between strikeouts, walks, and the events to be estimated--everything will simply be based on a scalar times (PA - W - K), a quantity which I’ll call N (the estimate of N based on IP is IP*2.84 + H - K).

In 2010, the ratio of hits to N was .369; the ratio of homers to N was .04; the ratio of total bases to N was .578; and the ratio of (AB - H - K) to N was .768. Thus:

A =.369N + W - .04N = W + .329N
B = 1.56(.578N) - .78(.369N) - 3.12(.04N) + .039W = .039W + .489N
C = K + .768N
D = .04N
BsR = (W + .329N)(.039W + .489N)/(.039W + .489N + K + .768N) + .04N
= (W + .329N)(.039W + .489N)/(.039W + K + 1.257N) + .04N

To convert to RA, multiply by 9 and divide by (C/2.84), which is a rough estimate of total outs (ideally, you would separate strikeouts from outs in play for this estimate). This is equivalent to multiplying by 25.56 and dividing by C:

Estimated RA = ((W + .329N)(.039W + .489N)/(.039N +K + 1.257N) + .04N)*25.56/(K + .768N)

The range for the estimated RA when applied is not as wide as the range for actual RA, which shouldn’t be a surprise since I intentionally took everything except strikeouts and walks out of the equation and didn’t do anything to amplify their value. For example, the top five starters in the AL in 2010 according to this formula were:



Again, the point is not to offer this as an equation that should be used. It’s simply an illustration of constructing a Base Runs equation while restricting the list of available inputs, yet still estimating each component separately. This same idea can be expanded upon (adding home runs to walk and strikeouts, for instance, would result in a standard DIPS-style estimator, and there are many other possible combinations of inputs), though, to produce a metric that is grounded in the foundation of the Base Runs model. As I mentioned in the previous post, one need not tie themselves to the “dumb” kind of estimation on display here (i.e. assuming that the allowed variables have no ability to improve the prediction of the missing variables).

2 comments:

  1. I like to estimate PA as:
    .94*(IP*3-K) + H + BB + K

    The reason is that a K is a PA.

    ***

    I suppose we can go really crazy, and estimate that .94 based on H+BB per IP. After all, the more runners on base, the more runners out on base. When H+BB = 0, then that .94 is really 1.00. I've never tried to come up with something (mostly because it's overkill I think). But, I'm putting it out there in case someone wants to try.

    But that first equation though, that should be done. It's a simple enough change, and prevents the bias of K pitchers.

    ReplyDelete
  2. Voros used something similar splitting out the Ks with the original DIPS.

    I am by nature quite fastidious when it comes to the design of metrics and quite lazy about what inputs go into them afterwards. Not a good trait, but one nonetheless.

    ReplyDelete

I reserve the right to reject any comment for any reason.