Monday, April 25, 2016

LWR and Component Deflators

If I tell you that a hitter has a line of .260/.330/.400, and I tell you that he plays in a park that inflates run scoring by 5%, what would you estimate his batting line would be in a neutral park? Suppose we know nothing at all about the park other than its effect on runs scored; we don’t know how it changes the rates of home runs, or strikeouts, or hits, or any other statistical category. We don’t know the dimensions or the altitude or the fence height or the type of playing surface, so we can’t make any estimates based on those factors either. How are you going to answer this?

One path you could take is to assume that the park influences the rates of all events equally. You could assume that the park will increase the number of singles by X%, the number of walks by X%, the number of home runs by X%, etc. Will this necessarily be the best approach? No; after all we expect the maximum real world park factor for walks to be much less extreme than for home runs, for instance. It may be an acceptable approximation, but it would be a stretch to say it was the optimum approach

However, what if we are only concerned about value, and don’t wish to adjust individual components? Personally, this is the question that interests me the most. I don’t care if the park benefits a certain type of hitter more than another; I just want to know what the effect was on runs scored. If a batter’s style or approach enables him to take more advantage of a park than the average batter, I don’t wish to take that credit away from him, as it generates real value for his team.

So if I want to adjust a player’s slash line for park, I don’t care about the precise park effect on doubles, walks, or strikeouts. I want to answer “What batting line would provide equivalent value in a neutral park, assuming that the proportional relationships between the components of this man’s line were a constant?” In other words, if this batter had a 2:1 ratio of singles to extra base hits in reality, I want him to still have a 2:1 ratio when we’ve adjusted his line for park.

This is where the approach mentioned above comes into play; assume the park has an equal effect on each mutually exclusive component of the batting line, and go from there. In order to start this process, we will make the assumption that the linear weight values will remain constant as we move between environments. Obviously this is a faulty assumption; linear weight values are dependent on context. However, linear weight values are also fairly stable over similar run environments, particularly in the case of the out value when we are using the -.1 type. A park increasing run scoring by 5% shouldn’t have too dramatic of an effect on the coefficients so as to render our conclusions invalid. Nonetheless, for more extreme parks, the potential for problems will be larger.

Let us define a new variable, a. a is what I will call the “component deflator” (I am borrowing the term “deflator” from Stephen Tomlinson’s “run deflator” as defined in the Big Bad Baseball Annual). Assuming stable linear weight values, using the definition of terms from the last post, and limiting the scope of our categories to the basic mutually exclusive offensive categories (singles, doubles, triples, homers, walks, batting outs) we can start by saying that:

RC/PF = new RC

New RC/Out = (sS*a + dD*a + tT*a + hHR*a + wW*a + x*(1 - S*a - D*a - T*a - HR*a - W*a))/(1 - S*a - D*a - T*a - HR*a - W*a)

All we have done is assume that the frequency of each event will be equally effected, by a factor of “a”. We can also simplify the out term to be 1 - a*(S + D + T + HR + W).

Just as seen in the last installment, we can cancel out the out terms from the numerator and the denominator, and thus save ourselves a lot of hassle, and write everything in terms of Linear Weight Ratio:

New LWR = (S*a + d'D*a + t'T*a + h'HR*a + w'W*a)/(1 - a*(S + D + T + HR + W)

Where new LWR = (New RC/O - x)*s'.

I have kept symbols everywhere to keep this as general as possible, but let’s remember what they actually are. d', t', etc. are all known values, all fixed coefficients. S, D, T, etc. are simply the frequencies of a set of mutually exclusive events. The only unknown variable in this equation is a, the common deflator.

Occasionally I find it necessary to include a disclaimer that I am not a mathematician, and this is one of those times. The way I am going to describe solving for severely overcomplicates the matter and makes the connection between LWR and a seem tenuous at best. And it’s true, you don’t need to convert to LWR in order to do this type of approximation; I just like doing in that way because of the aforementioned canceling out of the out term in the RC/O numerator and denominator.

To solve for a, let’s define the LWR numerator as N:

N = S + d'D + t'T + h'HR + w'W

One way to look at this is that, in effect, we have stated the player’s positive linear weight contributions from all events as an equivalent number of singles, since singles have a weight of one.

Let’s also write the denominator in an equivalent number of singles. Find (S + D + T + HR + W)/S and call this D (sorry for doubling up with doubles here). This the ratio of all non-out PA outcomes to singles.

Then, the New LWR can be viewed as this:

New LWR = N*a/(1 - D*S*a)

We have reduced all of the events down to an equivalent number of singles, and can solve for the ratio of singles under our new conditions to singles under old conditions that result in the desired new LWR. This is a, and it is the same ratio that will apply to the other events (except outs, which have to be handled differently):

a = New LWR/(N + D*S*New LWR)

Now, our player’s new rate of singles will be S*a. His new rate of doubles will be D*a, his new rate of home runs will be HR*a, and so on for all events except outs. His new rate of outs will be 1 - S*a - D*a - T*a - HR*a - W*a, or substitute “PA” for “1” if you are using the actual count of each event rather than the per-PA frequencies. The outs can also be adjusted as 1 - a*(1 - Outs), or PA - a*(PA - Outs), depending on whether you are using frequencies or counts.

Those of you who are astute and who are not totally bewildered by the circuitous way I defined terms and got to this point (which should eliminate most of you, since if you are in fact astute you are rightfully thinking “What the heck is wrong with this guy?”) may notice that the execution here is similar to Bill James’ “Willie Davis method”. And that it is. James converts a player’s batting line into an equivalent number of singles, finds the proportion of translated singles to original singles necessary to yield the right new number of Runs Created (which involves the quadratic formula due to the nature of the RC formula), and adjust the other events accordingly. So the procedure I’m using here is not in anyway new or unique, it is just an application of it in the case of Linear Weights.

Let me walk you through an example, since I’ve made this confusing as all get out. Let’s review the ERP-based LWR that I derived last time for example purposes:

LWR = (S + 1.67D + 2.33T + 3HR + .67W)/(1 - S - D - T - HR - W)

Let’s suppose that we want to take a league-average player from the 1990 NL and project his statistics in an extreme park, a mid-90s Coors type park with a 1.20 PF. Here are his statistics:



With some basic algebraic manipulations on the equations in the last post, we can go directly from LWR to New LWR by this formula:

New LWR = ((LWR/s' + x)*adjustment - x)*s'

Where we recall that x is the linear weight value of an out (-.097 in this case), s' is the reciprocal of the linear weight value of a single (2.058 in this case), and adjustment is the scalar effect on runs/out (1.20 in this case). So:

New LWR = ((.545/2.058 - .097)*1.2 + .097)*2.058 = .614

From here, we need to find “N” and “D”:

N = S + 1.67D + 2.33T + 3HR + .67W = .167 + 1.67(.041) + 2.33(.006) + 3(.021) + .67(.086) = .370

D = (S + D + T + HR + W)/S = (.167 + .041 + .006 + .021 + .086)/.167 = 1.922

And now we can solve for a:

a = New LWR/(N + D*S*New LWR) = .614/(.370 + 1.922*.167*.614) = 1.083

In order to increase this player’s RC/O by 20%, we need to increase his singles, doubles, triples, homers, and walks by 8.3% each. This yields a new batting line of:



So this player has gone from hitting .257/.321/.384 to hitting .281/.348/.419. His park-adjusted value has been held constant, as have his relative frequencies of each positive PA outcome. The key is that he has more of all the positive events, and thus less outs.

In case you are curious, from the limited set of frequencies defined here, BA is (S + D + T + HR)/(1 - W); OBA is S + D + T + HR + W; and SLG is (S + 2D + 3T + 4HR)/(1 - W)

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.