Tuesday, August 16, 2011

Ramblings on the Percentage of Runs Scored via Home Run

During the early part of the season, the perceived high dependence of the Yankees’ offense on home runs was often fodder for discussion in the mainstream baseball media. The implication was that while the Yankees were scoring a lot of runs, the fact that many of the runs were being scored on home runs was a sign that the performance was either unsustainable or would not be duplicable against quality pitching. The statistic most commonly cited in these discussions was the percentage of runs that scored on home runs. This post is not intended to comment on the sustainability or quality of pitching issues, but rather to offer a quick critique of the “percentage of runs scored on home runs” figure.

If one broadly divides approaches to credit runs to various individuals or events into two categories, there are those that only consider the final outcome (whether the run scored or not, and who did the scoring or the driving in) and those that attempt to assign credit at all steps in the process, even incremental steps that don’t directly push a run across the plate. The former class includes runs scored and RBI, of course, while the second class includes methods like linear weights and Base Runs.

The percentage of runs a team scores on homers obviously falls into the first class, and in fact when you think about it you will realize that the statistic is founded on a RBI perspective. The way a run gets tossed into the “resulting from a home run” bucket is to score on a home run--that is, to be driven in by a home run. Of course, one could also look at the question from the runs scored perspective--what percentage of the runners that score reached base on a home run? This is also very easy to compute, as it is simply the ratio between home runs and runs scored, data that is readily available for any team (whereas the number of runs actually scored on homers is harder to come by).

The RBI-based approach is subject to possible distortions in a manner somewhat similar to the issues with earned runs. If a home run is involved at all, the entire run is chalked up to the homer. Often, the home run is the key event enabling a run, but outside of the batter that actually hits the home run, it is never the only contributing event. In some cases, it is even relatively insignificant. If a home run scores a runner from third with no one out, it really didn’t have a large marginal value with respect to scoring the runner from third--the probability of that runner was scoring was already very high, and any number of other events would have allowed him to score. On the flip side, when a runner on first base with two outs is driven home by a home run, the home run is much more vital to scoring the runner.

The discussion in the last paragraph is making the case from transitioning from the outcome perspective to the run expectancy perspective of linear weights. Using play-by-play data, one can calculate the actual linear weight value of the home runs hit by a team. Such an approach will still be subject to sequencing fluctuations and arguably may not be as predictive as a more context-neutral approach.

One obvious context-neutral approach is to use standard linear weight values applied uniformly to all events to estimate the number of runs contributed by home runs. This figure can then be compared to the total number of runs scored. Using fixed linear weight values, though, this approach ends up boiling down to the ratio of home runs to runs scored, times a constant. For example, if the linear weight value of a home run is 1.4 runs, the result of that calculation will just be 1.4 times the simple home run to run ratio.

The next refinement is to not use actual runs scored at all; this post is going to be way too dry as is, so I won’t even bother trying to explain why mixing actual runs scored with estimated run contributions is a bad idea--it should be relatively obvious. Instead, you can compare the estimated run value of the team’s home runs to the run value of all of its offensive events.

There is a complicating factor in using linear weights (or intrinsic weights derived from a dynamic run estimator as I will in a moment) in this manner--the negative run value of the out. Simply taking Number of Event * Coefficient of Event for every event and dividing by the estimate of runs scored will result in percentages that sum to more than 100%, until outs are subtracted (and outs will have a negative percentage). This means that you can’t use the value literally--if the ratio of 1.4*HR/estimated runs scored is 25%, it doesn’t mean that 25% of the runs were scored because of home runs. Alternatively, one could look only at positive events, but then the denominator is no longer runs at all. As long as the number is viewed as a ratio and not a true percentage contribution, the result can still be useful in measuring the contribution of the home run to the offense.

Using a dynamic run estimator like Base Runs has the advantage of attempting to take into account the interaction between the offensive events rather than just assuming a fixed value. However, in the case of the home run, the additional value of considering dynamism is less than it might be for some other events because the value of a home run stays relatively fixed. The intrinsic value of a home run in BsR is:

((B + C)*A*b - A*B*b)/(B + C)^2 + 1

Where A, B, and C are the total A, B, and C factors for the team, and b and c are the respective B and C coefficients for the home run.

Take this BsR equation:

A = H + W - HR
B = .82S + 2.24D + 3.67T + 2.04HR + .1W
C = AB - H
D = HR

The formula for the intrinsic weight of the HR is:

((B + C)*A*2.04 - A*B*2.04)/(B + C)^2 + 1

I’ve also figured the intrinsic weights for the other events so that I can also show you the percentage of the positive intrinsic linear weight total contributed by home runs (“POS” in the chart below).

With this, we can look at the four different approaches I’ve discussed for 2010. In the chart below, “hr” is the intrinsic LW of the home run, “RonHR” is the number of runs that actually scored on home runs, “%onHR” is the RBI-perspective figure that gets a lot of media play (RonHR/R), HR/R is the run-scored perspective figure (HR/R), BsR% is (hr*HR/BsR), and Pos% is hr*HR divided by the sum of the other products of positive event counts and their respective intrinsic weights.

I’m not going to add much comment on these figures. This list is sorted by BsR%, which I think is the best measure of how large of a share of the offense the home run represented. Toronto was in its own world, of course, with respect to home runs hit and the share of offense contributed by the homer no matter how one estimates it. Also note the fact that the estimated linear weight value of every major league team falls in the [1.401, 1.444] range except for the Jays, 3.7 standard deviations below the mean at 1.355.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.