## Monday, June 03, 2013

### Offensive Percentages

I wrote this post (and its companion which will go up at a later date) in 2010, but didn't like it enough to publish it here. However, this topic came up on Twitter recently, and Sky Kalkman wrote up his take on it here. Since I already had this written, I thought I might as well add it to the conversation.

In one of his early national Abstracts, Bill James published a method that estimated the percentage of a player's offensive value (actually, his Runs Created) which was derived from Batting Average. Since RC is simply (H+W)*TB/(AB+W), one can calculate a player's RC in lieu of power and walks by taking H^2/AB. Dividing this by RC gave James an estimate of what percentage of his contribution came from base hits alone.

James' method was later expanded by Gerry Myerson in the Big Bad Baseball Annual to estimate the share of RC derived from walks and power. James Fraser (whatever happened to him, anyway?) later applied a similar approach to Extrapolated Runs.

Let's start with a simple, static linear formula, basically Paul Johnson's ERP. This is not the most precise run estimator available, but it's easy to work with and is good enough for this type of application:

RC = (.5H + TB + W - .3(AB-H))*.324 ~= .49H + .32EB + .32W - .1(AB - H) = .59H + .32EB + .32W - .1AB

It is pretty easy to split this up into the basic components of hits, walks, and power (as shown). However, there is the little problem of the negative runs that are charged for outs made. If you lump them in with hits, the share of offense contributed by base hits will be driven down. If you ignore them and compare to total RC, you'll end up saying that the percentage of value contributed by hits, walks, and power combined is greater than 100%, and by a different amount for each player. So instead, I’ll look at the contribution of hits, walks, and extra bases towards the positive linear weight value, and ignore the negative from outs. I make no claim that this is the optimal way to do this, but it seems like the least bad alternative.

Since we're not dealing with actual RC figures anymore, we can safely ignore the .324 multiplier and make it real simple:

Pos = .5H + TB + W = 1.5H + EB + W

The percentage of positive linear weights contributed by hits, walks, and power (extra bases) is straightforward:

H% = 1.5H/Pos

W% = W/Pos

P% = EB/Pos

I'm not quite sure how to express this coherently, but these percentages don't really represent the portion of a player's overall offensive value arising from those three components. It represents the share of a player's absolute positive Runs Created that arises from those three components. If you tried to apply this approach to absolute RC, it would fall apart, because you have to do something about the outs. If you tried to apply this approach to a baselined metric (RAA, RAR), it would really fall apart. You would have players with a negative denominator, and thus negative percentages, players with negative hit contributions but a negative denominator resulting in positive percentages, and all manner of results which wouldn't make much sense.

The bottom line is that, as Bill James explained when he introduced his version, you can't use the percentages literally. That doesn't make these percentages useless, but it does make them more of a freak show stat than they otherwise might be. Still, if you don't treat the percentages as literal, but as abstractions, and only compare them relatively between players, they have the potential to yield some insight.

Let's begin with the major league percentages for 2009 [I'm going to display these as (H, W, P) from this point]:

AL: (61, 15, 24)
NL: (61, 16, 23)

Simply collecting base hits is responsible for 60% of the positive run value in the majors. It's not that batting average is worthless--if you break OBA and SLG down into the portions derived from base hits and walks (OBA) or power (SLG), the hits portion is more important. The problem with BA is that it doesn't add much additional information given that you already have the more complete metrics. Getting hits is still a very important part of offense, and no sabermetrician will ever tell you otherwise.

Of course, the way I've split things up is to put the first base of every hit together. You could split off singles on their own, and leave the first bases of extra base hits in the "power" grouping, and of course the share of positive value credited to "power" would go up. Personally, I think this kind of approach is more useful if the extra bases are spun off.

In any event, players will have much more extreme profiles than the league as a whole. Consider these four players from the 2009 AL:

Suzuki: (76, 7, 16)
Punto: (60, 30, 10)
Pena: (41, 22, 37)
Delmon Young: (71, 5, 24)

Ichiro lead in H%; Punto led in W% and trailed in P%; Pena led in P% and trailed in H%; and Delmon Young was last in W%.

The disclaimer about abstraction can be illustrated by example. Compare Suzuki and Punto. 7% of Suzuki's positive linear weight total came from walks, while 30% of Punto's did. Suzuki 's walk rate was .059, Punto's was .145. If we could use the percentages literally, than Suzuki's overall rate of offensive productivity would be proportional to .059/.07 = .843 and Punto's .145/.3 = .483. It doesn't matter whether you use RC/PA, RC/O, or any other sensible overall rate--you're not going to be able to reconcile the players' ratio in those metrics and the players' ratio in non-sense units. You might be able to tie them loosely to an overall metric--after all, they can be tied back to "Pos" by definition. However, the positive linear weight values on their own, without subtracting or dividing by outs in any way, don't capture the full extent of a player's offensive productivity.

Next time, I'll look at how H, W, and P% look for hitters when they are grouped by overall productivity. To be one of the very best hitters, a player is going to have to contribute in all three areas--a player like Ichiro gives us some hint as to the upper limit for a player with very little secondary contribution. Looking at hitters breakdowns by quality groups will not provide much of analytical value, but it does help in identifying players with unique styles.