Tuesday, August 07, 2007

Career Walk Rates (an excuse to make a quick point)

Putting aside my attempt at being an objective analyst for a moment, my favorite offensive event is the walk, and my favorite kind of players are those that walk a lot. So I’m just going to do a quick look at some walk-focused derived career statistics for a those major leaguers with 5000 AB between 1901 and 2005 as a vehicle to talk about a couple. Also, please note that there are no new insights in this post; these are not ideas that originated with me or are unique. And it is really just a space-filler, to justify this post’s existence, so that I can complain about a certain quickie stat that people use.

Right off the bat, I’m going to ignore hit batters and sacrifices. So the only events being considered are at bats and walks in this analysis. Now, if you want to determine a player’s propensity to walk, what is the first possible statistic that comes to mind? I think if you’re like most people, you would say the percentage of plate appearances in which the hitter walked. And I think you would be right. So here are the top and bottom ten in Walk Percentage, W/(AB + W):
1. Ted Williams (20.8)
2. Barry Bonds (20.2)
3. Babe Ruth (19.7)
4. Eddie Yost (18.0)
5. Mickey Mantle (17.6)
6. Mark McGwire (17.6)
7. Jim Thome (17.5)
8. Frank Thomas (17.4)
9. Joe Morgan (16.7)
10. Rickey Henderson (16.7)
591. Kitty Bransfield (4.2)
592. Manny Sanguillen (4.2)
593. Tim Foli (4.2)
594. Enos Cabell (4.2)
595. Everett Scott (4.0)
596. Hal Chase (3.6)
597. Art Fletcher (3.5)
598. Ozzie Guillen (3.5)
599. Shawon Dunston (3.3)
600. George Stovall (3.2)

Obviously, we could get into league adjustments, park adjustments, or at least equivalent run values (as I did with the “Gavvys” for home runs), but that would be defeating the point of this post, which is not to look at the career lists as much as it is to discuss the construction of the stats themselves.

Another measure you’ll see people look at sometimes is what is sometimes called isolated walks, or on base extension, or other names, but is simply OBA-BA. It seems reasonable enough at first glance; OBA measures times on base by hits and walks, BA measures the frequency of hits per at bat, so the difference should tell you something about walk frequency. What kind of list does that make?
1. Barry Bonds (.141)
2. Ted Williams (.136)
3. Eddie Yost (.134)
4. Babe Ruth (.130)
5. Mark McGwire (.129)
6. Jim Thome (.126)
7. Mickey Mantle (.124)
8. Joe Morgan (.122)
9. Earl Torgeson (.121)
10. Frank Thomas (.121)

Obviously, this is a similar list, but not in the same order. Why is this? Well, what is OBA-BA?
(H + W)/(AB + W) - H/AB. BA and OBA have different denominators, so it is not exactly clear what this is measuring. But with a little bit of algebra, you can right “ISW” as:
ISW = W*(AB - H)/(AB*(AB + W))

As you can probably now see, this is a statistic that doesn’t really measure anything. Why are walks multiplied by outs in the numerator, and at bats multiplied by plate appearances in the denominator? Is there any logical explanation for this?

No, there isn’t. OBA-BA is just something that people use because they are lazy, and it obviously tracks the true walk rate well. But like OPS, it is a statistic that doesn’t have units; arguably unlike OPS, it doesn’t make even a bit of sense, despite generally tracking a useful thing (walking rate/frequency/ability).

You can fiddle with that ISW equation to further see what it winds up doing; if I rewrite it as the mathematically equivalent W/(AB + W)*(AB - H)/AB, and then rewrite (AB - H)/AB as the equivalent one minus batting average, you can see that:
ISW = W%*(1 - BA)

In other words, the more base hits you get, the worse you look in OBA-BA, despite an equal walk rate. I have never really seen it used for serious analysis, which is good because it never should be, and I don’t think there is any reason to ever use it.

If you just have the three major rate stats at your disposal, then you can calculate W% as (OBA - BA)/(1 - BA). Another quick walk stat you can use is walks per at bat. Now you have to be incredibly lazy to not want to add walks back into the denominator, but walks per at bat (which I’ll call WAB) does have a useful property, in that it ties into what I have always considered the “fourth rate stat”, secondary average, which is equal to ISO + WAB if you ignore stolen bases. What kind of career list does WAB give?
1. Ted Williams (.262)
2. Barry Bonds (.253)
3. Babe Ruth (.246)
4. Eddie Yost (.220)
5. Mickey Mantle (.214)
6. Mark McGwire (.213)
7. Jim Thome (.212)
8. Frank Thomas (.211)
9. Joe Morgan (.201)
10. Rickey Henderson (.200)

As you can see, this is the exact same order as the W% list. And if you understand the math, this is no surprise. Remember, we eliminated all extraneous categories (HB, SH, SF, INT) from consideration, so there are only at bats and walks. Therefore, W/AB is just the ratio form of the stat, where W% is the percentage version. That may be unclear; if so, consider:
Winning % = W/(W + L) Win Ratio = W/L
Walk % = W/(W + AB) WAB = W/AB

Win % is to win ratio as Walk % is to WAB. This is true of all ratios and percentages, mathematically, but I thought that using another example from baseball might help people see this. Therefore, W% and WAB are directly related--W% = WAB/(1 + WAB) and WAB = W%/(1 - W%). They will always produce the same list in order, and while W% is probably a more intuitive and useful form, WAB is the ratio of walks to non-walks, which can be a legitimate form of the stat. And if for some reason you want to get WAB from the basic rate stats, WAB = (OBA - BA)/(1 - OBA).

Anyway, one other walk stat that I do like is an estimate of the percentage of runs created derived from walks. You can do this pretty easily with a linear RC formula, and it has been done by many other sabermetricians in the past. It is sort of a junk stat, but it is a fun way to find players who may not have historically high walk rates but depended on the walk to contribute to their teams.

I am going to use one RC formula for the whole century; essentially Paul Johnson’s ERP (TB + .8H + W - .3AB)*.324. You can of course expand that out to see the weight on each event, but all that we care about here is that a walk is worth .32 runs. So .32*W/RC = percentage of RC derived from walks, which I will call %W:
1. Donie Bush, 44.5 %W, 13.8 W%
2. Eddie Yost, 44.5, 18.0
3. Miller Huggins, 44.3, 15.3
4. Eddie Joost, 42.0, 15.7
5. Mark Belanger, 38.1, 9.1
6. Earl Torgeson, 37.5, 16.5
7. Elmer Valo, 37.1, 15.8
8. Rickey Henderson, 36.9, 16.7
9. Joe Morgan, 36.9, 16.7
10. Burt Shotton, 36.6, 12.6

As you can see, the top of this list is largely made up of middling or poor hitters, as the great walkers like Ruth and Bonds and Mantle created a lot of runs by walking, but also many by getting base hits and hitting for power. Mark Belanger’s 9.1 W% is quite pedestrian, but he brought little else to the table as a hitter, and so a large share of his value did come from walking.

591. Manny Sanguillen, 11.6, 4.2
592. George Stovall, 11.6, 3.2
593. George Sisler, 11.5, 5.4
594. Heinie Zimmerman, 11.4, 4.4
595. Art Fletcher, 11.3, 3.5
596. Dante Bichette, 11.3, 5.3
597. Joe Medwick, 11.0, 5.4
598. Hal Chase, 10.3, 3.6
599. Garret Anderson, 10.2, 4.5
600. Shawon Dunston, 9.4, 3.3

Some good players, some bums, some crooks, a Coors Field fraud, a Michigians; all in all, a group of players that I love to hate.

The takeaway here is that if you’re going to figure a walk rate stat from BA and OBA, then for pete’s sake (Rose? Alexander? Schourek?), please use (OBA - BA)/(1 - OBA) or (OBA - BA)/(1 - BA), not the unitless and nonsensical OBA-BA.

1 comment:

  1. Teddy Ballgame rocks...his rate may have been even higher had he not missed several of his prime years helping folks stand up to facists during the war.


I reserve the right to reject any comment for any reason.