Monday, January 09, 2006

Win Shares Walkthrough, pt. 7 (Conclusion)

Distributing Fielding Win Shares to Individuals
Just as for hitting and pitching, we will assign fielding win shares to individuals by calculating claim points, and giving them the same percentage of the position’s win shares as they have of the position’s claim points. Each position, with the exception of second and short, has a different claim point formula. I will figure the claim points for the Braves’ starter at each position, and their win shares.

At catcher, the formula for claim points is:
cCP = PO + 2*(A - CS) - 8*E + 6*DP - 4*PB - 2*SB + 4*CS + 2*RS
where RS = (TmERA - CERA)*INN/9
RS is runs saved, based on Catcher’s ERA. I do not have Catcher’s ERA for 1993, so I have just set all of the catcher’s RS equal to zero. The Braves’ primary catcher Damon Berryhill recorded 570 putouts, 52 assists, 6 errors, 2 double plays, 6 passed balls, 62 steals, and 28 caught stealing. His claim points are 570 + 2*(52 - 28) - 8*6 + 6*2 - 4*6 -2*62 + 4*28 = 546. Just as in the offensive and defensive portions of the process, we zero out any negative claim points. Braves’ catchers totaled 990 claim points, giving Berryhill 546/990 of the 7.528 win shares for catchers, or 4.15.

At first base, we have:
1bCP = PO + 2*A - 5*E
Sid Bream had 627 PO, 62 A, and 3 E for 627 + 2*62 - 5*3 = 736 of the team’s 1638 total claim points, giving him 736/1638 of the 2.644 shares, or 1.19.

At second base, we introduce the concept of Range Bonus Plays. RBP are credited to any player whose Range Factor ((PO+A)*9/INN) is higher then the team average at the position, and are figured as:
RBP = (RF - PosRF)*Inn/9
We only credit RBP for players whose RF exceeds the positional average; therefore, there are no negative figures. Then we have this formula for CP at 2B and SS:
2bCP = ssCP = PO + 2*A - 5*E + 2*RBP + DP
Atlanta second baseman had a range factor of 5.320. Mark Lemke recorded 329 PO, 442 A, 14 E, and 100 DP in 1299 innings. His range factor therefore was (329+442)*9/1299 = 5.341. This is higher then the team average range factor at second base, so he gets (5.341-5.320)*1299/9 = 3 RBP. Then he has 329 + 2*442 - 5*14 + 2*3 + 100 = 1249 of the 1401 claim points, giving him 9.34 of the 10.476 win shares at second.

Jeff Blauser, at shortstop, had 189 PO, 426 A, 19 E, and 86 DP in 1323 innings, for no range bonus plays. This gives him 189 + 2*426 - 5*19 + 2*0 + 86 = 1032 of the 1204 win shares, giving him 6.65 of the 7.753 win shares at short.

At third base, the formula is:
3bCP = PO + 2*A - 5*E + 2*RBP
Terry Pendleton had 128 PO, 319 A, and 19 E in 1392 innings for 5 RBP. He has 128 + 2*319 - 5*19 + 2*5 = 681 of the 708 third base claim points, for 7.33 of the 7.62 win shares at third base.

For outfielders, we divide the RF by 3, since there are three outfield positions. Center fielders will generally have a higher range factor then the guys in the corners, and James notes that one function of RBP is to give more of the credit in the outfield to the center fielder. Then we apply this formula:
ofCP = PO + 4*A - 5*E + 2*RBP
Otis Nixon was the Braves primary center fielder, recording 308 PO, 4 A, and 3 E in 998 innings for 66.4 RBP. His claim points are 308 + 4*4 - 5*3 + 2*66.4 = 442 claim points out of 1190, for 6.71 of the 18.068 outfield win shares.

To find Fielding Win Shares, we simply add win shares credited at each position for a given player. One Brave, Bill Pecota, wound up with win shares at three positions(second, third, and outfield), although his total is just .38. The fielder with the most FWS for Atlanta was Mark Lemke with 9.34, all at second base.

My take: I don’t really have any opinion here; the system seems sound as far as I can tell, but of course the important stuff was done when we credited some of the defense to fielding and then the fielding to each position.

Putting it All Together
Win Shares for a player are just the sum of their batting, pitching, and fielding win shares. Then a rounding process is used. The Win Shares are rounded to whole numbers which must sum to 3 times the team win total. You could also display Win Shares unrounded, but James says that the difference between, say, 30 and 31 WS is very small to begin with and to display decimal places implies more accuracy then is actually there. He would prefer to keep the property that the team total sums to 3 times team wins.

So Bill’s rounding process is to round all numbers down to integers, and sum them. Then he orders the players by the remainders, and gives one win share to the player with the highest remainder until the player’s win shares sum up to the proper team number.

For example, suppose there was a team with 5 players that earned 25 win shares:
A had 10.005
B had 3.764
C had 0.963
D had 5.468
E had 4.800
Rounding down, A has 10, B has 3, C has 0, D has 5, and E has 4. That gives 22 WS, three short. Player C has the highest remainder, so he gets one WS. Then comes player E, who also gets one. Player B is third on the list, and also gets one. Now the team total is 25 and we stop with these final figures:
A has 10
B has 4
C has 1
D has 5
E has 5

My take
: In my spreadsheet, I keep the fractional numbers, but it is true that small differences are not significant. However, I don’t really see any reason to further reduce the accuracy by rounding it off. Bill’s position is to display imprecision and acknowledge that it is imprecise. My position is to display precision and acknowledge that it is imprecise. Just a difference of opinion, and not in any way a flaw in the method.

We can now compare the Win Shares that I found for the 1993 Braves to the actual Win Shares. There are certain to be some differences as I did not know exactly what years Bill used to set the park factors, nor did I have clutch hitting data, nor did I use precisely the same RC formula, nor did I have holds for a couple of relievers, nor did I have catcher ERA. Outside of those things, though, I believe I had all of the data needed.

My results did not match perfectly. I had the offense at 130.1, while Bill had 130.2. He had the pitching at 129.4, I had it at 127.8. He had the fielding at 52.4, I had it at 54.1. These differences are not insignificant, so it is possible that there is an error in my spreadsheet, but I have not been able to find it. Anyway, I will list the team in order of Win Shares that I came up with and put James’ in parentheses:
Blauser 29(29)
Justice 27(29)
Maddux 26(25)
Gant 24(25)
Glavine 20(20)
Avery 19(19)
Pendleton 19(16)
McMichael 17(17)
Smoltz 16(16)
Lemke 16(15)
Nixon 15(13)
McGriff 14(16)
Sanders 9(11)
Berryhill 9(8)
Bream 8(8)
Bedrosian 6(7)
Howell 6(6)
Mercker 5(6)
Olson 5(5)
Stanton 5(5)
Wohlers 4(4)
Smith 4(4)
Pecota 2(2)
Belliard 2(2)
Cabrera 2(1)
Klesko 1(2)
Lopez 1(1)
Jones 1(0)

Again, I am not quite sure what has caused the errors with the pitchers. Some of it is the data differences, but the offense/defense split should not have been affected (unless it was by different park factors). For hitters, I used a different RC formula since I did not have the clutch hitting data, so that is probably the largest factor contributing to the errors.

Final Thoughts on Win Shares
There will be no “My Take” section here because that’s what this whole part is. I have already listed my concerns/questions/disagreements/quibbles with the Win Shares method, and I will not rehash all of those here.

Instead, I will just state that personally, I don’t have a lot of use for the end result, Win Shares, but I will concede that there may well be useful stuff inside the process. For example, the idea of evaluating the team’s fielding and then breaking that down to individual credit may prove to be very useful. Bill claims to have had some new insights into fielding stats and I don’t doubt him; it’s just that I don’t keep up as much as I should with fielding evaluation so I’m not the right person to evaluate that and comment about it.

I do however, believe that the hitting and pitching components are not a step forward. They do not allocate absolute wins; well, they do, but they don’t do it correctly. So what are you left with? You are left with a runs above replacement (a very low replacement), with the scale nuked. I don’t find this very useful.

One aspect of Win Shares that most other systems do not incorporate is reducing the player’s rating if the team underperforms their expectations. If you create 100 runs on a team that creates 700, but only scores 650, your RC will be adjusted down. If you play on a team that, based on it’s R and RA, should win 55% of its games but only wins 52%, your runs will be worth less in terms of wins. These things are disliked by some people, but they are perfectly defensible in a value method. There may be some room for disagreement on whether to apply the adjustments proportionally to a player’s production, as James does, or whether to distribute them proportional to playing time. But these adjustments in theory are fine.

But you don’t have to go through the Win Shares system to apply similar adjustments. You can apply similar adjustments to an individual’s RC, or his WAR, or whatever. You could even find the team marginal runs/win (derived from the WS system), and use this as the RPW converter to convert RAR to WAR, or RAA to WAA. But there’s no need in my opinion, when evaluating hitters, to go through the WS process.

For pitchers, ideally it would be nice to have a different baseline depending on their reliance on the fielders, but it should clearly be individually-based, not team-based, so the mechanism in Win Shares won’t get you there. Even if you understood why it is designed the way it was.

So in short, I don’t think Win Shares is necessary for hitters, I don’t think it’s necessary for pitchers, and I think it may provide some insights into fielding evaluation but can’t really tell you. I don’t think it would get nearly as much attention as it does if it was not invented by Bill James, but I also think that Bill James is clearly the biggest name in sabermetrics so I do not begrudge him this.

What I do resent is people who accept WS because it comes from Bill James without questioning it themselves. I’m not thinking of any specific people, I have just seen in various discussions in places sentiment to the effect of “well, Bill spent a lot of time on it, it must make sense.”

There is nothing wrong with using WS in an analysis or talking about how the system could be improved, etc. as long as you recognize the strengths and weaknesses of the method and recognize how they might affect the analysis you are using them for. This of course is true to some extent for all sabermetric measures.

No comments:

Post a Comment

Comments are moderated, so there will be a lag between your post and it actually appearing. I reserve the right to reject any comment for any reason.