Tuesday, December 20, 2005

Win Shares Walkthrough, pt. 4

Splitting Defensive Win Shares between Pitching and Fielding

This process involves seven “Claim Point” formulas, which when combined give an estimate of the percentage of defense attributable to pitchers. Each one of these claim points is either classified as being a “pitching” claim point, i.e. one that is attributable to the skill of the pitchers, or a “fielding” claim point that is attributable to the fielders, with the except of the first. The first formula is based on the team’s Defensive Efficiency Record, which as an estimate of the percentage of balls hit into play against them that are turned into an out. This one, called CL-1, counts for both pitching and fielding. We first find the DER = (BF - H - W - K - HB)/(BF - HR - W - K - HB). The Braves faced 6015 batters, and allowed 1297 hits, 480 walks, 22 hit batters, 101 homers, and 1036 strikeouts. So their DER is (6015 - 1297 - 480 - 1036 - 22)/(6015 - 101 - 480 - 1036 - 22) = .7267. We then find adjDER as 1 - (1 - DER)/PF(S). The Braves’ PF(S) was .995, giving an adjusted DER of 1 - (1 - .7267)/.995 = .7253. The claim points are found by:
CL-1 = 100 + (adjDER - LgDER)*2500
The NL DER was .7114, so the Braves’ CL-1 is 100 + (.7253 - .7114)*2500 = 134.75

The second criteria is a pitching claim point, based on the strikeout rate. First we find the team’s strikeouts per game as K*9/IP = KG. The Braves had 6.408, and we easily convert this to CL-2:
CL-2 = (KG + 2.5)/7*200
For the Braves, (6.408 + 2.5)/7*200 = 254.51

The third criteria is a pitching claim point, based on walks compared to the league average. The formula is:
CL-3 = Lg(W + HB)/IP * TmIP - W - HB + 200
In words, we find the league average of walks and hit batters per innings and subtract the team in question’s W and HB to find out how many above average they were, then add this to 200. The league average was .3782 and the Braves pitched 1455 innings with 502 W+HB, so:
.3782*1455 - 502 + 200 = 248.28

The fourth criteria is another pitching claim point, based on home runs allowed. We find the number of homers less then expected, multiply by 5 and add to 200.
CL-4 = (LgHR/IP*TmIP - HR/PF(HR))*5 + 200
The league average was .0964 HR/IP, and the Braves allowed 101 homers with a 1.019 PF(HR), so:
(.0964*1455 - 101/1.019)*5 + 200 = 405.73

The fifth criteria is the first of two that is for fielding only, and it is based on the rate of errors and passed balls. This is put together like this:
CL-5 = (Lg(E + .5*PB)/INN*TmIP - E - .5*PB) + 100
The league average of errors and half of PB per inning was .0974, while the Braves committed 108 errors and 13 passed balls, resulting in:
(.0974*1455 - 108 - .5*13) + 100 = 127.22

The sixth criteria is the most complex and compares the team’s double plays to the expected number of double plays (a fielding claim). First, calculate the percentage of non-HR hits that are singles in the league as Lg%S = S/(H - HR). Then make an estimate of Runners on First Base(RoF) for the team and the league:
RoF = (H - HR)*Lg%S + W + HB - SH - WP - BK - PB
In the 1993 NL, 77.8% of non-HR hits were singles. The Braves allowed 77 sacrifice hits, 46 wild pitches, and 9 balks, giving:
(1297 - 101)*.778 + 480 + 22 - 77 - 46 - 9 - 13 = 1287.49
The league has a RoF estimate of 19790. Expected DPs is the league DP per RoF, times the team RoF, times the ratio of team assists per inning to league assists per inning (this is used as an estimation of the opposing hitters’ groundball tendencies):
ExpDP = Lg(DP/RoF)*TmRoF*(A/IP)/Lg(A/IP)
The NL turned 2028 double plays, and the Braves recorded 1769 assists versus 24442 for the league(in 20284 innings). So the Braves had 1769/1455 = 1.216 assists/inning versus 24442/20284 = 1.205 for the league. Put it all together:
2028/19790*1287.49*1.216/1.205 = 133.14
CL-6 is just the excess double plays multiplied by 4/3, plus 100:
CL-6 = (DP - ExpDP)*4/3 + 100
For the Braves, who turned 146 DP, (146 - 133.14)*4/3 + 100 = 117.15

The seventh and final criteria is simply 405 times the team’s winning percentage:
CL-7 = 405*W%
For the Braves(104-58, .642), 405*.642 = 260

The percentage of defense attributable to pitching is the sum of the pitching claim points, plus 650, divided by the sum of all claim points (with CL-1 double counted because it is credited to pitchers and fielder) plus 1097.5.
Pitch% = (CL-1 + CL-2 + CL-3 + CL-4 + CL-7 + 650)/(2*CL-1 + CL-2 + CL-3 + CL-4 + CL-5 + CL-6 + CL-7 + 1097.5)
For the Braves:
(134.75 + 254.51 + 248.28 + 405.73 + 260 + 650)/(2*134.75 + 254.51 + 248.28 + 405.73 + 127.22 + 117.15 + 260 + 1097.5) = .7026

Therefore, we will assign 70.26% of the Braves 182 defensive win shares to the pitchers, or 128. The Field% = 1 - Pitch%, of course, and PWS = Pitch%*DWS while FWS = Field%*DWS. The Braves’ fielders get 54 win shares collectively.

There are a couple of constraints placed on these figures, but I it doesn’t appear as if they are relevant today. The first is that the Pitch% must be between 60 and 75%, and the second, which takes precedence over the first, is that a team must have between .16375 and .32375 FWS/game.

My Take: I do not know exactly the logic behind these steps, because Bill does not explain what that is (specifically I mean; obviously, the formulas are given and we know which criteria are credited to pitcher and which to fielder’s, but outside of that, we don’t know much), but I do not like this step. If Win Shares truly represents a step forward in measuring fielding, then the step that determines how much of a team’s defense the fielders deserve credit for is pretty darn important.

First let me explain the general logic behind the scales. Each CL-x formula has an average, which represents the amount of weight it is given. For example, the average CL-1 is 100, meaning it is weighted by 100. Let me just make a list of the averages:
CL-1 = 100
CL-2 = 200
CL-3 = 200
CL-4 = 200
CL-5 = 100
CL-6 = 100
CL-7 = 202.5
Most of these are easy to figure out, because they just take some difference from league average and add it to the number. Obviously, if the team is average, zero plus that number will equal that number. The only exceptions are CL-2, which assumes an average team will have 4.5 KG, and CL-7, which is just 405*.500 = 202.5.

If you plug those averages into the Pitch% formula, you will get .675, meaning that Win Shares assumes that an average defense is 67.5% attributable to pitching, which is in line with the 2/3 approximation that some sabermetricians use.

For all I know, all of these formulas could be very well-founded and work. The only problem is, James does not explain the weightings or how he reached them. In such a crucial stage of the process, there is basically no justification offered other then “it works”. I can accept that it may give a reasonable estimate, but do not expect me to adopt your system unless I have some hard data or reasoning to back it up.

One thing that puzzles me is the willy-nilly mix of counting numbers and rates. CL-1, uses a rate, DER. CL-2 uses a rate, KG. CL-3 uses counting, walks above average. CL-4 uses counting, homers above average. CL-5 uses counting, errors above average. CL-6 uses counting, double plays above average. CL-7 uses a rate, winning percentage. This is completely flummoxing. Why does a pitching staff’s K ability get expressed as a rate, while their control ability gets expressed as a count? If a full season is played, these things should even out but what about strike seasons? What if you try to use Win Shares in the middle of a season? This part won’t work. It takes a full season for the variances from expectation for walks, double plays, etc. get to the same proportion used in the formulas as the variances for the rates. KG is a number between 0 and 27 whether it is April 1 or the last day of the season. But how many errors above average can you possibly be after ten games?

I can’t express just how bizarre I think this is. Win Shares will not work if you use them in the middle of a season, because these formulas will not work. They will be comparing apples and oranges.

There are other questions to be asked to. For one thing, strikeouts are not compared to the league average. This makes absolutely perfect sense--after all, if everybody in the league has a KG of 9, that is four less outs in the field then a league where everybody has a KG of 5, even if no teams deviate from the average. So I understand this step. But why aren’t walks treated the same way? After all, if there is a league like the late 40s AL with a billion walks, won’t everybody in that league allow a lot of runs due to walks, which are not controlled at all by the fielders? If one league has an average of 400 walks/team, and a second league has an average of 500 walks/team, the second league’s pitchers are all allowing a lot of runs without involving the fielders at all. And the same argument goes for home runs. In a “three true outcomes” game, in which every play is a homer, walk, or strikeout, you don’t need fielders. You can do the old Satchel Paige legend and have them sit on the mound.

This would cause problems in the Win Shares system, though, because one can argue against that by saying that we cannot know whether a certain walk rate is good or not unless we evaluate it against the context (read league average). 4 walks/game is a solid performance in the 1949 AL where the average is 4.5, but atrocious in the 1880 NL where the average is 1.1 But is this not true for Ks as well? Walter Johnson was a great strikeout pitcher in his day, but his strikeout rates look like Nate Conejo compared to Nolan Ryan. Part of the problem, then, is that the pitching/fielding split is kept constant over time. If you had a league with a very low strikeout and walk rate, you would want to increase the fielding share, but still credit those pitchers that excelled in strikeouts and walks.

My point is that there are two aspects to the K/W/HR rates. One is obvious: how does it relatively compare to the other pitchers in the league. The second is more subtle: what do the absolute weights say about the importance of pitching in this league. The W and HR claim formulas address the first question, the K formula addresses the second question. Ideally, both questions would be answered. The first question might change the percentage of team defense attributed to pitching; the second would address the percentage that is assumed to be the case for an average team. In the "Three True Outcomes" game, fielding is zero. In a league in which there are a mix of the three true outcomes and balls in play, but all the pitchers allow them at the exact same rate, there is no difference which pitchers you have, and so pitching must be zero.

James says that even with the use of .52/1.52 instead of .5/1.5, pitchers seem to rate too low. And pursuing Win Shares lists, one tends to agree with him. According to Win Shares, as best as I can tell, the last pitcher ranked as the top player in the league was Steve Carlton in 1972. In the most recent year published in the book, 2001, the majors’ top rated pitcher was Randy Johnson with 25.70 Win Shares. The top five position players in each league easily exceeded this. The AL WS leader was Jason Giambi at 35.81, but fifth place Jim Thome came in at 29.44. The NL WS leader was Barry Bonds at 52.22, but Gary Sheffield came in fifth at 28.32.

One reason for the low ranking of pitchers could be a flaw in other sabermetric methods. For example, when we use Run Average or ERA together with a baseline to rank pitchers, we credit all of the marginal impact to the pitcher. But of course we also recognize that the defense has playd a part in preventing those runs. So we may well be overrating the pitcher’s impact. While that is likely a factor, I think there is a much more basic reason for the relatively low rankings of pitchers: the fact that the pitchers’ share is determined at the team level, and not the individual level.

Win Shares holds that excellence in getting strikeouts and preventing walks and homers shows that more credit should be given to the pitchers. But of course the K, W, and HR skills vary wildly among individuals on a staff. When the Diamondbacks 2001 Win Shares were split up, Randy Johnson and Curt Schilling both were excellent in those areas to help bolster the team’s pitching share. But soft-tossers like Mike Morgan and Greg Swindell were on that team too, bringing it down. The pitching share is obviously different depending on who is on the mound. And different from era to era.

But Win Shares, by apportioning the Win Shares to the pitching staff and then to individuals, uses the total team performance, and therefore cannot properly credit Johnson and Schilling. I would suggest that it could do more to properly credit them by emphasizing the three true outcomes in the pitcher claim point process which distributes the team pitching WS to the individual pitchers. But as we will see, the criteria there are R, ER, IP, W, L, SV, and HLD, essentially, nothing that will allow Johnson to gain tons of points for not relying on his defense (outside of the positive effect excellence in the three true outcomes has on ERA, Wins, etc.) What I am suggesting are additional bonuses for doing things that increase the team’s pitchers’ share. I am not suggesting that this approach would be ideal, because I think the best thing to do would be to allocate the percentage differently for each pitcher. However, the “bonuses” might be the easiest way for these factors to be incorporated into the Win Shares framework.

One final little thing is that the pitching/fielding split excludes pitchers from receiving fielding win shares. There is a good case to be made I think for doing this, given that pitcher's runs allowed totals include whatever defensive impact they had. However, I am not sure how this would impact comparisons of the pitching/fielding breakdown to other estimates of the breakdown done by other people who may have included pitcher’s fielding in the total share for fielding (of course, the fielding impact of pitchers is very small compared to other positions, at least as far as I know).

Component ERA in Win Shares
Component ERA(ERC) is used for a very small portion of the Win Shares method of distributing pitching win shares to individuals, but the formula is complex and so I put it in a separate section. ERC is an estimate of what a pitcher’s ERA should be given his component stats(IP, H, W, HR, etc.) It is based on the Runs Created model of runs = A*B/C where A = baserunners, B = advancement, and C = PA.
A = H + W + HB
B = ((H - HR)*1.255 + HR*4)*.89 + (W + HB - IW)*.56
C = BF
You then take A*B/C to estimate runs allowed. Runs allowed are divided by innings and multiplied by 9 to estimate run average. Then, if the RA estimate is >=2.24, subtract .56 to get ERC. If it is less then 2.24, multiply by .75 to get ERC.

However, in the Win Shares methodology, Bill just adds back the earned run part, so we can ignore it. Therefore, I will simplify the formula and call it RAC (for Component RA). RAC = A*B*9/(BF*IP)

I will run through this for Mike Stanton, who on the basis of recording a save or hold will get credit for “save equivalent innings” and therefore will have RAC play into our evaluation of his performance. Stanton pitched 52 innings, allowing 51 hits, 29 walks (7 intentional), zero hit batters, and 4 homers while facing 236 batters. Therfore:
A = 51 + 29 + 0 = 80
B = ((51 - 4)*1.255 + 4*4)*.89 + (29 + 0 - 7)*.56 = 79.06
His RAC is 80*79.06*9/(236*52) = 4.64, significantly better then the 6.06 RA he actually allowed.

My Take: The ERC formula is pretty straightforward assuming use of Runs Created. Of course, I would prefer to see a more accurate run estimator used as the basis, but that’s not happening. 1.255 is an approximation of the average number of TB per non-HR hit. I am not exactly sure why Bill takes estimated TB by .89 and walks by .56, while in the regular RC formula TB are weighted around 1 and walks around .26.

The subtraction of .56 is also a little confusing, since I believe research shows that assuming a multiplier all the way is more accurate (approximately 90% of runs are earned). If anything, one would expect the ERA and RA for very good pitchers to be closer (linearly) then those for bad pitchers, so multiplying by 75% for low RAs would make it worse. Of course, this is of no consequence in Win Shares because the unearned part is added back in.

2 comments:

  1. Lots of good stuff in this post, Patriot. FWIW, Charlie Saeger and I did a lot of work on the fielding/pitching split and posted it in a series of articles:

    A first look (similar to this post)
    Charlie's take
    My own suggested approach
    A comparison of the three
    A discussion of the three

    I personally like mine the best, of course, but I didn't implement it at THT because I didn't get enough folks agreeing (or not agreeing). Also, I'm not sure how much we should change the basic Win Shares methodology at THT (and still call it Win Shares). I know David Smith had at least one similar conversation at Fanhome about it.

    Both Charlie and I looked at the issue of not splitting the difference between pitching and fielding at the individual pitcher level, and we found something weird: it didn't make any difference. Win Shares came out basically the same. Now maybe we both did it wrong, and I agree that it absolutely makes sense to do it even if it makes no difference (just to be systematically logical). But, surprisingly, it didn't impact the actual results.

    Also, we publish in-season Win Shares by taking all the key "fixed" numbers and prorating them, based on the number of games played.

    ReplyDelete
  2. "Both Charlie and I looked at the issue of not splitting the difference between pitching and fielding at the individual pitcher level, and we found something weird: it didn't make any difference."

    This would seem to me to be evidence that Bill's splitting system doesn't really work all that well. Intuitively, one would expect a fairly large difference between extreme pitchers like a Ryan or on the other side, a Tewksbury or a Nate Cornejo, and an average team.

    Prorating the numbers for in-season make a lot of sense, of course. It just puzzles me that some standards are based on percentages and averages and others on raw totals. And what does that mean for shortened seasons like 1994? It is just one of the things in the methodology I find really puzzling and is not at all explained. And if all of us[online sabermetric community] can come up with these questions/fixes, I wonder why Bill never addressed them or why his peer reviewers didn't raise them.

    ReplyDelete

Comments are moderated, so there will be a lag between your post and it actually appearing. I reserve the right to reject any comment for any reason.