Today in Columbus, it is 55 degrees and sunny. This is a January day you wish you could defer until April when there is a home game to go to and it is in the 30s and drizzling. But anyway, that's not why today is a beautiful day. I was out doing some errands, and stopped into the bookstore. And to my surprise, there was the 2006 Sporting News Baseball preview magazine.
One of the sure signs that baseball is coming for me has always been the appearance of these magazines. I'm not sure why I am so drawn to them, because I usually find some of the "analysis" downright idiotic, like the one a couple years ago that told me Garret Anderson was a superstar and the Yankees, who led the AL in walks that year, were a bunch of undisciplined free-swingers. But I always buy all of them, and skim all of them, and read a few of them. Unfortunately, their ranks have dwindled over my time as a fan: "Baseball Illustrated", "Spring Training", and most lamentably, the excellent "Bill Mazeroski's" have all vanished. But there is still TSN, and Street and Smith's, and Athlon Sports, and Lindy's, and the SI baseball preview edition which usually comes out the week the season starts. The only one I will not buy is the insipid MLB-produced advertisement approrpriately titled "MLB".
Anyway, these usually tend to make their appearance around the middle of February when spring training starts, at least if my memory serves. But if TSN wants to brighten today, I won't argue with them.
Saturday, January 28, 2006
Today in Columbus, it is 55 degrees and sunny. This is a January day you wish you could defer until April when there is a home game to go to and it is in the 30s and drizzling. But anyway, that's not why today is a beautiful day. I was out doing some errands, and stopped into the bookstore. And to my surprise, there was the 2006 Sporting News Baseball preview magazine.
Well, the gigantic Indians deal has finally gone down. I’ve been waiting to comment on it until it happened, so now I will. Quite frankly, I think it’s a brilliant move.
First, Josh Bard for Kelly Shoppach is a winning move as far as I can tell. Bard is 28 years old and a career 238/289/370 hitter in about 530 PA. Shoppach is 26 and hit 233/320/461 and 253/352/507 in AAA the last two years. I think that at worst, Shoppach will match Bard, and I think he has the potential to be a fairly decent bat for a backup catcher. He may start in Buffalo, but Einar Diaz or Tim Laker wouldn’t be a big dropoff from Bard either; I think he’s pretty expendable.
David Riske for Guillermo Mota is not quite the slam dunk; Riske has always been one of my favorites, and I am sad to see him go. I have the unfortunate suspicion that Mota is seen by the Tribe as a potential closer based on his performance in Florida in the past, since Wickman is about as stable as a three-legged table. Riske had a 3.30 eRA last year, but also a 4.98 GRA and 4.10 G-F. Mota was at 5.52, 4.24, 4.07 and is 33 whereas Riske is 30. On the other hand, Mota was sensational in 03 and solid in 04, while Riske has always been solid. I’d rather have Riske in the end.
Then you also give up Arthur Rhodes, to get Jason Michaels from Philadelphia. Rhodes is oft-injured, but still has put up some impressive performances over the years. I love the Rhodes for Michaels swap, but I do think he is a significant loss for the bullpen. Looking at the bullpen, you have Wickman, Sauerbeck, Mota, Betancourt, Cabrera, perhaps Tadano, perhaps Miller, perhaps a real surprise like Mujica, or a non-roster invitee like Karsay or (heaven help us) Danny Graves. So the bullpen certainly looks a lot shakier then it did last year.
Coco Crisp is a nice player--he was +7 runs created compared to an average LFer in 2005, and +2 in CF in 2004. But at age 27 this year, I doubt that he will improve significantly from where he is now. And he is certainly a much more valuable commodity in CF then in a corner, and apparently centerfield has been locked down in Cleveland for some time to come. Now will Jason Michaels match Crisp’s performance from last year? I doubt it, although Todd Hollandsworth and Michaels could actually be a pretty nifty platoon. Michaels has hit 308/420/439 in 255 PA v. lefties the last two years while hitting 278/356/401in 419 v. righties. Hollandsworth, in 429 PA v. righties, has hit 262/324/427. Michaels should probably get the majority of the PAs, with Hollandsworth spelling him some against righties.
Andy Marte of course will make or break the deal. I think that a 22-year old who hit 275/372/506 at AAA last year is pretty exciting. I know he struggled greatly in the majors, but it was around 70 PA in his first go around. If Marte could push Aaron Boone out of the lineup by the All-Star break, then that would be great.
I think this trade is probably a step back for 2006, but I still think the Indians are good enough to win the Central. And I think it has the potential to be a huge boon for 2007 and beyond to have Marte in the fold. And if the Indians could somehow pull of the fanciful scenario of signing Jeff Weaver, then flipping Jake Westbrook for Austin Kearns…
One IMO silly concern about this trade I have heard is that John Schuerholz is a great judge of talent, and he traded Marte, so perhaps that means that something is wrong with Marte. First, I acknowledge that Schuerholz is a very smart guy, but nobody’s perfect. He traded David Cone for Ed Hearn. He traded away Jason Schmidt in the Denny Neagle trade. Edgar Renteria, while he has a big contract, is still a guy who a lot of teams would love to have playing shortstop for them. In order to get premium talent, you have to give up premium talent.
Secondly, Mark Shapiro by the same logic deserves some credit for the guys he has acquired in trades, like Travis Hafner, Cliff Lee, Grady Sizemore, and…Coco Crisp.
For a little black humor, I wish I could take credit for this, but somebody on BTF (there are so many Crisp trade threads on there that I cannot remember which it was) pointed out that Baseball-Reference lists Mr. Michaels, who had an incident with a police office last year, as most comparable to Len Koenecke, a 1930s outfielder who met an early demise when he went psycho on an airplane and the pilot bludgeoned him to death with the fire extinguisher. Interesting.
Saturday, January 21, 2006
Recently I have started reading the 2006 Hardball Times Baseball Annual. I will do a book review at some time in the future but for now it will suffice to say that you should probably get this book. Anyway, for now I just have some comments on a technical issue that was brought up by reading Dan Fox’s article “Are You Feeling Lucky?” Mr. Fox also has an excellent blog, Dan Agonistes (linked on side of page) in addition to his writing for the Hardball Times.
Anyway, the article examines team’s runs scored and allowed versus their BsR estimates, and runs scored and allowed versus W% by using Pythagenpat. There is a typo in the Pythpat formula--they have it as RPG^2.85, when it should be RPG^.285. But obviously the formula was applied correctly in the article, and it’s just a production mistake. There is also an error in the Indians’ and Mariners’ runs allowed that leads to a faulty conclusion about who “should have” won the AL Central (which this Indians fan just happened to notice). The Indians were not actually “lucky”--in fact, Bill James’ analysis in his Handbook, based on RC and RC Allowed, shows that the Indians were the best team in baseball, and easily the “unluckiest” or “least efficient”. Anyway, Dan told me that he will have an updated version of the article on their website, and will fix that minor snafu.
The main point here is not to criticize the article, because it’s a fine article, but to mention that there is a simple way to use the Pythagenpat relationship to estimate Runs Per Win. What Fox does is take, say, a seven game margin above Pythpat expectation, and multiply this by a RPW factor to give an equivalent number of runs. This is not technically precise, since RPW is a linear concept and Pythpat is not, but of course the linear approximation works very well and so this does not really present a problem in the analysis. Fox uses Palmer’s RPW = 10*sqrt(RPG/9). This formula is fine, but I would just like to point out there is a similar formula that comes directly from Pythpat. David Smyth, in the past, has published a formula that gives the RPW for any team, from Pyth:
RPW = 2*(R-RA)*(R^x + RA^x)/(R^x - RA^x)
Where R and RA are per game, and x is the exponent. You can check and verify that this formula works. However, at R = RA, it is undefined because the denominator will be zero. And this is a shame, because it is the point where R = RA that we would want to examine in order to conclude that in a context with an RPG of X, RPW is Y.
However, if we differentiate PW% with respect to RD, we will find a formula that gives the correct result at the R = RA point. This formula is:
RPW = ((2*RPG*(RR^x + 1)^2*(.5 - RD/(2*RPG))^2)/(x*RR^(x-1))
That’s confusing as heck, but remember, we want to evaluate it at R = RA. So RR = 1 and RD = 0. One raised to any power is one, so we can simplify to:
RPW = ((2*RPG*(1 + 1)^2*(.5 – 0/(2*RPG))^2)/(x)
And what is x? We’ve set x equal to RPG to some power. Various people use different values--I originally published it as .29, David Smyth originally published it as .287, Davenport and Fox used .285, Tango Tiger found that .28 would probably provide the best combination of accuracy with extreme and regular teams. I’ll continue using x here just so that it is applicable to any of these choices.
Since x = RPG^z, we have this equation for RPW:
RPW = (2*RPG)/RPG^z
And this can be rewritten as:
RPW = 2*RPG^(1 - z)
So this is somewhere around 2*RPG^.72. So at 9.18 RPG, the 2005 average value, Palmer gives 10*sqrt(9.18/9) = 10.10 and Pythpat gives 2*9.18^.72 = 9.87. In case you are curious how these work with some real teams, with 1984-2003 teams, Palmer’s formula gives a RMSE of 3.938 and the one presented here gives 3.895. So you do not have to sacrifice accuracy with the run-of-the-mill teams.
The known point discovered by Smyth, that at RPG = 1, x must equal 1, also by definition states that RPW must equal 2 when RPG = 1. If you have a team that scores 100 runs and allows 62 runs, they will go 100-62. Their RD is 38, and 38/2 = 19. 19 is your estimate of wins above .500, and .500 is 81 wins, so 81+19 = 100. So the RPW must be two when the RPG is one. The Pythpat-based formula of course returns this result. The Palmer RPW gives 3.33.
As a final note, one thing I cannot quite figure out from Fox’s article is whether he is using Pythpat and Palmer to find an overall value for the league, and then using that value for each team, or whether he is using the specific value for each team. The second approach would again be more precise, but the first is an alright assumption for simplicity’s sake.
Monday, January 16, 2006
It is hard to muster a lot of enthusiasm for looking at run estimators other then Linear Weights or Base Runs, unless somebody manages to improve the score rate estimator for BsR someday, or somebody comes up with an alternative model of the scoring process that works at extremes as well as everyday teams.
The method I am looking at here is not one of them[LW or BsR]. It is a faulty run estimator, with those flaws often being obvious. However, if we set James’ Runs Created as the minimum standard a run estimator must meet in order to be taken seriously, this one does that. I believe that it may in fact be superior to Runs Created, but of course inferior to LW and BsR. It probably also falls below Eric Van’s Contextual Runs as well (Contextual Runs is sort of a poor man’s BsR. It takes out homers, but the score rate estimator does not hold up nearly as well at theoretical extremes as BsR’s does. This is probably partially because it is based on an advancement to out ratio instead of advancement as a percentage of advancement plus outs as is done by David Smyth).
Anyway, here I will take another look at what I call Appraised Runs (all of the good names are already taken after all), which I developed a few years ago as an adaptation of Mike Gimbel’s Run Production Average. RPA started with a set of linear weights (which Gimbel never fully explained where they came from, except that they represent “run-driving” value) which clearly did not measure the same thing as traditional LW as they severely underrated on base events and overrated homers. He then adjusted these by a “set up” rating which adjusted for the value of the event in setting up further scoring opportunities for the team. Homers actually reduced the set up rating because they take runners off base.
The set up rating was then divided by the league average, and that was used to scale 50% of the run-driving value. That plus 50% of the run-driving total was the estimated number of runs scored. This step still does not make sense to me because the team’s runs scored should not depend on the league rate, nor should the number of runners they have on base after an event.
Gimbel also included a number of categories that aren’t widely available, such as balks and wild pitches (for offenses), and so it was difficult to test his work and apply it with limited data. So I came up with my own method that started with his basic values and then applied a set up rating that was not dependent on the league average.
My starting point was Gimbel’s run-driving values, scaled to equal total runs scored for the dataset I developed the formula on:
RD = .289S + .408D + .697T + 1.433HR + .164W
Then the set up figure, which I abbreviated UP, was found through regression and some trial and error, trying to get realistic intrinsic linear weight values:
UP = (5.7S + 8.6(D+T) + 1.44HR + 5W)/(AB+W) - .821
Then AR = UP*RD*.5 + RD*.5
Immediately you can see some of the flaws in this equation. A team that gets a very small number of runners on base will get a negative UP and negative runs. A team that hits 500 HRs will have a RD of 716.5, and UP of .619, and an AR of 580, when we know they will score 500 runs. It may be better at the extremes then RC, but it is not correct at the extremes and is not nearly as good as BsR. Please note, again, that I am not claiming this method should remain in use in sabermetrics. I am claiming that it is probably as good as or better then Runs Created. If you read Gimbel’s book in 1993 and did not yet know about BsR, it may well have been the best dynamic run estimator at that time.
I also came up with a version that included SB and CS:
RD = .288S + .407D + .694T + 1.428HR + .164W + .099SB - .164CS
UP = (5.7S + 8.6(D+T) + 1.44HR + 5W + 1.5SB - 3CS)/(AB+W) - .818
AR = UP*RD*.5 + RD*.5
Anyway, if you do an accuracy test, using the SB version, on all teams in the 1980s excepting 1981(the same sample I used in the post about Mann’s RPA), the RMSE of AR is 23.09, compared with 23.64 for ERP, 22.85 for BsR, and 25.15 for RC. So it is just as accurate as any of the other methods with normal teams. If you use just the RD portion, you get a RMSE of 31.06. Obviously it is flawed, not only from under-weighting the value of on base events but also not considering outs at all. This construction necessitates the use of the UP factor or some other adjustment.
There is more than a similarity of names between the Gimbel and Mann RPAs. Both start with coefficients that do not capture the importance of avoiding outs and do not properly credit getting on base, then apply adjustments in order to make the formula useful. Gimbel’s adjustments seem to be much more thought out and reasonable then Mann’s. I might be very wrong, but I get the impression that Gimbel developed his system knowing that he would need to apply adjustments and actually wanting that to be the way his estimator worked, whereas it seems that Mann kept adding stuff in order to give his estimator some semblance of accuracy.
Of course, I like to differentiate everything so that we can see the intrinsic linear weights. In order to do this, we need to differentiate UP and RD. So the derivative of UP with respect to an event will be dUP, and for RD we’ll call it dRD(which will simply be the coefficient of the event in the RD equation). We can write UP as U/P, where U is numerator and P is the denominator. Since we are dealing with a constant subtraction of .818, we can ignore this, since the derivative of a constant is zero. We will call the coefficient of any event in the UP numerator u and the coefficient of any event in the UP denominator p. Then dUP = (P*u - U*p)/P^2. Knowing that value, the formula for the AR intrinsic LW is simple:
dAR = .5*(UP*dRD + RD*dUP) + .5*dRD
For example, using the non-SB version of AR, the 1979 Red Sox had an RD of 809.529, an UP of 1.17269, with U = 11663.06 and P = 5850. So the dUP for a walk was:
dUP = (5850*5 - 11663.06*1)/5850^2 = 5.139*10^(-4). We can plug into the dAR formula now:
dAR(wrt W) = .5*(1.17629*.164 + 809.529*5.139*10^(-4)) + .5*.164 = .3862
So the intrinsic LW value of a walk for the 1979 BoSox, according to AR, is .3862 runs. We can likewise calculate the intrinsic LWs for the other offensive events.
I have calculated the LW for each of the three methods based on the 1980s sample, and gotten these results(displayed as S, D, T, HR, W, SB, CS, O):
RC: .58, .89, 1.21, 1.52, .35, .16, -.39, -.122
AR: .51, .80, 1.09, 1.41, .35, .08, -.44, -.105
BsR: .47, .77, 1.08, 1.45, .33, .23, -.32, -.094
The AR weights are a better match for the BsR weights then are the RC weights. Although it seems that I have not properly factored in the value of the stolen base in the AR SB version. Maybe they and CS should be given more weight in the UP factor.
If you apply the linear versions in order to do an accuracy test, the RMSE for RC is 43.65 (this is not really a reflection on RC, it is that I used the technical version without using all of the categories, so you should probably take the intrinsic LW for RC with a grain of salt too), 31.27 for AR, and 23.02 for BsR (interestingly, the BsR linear weight error is only .17 higher then the error for the multiplicative BsR equation).
I tried the derivative formulas on an extreme player, one with 500 AB, 180 H, 40 D, 60 HR, 150 W, and no SB or CS. That is a batting line of 360/508/800, which of course is ridiculously awesome. The point was to see how the intrinsic LW from AR would match up with those from RC or BsR. I used the basic versions of AR and BsR but the technical version of RC, just so that RC would include walks in the B factor. This may cause other problems, since it would be unrealistic to have no DP, SF, IW, etc. for a player like this, but I think the walks not being included at all in advancement problem is bigger. The estimated runs created for the player by each method are 192.4 for RC, 202.3 for AR, and 191.2 for BsR. Here are the resulting intrinsic weights from the three equations:
RC: .84, 1.35, 1.86, 2.36, .46, -.34
AR: .76, 1.25, 1.64, 1.81, .51, -.29
BsR: .66, 1.01, 1.36, 1.52, .49, -.21
As you can see, the AR weights are a better match for the BsR weights then the RC weights, but the AR and RC estimates are closer to each other then they are to BsR. AR performs better largely because, while it does not treat the HR correctly by counting it as an automatic run as BsR does, it does recognize that homers reduce the number of runners on base for the following batters(this is what the UP factor in it’s original form was supposed to capture--the replacement for it that I have used here is empirical and not theoretical as Gimbel’s was), and therefore does not allow the value of the HR to keep compounding as RC does.
Although I have not tested it, I think that is possible that AR shares with BsR one flaw that occurs at particularly high OBAs that you could call the “triple jump” flaw--the value of a triple will jump the value of a HR at some point. This is obviously incorrect, but the distortion that comes from having the triple valued slightly higher then the home run in BsR is of a much smaller magnitude then the RC flaw of allowing a HR to continue to grow in value, up to a maximum of four runs a pop. It is unfortunate that we have this imperfection in BsR, and hopefully future innovation will allow us to eliminate it.
I am sure that one could improve the accuracy of this formula by fiddling around with the coefficients, particularly for UP. Perhaps one could develop an UP with a completely different structure that would be useable as well, or change the percentage of RD that is scaled by UP, etc. I’m not sure what the point would be though. This type of run estimator is best accepted for what it is: a clever but ultimately not very useful approximation of the scoring process that displays ingenuity on the part of its inventor and may have once been state of the art, but is no longer. As far as I’m concerned, this is exactly the same category that Runs Created falls into. It, however, shows no signs of being consigned to the dustbin any time soon.
Wednesday, January 11, 2006
Be forewarned, this post is all about me and has no baseball content to speak of.
For one thing, the new November, 2005 edition of By the Numbers is out, and it contains an article by yours truly entitled “Finding Implicit Linear Weights in Run Estimators”. It does not contain anything that I have not already posted on FanHome or on my website, it simply exposes some of that stuff to a wider audience. As far as I know I was the first to apply partial derivatives to run estimators (although Kevin Harlow and others now have and Ralph Caola and Ben Vollmayr-Lee apparently did so on win estimators), so I figured I would publish a piece on it before somebody else did. And I’m sure that some people have not heard about it yet. It is also another little bit of exposure for Base Runs, although that usually is a dead end.
Over the past month or so there has been a (relatively) high volume of posts on this blog. If you look at the archives, you will see that this has been an extremely unusual high volume of posts. Do not expect this blog to always have so much new material. The last month’s output was kind of a “perfect storm” in that I had three and a half weeks off of school, lots of ideas of stuff I wanted to write about that had built up over several months, and was sufficiently motivated. Often one or two of these factors will be in place, but it is rare to have a confluence of the three. Also, things like the Win Shares and rate stat series are one idea but are able to be stretched into several posts. Hopefuly I will be able to come up with enough useful content to keep this updated at least every week or so.
I’m not actually sure how many people read this thing, but whoever it is, it’s appreciated. Comments are welcomed and encouraged. I do have comment screening turned on though, so I have to read the comments and approve them before they are posted. I was getting sick of going back and deleting insightful commentary like “This is a GREAT blog. Please visit my richly scented candle website.” It is easier to screen them ahead of time. Any comment that is on-topic will be approved. But there will probably be some lag between you posting a comment and the comment appearing, unless I happen to be checking it that very minute.
Monday, January 09, 2006
Distributing Fielding Win Shares to Individuals
Just as for hitting and pitching, we will assign fielding win shares to individuals by calculating claim points, and giving them the same percentage of the position’s win shares as they have of the position’s claim points. Each position, with the exception of second and short, has a different claim point formula. I will figure the claim points for the Braves’ starter at each position, and their win shares.
At catcher, the formula for claim points is:
cCP = PO + 2*(A - CS) - 8*E + 6*DP - 4*PB - 2*SB + 4*CS + 2*RS
where RS = (TmERA - CERA)*INN/9
RS is runs saved, based on Catcher’s ERA. I do not have Catcher’s ERA for 1993, so I have just set all of the catcher’s RS equal to zero. The Braves’ primary catcher Damon Berryhill recorded 570 putouts, 52 assists, 6 errors, 2 double plays, 6 passed balls, 62 steals, and 28 caught stealing. His claim points are 570 + 2*(52 - 28) - 8*6 + 6*2 - 4*6 -2*62 + 4*28 = 546. Just as in the offensive and defensive portions of the process, we zero out any negative claim points. Braves’ catchers totaled 990 claim points, giving Berryhill 546/990 of the 7.528 win shares for catchers, or 4.15.
At first base, we have:
1bCP = PO + 2*A - 5*E
Sid Bream had 627 PO, 62 A, and 3 E for 627 + 2*62 - 5*3 = 736 of the team’s 1638 total claim points, giving him 736/1638 of the 2.644 shares, or 1.19.
At second base, we introduce the concept of Range Bonus Plays. RBP are credited to any player whose Range Factor ((PO+A)*9/INN) is higher then the team average at the position, and are figured as:
RBP = (RF - PosRF)*Inn/9
We only credit RBP for players whose RF exceeds the positional average; therefore, there are no negative figures. Then we have this formula for CP at 2B and SS:
2bCP = ssCP = PO + 2*A - 5*E + 2*RBP + DP
Atlanta second baseman had a range factor of 5.320. Mark Lemke recorded 329 PO, 442 A, 14 E, and 100 DP in 1299 innings. His range factor therefore was (329+442)*9/1299 = 5.341. This is higher then the team average range factor at second base, so he gets (5.341-5.320)*1299/9 = 3 RBP. Then he has 329 + 2*442 - 5*14 + 2*3 + 100 = 1249 of the 1401 claim points, giving him 9.34 of the 10.476 win shares at second.
Jeff Blauser, at shortstop, had 189 PO, 426 A, 19 E, and 86 DP in 1323 innings, for no range bonus plays. This gives him 189 + 2*426 - 5*19 + 2*0 + 86 = 1032 of the 1204 win shares, giving him 6.65 of the 7.753 win shares at short.
At third base, the formula is:
3bCP = PO + 2*A - 5*E + 2*RBP
Terry Pendleton had 128 PO, 319 A, and 19 E in 1392 innings for 5 RBP. He has 128 + 2*319 - 5*19 + 2*5 = 681 of the 708 third base claim points, for 7.33 of the 7.62 win shares at third base.
For outfielders, we divide the RF by 3, since there are three outfield positions. Center fielders will generally have a higher range factor then the guys in the corners, and James notes that one function of RBP is to give more of the credit in the outfield to the center fielder. Then we apply this formula:
ofCP = PO + 4*A - 5*E + 2*RBP
Otis Nixon was the Braves primary center fielder, recording 308 PO, 4 A, and 3 E in 998 innings for 66.4 RBP. His claim points are 308 + 4*4 - 5*3 + 2*66.4 = 442 claim points out of 1190, for 6.71 of the 18.068 outfield win shares.
To find Fielding Win Shares, we simply add win shares credited at each position for a given player. One Brave, Bill Pecota, wound up with win shares at three positions(second, third, and outfield), although his total is just .38. The fielder with the most FWS for Atlanta was Mark Lemke with 9.34, all at second base.
My take: I don’t really have any opinion here; the system seems sound as far as I can tell, but of course the important stuff was done when we credited some of the defense to fielding and then the fielding to each position.
Putting it All Together
Win Shares for a player are just the sum of their batting, pitching, and fielding win shares. Then a rounding process is used. The Win Shares are rounded to whole numbers which must sum to 3 times the team win total. You could also display Win Shares unrounded, but James says that the difference between, say, 30 and 31 WS is very small to begin with and to display decimal places implies more accuracy then is actually there. He would prefer to keep the property that the team total sums to 3 times team wins.
So Bill’s rounding process is to round all numbers down to integers, and sum them. Then he orders the players by the remainders, and gives one win share to the player with the highest remainder until the player’s win shares sum up to the proper team number.
For example, suppose there was a team with 5 players that earned 25 win shares:
A had 10.005
B had 3.764
C had 0.963
D had 5.468
E had 4.800
Rounding down, A has 10, B has 3, C has 0, D has 5, and E has 4. That gives 22 WS, three short. Player C has the highest remainder, so he gets one WS. Then comes player E, who also gets one. Player B is third on the list, and also gets one. Now the team total is 25 and we stop with these final figures:
A has 10
B has 4
C has 1
D has 5
E has 5
My take: In my spreadsheet, I keep the fractional numbers, but it is true that small differences are not significant. However, I don’t really see any reason to further reduce the accuracy by rounding it off. Bill’s position is to display imprecision and acknowledge that it is imprecise. My position is to display precision and acknowledge that it is imprecise. Just a difference of opinion, and not in any way a flaw in the method.
We can now compare the Win Shares that I found for the 1993 Braves to the actual Win Shares. There are certain to be some differences as I did not know exactly what years Bill used to set the park factors, nor did I have clutch hitting data, nor did I use precisely the same RC formula, nor did I have holds for a couple of relievers, nor did I have catcher ERA. Outside of those things, though, I believe I had all of the data needed.
My results did not match perfectly. I had the offense at 130.1, while Bill had 130.2. He had the pitching at 129.4, I had it at 127.8. He had the fielding at 52.4, I had it at 54.1. These differences are not insignificant, so it is possible that there is an error in my spreadsheet, but I have not been able to find it. Anyway, I will list the team in order of Win Shares that I came up with and put James’ in parentheses:
Again, I am not quite sure what has caused the errors with the pitchers. Some of it is the data differences, but the offense/defense split should not have been affected (unless it was by different park factors). For hitters, I used a different RC formula since I did not have the clutch hitting data, so that is probably the largest factor contributing to the errors.
Final Thoughts on Win Shares
There will be no “My Take” section here because that’s what this whole part is. I have already listed my concerns/questions/disagreements/quibbles with the Win Shares method, and I will not rehash all of those here.
Instead, I will just state that personally, I don’t have a lot of use for the end result, Win Shares, but I will concede that there may well be useful stuff inside the process. For example, the idea of evaluating the team’s fielding and then breaking that down to individual credit may prove to be very useful. Bill claims to have had some new insights into fielding stats and I don’t doubt him; it’s just that I don’t keep up as much as I should with fielding evaluation so I’m not the right person to evaluate that and comment about it.
I do however, believe that the hitting and pitching components are not a step forward. They do not allocate absolute wins; well, they do, but they don’t do it correctly. So what are you left with? You are left with a runs above replacement (a very low replacement), with the scale nuked. I don’t find this very useful.
One aspect of Win Shares that most other systems do not incorporate is reducing the player’s rating if the team underperforms their expectations. If you create 100 runs on a team that creates 700, but only scores 650, your RC will be adjusted down. If you play on a team that, based on it’s R and RA, should win 55% of its games but only wins 52%, your runs will be worth less in terms of wins. These things are disliked by some people, but they are perfectly defensible in a value method. There may be some room for disagreement on whether to apply the adjustments proportionally to a player’s production, as James does, or whether to distribute them proportional to playing time. But these adjustments in theory are fine.
But you don’t have to go through the Win Shares system to apply similar adjustments. You can apply similar adjustments to an individual’s RC, or his WAR, or whatever. You could even find the team marginal runs/win (derived from the WS system), and use this as the RPW converter to convert RAR to WAR, or RAA to WAA. But there’s no need in my opinion, when evaluating hitters, to go through the WS process.
For pitchers, ideally it would be nice to have a different baseline depending on their reliance on the fielders, but it should clearly be individually-based, not team-based, so the mechanism in Win Shares won’t get you there. Even if you understood why it is designed the way it was.
So in short, I don’t think Win Shares is necessary for hitters, I don’t think it’s necessary for pitchers, and I think it may provide some insights into fielding evaluation but can’t really tell you. I don’t think it would get nearly as much attention as it does if it was not invented by Bill James, but I also think that Bill James is clearly the biggest name in sabermetrics so I do not begrudge him this.
What I do resent is people who accept WS because it comes from Bill James without questioning it themselves. I’m not thinking of any specific people, I have just seen in various discussions in places sentiment to the effect of “well, Bill spent a lot of time on it, it must make sense.”
There is nothing wrong with using WS in an analysis or talking about how the system could be improved, etc. as long as you recognize the strengths and weaknesses of the method and recognize how they might affect the analysis you are using them for. This of course is true to some extent for all sabermetric measures.
Tuesday, January 03, 2006
Distributing Fielding Win Shares to Positions
Win Shares takes a different approach from other fielding evaluation methods in that it first assigns a value to each position, then splits that up among the men who played that position. This allows James to use data for the team as a whole, rather then try to estimate how many strikeouts there were when a particular player was in the field.
Each position has four criteria which are used to assign Win Shares. A “claim percentage” is derived from the sum of these four scales divided by 100. Each position has different criteria and different weightings assigned to them. I will call the four criteria N, X, Y, and Z in order to keep the quantity of abbreviations to a minimum.
At catcher, the four criteria are Caught Stealing Percentage, Error Percentage, Passed Balls, and Sacrifice Hits. The 50 point scale is CS%(meaning that this criteria will compose 50% of the rating). CS% = CS/(SB + CS) for the team as a whole. The Braves allowed 121 SB and 53 CS, for a CS% of 53/(53 + 121) = .305. N = 25 + (CS% - LgCS%)*150 The NL average was .3149, so the Braves’ N is 25 + (.305-.3149)*150 = 23.52. I should point out now that all of the scales at each position have a minimum value of 0 and a maximum value of the number of points the criteria is assigned (50 in this case).
Error Percentage for a catcher is E% = 1 - (cPO + cA - TmK)/(cPO + cA - TmK + cE). This removes the putout credited for a strikeout from the catchers’ total. The Braves catchers had 1056
The Passed Ball criteria incorporates something we will use in many fielding formulas, the Team League Putout Percentage(TLPO%). TLPO% = Tm(
Y = (LgPB*TLPO% - TmPB)/5 + 5. The Braves had 13 PB versus 199 for the league, so their Y was (199*.07-13)/5 + 5 = 5.19 on a 10 point scale.
The final criteria is based on team sacrifice hits allowed and is Z = 10 - TmSH/(LgSH*TLPO%)*5.
The claim percentage for
At first base, the criteria are Plays Made, Error %, “Arm Rating”, and errors by shortstop and third baseman. To find Plays Made, first a very complicated estimate of estimated unassisted putouts by first baseman is made:
Est1BUnPO = (1bPO - .7*pA - .86*2bA - .78*(3bA + ssA) + .115*(RoF + SH) - .0575*BIP)*2/3 + (BIP*.1 - 1bA)*1/3
BIP = IP*3 - K, and is an estimate of Balls in Play. The Braves’ first baseman had 1423
We also need to find what is called the LHP+/-, the number of balls in play against left-handed pitcher’s above what you would expect from the league average. The formula is:
LHP+/- = TmBIP(lefties) - (LgBIP(lefties)/LgBIP*TmBIP)
Then N = ((Est1BUnPO + 1bA + .0285*LHP+/-) - Lg(Est1BUnPO + 1bA)*TLPO%)/5 + 20. The league first baseman had 3114 estimated unassisted putouts and 1670 assists, so the Atlanta N = (177.9 + 130 + .0285*116.4 - (3114+1670)*.07)/5 + 20 = 15.27 on a 40 point scale.
The E% at all positions other then catcher is figured as E/(
The Arm rating is figured as Arm = 1bA + .5*ssDP - pPO - .5*2bDP + .015*LHP+/-. Braves 2B and SS had 108 and 97 DP, while the pitchers had 119
Z = 10 - 5*(3bE + ssE)/(Lg(3bE + ssE)*TLPO%). NL third baseman made 357 errors and the shortstops made 389. Braves third baseman and shortstops each made 19, so the Z is 10 - 5*(19 + 19)/((357+389)*.07) = 6.36 on a 10 point scale. The claim% at first base is (15.27+19.76+5.99+6.36)/100 = .474
At second base, the criteria are team DP, Assists, E%, and Putouts. N = 20 + (TmDP - ExpDP)/3. We already found the Braves’ ExpDP of 133.1, and they actually turned 146, giving a N of 24.3 on a 40 point scale.
The Assists rating is found as:
X = ((2bA - 2bDP) - (Lg(2bA - 2bDP)*TLPO% - 1/35*LHP+/-))/6 + 15. The Braves 2B had 364
Y = 24 - 14*2bE%/Lg2bE%.
To find the putout criteria, we first find expected 2B
Exp2bPO = Tm(
3329*4863/47496 + 1/13*(480 - .3502*1455) + 1/32*116.4 = 342.2.
Z = 5 + (2bPO - Exp2bPO)/12, giving 5 + (364-342.2)/12 = 6.82 on a 10 point scale. The claim% at 2B is (24.3+17.64+11.97+6.82)/100 = .607
At third base the criteria are Assists, Errors Above Average, Sacrifice Hits, and Double Plays. We first find Exp3bA = TmA*Lg3bA/LgA + 1/31*LHP+/-. Braves 3B had 131
Exp3bE = (3bPO + 3bA)/LgFA@3B - (3bPO + 3bA). The league FA at third base is .945(figured as (
The Sacrifice Hit criteria uses what I will call Sacrifice Hit Rating, or SH/(G + L) = SH/(W + 2*L). The Atlanta SHR is 77/(104 + 2*58) = .35 against a league average of .326. Y = 10 - SHR/LgSHR*5, or 10 - .35/.326*5 = 4.63 points on a 10 point scale.
Expected DP at third base are found very simply as ExpDP*Lg3bDP/LgDP, or 133.1*374/2028 = 24.55. Z = (3bDP - Exp3bDP)/2 + 5 or (32-24.55)/2 + 5 = 8.73 on a 10 point scale. The Claim% at 3B is (26.93+18.95+4.63+8.73)/100 = .592
For shortstops, the criteria are Assists, Double Plays, E%, and Putouts. First we find ExpssA = TmA*LgssA/LgA + 1/100*LHP+/-.
X = 15 + (TmDP – ExpDP)/4 = 15 + (146-133.1)/4 = 18.23 on a 30 point scale.
Y = 20 - 10*ssE%/LgssE% = 20 - 10*.0267/.0355 = 12.48 on a 20 point scale.
For outfielders, the criteria are Putouts, the team’s Defensive Efficiency Record, “Arm Elements”, and E%. Outfield putouts are first expressed as a percentage of team putouts less strikeouts and assists (assists generally come on groundballs). I will call this Putout Rating, POR = ofPO/(TmPO - TmA - TmK). Braves outfielders recorded 1055
The second criteria is very easy to calculate, using CL-1 from way back in the process when we were dividing defense between the pitchers and the fielders. X = CL-1*.29 - 9, which for
The third criteria, “Arm Elements”, compares the team sum of outfield assists and DP less SF to the league total of the same, discounted at the TLPO%:
Y = ((ofA + ofDP - TmSF) - Lg(ofA + ofDP - SF)*TLPO%)/5 + 10. Since the Braves allowed 39 SF and the league 701, their Y is ((19+5-39)-(480+87-701)*.07)/5 + 10 = 8.88 on a 20 point scale.
Finally, Z = 10 - 5*E%/LgE%, which is 10 - 5*.0192/.0208 = 5.38 on a 10 point scale. The OF Claim% is therefore (21+23.36+8.88+5.38)/100 = .586.
We are now ready to distribute the fielding win shares to each position. Each position has an “intrinsic weight”, which we will abbreviate IW. These weight the claim percentages at each position by the importance of that position. The IWs are: C = 38, 1B = 12, 2B = 32, 3B = 24, SS = 36, and OF = 58. We take, for each position (Claim% - .200)*IW, and sum these:
C = (.447-.200)*38 = 9.39
1B = (.474-.200)*12 = 3.29
2B = (.607-.200)*32 = 13.02
3B = (.592-.200)*24 = 9.41
SS = (.467-.200)*36 = 9.61
OF = (.586-.200)*58 = 22.39
These sum up to 67.11. So catcher’s get 9.39/67.11 = 14% of the team’s 54 FWS, or 7.56. Doing this for all positions(and not rounding the numbers):
C = 7.523, 1B = 2.644, 2B = 10.476, 3B = 7.616, SS = 7.753, OF = 18.068
My take: As I said earlier, fielding analysis is not something I am really qualified to pontificate about. I will leave it to the Tangos and the Mike Emeighs and the MGLs, etc. to debate the merits of the method. I will instead focus on its similarity to Defensive Winning Percentage.
DW% was used by James in his early Abstracts to evaluate fielding and then combined with OW% to give a total player rating. DW% was not used after the 1984 book. My first reaction when I saw the Defensive Win Shares formula was “it’s a revised DW%”.
Just like in WS, each position had four criteria, rated on scales that added up to 100. The criteria have changed over the years, sometimes based on better data being available (for example, James used to use opposition SB/G to rate catchers, whereas now we know SB and CS and can find the percentage) or based on new research and ideas of how to evaluate fielding (James used A/G for 1B, but now estimates unassisted putouts as well). But many of the criteria are the same or similar.
Another feature of the system was that each position had an intrinsic weight. These were 10 at C, 3 @ 1B, 8 @ 2B, 6 @ 3B, 11 @ SS, 4 @ LF, 6 @ CF, and 5 @ RF. These sum up to 53, which is the value in games given to fielding (both wins and losses) for a 162-game season. Dividing 53 by 162 gives .327, i.e. the system considers fielding to make up 32.7% of defense. Win Shares puts fielding, for an average team, at 32.5%. If you consider the outfield as a unit and scale these to 200 (the total of the intrinsic weights in Win Shares), you have:
C = 37.7(38); 1B = 11.3(12), 2B = 30.2(32), 3B = 22.6(24), SS = 41.5(36), OF = 56.6(58)
The numbers in parentheses are the WS intrinsic weights. As you can see, both systems say that fielding is approximately 32.5% of defense and weight the positions equally (shortstop is the only position with a significantly different weight).
I must reach the same conclusion as my first glance: Fielding Win Shares is an updated Defensive Winning Percentage. This is not necessarily a bad thing; perhaps the original system was very good to begin with, and it has been improved by better data, better estimates, and presumably more research. And of course a huge difference is the fact that DW% looks at each fielder individually while WS starts by crediting the team, and distributes value to the players from there. But the similarities between the two systems, separated by twenty years, are still striking, at least to me.
Again, just as in the stage where responsibility was split between pitching and fielding, there is an explanation of how, but not why. Why is the intrinsic weight at shortstop 36? Why are team sacrifice hits allowed weighted double a third baseman’s double plays over expectation? Etc. These questions are not answered, nor even acknowledged by James. That is not to say that he did not think of them himself, as I’m sure he did--just that we have no way of knowing what the thought process behind the system was, and are left to puzzle over it ourselves.
Along the same line, there are differences in how the ratings are formed at each position. Most positions are given a rating for errors based on their error percentage. But at third base it is based on errors above average. These sorts of things seem like inconsistencies within the system, but if there is a good reason for them, we have not been told what it is.
Aside from the fielding nature of the method itself, the subtraction of .200 from each claim percentage hammers home that the system is giving out absolute wins on the basis of marginal runs. 50% of the league average in runs scored, with a Pythagorean exponent of 2, corresponds to a W% of .200. It is for this reason that in old FanHome discussions myself and others said that WS had an intrinsic baseline of .200 (James changed the offensive margin line to 52%, which corresponds to about .213).
In an essay in the book, James discusses this, and says that the margin level(i.e. 52%) “is not a replacement level; it’s assumed to be a zero-win level”. This is fine on it’s face; you can assume 105% to be a zero-win level if you want. But the simple fact is that a team that scored runs at 52% of the league average with average defense will win around 20% of their games. Just because we assume this to not be the case does not mean that it is so.
Win Shares would not work for a team with a .200 W%, because the team itself would come out with negative marginal runs. If it doesn’t work at .200, how well does it work at .300, where there are real teams? That’s a rhetorical question; I don’t know. I do know that there will be a little bit of distortion every where.
In discussing the .200 subtraction, James says “Intuitively, we would assume that one player who creates 50 runs while making 400 outs does not have one-half the offensive value of a player who creates 100 runs while making 400 outs.” This is either true or not true, depending on what you mean by “value”. The first player has one-half the run value of the second player; 50/100 = 1/2, a mathematical fact. The first player will not have one-half the value of the second player if they are compared to some other standard. From zero, i.e. zero RC, one is valued at 50 and one is valued at 100.
By using team absolute wins as the unit to be split up, James implies that zero is the value line in win shares. Anyone who creates a run has done something to help the team win. It may be very small, but he has contributed more wins then zero. Wins above zero are useless in a rating system; you need wins and losses to evaluate something. If I told you one pitcher won 20 and the other won 18, what can you do? I guess you assume the guy who won 20 was more valuable. But what if he was 20-9, and the other guy was 18-5?
You can’t rate players on wins alone. You must have losses, or games. The problem with Win Shares is that they are neither wins nor wins above some baseline. They are wins above some very small baseline, re-scaled against team wins. If you want to evaluate WS against some baseline, you will have to jump through all sorts of hoops because you first must determine what a performance at that baseline will imply in win shares. Sabermetricians commonly use a .350 OW%, about 73% of the average runs/out, as the replacement level for a batter. A 73% batter though will not get 73% as many win shares as an average player. He will get less then that, because only 21%(73% - 52%) of his runs went to win shares, while for an average player it was 48%. So maybe he will get .21/.48 = 44%. I’m not sure, because I don’t jump through hoops.
Bill could use his system, and get Loss Shares, and have the whole thing balance out all right in the end. But to do it, you would have to accept negative loss shares for some players, just as you would have to accept negative win shares for some players. Since there are few players who get negative wins, and they rarely have much playing time, you can ignore them and get away with it for the most part. But in the James system, you could not just wipe out all of the negative loss shares. Any hitter who performed at greater then 152% of the league average would wind up with them, and there are (relatively) a lot hitters who create seven runs a game.
James writes in the book that with Win Shares, he has recognized that Pete Palmer was right after all in saying that using linear methods to evaluate players would result in only “limited distortions”. And it’s true that a linear method involves distortions, because when you add a player to a team, he changes the linear weights of the team. This is why Theoretical Team approaches are sometimes used. But the difference between the Palmer system and the James system is that Palmer takes one member of the team, isolates him, and evaluates him. James takes the entire team.
So while individual players vary far more in their performance then teams, they are still just a part of the team. Barry Bonds changes the linear weight values of his team, no doubt; but the difference might only be five or ten runs. Significant? Yes. Crippling to the system? Probably not. But when you take a team, particularly an unusually good or bad team, and use a linear method on the entire team, you have much bigger distortions.
Take the 1962 Mets. They scored 617 and allowed 948, in a league where the average was 726. Win Shares’ W% estimator tells me they should be (617-948+726)/(2*726) = .272. Pythagorus tells us they should be .304. That’s a difference of 5 wins. WS proceeds as if this team will win 5 less games then it probably will. Bonds’ LW estimate may be off by 1 win, but that is for him only. It does not distort the rest of the players (they cause their own smaller distortions themselves, but the error does not compound). Win Shares takes the linear distortion and thrusts it onto the whole team.
Finally, the defensive margin of 152% corresponds to a W% of about .300, compared to .213 for the offense. The only possible cutoffs which would produce equal percentages are .618/1.618 (the Fibonacci number). That is not to say that they are right, because Bill is trying to make margins that work out in a linear system, but we like to think of 2 runs and 5 allowed as being equal to the complement of 5 runs and 2 allowed. In Win Shares, this is not the case. And it could be another reason why pitchers seem to rate too low with respect to batters (and our expectations).
Finally, one little nit-picky thing; why do expected putouts by second baseman and shortstops go up as walks go up? Obviously, more walks means more runners on first who may be putout at second on fielder’s choices, or steal attempts, or double plays, but so do singles and hit batters. Am I missing something really obvious here?