Friday, December 30, 2005

Hitting by Position, 2005

This will be a fairly boring article examining offensive production by fielding position in 2005. The data, as it did for the leadoff hitter piece, came from the Baseball Direct Scoreboard, which apparently gets its’ data from STATS. Anyway, I included just the basic hitting stats(AB, H, D, T, HR, W) so this examination will not include SB/CS and other useful information.

The first basic thing is to take a look at the averages at each position for all hitters. These are BA/OBA/SLG and then RG:
C : 253/310/390/4.11
1B: 276/350/471/5.69
2B: 274/330/413/4.67
3B: 270/334/435/5.01
SS: 270/321/394/4.32
LF: 276/339/448/5.24
CF: 272/330/423/4.79
RF: 270/335/454/5.23
DH: 259/334/438/5.05
P : 148/178/190/.42
TOT: 266/327/421/4.72

The Total line is not the total for all ML players, it is the total for those positions. I’m not exactly sure how STATS determines which PAs count as hitting for a particular position, but they have other categories such as pinch hitters.

I don’t see anything remarkable in those numbers; they match the defensive spectrum fairly well. The spectrum can be written as:
DH - 1B - LF - RF - 3B - CF - 2B - SS - C - P
The RGs put in order are:
1B - LF - RF - DH - 3B - CF - 2B - SS - C - P
The only difference is with the DH position. It is clear that the DH position is the least demanding defensively, but there are various possible explanations for why they don’t come out that way in terms of production. Other than that, the one year sample is a perfect match for our expectation.

One question that is raised by the DH’s performance is how do DH stats vary between the AL and NL? NL teams only use the DH for a very small number of games; do they get similar production to AL teams? And we can ask the same question about AL pitchers.

Here’s how it turned out in 05:
AL DH: 259/335/441/5.10
NL DH: 250/321/385/4.24
AL P : 115/139/167/-.12
NL P : 150/180/191/.46
As you can see, AL DHs easily outperformed their NL counterparts, and NL pitchers did the same to the AL hurlers. Whether this is because NL hitters take batting practice, or because AL teams seek out a DH and NL teams just use some guy off the bench, or because of some other potential explanation, I don’t know. But there was a big difference.

We can also use this data to come up with positional adjustments based on the ‘05 season. To do this I will divide the positional RG by the total RG. I have always lumped DH and 1B in together, and I have also combined LF and RF. It is my belief that the difference between the players who fill out those positions is largely based on defense. What I mean is that at other positions, you choose a second baseman and a shortstop and a third baseman, etc. At LF and RF, you choose two corner outfielders, and the one with the better arm may wind up in RF, or the one with more range in LF, or what have you. But the two positions are very much interchangeable. At least that’s how I see it. With that considered, here are the total positional averages for 1B/DH and LF/RF:
1B/DH: 270/345/460/5.48
LF/RF: 273/337/451/5.23
That gives us these positional adjustments (the positional adjustments that I have traditionally used, based on 1992-2001 data, are in parentheses):
C = .87(.89)
1B/DH = 1.16(1.19)
2B = .99(.93)
3B = 1.06(1.01)
SS = .91(.86)
LF/RF = 1.11(1.12)
CF = 1.01(1.02)

Looking at the differences between the 2005 data and the 1992-2001, the major difference is the gain of production at the skill positions in the infield at the expense of the corners. Of course, this is just a one-year sample and so it would be inappropriate to draw any conclusions about changes in the game from it.

One fun application of these PADJs that I have not done in the past is to see how teams’ offensive production was distributed by position. Some teams might have the bulk of their offensive strength coming from the traditional hitting positions, while others may take advantage of good hitting at traditional weak spots like the middle infield.

I’m not sure that the way I’ve chosen to do this is the best, but it is what it is. I have simply found the correlation between the position adjustment (for the league as a whole) and the position RGs for each team. Positive correlations will indicate that the team got higher production from the positions you would expect would give higher production (left side of the spectrum). For the AL, all positions were used except pitcher, while pitcher and DH were ignored for the NL. So here are the team correlations broken down by league:
American League
All players…..+.33
The “all players” figure is the correlation between PADJ and RG for all 9*14 positions in the AL.
National League
All players….+.52

As you can see, most teams had fairly strong correlations between PADJ and the Positional RG. Only two teams had negative correlations; the Indians and the Orioles. I expected to see Cleveland on this list, because of the pathetic performance from the corners which I have touched on a number of times here (particularly Blake, Boone, and Broussard). We’ll take a look at the team with the best correlation, the Diamondbacks. What I have done is list the league position adjustment on the first line and below that the position adjustment based on the team stats. This was calculated by taking the average of RG from each position, and then dividing the RG for a given position by this. The reasoning behind this is that if the PADJ for catcher is .87, then a team’s catcher should have an RG 87% of his team’s RG. You can see that if the PADJ and Team Adjusted RG numbers match up well, the correlation will be high (I have dropped the decimal points):

As you can see, every position at which you expect to have below average offense had a below average output, and vice versa, except for center field. The Diamondbacks got most of their offense out of the left side of the spectrum. But how about the opposite side of the coin in Baltimore, which had the lowest correlation?
TmAdjRG….88…….115……130…….107…….....130….. 84……..75………....103……....69
The Orioles two best hitters, by far, were Roberts and Tejada at two of the weakest offensive positions. They also had pathetic production in the outfield and at DH.

Finally, since I write about them a lot anyway, here are the Indians’ figures.
The Indians had the lowest absolute correlation of any team between PADJ and RG at a position. As you can see, catcher, short, and DH were well above expectation (a mix of extreme left and extreme right), while first, second, third, and right were well below expectation.

Thursday, December 29, 2005

Win Shares Walkthrough, pt. 5

Distributing Pitching Win Shares to Individuals

The main component of the individual pitcher’s “Claim Points” formula is very similar to that of the only criteria for batters: Marginal Runs. For pitchers, we also consider W, L, and SV, something called “Save Equivalent Innings”, and the sub-marginal batting performances of pitchers.

We start by calculating the zero-level. This is simply the 152% of the league rate of runs/ 9 innings:
ZL = LgRA*PF(R)*1.52
The 1993 NL RA was 4.521, the Braves PF(R) is .998, and so the ZL for the Braves is 4.521*.998*1.52 = 6.858. But this is the zero-level for the defense as a whole; it includes fielders and pitchers. So we find the PZL, or Pitcher ZL, as:
PZL = ZL - (ZL - RA)*Field%
The Braves’ RA was 3.458, and the Field%(Field% is not the team Fielding Percentage, although I can see why the abbreviation I chose might make some think that. It is the percentage of defensive win shares that has been assigned to the team’s fielders) was .297, so the PZL is 6.858 - (6.858 - 3.458)*.297 = 5.848. So Braves pitchers will get credit for their marginal runs saved compared to a 5.848 RA pitcher.

In WS, unearned runs are counted as one half for the purpose of calculating RA. So ER + .5(R - ER) simplifies to (R + ER)/2, which allows us to write this formula:
PCL-1 = IP/9*PZL - (R + ER)/2
We’ll run through these steps for a starter and a reliever. Steve Avery pitched 223 innings and allowed 81 R and 73 ER, so his PCL-1 is 223/9*5.848 - (81+73)/2 = 67.90. Mike Stanton pitched 52 innings allowing 35 R and 27 ER, for a PCL-1 of 52/9*5.848 - (35+27)/2 = 2.78.

The second criteria used for pitchers is a combination of W, L, and SV:
PCL-2 = (3*W - L + SV)/3
Avery was 18-6 with no saves, for (3*18-6)/3 = 16, while Stanton was 4-6 with 27 saves for (3*4-6+27)/3 = 11.

The third criteria is for “Save Equivalent Innings”. This is designed to give extra credit to relief pitchers for the high leverage value of their innings. SEI = 3*SV + HLD, with the caveat that this figure cannot be greater then 90% of the pitcher’s actual IP. Avery had no saves or holds, and therefore has 0 SEI and will get 0 PCL-3. Stanton had 27 saves and 5 holds, which is 3*27 + 5 = 86 SEI. This is greater then 90% times his 52 innings, by a longshot, so he only gets credit for .9*52 = 46.8 SEI.

Then claim points are given by the marginal runs saved, over your SEI, based on RAC(discussed above):
PCL-3 = (PZL - RAC)*SEI/9
For Stanton this is (5.848 - 4.64)*46.8/9 = 6.28
(This is one case where my input numbers will differ from reality, because I do not have hold data for Steve Bedrosian or Jay Howell in 1993, but I do for the other Braves relievers.)

The final criteria for pitchers is their hitting performance, if it was sub-marginal. This is PCL-4. Marginal Runs hitting was already calculated when we distributed OWS to individuals. But we zeroed out negative MR; here we count them against pitchers. Mike Stanton did not bat in 1993 and therefore has 0. Steve Avery created 3.83 runs while making 72 outs. A marginal player making 72 outs would have created 8.32 runs, so Avery is -4.49 runs. This is PCL-4.

We then sum PCL-1 through PCL-4 for all players, zeroing out negative numbers. Stanton has 2.78+11+6.28+0 = 20.06 claim points, while Avery has 67.90+16+0+-4.49 = 79.41 claim points. The team total is 527, and the team has 128 PWS to give out. So Stanton is credited with 20.06/527*128 = 4.87 PWS, while Avery has 79.41/527*128 = 19.29 PWS.

My take: The zeroing out issue that I railed against in the offensive section is prevalent here as well, so I will not harp on it again.

One part that I am not sure about is the use of a different Zero Level for pitchers then for the defense as a whole. We already assigned a given number of Win Shares to the pitchers based on our assessment of the percentage of defensive value attributable to the pitching staff. Then we use this Pitch/Field breakdown again to find the ZL. This may be double-counting, but I am not sure about it to be honest. The effect of this decision is to have a lower ZL, so this helps better pitchers claim more Win Shares. Great pitchers already seem to be shortchanged, so if what I thinking is true, they would do even worse. Perhaps I am wrong, or just missing something really obvious here.

Some people will question the use of actual decisions in the evaluation, because these of course are dependent on many factors beyond the pitcher’s control. However, in a value method, there is potentially some “hidden” information in there. The weight given to the decisions is much lower then the weight on marginal runs. There are pros and cons, and I am ambivalent on the issue.

The third step, giving extra credit to relievers, is justifiable in a value method because they do pitch in situations in which runs are more valuable, if you use a real-time approach to value. So while the specifics of the step seem to be a guess, it is alright. Using RAC instead of actual RA is a little confusing in a value method, but James cites the misleading nature of reliever’s ERA due to inherited and bequeathed runners. If we could use actual inherited and bequeathed runners data, this would be preferable, but we don’t have that historically, so I can swallow RAC as a stand-in.

The fourth step of subtracting credit for sub-marginal offense is baffling. First of all, lumping it in with pitching means that pitching win shares incorporates offensive performance for bad hitters and does not for good pitchers. That means PWS is not a true isolation of pitching. Even more befuddling, a full-time hitter like Alfredo Griffin in 1981 does not have his value reduced by his horrific hitting, but Steve Avery does. It would be much easier to just allow Avery to have a negative OWS but then have his PWS be truly reflective of his pitching instead of including his offense. Again, the forced zeroing out of negative marginal runs comes back to create problems in Win Shares.

Wednesday, December 28, 2005

Elarton, Johnson, and Millwood

I'm sure that many Indians fans are very upset about the fact that Kevin Millwood is not a Ranger, but I am not one of them. Sure, I would love to have Millwood back. But as I have written here before, he did not pitch as well this year as his ERA would indicate, and he is a big injury risk(the only reason the Indians had him to begin with was the injury risk), and sixty million over five years is simply ridiculous. Even if you assume he'll stay healthy, I'm not sure a normal aging progression would justify that kind of contract.

Then the Indians let Scott Elarton walk and have chosen to replace him with Jason Johnson. This is another move that I have no problem with. First, the Indians are hoping that a young pitcher like Jason Davis, Fausto Carmona, or Jeremy Sowers will step up and take over the fifth stater spot by the All-Star break. Secondly, Elarton got a $8 million, 2 year deal from Kansas City, who think that they will be respectable after adding tons of veteran mediocrity. Johnson signed with the Indians for a one year guaranteed contract with an option for next year. So if a youngster steps up and takes his spot, he can be jettisoned, and if he pitches well, he can stay.

If you look at Elarton and Johnson in 05, they were very similar pitchers. Elarton pitched 181 innings with +18 RAR, -6 RAA, a 5.09 eRA and a 4.37 G-F. Johnson pitched 210 innings with +18 RAR, -10 RAA, a 4.78 eRA, and a 4.46 G-F. Elarton is younger and has a better G-F, but Johnson had a lower's a pretty close call, both for last year and for an expectation for next year. In 2004, Johnson had a 5.32 eRA in 196 innings. I think it's pretty clear that you should expect around 200 innings with an eRA twenty five points around 5 as a baseline for him. For the contract, I think he is clearly a superior choice to Elarton.

Don't get me wrong, I think Cleveland's starting pitching will be worse this year then it was a year ago, unless Sabathia or Lee can take a step forward or Westbrook returns to his 2004 form(very unlikely). But I would feel the same way if Millwood was still here. As a Tribe fan, I am much more concerned about the fact that it looks like we will go into the season with the three killer Bs (Broussard, Boone, and Blake) still firmly in corner positions.

Rate Stat Series, pt. 4

Perhaps the title for this series is a misnomer, because we will begin to leave the exclusive realm of rate stats to talk about value stats that are generated from them. “Offensive Evaluation” series will probably suit it better, and I think I’ll use that from now on. If you are constructing an ability stat, it is probably going to be a rate stat. True, there is an ability to stay in the lineup that you may want to account for, but outside of this, ability is without question a rate. The rate ideally will be expressed with some sort of margin of error or confidence interval, because it is just an estimate of the player’s true ability. But the fundamental thing you need is a rate.

A value stat, looking-backwards, needs to have playing time as a component. In this segment, we will look at a method of estimating value based on a rate stat (R/O) which we have already discussed, and see whether it stands up to the same kind of logical tests we have used previously.

The first approach is the one developed by Bill James and used in his work for many years (although it has now been discarded). Bill calculated Runs Created, but knew of course that this was just a quantity and in order to express value, needed to consider the quality of the performance and the “baseball time” (read outs, or PAs, or innings, etc.) that it was accumulated for. Bill, as have most analysts over the years, analyzed the issues we have for the last couple installments and chose R/O as his rate stat. However, this scale was unfamiliar to the average fan. He instead chose to express it in a number of runs/game for a team, since runs/game has a scale which more fans and even sabermetricians are familiar with. So:
R/G = (O/G)*(R/O)

Where O/G is some pre-defined value. Bill used 25 O/G(and later 25.5) when considering just batting outs(AB - H), 25.5(and later 26) when considering (AB - H + CS), and 27 when considering all outs in the official statistics (AB - H + CS + SH + SF). You could use the league average, or the team average, or what have you--25.2 is actually a more precise estimate when using batting outs, 25.5 when including CS, and 27 when including all outs. I will sometimes just abbreviate R/G as RG.

Bill could have expressed this in terms of runs/team season by multiplying by some number of games (probably 162), or in terms of runs/inning, etc. He chose runs/game.

Then he decided to express the R/G figure in terms of a Winning Percentage through the Pythagorean theorem. He defined a player’s Offensive Winning Percentage(OW%) as the winning percentage that a team would have if each player on the team hit as the player hit, and the team allowed a league average number of runs.
OW% = (R/G)^x/((R/G)^x + Lg(R/G)^x)
Where x is the exponent used, generally 2. I will often call the league average of R/G simply as “N”.

OW% shares the property with R/G of familiarity to an average fan--you know that .550 is a contender, .600 is probably one of the best teams in baseball, and that .700 is legendary. Of course, individual performance varies much more than team performance, so you cannot carry those interpretations of the values for teams over to individuals, but they can still serve as guideposts.

Another commonly used method of converting a R/O or R/G figure into another scale is to convert it to a number that looks like a batting average, since all fans are familiar with the BA scale. This is what Clay Davenport does in Equivalent Average(EQA) and it is what Eddie Epstein does with Real Offensive Value(ROV):
EQA = ((R/O)/5)^.2
ROV = .0345 + .1054*sqrt(R/G)

Let’s look at an example of two players who play in a league where the average team scores 4.5 runs/game:
PLAYER . R/G...... OW% ..........EQA........... ROV
A .............. 8.00 ...........760 ................333 ................333
B................ 7.00 ...........708 ................316 ................313

We can see that the ratio in terms of runs between the player is 8/7 = 1.143. The ratio in OW% is 1.073 and the ratio in EQA is 1.054. So these other measures are decreasing the ratio between players. Now this is not necessarily a bad thing, because our ultimate goal is to move from run-based evaluations to win-based. But in fact, the ratios are incorrect.

Based on Pythagorean, Run Ratio = R/RA, or in our case R/G/(LgR/G). Win Ratio equal Run Ratio^x, and W% = Win Ratio/(Win Ratio + 1). So the Win Ratio for Player A is 3.16 (versus a Run Ratio of 1.78) and Player B is 2.42 (versus a Run Ratio of 1.56). So if we have one player with a RR of RR1 and another with a ratio of RR2, the win ratio between the two players are (RR1/RR2)^x, so Player A has a win ratio 31% higher then Player B (versus the 14% higher run ratio). So the WR grows exponentially versus the RR.

So basically, OW%, EQA, and other similar methods reduce the ratios between players and therefore distort the scale. The number you look at may be on a more familiar scale, but the scale distortion may cause confusion. While Davenport and Epstein and others are free to state the end results of their methods however they’d like, I think that the best course of action is to use the R/O or R/G or whatever scale and learn the standards. We all agree that BA is not a useful measure for a player’s total value, so why continue to use that scale? If R/O is the proper measure, let’s learn the scale.

James’s system went on to express a player’s contribution in terms of a number of offensive wins and losses. Since we already have an offensive winning percentage, all we need to find offensive wins and losses is offensive games. By definition in the OW% formula, Games = Outs/25 (or whatever value is appropriate given the type of outs that are being considered). So:
Offensive Wins = OW%*Offensive Games
Offensive Losses = (1 - OW%)*Offensive Games = Offensive Games - Offensive Wins

So now we can apparently express a player’s offensive contribution in terms of wins and losses. Great. But we still have a problem. How do we compare two players with different amounts of playing time? Consider two players:
NAME ...... RC..... O ..... RG ........ OW%......... OW-OL
A ................. 100 .....400 ... 6.25 ......... .659..............10.54-5.46
B ..................88........ 300.....7.33 ...........726 ............. 8.71-3.29
I have assumed that N = 4.5 for both players. Player A has more OW, but he also has more OL and an OW% that is almost seventy points lower. How do we pick between these two, value-wise? Clearly, Player B’s rate of performance is superior. But Player A’s total offensive wins is higher by almost two. It’s not an obvious choice.

Well, you could say, Player A has 5.08 more wins then losses, while Player B has 5.42, so Player B is better. Alrighty then. See what you just did? You put in a baseline. Your baseline was “over .500, times two”. So you are comparing a player to an average player.

Many sabermetricians think that a better comparison is to compare each player to a “replacement” player. This debate is beyond the scope of this article, but let’s just say the replacement player would have an OW% of .350. What if we compare to .350? Well, Player A has 16 offensive games, so a replacement player would be 5.6-10.4 in those games, so our guy is +4.94 wins over him(This is figured by (OW%-.350)*OG/25)) Player B has 12 offensive games, so he is +4.51 wins.

So if we compare to average Player B is ahead, but if we compare to replacement Player A is ahead. And this is very common; different baselines change rankings. This is true no matter what rate stat you start out with. So what’s the point, as it applies to OW%? The point is that having a number that is supposed to be “wins” and “losses” as the OW% system does, as opposed to just having a number of wins above some baseline, as other systems do, is not a panacea. Even if you have absolute wins and losses, you are going to have to use some sort of baseline to sort out who’s better then who. And the other systems can be adapted to other baselines (we’ll talk more about this in the next segment), so the absolute wins and losses aren’t really an advantage of this system.

Moving on from that, let’s use the OW% approach to compare two real player seasons:
NAME ............ RC....... O....... RG ....... OW%....... OW-OL
Mantle.................162....... 301.......13.46........910 ............10.96-1.08
Williams............. 161........ 257...... 15.66 .......932 .............9.58-.70
These are probably two of the three or four best offensive seasons in the major leagues in the 1950s, both turned in in the 1957 AL (N = 4.23) by Mickey Mantle and Ted Williams. We see that Mantle and Williams created almost identical numbers of runs, but that Mantle made 44 more outs. This gives Williams a comfortable edge in RG(about 16% higher), and a smaller but still significant lead in OW% due to the scale distortion(about 2%) higher. So Williams has created just about as many runs as Mantle, and used a lot less outs. So clearly it seems, Williams should rate ahead of Mantle.

However, Mantle has 1.38 more offensive wins. On the other hand, he has .38 more offensive losses. If we compare them to a .500 player, Mantle is +4.94 wins and Williams is +4.44 wins (figured as (OW%-.5)*OG/25). How can this be? How can we rate Mantle ahead of Williams, by half a win, when there is essentially no difference between the number of runs they created but a large difference in the number of outs they made?

In truth, it can’t be. It’s clearly wrong, and is caused by a flaw in the OW% way of thinking. OW% decreases the value of each additional run created. If you have a player with a 4.5 RG in a N = 4.5 league, and you add .5 runs/game and give him a 5 RG, his OW% increases by 52 points, from .500 to .552. If you add another .5 to take him up to 5.5, his OW% increases by 47 points, from .552 to .599. So the additional .5 runs had less win value, according to OW%.

Like many things, there is a grain of truth to this. It is true that for a team, going from 4.5 runs to 5 will cause a greater increase in W% then going from 5 to 5.5. But there is a big difference between a team adding .5 runs per game or one-ninth of a team, an individual player, adding .5 runs per game and batting one-ninth of the time. Each additional run created by a player will in theory have less value then the previous one, but treating the player as a team blows this way out of proportion.

The more fundamental reason why OW% gives this clearly incorrect result is how it defines games. You get credit for more games as you make more outs. Williams’ OW% is .022 higher then Mantle’s, but that is not to offset the fact that we are now crediting Mantle with 1.76 more games then Williams. What would have happened if Mantle would have made 350 outs? Well, his RG would have gone down to 11.57, and his OW% to .882, but his OW-OL would have been 12.35-1.65 for +5.35 WAA. In fact, we could have to increase Mantle’s outs to 465(!!) before his WAA would reach its potential peak, +5.75. That would be a player with an OW% of .809, almost exactly one hundred points lower then Mantle actually was. And all he’s done to have his value increase is make 164 more outs! Clearly, this approach does not and cannot work.

Again, a player is not a team. If we know that one team has made 100 outs, we know that they have played about 4 games. But this is because a team gets 25 outs/game (yes, I know, it’s 27, but we’re using 25 as described above when using just batting outs). A player does not get 25 outs per game. He gets to keep making outs until his team has recorded 25 outs. Then he’s done, whether he has made zero outs (if he has a 1.000 OBA) or all 25 outs (if his eight teammates have a 1.000 OBA). It is unrealistic and silly to state a player’s games in terms of outs.

Of course, even a team does not get one game per 25 outs. The number of outs do not define the number of games a team plays. The number of games define the number of outs they get. Baseball teams play 162 games a year because somebody in the commissioner’s office said they should play 162 games a year, not because somebody in the commissioner’s office said they should make 4,374 outs a year.

So why not use plate appearances, or at bats, or some combination of outs and those? Because this whole exercise is folly. No matter what we do with the number of games, we have defined OW% in terms of games defined by outs. The bottom line is that players don’t play games themselves. We certainly want to move beyond a player’s run contribution and express his win contribution. But to do this, we need to consider how he would affect the wins of his team, not try to create some situation in which he is a team. A team of Mantles will win 91% of their games. Great. We have one Mantle, on a team with eight other guys who aren’t Mantles. We don’t care how many games nine Mantles will win.

This whole folly began when we expressed the player’s performance in terms of runs/out. From there, it was easy to ask, how many runs would he score per game? From there, it was easy to ask, what would his W% be? From there, it was easy to ask, how many games would he win or lose? From there, it was easy to ask, how many games would he win compared to some baseline? And we ended up with absurd conclusions, that any sabermetric hater would see and laugh at, and rightfully so. Runs per out is a team measure. It is alright to apply to players, maybe not theoretically correct, but it will not cause too much distortion. But if you start jumping from just using it as a rate to doing all sorts of other stuff with it, then you will get distortion.

On that note, let me clarify something from earlier. I said that a class 1 run estimator, which measures multiplicative effects, like RC, should be done in terms of R/O. This could be seen as contradicting what I just said. The point is that in fact, a player’s RC is not an accurate representation of his runs contributed to his team, but the people who use it treat it as such. Nobody actually sets out to apply a full class 1 approach to a player, because they recognize that a player is not a team. What they do is apply a class 1 run estimator without realizing that it is incompatible with a full-blown class 2 or 3 evaluation approach.

Tuesday, December 27, 2005

A Review of "Baseball Superstats 1989"

Baseball Superstats 1989 was a book written by Steve Mann outlining his approach to sabermetric analysis and using his methods to make predictions for the 1989 season and display complete career statistics for players. The “superstats” referred to in the title are his own Runs Above Average figures based essentially on OPS for hitters and ERA for pitchers. The book, read sixteen years later, does not really provide any new sabermetric insight. Mann did note back then that a “plays made” approach to fielding would result in a very cloudy picture when it came to team fielding evaluation, because there are a certain number of outs that have to be made by somebody. This is now regarded as a truism in sabermetrics, as range factor methods have died out for new advanced approaches based on play-by-play, regression analysis, and assorted other approaches. Mann essentially predicts this in the book.

The pitcher ratings are structurally sound; they are essentially equivalent to Pitching Runs, except that Mann has normalized everything to an ideal context of 4.00 ERA, similar to how Clay Davenport would later construct an ideal league for his evaluations. The major quibble I have is with the use of one year park factors (for hitters too); one line in the book that made me chuckle is “It’s often not until July or August or even September that the run production rate at a ballpark will settle into its true range, causing early evaluations to bounce around wildly.” As if you can tell the true range of a ballpark from 81 games played there and on the road. Now if one wants to use a one year PF based on some consideration, be it about the weather or what have you, they can go ahead, but don’t try to pass it off as if you know the true range of the park.

The offensive ratings are, well, not so super, but I’m not sure Mr. Mann (at least his 1989 incarnation) would take kindly to hearing this. The book is not at all short on, ahem, naked displays of ego. For example, “[if you] would like to get clean, accurate, reliable answers to your most burning baseball questions, then you’ve come to the right oasis. The superstats are like a cool refreshing dip in a clear blue pool of common sense.” Of course they are. He later writes “There is a smattering of unofficial stats that have been foisted on the public by the baseball media in recent years that are generally even more seriously flawed…we won’t even go into the spate of statistical inventions that has flowed from the fertile minds of Earnshaw Cook, Bob Kingsley, Bill James, Tom Boswell, and other researchers and writers during the last quarter century.”

Now I suppose those two clauses don’t have to be connected, but it seems fairly clear to me at least that they are. Now some of the stuff Cook and James came up with is flawed, but all of it was ground-breaking. And of course Mann leaves his atrocious offensive centerpiece out of this mix.

That offensive centerpiece is the Run Productivity Average. The RPA was developed by Mann based on tallies of runs scored and RBIs generated by various offensive events in that season’s Phillies games. So one team season is used as the baseline, and runs and RBI are used rather then run expectancy. This causes some serious problems. Mann’s method is to see what percentage of singles wind up scoring, and add the average number of RBI per single. That is the single coefficient (the SB and CS coefficients were figured through a different approach which he does not exactly explain). Then he does this for all of the events, sums them, and multiplies by .52(R+RBI are generally equal to 52% of runs). This is the base estimate, and is:
(.51S + .82D + 1.38T + 2.63HR + .25W + .29HB + .15SB - .28CS)*.52 = .265S + .426D + .718T + 1.368HR + .13W + .151HB + .078SB - .146CS

As you can see, these weights are very low compared to the RE based weights from Linear Weights, with the exception of the homer, which is already right around where it should be. Incredibly, Mann touts his run/RBI based approach as easier to understand then the RE approach of Pete Palmer, who Mann refers to reverently throughout the book.

Perhaps it is easier on first blush to understand the weights based on R and RBI, but understanding RE opens the door to a whole world of knowledge in sabermetrics: win expectancy, strategy analysis, linear weights, how context affects event values, etc. Any serious analyst or would-be sabermetrician should make the effort to learn how RE works. Of course, the LW approach also saves complexity later in the process, when Mann has to add bells and whistles to feign accuracy for the RPA.

The first is that the estimate is low by about 52 runs/team, so he adds 52 as the “garbage constant”. Then he also adds a “corrector” for OBA, since by not accounting for outs, he has not in any way allowed his method to include the effect of not making outs, and the negative impact of runs scored. The OBA corrector is 3000*(OBA - .330). The reasoning is that each point of OBA away from the long-term average of .330 adds 3 runs. After this, the final formula as Mann writes it is:
RPA = (.51S + .82D + 1.38T + 2.63HR + .25W + .29HB + .15SB - .28CS)*.52 + 52 + 3000*(OBA - .330)

How does this formula stack up in an accuracy test? I tried all teams 1980-1989 with the exception of the strike-shortened 1981. The SB version of James’ RC came in at a RMSE of 25.15. The SB version of ERP comes in at 23.64, and the SB version of BsR at 22.85. Mann’s RPA, with both of the correctors, comes in at 27.82. It is not really in the same ballpark.

However, Mann publishes a table based on Pete Palmer’s accuracy tests that show RPA at 22.5 behind Batting Runs and OTS Plus, ahead of RC, DLSI, TA, OPS, DX, etc. I am not quite sure how that worked out, but perhaps the regression equations that Palmer uses helped Mann because if he applied them yearly, the “garbage constant” would have been customized by year rather then a constant 52. I am not sure if this is what happened, but that’s my best guess.

Anyway, looking at the RPA formula, we do not have any way of knowing what the intrinsic linear weights used by the formula are. However, we can very closely approximate this. Each event has a linear weight, but these are complicated by the garbage constant and the OBA corrector. Let’s look at each separately. The garbage constant is 52 for every team. How can we apportion this across a team’s offensive events? Well, the one event that is pretty much a constant for every team is outs. So let’s take 52/(AB-H). This is our first value on the out, and it is positive.

The OBA corrector for a particular team gives them an additional 3000*(OBA-.330) runs. We can calculate this value for any team. For example, the 1980 Orioles had a .3441 OBA(figured as (H+W+HB)/(AB+W+HB)), so their corrector was 42.3 runs. We add these to the 52 above, and now have an out constant of (52+42.3)/(AB-H). For the O’s, this comes to .0232 runs per out, positive. A general equation for a team is (52+3000*(OBA-.330))/(AB-H).

There is another factor we have to consider with respect to the OBA corrector, though, which is that each on base event adds additional value for each on base event and reduces the run estimate for each out value. We can differentiate OBA to approximate this. We’ll write OBA as N/P, where N = H+W+HB and P = AB+W+HB. Then the derivative of OBA is dOBA = (P*n-N*p)/P^2, where n is the derivative of N with respect to any event(one for on base events, zero for other events) and p is the derivative of P with respect to any event(one for any batting event). We can simplify those formulas to pdOBA, the derivative for an on base event, and ndOBA, the derivative for a batting out:
pdOBA = ((P-N)/P^2)*3000 ndOBA = (-N/P^2)*3000
I have multiplied by 3000 because the OBA corrector multiplies by 3000. For the Orioles, this means that each on base event raises their RPA by .318 runs and each out reduces their RPA by .167 runs. -.167+.0232 = -.1438, our final out coefficient. We then add .167 to the coefficients for each batting event. So we have:
RPA = (.13 + pdOBA)*W + (.151 + pdOBA)*HB + (.265 + pdOBA)*S + (.426 + pdOBA)*D + (.718 + pdOBA)*T + (1.368 + pdOBA)*HR + .078*SB - .146*CS + (ndOBA+52/(AB-H))*(AB-H)

If we use this formula to estimate runs, it has a RMSE of 27.85, just slightly worse then the official RPA formula. The estimates for teams generally agree with in 2 or 3 runs(this formula has a RMSE of .252 in predicting RPA).

We can also approximate the adjustments for a hypothetical team. Mann assumes that each team will make 25 outs a game for 4050 for a season with a .330 OBA. This means they will have 4050/(1-.330) = 6044.78 PA, which means they have N = 6044.78-4050 = 1994.78. This gives them a pdOBA of .3325 and a ndOBA of -.1638. Applying this to the above formula, we have this 100% linear equation:
RPA = .598S + .759D + 1.051T + 1.701HR + .463W + .484HB + .078SB - .146CS - .151(AB-H)

This formula has a RMSE of 27.71, slightly better then the official version, but still not nearly as accurate as the other run estimators. You can see why, when you compare to linear weights coefficients. RPA overvalues walks, and overvalues home runs with respect to other hits. That’s the basic reason why the RPA does not work very well.

Anyway, the full version of RPA is to divide by PA. There are also some corrections to apply it to individual hitters, splitting the 52 and the OBA corrector so that players are not treated as teams. But I’m not going to go into that, because this method is not accurate enough to waste my time on it further. And of course, putting it over PA will not give a proper rate stat, as we saw in part 3 of my rate stat series. The OBA corrector corrects for the fact that outs are not considered at all and that the value of reaching base is underweighted, not for the extra PA generation effect of PA on a player level, just like ERP, RC, and BsR, which are better (but not precisely) put into a rate by dividing by outs.

Mann then goes on to lay out the superstats for batters, which are said to closely approximate the true values. And he is right that these methods are not that bad. They are basically based on OPS. OBS, which is On Base plus Steals, is used. OBS = (H+W+HB+.5SB-CS)/(AB+W+HB). I’m not exactly sure why steals are included in the OBA, but they are. Then (OBS/LgOBS + SLG/LgSLG - 1) is essentially set equal to the percentage that a player exceeds the league R/PA by.

He also offers a quick superstats rate, which replaces the above equation with (2*OPS/LgOPS-1). These equations are perfectly acceptable ways to convert OPS to runs, but OPS has its own problems as have been well-established elsewhere.

So basically, as far as I am concerned, Mann’s theory behind run production is flawed and overly complex, and his specific execution is nothing special in modern sabermetrics, or even 1989 sabermetrics.

It probably sounds as if I’m hammering the book. That’s not really my intention; I’m just trying to explain his methodology. It is not a bad book; Mann is obviously a smart fellow, and it is a book you might wish to have in your sabermetric library. But it does not live up to its title, and if you don’t read it, you won’t be missing out on any great sabermetric truth you cannot read anywhere else.

Monday, December 26, 2005

Ranking the Leadoff Hitters

The recent signing of Johnny Damon by the Yankees raised the question of “who is the best leadoff man in baseball?”, or at least “how did the ML leadoff hitters perform last year?" On one hand, this is silly, because in general, the guys who would be the best leadoff hitters are the guys who are the best hitters period. Albert Pujols would create more runs out of the leadoff spot then the best hitter who actually bats leadoff. And this is true for every lineup spot. But this is implicitly recognized by a lot of people, sabermetricians at least, when you ask the question. The question then becomes “of the players who actually hit leadoff, who is the best” or “whose talents are best suited to hitting leadoff”. There is also the issue that leadoff hitters only are guaranteed to leadoff an inning once a game. They presumably will have less runners on base ahead of them then other hitters because when they truly leadoff, there is nobody on, and when they bat after others they follow the weaker hitters at the bottom of the lineup.

I will go through a number of different methods and show the top and bottom three teams in the league, as well as how the Yankees and Red Sox ranked last year. I have the complete leadoff stats for each team, from STATS Inc., which include all players who hit in the leadoff men. In parentheses I have the primary leadoff hitter for the team. Some of these primary guys played almost 162 games out of the leadoff spot, while some might have led the team with 60 games. In one case two players were so close in terms of playing time that I have them designated as co-primary leadoff hitters(Jason Ellison and Randy Winn in SF--Winn obviously did the bulk of the leading off after he was acquired).

Anyway, the basic job of a leadoff hitter is said to be to get on base or score runs. So we’ll start with Runs scored per 25.5 outs (AB-H+CS):
1. BOS(Damon), 6.75
2. NYA(Jeter), 6.55
3. PHI(Rollins), 6.04
MLB Avg, 5.24
28. MIN(Stewart), 4.18
29. LA(Izturis), 4.08
30. CHN(Hairston), 3.98
The MLB average in this case is the average for leadoff hitters. This average for all hitters will be pretty much equal to league runs/game. In some of the other categories below, I will provide the overall MLB figures to go along with the leadoff average.

Or since these figures are of course dependent on the hitters coming up behind the leadoff me, we can look at getting on base, with On Base Average:
1. NYA(Jeter), .372
2. BAL(Roberts), .370
3. BOS(Damon), .364
MLB Avg, .337
28. COL(Barmes), .293
29. NYN(Reyes), .292
30. CHN(Hairston), .291
Interestingly, MLB leadoff men’s OBA of .337 is just slightly better then the overall OBA of .327. Sadly, the Rockies’ leadoff men put up a .293 OBA despite the fact that their park inflates rate stats by about six or seven percent, which would be .276 park-adjusted. Ouch.

OBA includes the times the runner gets on base, but it does not subtract the outs that they make once they are there. If you are leading off, and you get thrown out on the bases, you have done nothing to help your team, because there was nobody to advance. So let’s use what I have called Not Out Average, which in this case is (H+W-CS)/(AB+W):
1. NYA(Jeter), .365
2. BOS(Damon), .363
3. BAL(Roberts), .354
MLB Average, .323
28. COL(Barmes), .288
29. CHN(Hairston), .274
30. NYN(Reyes), .263
This list of course is very similar to the others because all we have done is take out caught stealing. The MLB Average for all hitters was .320. There is a ten point difference between leadoff hitters and the average in OBA, but just three here, because leadoff men tend to get caught stealing more then other hitters since they attempt more steals.

We could also look at this in terms of Runners On Base Average. ROBA is the A factor from BsR, per PA. This subtracts HR as well as CS. We could offer this ranking of leadoff men on the grounds that their job is to set the table, and the home run clears the table. The implication is not necessarily that the HR is a bad thing, just that it is not something that lends itself to being a leadoff hitter. I do not personally support this line of thinking, but we can still look at a list:
1. BOS(Damon), .348
2. NYA(Jeter), .340
3. STL(Eckstein), .334
MLB Average, .305
28. TEX(Dellucci), .266
29. NYN(Reyes), .263
30. CHN(Hairston), .256
Texas leadoff men rate poorly here because they clubbed 37 home runs, between Dellucci(23), Matthews(8), Soriano(4), and DeRosa(2). They are well below average in OBA as well(.319, 25th), but they hit seventeen more longballs then the next highest team(Cleveland),

Another thing we can look at is Bill James’ Run Element Ratio, which is (W + SB)/EB. This is a ratio of things that “set up” innings over things that finish off innings(drive in the runs). It’s not a measure of who is the best leadoff man, it just gives an indication as to which players have strengths that are suited to batting earlier in the inning. If two hitters are equally productive, then the one with the highest RER might well be the better choice to leadoff. But a player can have a very high RER while being a terrible player on the basis of a complete lack of power:
1. CHA(Podsednik), 3.24
2. FLA(Pierre), 1.82
3. LAA(Figgins), 1.59
12. NYA(Jeter), .979
MLB Average, .977
22. BOS(Damon), .837
28. CLE(Sizemore), .678
29. KC(DeJesus), .598
30. TEX(Dellucci), .576
Here we see the first significant difference between Damon and Jeter, that being that Jeter’s talents are better suited to setting up an inning, at least according to RER. Again the Rangers’ power out of this spot puts them near the bottom. Grady Sizemore is often mentioned as a guy who will mature out of the leadoff spot and this provides some evidence for that, although the Indians’ leadoff men hit ten triples which increases their EB total, but is not really a power indicator anymore then doubles is. The RER for all players was .690 so you can see that this stat seems to incorporate some of the thinking that goes into choosing a leadoff man.

Another Bill James tool which does speak directly to the question of best leadoff man, and is used by Bill for that purpose, is what I will call LE for Leadoff Efficiency. It is the number of runs per 25.5 outs that a leadoff man is expected to score. Apparently, this formula was introduced in the 1979 Abstract and has not changed since. The premise is that a leadoff man will score 35% of the time he is on first, 55% of the time from second, 80% of the time from third, and that he always score on a home run(if only Bill would see that this last part is true in team run estimation as well). Times on first is singles plus walks minus stolen base attempts. Times on second is doubles plus steals, and times on third and home are triples and homers respectively. This method gives these rankings:
1. BAL(Roberts), 6.48
2. NYA(Jeter), 6.35
3. BOS(Damon), 6.28
MLB Average, 5.44
28. HOU(Taveras), 4.51
29. COL(Barmes), 4.34
30. CHN(Hairston), 4.31
You can see that this formula expected leadoff hitters to score 5.44 runs per game, but in fact they scored 5.24. Perhaps the formula had a bad year, or perhaps it is a little too rosy. One would expect that with the increased overall offense in MLB since this formula was developed, that it would estimate too low. But that was not the case, this year at least. Willy Taveras shows up here, and is the perfect example of a leadoff hitter who does nothing to help his teams score runs. Taveras actually did not score many runs, ranking fourth to the bottom in R/G. Having speed can certainly be a plus from the leadoff spot. But getting on base is the key, because you can’t use your speed on the bench. Baltimore’s leadoff hitters, ranked on top here, would actually gain runs on their estimation because they only stole bases at a 70% clip. Their raw run estimation would decrease, but saving the twelve extra outs take their efficiency down. Interestingly, the average for all players was 5.41, meaning that an average hitter would have scored runs with practically the same efficiency leading off as the real leadoff men.

Reminding ourselves that the main job of a leadoff hitter, just like for any hitter, is to create runs and avoid outs, we can look at good old RG:
1. BAL(Roberts), 6.13
2. BOS(Damon), 5.83
3. NYA(Jeter), 5.82
MLB Average, 4.77
28. HOU(Taveras), 3.66
29. COL(Barmes), 3.39
30. CHN(Hairston), 3.38
Again, Colorado has a pathetic showing despite a 15% boost from the park. The Cubs are also frequent residents at the bottom of these lists, which gives you some idea why they set out to get a leadoff hitter. Unfortunately for them, Juan Pierre is 26th at 3.96, and is also just 24th in LE/G at 4.81. The leadoff average of 4.77 is higher then the league R/G, so at least teams put an above average player in the leadoff spot.

Finally, we can look at three specialized measures derived from linear weights-type thinking. The first is to ask the question, “What would the batter’s RC be if he leadoff an in inning in every plate appearance?” I will further add the assumption that if he attempts a steal, it is of second base and it comes while the second batter is at the plate. Ideally I would use a RE table based on current data, but for ease I will use the one published by Palmer and Thorn in The Hidden Game. The RE for the inning, before anything happens, is .454. This goes up to .783 with a runner at first and no outs, so times on first add .329 runs. With a runner at second, it is 1.068, so times on second add .614 runs. With a runner at third, it is 1.277, so times on third add .823 runs. Homers of course add one run. If he makes an out, it drops to .249, so outs are worth -.205 runs. This will be expressed in terms of RAA, which I will just keep as a total. So what I’ll call Pure Leadoff RAA can be written as:
PLRAA = .329(S + W - SB - CS) + .614(D + SB) + .823(T) + HR - .205(AB - H +CS)
1. BAL(Roberts), +25
2. ATL(Furcal), +22
3. BOS(Damon), +20.1
4. NYA(Jeter), +19.8
MLB Average, +5
28. HOU(Taveras), -15
29. CHN(Hairston), -18.7
30. COL(Barmes), -19.4
This can be considered an abstract rating for a leadoff hitter because while they face this situation more then other batting order positions, it is only guaranteed to happen once a game.

In the end, it is clear that of the guys who actually hit leadoff last year, Johnny Damon was one of the best. As was Derek Jeter. So the Yankees didn’t need Damon because they had a deficiency at leadoff. What they did have was a deficiency, particularly defensive, in center field. Paying Damon thirteen million a year certainly seems excessive. He is a good player, but not one of the top players in the game. For the past three years his offensive RAA compared to a center fielder have been 0, +20, and +16. That put him second among AL centerfielders to Sizemore and 23rd among all AL batters. However, given that the Yankees apparently still have money to spend, and that the biggest need they had(offensively at least; I would like to have better starting pitchers if I was Brian Cashman) was a center fielder, the contract does not seem unjustifiable.

Tuesday, December 20, 2005

Win Shares Walkthrough, pt. 4

Splitting Defensive Win Shares between Pitching and Fielding

This process involves seven “Claim Point” formulas, which when combined give an estimate of the percentage of defense attributable to pitchers. Each one of these claim points is either classified as being a “pitching” claim point, i.e. one that is attributable to the skill of the pitchers, or a “fielding” claim point that is attributable to the fielders, with the except of the first. The first formula is based on the team’s Defensive Efficiency Record, which as an estimate of the percentage of balls hit into play against them that are turned into an out. This one, called CL-1, counts for both pitching and fielding. We first find the DER = (BF - H - W - K - HB)/(BF - HR - W - K - HB). The Braves faced 6015 batters, and allowed 1297 hits, 480 walks, 22 hit batters, 101 homers, and 1036 strikeouts. So their DER is (6015 - 1297 - 480 - 1036 - 22)/(6015 - 101 - 480 - 1036 - 22) = .7267. We then find adjDER as 1 - (1 - DER)/PF(S). The Braves’ PF(S) was .995, giving an adjusted DER of 1 - (1 - .7267)/.995 = .7253. The claim points are found by:
CL-1 = 100 + (adjDER - LgDER)*2500
The NL DER was .7114, so the Braves’ CL-1 is 100 + (.7253 - .7114)*2500 = 134.75

The second criteria is a pitching claim point, based on the strikeout rate. First we find the team’s strikeouts per game as K*9/IP = KG. The Braves had 6.408, and we easily convert this to CL-2:
CL-2 = (KG + 2.5)/7*200
For the Braves, (6.408 + 2.5)/7*200 = 254.51

The third criteria is a pitching claim point, based on walks compared to the league average. The formula is:
CL-3 = Lg(W + HB)/IP * TmIP - W - HB + 200
In words, we find the league average of walks and hit batters per innings and subtract the team in question’s W and HB to find out how many above average they were, then add this to 200. The league average was .3782 and the Braves pitched 1455 innings with 502 W+HB, so:
.3782*1455 - 502 + 200 = 248.28

The fourth criteria is another pitching claim point, based on home runs allowed. We find the number of homers less then expected, multiply by 5 and add to 200.
CL-4 = (LgHR/IP*TmIP - HR/PF(HR))*5 + 200
The league average was .0964 HR/IP, and the Braves allowed 101 homers with a 1.019 PF(HR), so:
(.0964*1455 - 101/1.019)*5 + 200 = 405.73

The fifth criteria is the first of two that is for fielding only, and it is based on the rate of errors and passed balls. This is put together like this:
CL-5 = (Lg(E + .5*PB)/INN*TmIP - E - .5*PB) + 100
The league average of errors and half of PB per inning was .0974, while the Braves committed 108 errors and 13 passed balls, resulting in:
(.0974*1455 - 108 - .5*13) + 100 = 127.22

The sixth criteria is the most complex and compares the team’s double plays to the expected number of double plays (a fielding claim). First, calculate the percentage of non-HR hits that are singles in the league as Lg%S = S/(H - HR). Then make an estimate of Runners on First Base(RoF) for the team and the league:
RoF = (H - HR)*Lg%S + W + HB - SH - WP - BK - PB
In the 1993 NL, 77.8% of non-HR hits were singles. The Braves allowed 77 sacrifice hits, 46 wild pitches, and 9 balks, giving:
(1297 - 101)*.778 + 480 + 22 - 77 - 46 - 9 - 13 = 1287.49
The league has a RoF estimate of 19790. Expected DPs is the league DP per RoF, times the team RoF, times the ratio of team assists per inning to league assists per inning (this is used as an estimation of the opposing hitters’ groundball tendencies):
ExpDP = Lg(DP/RoF)*TmRoF*(A/IP)/Lg(A/IP)
The NL turned 2028 double plays, and the Braves recorded 1769 assists versus 24442 for the league(in 20284 innings). So the Braves had 1769/1455 = 1.216 assists/inning versus 24442/20284 = 1.205 for the league. Put it all together:
2028/19790*1287.49*1.216/1.205 = 133.14
CL-6 is just the excess double plays multiplied by 4/3, plus 100:
CL-6 = (DP - ExpDP)*4/3 + 100
For the Braves, who turned 146 DP, (146 - 133.14)*4/3 + 100 = 117.15

The seventh and final criteria is simply 405 times the team’s winning percentage:
CL-7 = 405*W%
For the Braves(104-58, .642), 405*.642 = 260

The percentage of defense attributable to pitching is the sum of the pitching claim points, plus 650, divided by the sum of all claim points (with CL-1 double counted because it is credited to pitchers and fielder) plus 1097.5.
Pitch% = (CL-1 + CL-2 + CL-3 + CL-4 + CL-7 + 650)/(2*CL-1 + CL-2 + CL-3 + CL-4 + CL-5 + CL-6 + CL-7 + 1097.5)
For the Braves:
(134.75 + 254.51 + 248.28 + 405.73 + 260 + 650)/(2*134.75 + 254.51 + 248.28 + 405.73 + 127.22 + 117.15 + 260 + 1097.5) = .7026

Therefore, we will assign 70.26% of the Braves 182 defensive win shares to the pitchers, or 128. The Field% = 1 - Pitch%, of course, and PWS = Pitch%*DWS while FWS = Field%*DWS. The Braves’ fielders get 54 win shares collectively.

There are a couple of constraints placed on these figures, but I it doesn’t appear as if they are relevant today. The first is that the Pitch% must be between 60 and 75%, and the second, which takes precedence over the first, is that a team must have between .16375 and .32375 FWS/game.

My Take: I do not know exactly the logic behind these steps, because Bill does not explain what that is (specifically I mean; obviously, the formulas are given and we know which criteria are credited to pitcher and which to fielder’s, but outside of that, we don’t know much), but I do not like this step. If Win Shares truly represents a step forward in measuring fielding, then the step that determines how much of a team’s defense the fielders deserve credit for is pretty darn important.

First let me explain the general logic behind the scales. Each CL-x formula has an average, which represents the amount of weight it is given. For example, the average CL-1 is 100, meaning it is weighted by 100. Let me just make a list of the averages:
CL-1 = 100
CL-2 = 200
CL-3 = 200
CL-4 = 200
CL-5 = 100
CL-6 = 100
CL-7 = 202.5
Most of these are easy to figure out, because they just take some difference from league average and add it to the number. Obviously, if the team is average, zero plus that number will equal that number. The only exceptions are CL-2, which assumes an average team will have 4.5 KG, and CL-7, which is just 405*.500 = 202.5.

If you plug those averages into the Pitch% formula, you will get .675, meaning that Win Shares assumes that an average defense is 67.5% attributable to pitching, which is in line with the 2/3 approximation that some sabermetricians use.

For all I know, all of these formulas could be very well-founded and work. The only problem is, James does not explain the weightings or how he reached them. In such a crucial stage of the process, there is basically no justification offered other then “it works”. I can accept that it may give a reasonable estimate, but do not expect me to adopt your system unless I have some hard data or reasoning to back it up.

One thing that puzzles me is the willy-nilly mix of counting numbers and rates. CL-1, uses a rate, DER. CL-2 uses a rate, KG. CL-3 uses counting, walks above average. CL-4 uses counting, homers above average. CL-5 uses counting, errors above average. CL-6 uses counting, double plays above average. CL-7 uses a rate, winning percentage. This is completely flummoxing. Why does a pitching staff’s K ability get expressed as a rate, while their control ability gets expressed as a count? If a full season is played, these things should even out but what about strike seasons? What if you try to use Win Shares in the middle of a season? This part won’t work. It takes a full season for the variances from expectation for walks, double plays, etc. get to the same proportion used in the formulas as the variances for the rates. KG is a number between 0 and 27 whether it is April 1 or the last day of the season. But how many errors above average can you possibly be after ten games?

I can’t express just how bizarre I think this is. Win Shares will not work if you use them in the middle of a season, because these formulas will not work. They will be comparing apples and oranges.

There are other questions to be asked to. For one thing, strikeouts are not compared to the league average. This makes absolutely perfect sense--after all, if everybody in the league has a KG of 9, that is four less outs in the field then a league where everybody has a KG of 5, even if no teams deviate from the average. So I understand this step. But why aren’t walks treated the same way? After all, if there is a league like the late 40s AL with a billion walks, won’t everybody in that league allow a lot of runs due to walks, which are not controlled at all by the fielders? If one league has an average of 400 walks/team, and a second league has an average of 500 walks/team, the second league’s pitchers are all allowing a lot of runs without involving the fielders at all. And the same argument goes for home runs. In a “three true outcomes” game, in which every play is a homer, walk, or strikeout, you don’t need fielders. You can do the old Satchel Paige legend and have them sit on the mound.

This would cause problems in the Win Shares system, though, because one can argue against that by saying that we cannot know whether a certain walk rate is good or not unless we evaluate it against the context (read league average). 4 walks/game is a solid performance in the 1949 AL where the average is 4.5, but atrocious in the 1880 NL where the average is 1.1 But is this not true for Ks as well? Walter Johnson was a great strikeout pitcher in his day, but his strikeout rates look like Nate Conejo compared to Nolan Ryan. Part of the problem, then, is that the pitching/fielding split is kept constant over time. If you had a league with a very low strikeout and walk rate, you would want to increase the fielding share, but still credit those pitchers that excelled in strikeouts and walks.

My point is that there are two aspects to the K/W/HR rates. One is obvious: how does it relatively compare to the other pitchers in the league. The second is more subtle: what do the absolute weights say about the importance of pitching in this league. The W and HR claim formulas address the first question, the K formula addresses the second question. Ideally, both questions would be answered. The first question might change the percentage of team defense attributed to pitching; the second would address the percentage that is assumed to be the case for an average team. In the "Three True Outcomes" game, fielding is zero. In a league in which there are a mix of the three true outcomes and balls in play, but all the pitchers allow them at the exact same rate, there is no difference which pitchers you have, and so pitching must be zero.

James says that even with the use of .52/1.52 instead of .5/1.5, pitchers seem to rate too low. And pursuing Win Shares lists, one tends to agree with him. According to Win Shares, as best as I can tell, the last pitcher ranked as the top player in the league was Steve Carlton in 1972. In the most recent year published in the book, 2001, the majors’ top rated pitcher was Randy Johnson with 25.70 Win Shares. The top five position players in each league easily exceeded this. The AL WS leader was Jason Giambi at 35.81, but fifth place Jim Thome came in at 29.44. The NL WS leader was Barry Bonds at 52.22, but Gary Sheffield came in fifth at 28.32.

One reason for the low ranking of pitchers could be a flaw in other sabermetric methods. For example, when we use Run Average or ERA together with a baseline to rank pitchers, we credit all of the marginal impact to the pitcher. But of course we also recognize that the defense has playd a part in preventing those runs. So we may well be overrating the pitcher’s impact. While that is likely a factor, I think there is a much more basic reason for the relatively low rankings of pitchers: the fact that the pitchers’ share is determined at the team level, and not the individual level.

Win Shares holds that excellence in getting strikeouts and preventing walks and homers shows that more credit should be given to the pitchers. But of course the K, W, and HR skills vary wildly among individuals on a staff. When the Diamondbacks 2001 Win Shares were split up, Randy Johnson and Curt Schilling both were excellent in those areas to help bolster the team’s pitching share. But soft-tossers like Mike Morgan and Greg Swindell were on that team too, bringing it down. The pitching share is obviously different depending on who is on the mound. And different from era to era.

But Win Shares, by apportioning the Win Shares to the pitching staff and then to individuals, uses the total team performance, and therefore cannot properly credit Johnson and Schilling. I would suggest that it could do more to properly credit them by emphasizing the three true outcomes in the pitcher claim point process which distributes the team pitching WS to the individual pitchers. But as we will see, the criteria there are R, ER, IP, W, L, SV, and HLD, essentially, nothing that will allow Johnson to gain tons of points for not relying on his defense (outside of the positive effect excellence in the three true outcomes has on ERA, Wins, etc.) What I am suggesting are additional bonuses for doing things that increase the team’s pitchers’ share. I am not suggesting that this approach would be ideal, because I think the best thing to do would be to allocate the percentage differently for each pitcher. However, the “bonuses” might be the easiest way for these factors to be incorporated into the Win Shares framework.

One final little thing is that the pitching/fielding split excludes pitchers from receiving fielding win shares. There is a good case to be made I think for doing this, given that pitcher's runs allowed totals include whatever defensive impact they had. However, I am not sure how this would impact comparisons of the pitching/fielding breakdown to other estimates of the breakdown done by other people who may have included pitcher’s fielding in the total share for fielding (of course, the fielding impact of pitchers is very small compared to other positions, at least as far as I know).

Component ERA in Win Shares
Component ERA(ERC) is used for a very small portion of the Win Shares method of distributing pitching win shares to individuals, but the formula is complex and so I put it in a separate section. ERC is an estimate of what a pitcher’s ERA should be given his component stats(IP, H, W, HR, etc.) It is based on the Runs Created model of runs = A*B/C where A = baserunners, B = advancement, and C = PA.
A = H + W + HB
B = ((H - HR)*1.255 + HR*4)*.89 + (W + HB - IW)*.56
C = BF
You then take A*B/C to estimate runs allowed. Runs allowed are divided by innings and multiplied by 9 to estimate run average. Then, if the RA estimate is >=2.24, subtract .56 to get ERC. If it is less then 2.24, multiply by .75 to get ERC.

However, in the Win Shares methodology, Bill just adds back the earned run part, so we can ignore it. Therefore, I will simplify the formula and call it RAC (for Component RA). RAC = A*B*9/(BF*IP)

I will run through this for Mike Stanton, who on the basis of recording a save or hold will get credit for “save equivalent innings” and therefore will have RAC play into our evaluation of his performance. Stanton pitched 52 innings, allowing 51 hits, 29 walks (7 intentional), zero hit batters, and 4 homers while facing 236 batters. Therfore:
A = 51 + 29 + 0 = 80
B = ((51 - 4)*1.255 + 4*4)*.89 + (29 + 0 - 7)*.56 = 79.06
His RAC is 80*79.06*9/(236*52) = 4.64, significantly better then the 6.06 RA he actually allowed.

My Take: The ERC formula is pretty straightforward assuming use of Runs Created. Of course, I would prefer to see a more accurate run estimator used as the basis, but that’s not happening. 1.255 is an approximation of the average number of TB per non-HR hit. I am not exactly sure why Bill takes estimated TB by .89 and walks by .56, while in the regular RC formula TB are weighted around 1 and walks around .26.

The subtraction of .56 is also a little confusing, since I believe research shows that assuming a multiplier all the way is more accurate (approximately 90% of runs are earned). If anything, one would expect the ERA and RA for very good pitchers to be closer (linearly) then those for bad pitchers, so multiplying by 75% for low RAs would make it worse. Of course, this is of no consequence in Win Shares because the unearned part is added back in.

Monday, December 19, 2005

Win Shares Walkthrough, pt. 3

Runs Created Formulas Used in Win Shares
James’ own RC formula is the basis for dividing team Win Shares to individual hitters. The versions used in the book are the 24 “Historical Data Group” formulas published in the STATS All-Time Major League Handbook. For the modern era, this is only one formula.

Since publishing Win Shares, James has again revised his RC formula. Also, the RC formula used includes situational offensive data which I do not have for the Braves. Because of this, I will use the Tech-1 RC formula as the starting point in my analysis here. This formula is less accurate, but it will save a lot of hassle. In the spreadsheet, I have allowed entry of other coefficients so that you can use whichever RC variation you want.

Tech-1 RC is:
A = H + W + HB - CS - DP
B = TB + .26(W + HB - IW) + .52(SB + SH +SF)
C = AB + W + HB + SH + SF
We will use Jeff Blauser as our example player. He had an A factor of 264, a B factor of 300.82, and a C factor of 710. The “classic” RC construct A*B/C gives him 111.9. However, we use Theoretical Team RC here. For more on this, see the “Runs Created” article on my site. Basically, we add the player to a reference team of 8 players who hit at fairly average rates of A/C and B/C(.3 and .375 respectively). Then we calculate the team’s RC with our player and the team’s RC without our player. The difference is the number of runs our player has created. If you let a be the chosen value of A/C(.3) and b be the chosen value of B/C(.375), we have:
TT RC = (A + 8*a*C)*(B + 8*b*C)/(9*C) - (8*a*b*C)
With a =.3 and b = .375(which is what Bill uses):
TT RC = (A + 2.4C)*(B + 3C)/(9C) - .9C
For Blauser:
TT RC = (264 + 2.4*710)*(300.82 + 3*710)/(9*710) - .9*710 = 109.65

This is only the initial RC estimate; we make two adjustments to it. First, an adjustment for hitting in certain situations. What this does is add one run for each “extra” homer with a man on base and one run for each “extra” hit with runners in scoring position. If AB(OB) is AB with runners on base, H(SP) is hits with men in scoring position, etc. then this adjustment is:
SIT = H(SP) - BA*AB(SP) + HR(OB) - (HR/AB)*AB(OB)
Since I do not have the situational data for the 1993 Braves, we’ll set this at zero for everybody.

The second adjustment is to reconcile the individual values to equal the team total of runs. We just sum up the individual runs created and divide that into the team runs scored. The Braves scored 767 runs, but the players are credited with 757 RC. 767/757 = 1.013 = RF(reconciliation factor). Then each individual’s final RC estimate is:
For Blauser, this is (109.65 + 0)*1.013 = 111

My Take: RC is a flawed method, although using the TT version helps correct for this. The issue of RC’s weaknesses can be read about in my aforementioned article on it. For here, I will focus on the other stuff. The situational adjustments are perfectly appropriate in a value method, my only question is imprecision. Why one run for each extra hit and home run? The effect Bill found may well be close to one run, but I’m sure it’s not exactly one run. The reconciliation is completely appropriate so that the players are credited with the same number of runs that their collective efforts actually led to.

The use of constant A/C and B/C of .3 and .375 is another case of imprecision. If you want to estimate how many runs Jeff Blauser created in 1993, value-wise, the best thing to do would be to use the actual context of the Braves team. In the end, doing this will not change your estimate very much, because the differences between real teams are small enough that most hitters will not vary by much. And then you reconcile all of the individual estimates to the actual team runs scored, so you will not gain much. If we use the Braves actual A/C of .299 and B/C of .404, Blauser’s RC estimate becomes 112.46 instead of our original 111.12. His OWS actually decrease, though from 22.58 to 22.34, because of the changes in the other hitter’s performances. You will see how we calculate OWS in the next section.

Distributing Offensive Win Shares to Individuals
We now distribute the team Offensive Win Shares to individuals on the basis of their personal Marginal Runs. Each player’s Marginal Runs is found by:
MR = RC - LgR/Out*Out*PF(R)*.52
Where Out = AB - H + CS + SH + SF + DP
The NL average is .1662 R/O. Blauser created 111 runs, made 446 outs, and the PF(R) is 1.015. So Blauser had:
MR = 111 - .1662*446*.998*.52 = 72.53
We now total these for all players on the team, zeroing out any negative numbers. The Braves totaled 418 MR, so Blauser will get 72.53/418 of the 130 available win shares = 22.6 offensive win shares, the highest total of any Brave hitter.

My take: It only makes sense that if we apportion win shares to the offense based on their percentage of marginal runs that we do the same for players. The biggest problem in this step is zeroing out negative performers. Of course, negative performers are rare, because .52 of the Lg R/O is equivalent to about a .215 OW%, which nobody does and keeps a job. So the margin is set low enough that among players with significant playing time, only pitchers compile negative numbers, which will cause problems later. But this step as a whole is tough to argue with given the assumptions previously made.

The part about this process that I find most concerning is that the distribution of performance by a player’s teammates can impact his own rating. For example, the 1981 Blue Jays are one of the teams with the lowest percentage of win shares assigned to offense. James tells us that they earned 28.083 OWS. The most productive hitter on their team was John Mayberry, who created about approximately 44.67 runs, which is 23.25 runs above the margin. The Blue Jays hitters totaled 74.74 total MR, so Mayberry gets 31.1% of the OWS, for 8.74. So he (and the other TOR hitters), got 1 WS for every 2.66 MR.

Looking at the team, though, two players with very significant playing time are zeroed out. Alfredo Griffin, in 408 PA, was 3.5668 runs below the margin, and Danny Ainge, in 269 PA, was 5.5554 runs below the margin. Suppose that we take those sub-marginal runs and subtract them from two players, Llloyd Moseby(+8.42 MR) and Otto Velez(+15.65 MR). By doing this, we have done nothing to change the run scoring of the team--their RC is still the same, and the team-based MR are still the same. But now the individual MR totals add up to only 65.62, for 2.367 MR/WS, pushing Mayberry up to 9.95 WS. So John Mayberry gains 1.21 Win Shares by moving around about 9.12 RC by his teammates.

The distribution of the other player’s performance should not impact John Mayberry’s value. The performance of his team in a value method can certainly do this(and of course it does in Win Shares), but there is no reason why the distribution of his teammates’ RC should impact our assessment of his value.

WS advocates will probably point out that it is unusual to have significant playing time go to hitters as bad as Griffin and Ainge, and that this was a strike-shortened season so perhaps that is why they ended up so bad, and that therefore this is a nitpick. That is a reasonable viewpoint, but I don’t think there’s anything wrong with nitpicking, because it can point out things that perhaps could be improved upon. Is Win Shares fatally flawed and worthless because of this issue? No. But is it something that should be fixed (if possible)? Yes.

There are actually much worse cases then the Mayberry example I just demonstrated. James discusses an actual case in his book. He has an article comparing the offensive seasons of Chuck Klein in 1930 and Carl Yastrzemski in 1968. Klein had 27% of his team’s MR, and thereby 27% of his team’s 91.72 OWS, for 25.04. Except there were some submarginal performances in there, which are zeroed out. This INCREASES the team’s total of marginal runs, since the individual MR must add up to the team MR, but when we remove negative individual MR, the individual MR are greater. So Klein loses WS, and winds up with 24.09. In the Mayberry example, I eliminated sub-marginal performances by reducing super-marginal performances, and thereby decreased the individual total of marginal runs, causing Mayberry to gain WS. In practice, though, any variation between the team MR and the summed individual MR will drive WS down.

Yaz comes out even worse, dropping from 41.45 to 38.18. This is going to be a problem in any league in which the DH is not used, because a fair percentage of pitchers are sub-marginal. It will be less of an issue in the modern AL then the NL, so now we have introduced a bias towards AL players. Going back to Klein and Yaz, Klein loses 3.8% of his OWS, and Yaz loses 7.9%.

And for what? The distribution of performance by his teammates, which as I have already made the case should not impact our estimate of a player’s value. The fundamental problem here (which I will discuss again later when we get to fielders) is that Bill uses marginal runs to distribute absolute wins. But isn’t it intuitive that absolute wins must be distributed on the basis of absolute runs? We know that when dealing with team dynamics, it is necessary to accept negative figures. If you figure Linear Weight run estimations for hitters (like ERP, XR, etc.), the very bad players will get negative runs. This is unfortunate, but unavoidable, at least as far as we know. James’ system purports to give absolute win estimates, but cannot do so because he does it on the basis of MR. Bill’s methods have the exact same problems as linear absolute run estimators, but he just chooses to ignore them. This is nice, but by ignoring the negatives, we are forced to skew all of the positives.

The individual player’s OWS would add up to the team OWS by using the % of team MR--if you accept negative numbers. This is why individual player’s ERP will add up to the team total. Negative absolute wins aren’t easy to grasp intuitively. They bother us. But what is worse? Rating some shlub at -2, or artificially reducing Yaz’s value because he plays with shlubs? Who do we really care about ranking, shlubs or Hall of Famers? And of course, the shlubs deserve the negative numbers, it’s not like we’re forcing them to inflate Yaz’s value. It’s the other way around. We’re forcing the shlub’s up to zero, and since the slices of the pie must add up to the whole pie, we have to take some pie away from Carl, and Jeff Blauser, and Chuck Klein, and anybody else who has teammates with sub-marginal performances.

And the kicker is that many of the sub-marginal offensive players would get negative OWS but still wind up with positive overall Win Shares, particularly pitchers. Since overall WS are the payoff of the system anyway, what is wrong with accepting negative component values but a positive overall value? Instead you distort everything, instead of accepting some confusing but correct negative numbers for atrocious hitters.