Thursday, December 01, 2005

Revamping Speed Unit

Bill James came up with Speed Score as a way to estimate a player’s speed from his statistics. This of course is not perfect because the way speed manifests itself in baseball, such as hitting triples and stealing bases, don’t correlate perfectly with raw speed. But an estimate of “baseball speed” can be made by looking at various elements of the statistical record.

James used a number of categories in his Speed Scores, such as Range Factor and double plays(offensive) which I either did not have readily available, or in the case of Range Factor, feel was appropriate given the advancement in defensive stats since his early work, so I came up with my own Speed Score type method called Speed Unit. Of course, the idea for the specific categories used and the idea for the method itself are lifted from James.

Speed Unit uses four categories: Stolen Base Frequency(SBFrq), Weighted Stolen Base Percentage(WSB%), Runs/Time on Base(R/TOB), and Triples/Balls in Play(T/BIP). Each category is evaluated against the league average and the standard deviation, in an attempt to convert it into a z-score. Then the z-scores in each category are summed and converted into a total score. The properties of the z-scores are not really maintained by this process, and it is no longer on a normally distributed scale, but the individual scores are.

The Speed Unit values published on my website in the seasonal stats are now obsolete, because I discovered something I should have noticed a few years ago when I came up with the formula--the WSB% component was dragging all the scores down, because the mean was well below zero(which should be the mean for a z-score). The reason for this was quite simple. WSB% was developed by Bill James in Speed Score. It adds 3 to SB and 4 to CS for the purpose of calculating the percentage. That way, someone with 3 SB and 0 CS won’t come out at 100%, or 0 SB and 2 CS at 0%, etc. However, adding 3 SB and 4 CS is adding in a 3/(3+4) = 42.9% SB%. This drags everybody down. Then I compare to the league average SB%, which is whatever it is, and you get too many negative numbers. I fixed this by just adding in the previous mean for the WSB% component.

The other change from the previous incarnation was how the standard deviation of each category was figured. Previously, I used a study of a few years of data for regular players and tied the standard deviation in a category, proportionally, to the league average. However, investigating this with a new dataset of 2003-2005 regulars, the highest correlation between standard deviation and league average was +.531 for R/TOB. I decided that correlations of this magnitude were not high enough to justify tying standard deviation to the league average, so I am now just using the three year average standard deviation in all cases. This actually simplifies the formula, which I will now get along to presenting.

First we find the four basic measures which are to be converted to z-scores:
SBFrq = (SB + CS)/(H + W - HR)
T/BIP = T/(AB - HR - K)
R/TOB = (R - HR)/(H + W - HR)
WSB% = (SB + 3)/(SB + CS + 7)

Then we subtract the league average from each of these, and divide by the 3-year average standard deviation to get a z-score:
sbf = (SBFrq - LgSBFrq)/.0669 = (SBFrq - LgSBFrq)*14.95
tbip = (T/BIP - LgT/BIP)/.0063 = (T/BIP - LgT/BIP)*158.7
rtob = (R/TOB - LgR/TOB)/.0640 = (R/TOB - LgR/TOB)*15.63
wsb = (WSB% - LgSB%)/.1240 + 1.31 = (WSB% - LgSB%)*8.065 + 1.31
Speed Unit = 50 + 4.25*(sbf + tbip + rtob + wsb)

The logic behind the final step is that 4.25*(1+1+1+1) + 50 = 67. So a player who is an average of one standard deviation above the mean in each speed component would be rated as 67%--since the 67th percentile of the normal distribution is approximately at one standard deviation.

Anyway, I checked to see how well the z-scores match what they should. The mean should be 0, the standard deviation should be 1, the 25th percentile(first quartile) should be at about -.675, the 50th percentile(median) should be at 0, and the 75th percentile(third quartile) should be at about .675.

It came out pretty good, in my biased opinion, but of course the actual standard deviations for the period were being used, and it would be less accurate taken outside of that dataset. The means for the four z-scores were(in order of sbf, tbip, rtob, wsb) (-.003,-.024,-.029,-.001). The standard deviations were (.999,.997,.991,1.002), the first quartiles (-.729,-.710,-.693,-.791), the medians (-.275,-.188,-.067,-.049), the third quartiles (.392,.533,.611,.704). As you can see, they are all a little skewed to the left. But not all that bad.

I also compared Speed Unit to the Speed Score that would be figured as the average of the four components I consider. The correlation was .948, not surprising considering that the same inputs were used. For SU as a whole, the mean was 49.96, the standard deviation 12.97, the first quartile 40.48, the median 48.15, and the third quartile 57.72. So that is not at all normally distributed, but we didn’t expect it to be.

Now for the fun stuff. I will run down the leaders and trailers for the 05 leagues in each category, list the 5 fastest and slowest guys in each league, and the fastest and slowest players by position:
Category Leaders
CAT…………….AL…………………………..NL
SBFrq…………..Podsednik, CHA(.423)……...Reyes, NYN(.357)
R/TOB………….Womack, NYA(.489)…….....Burke, HOU(.454)
WSB%.................Soriano, TEX(.846)…………Bay, PIT(.828)
T/BIP…………...Gomes, TB(.028)……………Roberts, SD(.02)

Category Trailers
CAT…………….AL…………………………..NL
SBFrq…………...12 tied(0)…………………...12 tied(0)
R/TOB…………..LeCroy, MIN(.155)………...Clark, ARI(.150)
WSB%..................Rivera, LAA(.235)…………Robles, LA(.200)
T/BIP…………….24 tied(0)…………………..24 tied(0)

The trailers in stolen base attempts and triples are numerous, because lots of guys don’t hit a triple or attempt a steal. The WSB% trailers had horrible numbers; Rivera was 1 for 10 on steals and Robles 0 for 8.

5 Fastest
AL………………………………..NL
Crawford, TB(91)………………..Reyes, NYN(99)
Figgins, LAA(89)………………..Pierre, FLA(92)
Logan, DET(85)…………………Rollins, PHI(85)
Womack, NYA(83)……………...Furcal, ATL(85)
Podsednik, CHA(80)…………….Freel, CIN(83)

For the ten total, you have three centerfielders, three shortstops, three leftfielders, and a second baseman.

5 slowest
AL………………………………..NL
LeCroy, MIN(26)………………..Snyder, ARI(27)
Hall, TB(27)……………………..Piazza, NYN(29)
BMolina, LAA(28)………………LaRue, CIN(29)
Giambi, NYA(30)……………….Matheny, SF(30)
Martinez, CLE(30)………………LaRoche, ATL(30)

Here you have seven catchers and three first baseman/DHs. That’s why its fun to break it down by position so that we can get some more diversity(and see who maybe shouldn’t be in centerfield after all).

Fastest by Position
POS……….AL…………………………………NL
C…………..Rodriguez, DET(62)………………LoDuca, FLA(40)
1B…………Hinske, TOR(57)………………….Pujols, STL(59)
2B…………Soriano, TEX(71)…………………Freel, CIN(83)
3B…………Teahen, KC(59)…………………...Wright, NYN(54)
SS…………Lugo, TB(70)……………………...Reyes, NYN(99)
LF…………Crawford, TB(91)…………………Burke, HOU(69)
CF…………Figgins, LAA(89)…………………Pierre, FLA(92)
RF…………Suzuki, SEA(77)…………………..Cameron, NYN(66)
DH…………Dellucci, TEX(63)

Slowest by Position
POS……….AL…………………………………NL
C…………..Hall, TB(27)………………………Snyder, ARI(27)
1B…………Giambi, NYA(30)…………………LaRoche, ATL(30)
2B…………Cantu, TB(38)……………………..Vidro, WAS(41)
3B…………Crede, CHA(36)…………………..Bell, PHI(33)
SS…………Peralta, CLE(42)…………………..Robles, LA(36)
LF…………Anderson, LAA(38)……………….Burrell, PHI(32)
CF…………Williams, NYA(34)……………….Cruz, ARI(33)
RF…………Ordonez, DET(33)…………………Jenkins, MIL(40)
DH…………LeCroy, MIN(26)

You can also try applying it to teams, although the range will obviously be much smaller. You also have to ditch R/TOB, since for a team, that is mostly a reflection of advancement ability(it is actually the BsR “Score Rate”), and just use plain old SB% rather then WSB%, since they all attempt steals(and you also drop the “+1.31” from the wsb formula). Instead of 4.25 times the sum of the four z-scores, you take 5.67 times the sum of the three z-scores. Anyway, the fastest teams in baseball were the Mets, the Devil Rays, and the Phillies(all around 57-58). The slowest were the A’s, Nationals, and Dodgers(in the 39-43 range). The correlation between SU and runs scored was +.101, and the correlation with W% was +.093. Of course, these correlations based on thirty teams in one season should be taken with an ocean-full of salt.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.