Tuesday, September 30, 2008

Silly Playoff "Thoughts"

DISCLAIMER: This is not an analytical post. Nothing in this post should be taken seriously. This post is a waste of your time.

If you are one of those people who likes to bet on football games, and do so based on tips from those radio hucksters, then boy, do I have a World Series tip for you. You know the clowns I’m talking about--the ones that have a super duper lock of the week that you can buy for $10 and talk fast, tossing in a bunch of ridiculous win-loss records (“The Panthers are 3-11 ATS in their last 14 November home games”) as if they are meaningful.

On that level, you should pick the Chicago White Sox to win the World Series this year. Not because of anything they have done on the field, but because they are my least preferred playoff team. The team that I would have least wanted to win of the eight has won it all in 2001, 2002, 2003, and 2005. That’s four out of the last seven years. Not only that, but the Chicago White Sox with Ozzie Guillen as their manager and Nick Swisher relegated to the bench for Ken Griffey and DeWayne Wise are my least favorite playoff team of my time as a baseball fan.

Not only that, but I picked the White Sox to finish fourth in the AL Central this year. My pick of fourth in the AL Central has been a springboard to greatness for the 2005 White Sox and the 2006 Tigers. Not only that, but my fourth place NL Central pick from 2005 won their pennant. It’s getting bad enough that I think I am going to pick the Indians fourth next year just for the heck of it.

My personal preference for this year would go something like this:

1) Brewers
2) Phillies
3) Red Sox
4) Dodgers
5) Rays
6) Cubs
7) Angels
700,000,000,000) White Sox

I really have nothing against the city of Chicago (seriously!), but the Cubs get no sympathy for me because of the fact that they haven’t won since 1908. I just happen to like the other teams better than I like them. I will not be upset if any of those seven teams win.

The doomsday scenario: Cubs and White Sox play for all the marbles as Chicago is awarded the 2016 Olympics and the worse of the two fools wins the presidency. Chicago Uber Alles!

You have just wasted a few minutes of your life if you made it this far; don’t feel too bad, I wasted a few more of mine writing it. Really, though, I am trying to make a point in a roundabout way. I think that a lot of the playoff analysis that you see out there, even from analytical sites, is kind of silly and overwrought.

It’s pretty hard to pick the outcome of five and seven games series contested between two good teams with a great deal of accuracy. That’s not to say that you shouldn’t try to do it, or that it can’t be a fun activity, but I’m not going to join you this time (I have before and reserve the right to do so again). So what you have here is the ultimate anti-analytical approach--who do I want to win? And if you use irrelevant coincidence as the basis for your predictions, the Calcetines Blancas may be your guys.

Monday, September 29, 2008

End of Season Statistics, 2008

Note: This is largely the same explanation as last year; the only significant change is that I switched from FIP to a Base Runs-centric DIPS. Admittedly, this is completely unnecessary, but I decided to use BsR where I could to back up my advocacy for it. It certainly doesn’t hurt, but it is needlessly complicated for that (DIPS) application. I have also added a "R" column which is for rookie. I based this on the list of choices for the IBA Rookie of the Year listed at Baseball Prospectus. I can't guarantee that I marked every rookie (I tried), but I believe I got all the serious ROY hopefuls at the very least.

For the past several years I have been posting Excel spreadsheets with sabermetric stats like RC for regular players on my website. I have not been doing this because I think it is a unique thing that nobody else does--Hardball Times, Baseball Prospectus, and other sites have similar data available. However, since I figure my own stats for myself anyway, I figured I might as well post it on the net.

This year, I am not putting out Excel spreadsheets, but I will have Google Spreadsheets that I will link to from both this blog and my site. What I wanted to do here is a quick run down of the methodology used. These will be added as they are completed; as I post this, there are none, but by the end of the week they should start popping up.

First, I should acknowledge that the primary data source is Doug’s Stats, and that park data for past seasons comes from KJOK’s park database. Baseball-Reference.com and ESPN.com round out the sources.

The general philosophy of these stats is to do what is easiest while not being too imprecise, unless you can do something just a little bit more complex and be more precise. Or at least it used to be. Then I decided to put my money where my mouth was on the matter of Base Runs for pitchers and teams and Pythagenpat. On the other hand, using ERP as the run estimator is not optimal--I could, in lieu of having empirical linear weights for 2007, use Base Runs or another approach to generate custom linear weights. I have decided that does not constitute a worthwhile improvement. Others might disagree, and that’s alright. I’m not claiming that any of these numbers are the state of the art or cannot be improved upon.

First, the team report. I list Park Factor (PF), Winning %, Expected Winning % (EW%), Predicted Winning % (PW%), Wins, Losses, Runs, Runs Allowed, Runs Created (RC), Runs Created Allowed (RCA), Runs/Game (R/G), Runs Allowed/Game (RA/G), Runs Created per Game (RCG), and Runs Created Allowed per Game (RCAG):

EW% is based on runs and runs allowed in Pythagenpat, with the exponent = RPG^.29. PW% is based on runs created and runs created allowed in Pythagenpat.

Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. For the offense, the formula is:
A = H + W - HR - CS
B = (2TB - H - 4HR + .05W + 1.5SB)*.76
C = AB - H
D = HR
For the defense:
A = H + W - HR
B = (2TB - H - 4HR + .05W)*.78
C = AB - H (approximated as IP*2.82, or whatever the league (AB-H)/IP average is)
D = HR
Of course, these are both put together, like all BsR, as A*B/(B + C) + D. The only difference between the formulas is that I include SB and CS for the offense, but don’t want to waste time scrounging up stolen bases allowed for the defense.

R/G, RA/G, RCG, and RCAG are all calculated straightforwardly by dividing by games, then park adjusted by dividing by park factor. Ideally, you use outs as the denominator, but for teams, outs and games are so closely related that I don’t think it’s worth the extra effort.

Next, we have park factors. I have explained the methodology used to figure the PFs before, but the cliff’s notes version is that they are based on five years of data when applicable, include both runs scored and allowed, and they are regressed towards average (PF = 1), with the amount of regression varying based on the number of years of data used. There are factors for both runs and home runs. The initial PF (unshown) is:
iPF = (H*T/(R*(T - 1) + H) + 1)/2
where H = RPG in home games, R = RPG in road games, T = # teams in league (14 for AL and 16 for NL). Then the iPF is converted to the PF by taking 1- (1-iPF)*x, where x = .6 if one year of data is used, .7 for 2, .8 for 3, and .9 for 4+.

It is important to note, since there always seems to be confusion about this, that these park factors already incorporate the fact that the average player plays 50% on the road and 50% at home. That is what the adding one and dividing by 2 in the iPF is all about. So if I list Fenway Park with a 1.02 PF, that means that it actually increases RPG by 4%.

In the calculation of the PFs, I did not get picky and take out “home” games that were actually at neutral sites, like the Astros/Cubs series that was moved to Milwaukee. They simply don’t cause that big of a problem. Suppose Enron Field (I have nothing against corporate stadium names, but I refuse to learn the new ones when they come along) was a perfectly average park in a league in which there are 4.8 runs/game. At 81 home and road games per year, in the previous four years the Astros and their opponents would have scored 3110.4 runs at home and on the road.

If this season, the Astros played four “home” games in an extreme environment in which say 20 runs were scored per game, they would have 819.2 runs added in to the home total. The road games would contribute 777.6 runs to the five-year total. Now, for the five years the Astros’ home games would have a total of 9.70272 RPG versus 9.6 for the road games. The park factor, when fully figured with the regression factor would be 1.0045, when we know that it should be 1.0000. I’m not going to spend too much time worrying about that kind of discrepancy, and that’s a high end example of what the discrepancy would actually be. And I round off to two decimal places anyway, so both would end up 1.00.

Next is the relief pitchers report. I defined a starting pitcher as one with 15 or more starts. All other pitchers are eligible to be included as a reliever. If a pitcher has 40 appearances, then they are included. Additionally, if a pitcher has 50 innings and less than 50% of his appearances are starts, he is also included here (this allows some swingmen type pitchers who wouldn’t meet either the minimum start or appearance standards to get in).

For all of the player reports, ages are based on simply subtracting their year of birth from 2007. I realize that this is not compatible with how ages are usually listed and so “Age 27” doesn’t necessarily correspond to age 27 as I list it, but it makes everything a heckuva lot easier, and I am more interested in comparing the ages of the players to their contemporaries, for which case it makes very little difference.

Anyway, for relievers, the statistical categories are Games, Innings Pitched, Run Average (RA), Relief Run Average (RRA), Earned Run Average (ERA), Estimated Run Average (eRA), DIPS-style estimated Run Average (dRA), Guess-Future (G-F), Inherited Runners per Game (IR/G), Inherited Runs Saved (IRSV), hits per ball in play (%H), Runs Above Average (RAA), and Runs Above Replacement (RAR).

All of the run averages are park adjusted. RA is R*9/IP, and you know ERA. Relief Run Average subtracts IRSV from runs allowed, and thus is (R - IRSV)*9/IP; it was published in By the Numbers by Sky Andrecheck. eRA, dRA, %H, and RAA will be explained in the starters section.

Guess-Future is a JUNK STAT. G-F is A JUNK STAT. I just wanted to make that clear so that no anonymous commentator posts that without any explanation. It is just something that I have used for some time that combines eRA and strikeout rate into a unitless number. As a rule of thumb, anything under 4 is pretty good. I include it not because I think it is meaningful, but because it is a number that I have been looking at for some time and still like to, despite the fact that it is a JUNK STAT. JUNK STATS can be fun as long as you recognize them for what they are. G-F = 4.46 + .095(eRA) - .113(KG), where KG is strikeouts per 9 innings. JUNK STAT JUNK STAT JUNK STAT JUNK STAT JUNK STAT

Inherited Runners per Game is per relief appearance (G - GS); it is an interesting thing to look at, I think, in lieu of actual leverage data. You can see which closers come in with runners on base, and which are used nearly exclusively to start innings. Of course, you can’t infer too much; there are bad relievers who come in with a lot of people on base, not because they are being used in high leverage situations, but because they are long men or what have you. I think it’s mildly interesting, so I include it.

Inherited Runs Saved is the difference between the number of inherited runs the reliever allowed to score, subtracted from the number of inherited runs an average reliever would have allowed to score, given the same number of inherited runners. I do not park adjust this figure. Of course, the way I am doing it is without regard to which base the runners were on, which of course is a very important thing to know. Obviously, with a lot of these reliever measures, if you have access to WPA and LI data and the like, that will probably be more significant.

IRSV = Inherited Runners*League % Stranded - Inherited Runs Scored

Runs Above Replacement is a comparison of the pitcher to a replacement level reliever, which is assumed to be a .450 pitcher, or as I would prefer to say, one who allows runs at 111% of the league average. So the formula is (1.11*N - RRA)*IP/9, where N is league runs/game. Runs Above Average is simply (N - RRA)*IP/9.

On to the starting pitchers. The categories are Innings Pitched, Run Average, ERA, eRA, dRA, KG, G-F, %H, Neutral W% (NW%), Quality Start% (QS%), RAA, and RAR.

The run averages (RA, ERA, eRA, dRA) are all park-adjusted, simply by dividing by park factor.

eRA is figured by plugging the pitcher’s stats into the Base Runs formula above (the one not including SB and CS that is used for estimating team runs allowed), multiplying the estimated runs by nine and dividing by innings.

dRA is a DIPS method (which of course means that Voros McCracken is the true developer), using Base Runs as the run estimator. This is overkill, since a DIPS estimator like FIP will work just fine, but I decided to use Base Runs wherever I could this year. To find, it first estimate PA as IP*x + H + W, where x = Lg(AB-H)/IP. Then, find %K (K/PA), %W (W/PA), %HR (HR/PA), and BIP% = 1- %K - %W - %HR. Next, find estimated %H (which I will just call %H for the sake of this explanation, but it is not the same as the %H displayed in the stats. That is the pitcher’s actual rate, (H-HR)/(estimated PA-W-K-HR)) as BIP%*Lg%H.

Then you use BsR to find the new estimated RA:

A = %H + %W

B = (2*(%H*Lg(TB-4*HR)/(H-HR) + 4*%HR) - %H - 5*%HR + .05*%W)*.78

C = 1 - %H - %W - %HR

D = %HR

dRA = (A*B/(B+C) + D)/C*25.2/PF

Neutral Winning Percentage is the pitcher’s winning percentage adjusted for the quality of his team. It makes the assumption that all teams are perfectly balanced between offense and defense, and then projects what the pitcher’s W% would be on an average team. I do not place a lot of faith in anything based on wins and losses, of course, and particularly not for a one-year sample. In the long run, we would expect pitchers to pitch for fairly balanced teams and for run support for an individual to be the same as for the pitching staff as a whole. For individual seasons, we know that things are not going to even out.

I used to use Run Support to compare a pitcher’s W% to what he would have been expected to earn, but now I have decided that is more trouble than it is worth. RS can be a pain to run down, and I don’t put a lot of stock in the resulting figures anyway. So why bother? NW% = W% - (Mate + .5)/2 +.5, where Mate is (Team Wins - Pitcher Wins)/(Team Decisions - Pitcher Decisions).

Likewise, I include Quality Start Percentage (which of course is just QS/GS) only because my data source (Doug’s Stats) includes them. As for RAA and RAR for starters, RAA = (N - RA)*IP/9, and RAR = (1.25*N - RA)*IP/9.

For hitters with 300 or more PA, I list Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Runs Created (RC), Runs Created per Game (RG), Secondary Average (SEC), Speed Unit (SU), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), Hitting Runs Above Replacement (HRAR), and Runs Above Replacement (RAR).

I do not bother to include hit batters, so take note of that for players who do get plunked a lot. Therefore, PA are simply AB + W. Outs are AB - H + CS. BA and SLG you know, but remember that without HB and SF, OBA is just (H + W)/(AB + W). Secondary Average = (TB - H + W)/AB. I have not included net steals as many people (and Bill James himself) does--it is solely hitting events.

The park adjustment method I’ve used for BA, OBA, SLG, and SEC deserves a little bit of explanation. It is based on the same principle as the “Willie Davis method” introduced by Bill James in the New Historical Baseball Abstract. The idea is to deflate all of the positive offensive events by a constant percentage in order to make the new runs created estimate from those stats equal to the park adjusted runs created we get from the player’s actual stats. I based it on the run estimator (ERP) that I use here instead of RC.

X = ((TB + .8H + W - .3AB)/PF + .3(AB - H))/(TB + W + .5H)

X is unique for each player and is the deflator. Then, hits, walks, and total bases are all multiplied by X in order to park adjust them. Outs (AB - H) are held constant, so the new At Bat estimate is AB - H + H*X, which can be rewritten as AB - (1 - X)*H. Thus, we can write BA, OBA, SLG, and SEC as:

BA = H*X/(AB - (1 - X)*H)
OBA = (H + W)*X/(AB - (1 - X)*H + W*X)
SLG = TB*X/(AB - (1 - X)*H)
SEC = SLG - BA + (OBA - BA)/(1 - OBA)

Next up is Runs Created, which as previously mentioned is actually Paul Johnson’s ERP. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available. Anyway, RC = (TB + .8H + W + .7SB - CS - .3AB)*.322.

RC is park adjusted, by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the rate, is RC/O*25.5. I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).

Speed Unit is my own take on a “speed skill” estimator ala Speed Score. I AM NOT CLAIMING THAT IT IS BETTER THAN SPEED SCORE. I don’t use Speed Score because I always like to make up my own crap whenever possible (while of course recognizing that others did it first and better), because some of the categories aren’t readily available, and because I don’t want to mess with square roots. Anyway, it considers four categories: runs per time on base, stolen base percentage (using Bill James’ technique of adding 3 to the numerator and 7 to the denominator), stolen base frequency (steal attempts per time on base), and triples per ball in play. These are then converted to a pseudo Z-score in each category, and are on a 0-100 scale. I will not reprint the formula here, but I have written about it before here. I AM NOT CLAIMING THAT IT IS BETTER THAN SPEED SCORE. I AM NOT CLAIMING THAT IT IS AS GOOD AS SPEED SCORE.

There are a whopping four categories that compare to a baseline; two for average, two for replacement. Hitting RAA compares to a league average hitter; it is in the vein of Pete Palmer’s Batting Runs. RAA compares to an average hitter at the player’s primary position. Hitting RAR compares to a “replacement level” hitter; RAR compares to a replacement level hitter at the player’s primary position. The formulas are:

HRAA = (RG - N)*O/25.5
RAA = (RG - N*PADJ)*O/25.5
HRAR = (RG - .73*N)*O/25.5
RAR = (RG - .73*N*PADJ)*O/25.5

PADJ is the position adjustment, and it is based on 1992-2001 data. For catchers it is .89; for 1B/DH, 1.19; for 2B, .93; for 3B, 1.01; for SS, .86; for LF/RF, 1.12; and for CF, 1.02.

How do I deal with players who split time between teams? I assign all of their statistics to the team with which they played more, even if this means it is across leagues. This is obviously the lazy way out; the optimal thing would be to look at the performance with the teams separately, and then sum them up.

You can stop reading now if you just want to know how the numbers were calculated. The rest of this post will be of a rambling nature and will discuss the underpinnings behind the choices I have made on matters like park adjustments, positional adjustments, run to win converters, and replacement levels.

First of all, the term “replacement level” is obnoxious, because everyone brings their preconceptions to the table about what that means, and people end up talking past each other. Unfortunately, that ship has sailed, and the term “replacement level” is not going away. Secondly, I am not really a believer in replacement level. I don’t deny that it is a valid concept, or that comparisons to replacement level can be useful for answering certain questions. I just don’t believe that replacement level is clearly the correct baseline. I also don’t believe that it’s clearly NOT the correct baseline, and since most sabermetricians use it, I go along with the crowd in this case.

The way that reads is probably too wishy-washy; I do think that it is PROBABLY the correct choice. There are few things in sabermetrics that I am 100% sure of, though, and this is certainly not one of them.

I have used distinct replacement levels for batters, starters, and relievers. For batters, it is 73% of the league RG, or since replacement levels are often discussed in these terms, a .350 W%. For starters, I used 125% of the league RA or a .390 W%. For relievers, I used 111% of the league RA or a .450 W%. I am certainly not positive that any of these choices are “correct”. I do think that it is extremely important to use different replacement levels for starters and relievers; Tango Tiger convinced me of this last year (he actually uses .380, .380, .470 as his baselines). Relievers have a natural RA advantage over starters, and thus their replacements will as well.

Now, park adjustments. Since I am concerned about the player’s value last season, the proper type of PF to use is definitely one based on runs. Given that, there are still two paths you can go down. One is to park adjust the player’s statistics; the other is to park adjust the league or replacement statistics when you plug in to a RAA or RAR formula. I go with the first option, because it is more useful to have adjusted RC or adjusted RA, ERA, etc. than to only have the value stats adjusted. However, given a certain assumption about the run to win converter, the two approaches are equivalent.

Speaking of those RPW: David Smyth, in his Base Wins methodology, uses RPW = RPG. If the RPG is 9.4, then there are 9.4 runs per win. It is true that if you study marginal RPW for teams, the relationship is not linear. However, if you back up from the team and consider things in league context, one can make the case that the proper approach is the simple RPW = RPG.

Given that RPW = RPG, the two park factor approaches are equivalent. Suppose that we have a player in an extreme park (PF = 1.15, approximately like Coors Field) who has a 8 RG before adjusting for park while making 350 outs in a 4.5 N league. The first method of park adjustment, the one I use, converts his value into a neutral park, so his RG is now 8/1.15 = 6.957. We can now compare him directly to the league average:

RAA = (6.957 - 4.5)*350/25.5 = +33.72

The second method would be to adjust the league context. If N = 4.5, then the average player in this park will create 4.5*1.15 = 5.175 runs. Now, to figure RAA, we can use the unadjusted RG of 8:

RAA = (8 - 5.175)*350/25.5 = +38.77

These are not the same, as you can obviously see. The reason for this is that they are in two different contexts. The first figure is in a 9 RPG (2*4.5) context; the second figure is in a 10.35 RPG (2*4.5*1.15) context. Runs have different values in different contexts; that is why we have RPW converters. If we convert to WAA, then we have:

WAA = 33.72/9 = +3.75
WAA = 38.77/10.35 = +3.75

Once you convert to wins, the two approaches are equivalent. This is another advantage for the first approach: since after park adjusting, everyone in the league is in the same context, there is no need to convert to wins at all. Sure, you can convert to wins if you want. If you want to compare to performances from other seasons and other leagues, then you need to. But if all you want to do is compare David Wright to Prince Fielder to Hanley Ramirez, there is no need to convert to wins. Personally, I think that stating something as +34 is a lot nicer than stating it as +3.8, if you can get away with it. None of this is to deny that wins are not the ultimate currency, but runs are directly related to wins, and so there is no difference in conclusion from using them if the RPW is the same for all players, which it is for a given league season coupled with park adjusting runs rather than context.

Finally, there is the matter of position adjustments. What I have done is apply an offensive positional adjustment to set a baseline for each player. A second baseman’s RAA will be figured by comparing his RG to 93% of the league average, while a third baseman’s will compare to 101%, etc. Replacement level is set at 73% of the estimated average for each position.

So what I am doing is comparing to a “replacement hitter at position”. As Tango Tiger has pointed out, there is really no such thing as a “replacement hitter” or a “replacement fielder”--there are just replacement players. Every player is chosen because his total value, both hitting and fielding, is sufficient to justify his inclusion on the team. Segmenting it into hitting and fielding replacements is not realistic and causes mass confusion.

That being said, using “replacement hitter at position” does not cause too many distortions. It is not theoretically correct, but it is practically powerful. For one thing, most players, even those at key defensive positions, are chosen first and foremost for their offense. Empirical work by Keith Woolner has shown that the replacement level hitting performance is about the same for every position, relative to the positional average.

The offensive positional adjustment makes the inherent assumption that the average player at each position is equally valuable. I think that this is close to being true, but it is not quite true. The ideal approach would be to use a defensive positional adjustment, since the real difference between a first baseman and a shortstop is their defensive value. When you bat, all runs count the same, whether you create them as a first baseman or as a shortstop.

Figuring what the defensive positional adjustment should be, though, is easier said than done. Therefore, I use the offensive positional adjustment. So if you want to criticize that choice, or criticize the numbers that result, be my guess. But do not claim that I am holding this up as the correct analytical structure. I am holding it up as the most simple and straightforward structure that conforms to reality reasonably well, and because while the numbers may be flawed, they are at least based on an objective formula. If you feel comfortable with some other assumptions, please feel free to ignore mine.

One other note here is that since the offensive PADJ is a proxy for average defensive value by position, ideally it would be applied by tying it to defensive playing time. I have done it by outs, though. For example, shortstops have a PADJ of .86. If we assume that an average full-time player makes 10% of his team’s outs (about 408 for a 162 game season with 25.5 O/G) and the league has a 4.75 N, the average shortstop is getting an adjustment of (1 - .86)*4.75/25.5*408 = +10.6 runs. However, I am distributing it based on player outs. If you have one shortstop who makes 350 outs and another who makes 425 outs, then the first player will be getting 9.1 runs while the second will be getting 11.1 runs, despite the fact that they may both be full-time players.

The reason I have taken this flawed path is because 1) it ties the position adjustment directly into the RAR formula rather then leaving it as something to subtract on the outside and more importantly 2) there’s no straightforward way to do it. The best would probably be to use defensive innings--set the full-time player to X defensive innings, figure how Derek Jeter’s innings compare to X, and adjust his PADJ accordingly. Games in the field or games played are dicey because they can cause distortion for defensive replacements. Plate Appearances avoid the problem that outs have of being highly related to player quality, but they still have the illogic of basing it on offensive playing time. And of course the differences here are going to be fairly small (a few runs). That is not to say that this way is preferable, but it’s not horrible either, at least as far as I can tell.

Given the inherent assumption of the offensive PADJ that all positions are equally valuable, once we have a player’s RAR, we should account for his defensive value by adding on his runs above average relative to a player at his own position. If there is a shortstop out there who is -2 runs defensively versus an average shortstop, he is without a doubt a plus defensive player, and a more valuable defensive player than a first baseman who was +1 run better than an average first baseman. Regardless, since we have implicitly assumed that they are both average defensively for their position when RAR was calculated, the shortstop will see his value docked two runs. This DOES NOT MEAN that the shortstop has been penalized for his defense. The whole process of accounting for positional differences, going from hitting RAR to positional RAR, has benefited him.

It is with some misgivings that I publish “hitting RAR” at all, since I have already stated that there is no such thing as a replacement level hitter. It is useful to provide a low baseline total offensive evaluation that does not include position, though, and it can also be thought of as the theoretical value above replacement in a world in which nobody plays defense at all.

The DH is a special case, and it caused a lot of confusion when my MVP post was linked at BTF last year. Some of that confusion has to do with assuming that any runs above replacement methodology is the same as VORP from the Baseball Prospectus. Obviously there are similarities between my approach and VORP, but there also key differences. One key difference is that I use a better run estimator. Simple, humble old ERP is, in my opinion, a superior estimator to the complex MLV. I agree with almost all of the logic behind MLV--but using James’ Runs Created as the estimator to fuel it is putting lipstick on a pig (this is a much more exciting way of putting it in the 2008 context, don’t you think?).

The big difference, though, as it relates to the DH, is that VORP considers the DH to be a unique position, and I consider DHs as in the same pool as first baseman. The fact of the matter is that first baseman outhit DH. There are any number of potential explanations for this; DHs are often old or injured, hitting as a DH is harder than hitting as a position player, etc. Anyway, the exact procedure for VORP is propriety, but it is apparent that they use some sort of average DH production to set the DH replacement level. This makes the replacement level for a DH lower than the replacement level for a first baseman.

A couple of the aforementioned nimrods took the fact that VORP did this and assumed that my figures did as well. What I do is evaluate 1B and DH against the same replacement RG. This actually helps first baseman, since the DHs drag the average production of the pool down, thus resulting in a lower replacement level than I would get if I considered first baseman on their own. Contrary to what the chief nimrod thought, this is not “treating a 1B as a DH”. It is “treating a 1B as a 1B/DH”.

It is true, however, that this method assumes that a 1B and a DH have equal defensive value. Obviously, a DH has no defensive value. What I advocate to correct this is to treat a DH as a bad defensive first baseman, and thus knock another five or ten runs off of his RAR for a full-time player. I do not incorporate this into the published numbers, but you should keep it in mind. However, there is no need to adjust the figures for first baseman upwards, despite what the nimrods might think. The simple fact of the matter is that first baseman get higher RAR figures by being pooled with the DHs than they would otherwise.

2008 Park Factors

2008 Leagues

2008 Teams

2008 AL Relievers

2008 NL Relievers

2008 AL Starters

2008 NL Starters

2008 AL Hitters

2008 NL Hitters

Tuesday, September 23, 2008

Most Vacuous Post

I hate to even write a post like this because it is the kind of thing that sportswriters love to write about--controversial, timely, and of very little actual importance in the grand scheme of things. So if you’re not interested, don’t read it.

There is a lot of opinion involved with this topic, on all sides. By stating your own opinion, some people will inherently conclude that you are disrespecting others. While there certainly are views on this topic that I don’t particularly respect, because I think they are inane or ridiculous, my intention here is simply to present my position, arguing against others only in the limited way that is required when you try to state your own.

The other problem is that this topic has been discussed by so many people for so many years that there really is nothing new to say. So there’s certainly no claim that any line of argument here is unique.

Nonetheless, here are my thoughts on the criteria for the MVP award, presented here so that I don’t have to cover the topic when I actually pick my IBA ballot.

First, let me reprint here the actual instructions that are sent to each BBWAA MVP voter:

Dear Voter:

There is no clear-cut definition of what Most Valuable means. It is up to the individual voter to decide who was the Most Valuable Player in each league to his team. The MVP need not come from a division winner or other playoff qualifier.

The rules of the voting remain the same as they were written on the first ballot in 1931:

1. Actual value of a player to his team, that is, strength of offense and defense.
2. Number of games played.
3. General character, disposition, loyalty and effort.
4. Former winners are eligible.
5. Members of the committee may vote for more than one member of a team.

You are also urged to give serious consideration to all your selections, from one to 10. A 10th-place vote can influence the outcome of an election. You must fill in all 10 places on your ballot.

Keep in mind that all players are eligible for MVP, and that includes pitchers and designated hitters.

Only regular-season performances are to be taken into consideration.

As Larry Mahnken points out in the THT piece in which the quoted text appeared, the first line basically allows the individual voter to define “most valuable” any which way they want. So how should an individual, whether they are a BBWAA voter or not, define “value”?

That is a question that, to paraphrase someone who is in the news a lot now (and with any good fortune at all will vacate that position in six weeks or so), is above my pay grade. The definition of “value” is by no means clear and is not agreed upon even in the sabermetric community where matters like definitions of terms are taken seriously. In this case, I do believe that it is a matter of personal discretion. Therefore, I will try to explain my personal approach. And it is just that, and if any BTF readers think this is indulgent, then by all means, stop reading now.

The fundamental starting point from which I come from is: The name that was chosen for the award need not be viewed through the lens of how a sabermetrician would define “value” in its most literal sense. This could also be called the “WPA is not the boss of me” principle.

Some people in the sabermetric community seem to see “valuable” and immediately jump to using what I in the past have called “literal value” methods. The most prominent example is WPA, but there are other methods that would qualify.

This response is completely understandable if the name chosen for the award is taken literally. However, I think that it is a little bit silly to read more into the name itself than its creators put thought into it. I seriously doubt that the BBWAA, when deciding to start a year-end award to honor an individual player, put a lot of effort into debating whether it should be called “Most Valuable”, “Most Outstanding”, “Best”, etc. If they did, the instructions that they left for the voters to make their choice leave scant evidence of it, as they are very bland and do not demand any particular viewpoint on what the award represents.

Had the award been called something else, I wonder how the debates about the award down throughout the years would have gone. If it was a “Most Outstanding Player” award, would there be a large group of people claiming that one could only truly be outstanding on a contender? Is the wish to recognize players from contenders a product of the award name, or a product of a natural desire that would manifest itself even if we were discussing the “Top Performing Player”?

In fact, I wonder if even our sabermetric perspective on what constitutes value has not been shaped by the MVP Award and the ensuing debates. I don’t mean to suggest that what is measured by the “value” metrics (WPA, Win Shares, Value-Added Batting Runs, etc. depending on how exactly you define the term) is not meaningful or that they are not worth calculating. However, I don’t dismiss the possibility that people have attached the “value” label to these metrics at least in part due to the fact that they incorporate context in a way that a baseball fan expects thanks to the MVP debates. All I’m suggesting is that the language may have evolved a bit differently in their absence.

Anyway, I choose to not take the “value” literally, as I don’t see compelling evidence that the original intent of the award was to do so, and I don’t see anything in the criteria for the award that compels me to do so. So let me give my take on each one of the criteria:

1. Actual value of a player to his team, that is, strength of offense and defense.

Again, “actual value”, interpreted sabermetrically, can lead to a conclusion at odds with mine. However (and also again), I don’t believe that interpreting the letter of the rules set out by a group of 1930s baseball writers through the eyes of a 2000s sabermetrician is required. For a non-sports analogy that will get me in trouble with some of you (but it’s September of an election year and I don’t do it that often), I believe this is similar to interpreting the Constitution. The original intent of the authors and the commonly understood meaning of the words at the time of their writing trump any modern reading of the document.

Even if I did feel compelled to hold the opposite viewpoint, they had to go mess it up by clarifying the definition of value, and talking about “strength of offense and defense”. What immediately jumps to my mind here is rate statistics, context-neutral or not. I believe that this interpretation is supported by:

2. Number of games played.

I have a rate, and now I have to balance it against playing time, and I feel comfortable in using value above replacement to address the first two considerations, regardless of what you use to fuel the RAR/VORP/WARP/etc.

Personally, I use a linear weights formula based on the player’s overall statistics as the starting point for my RAR estimates. That means that I am not considering the game situation in which his performance occurred. I adjust for park, but only for the value of runs in the park--if there is some other characteristic of the park (like being doubles-friendly or benefiting left-handed pull hitters) that the player is able to exploit, I don’t care. The player is creating actual wins for his team if he his able to do so.

I do tend to consider situational performance as a tiebreaker. If two candidates are very close, and one has a WPA or WPA/LI or VABR clearly superior to the other, then I might bump him ahead. However, I do not use those metrics as a starting point.

Why? Why not? It’s a matter of personal preference. I think that an award that honors the player who demonstrated the most ability (not a precise term for what I described above) is more interesting than an award for the player who had the best combination of performance and timing. And again, I don’t feel that the “valuable” part of the title needs to be interpreted in a sabermetric sense. Maybe you do, and I’m okay with that.

I also do not believe that WPA and similar measures are necessarily the correct way to measure literal value. They are a measure of real-time value. You could also consider a backwards-looking perspective, in which all runs were equally valuable in the end. You could take this viewpoint a step further and claim that any performances in team losses really didn’t have any value, since they were for naught in the end. A forwards-looking definition of value would not measure value in the sense that sabermetricians generally do; rather it would measure what is often referred to as “ability”. The point is that there are a number of different perspectives from which value can be defined, and any number of roads that you can take from that point. I do not accept the premise that WPA is necessarily the best road.

3. General character, disposition, loyalty and effort.

This serves, as Bill James might say, as a BS dump. I’m not a psychologist/sociologist/behaviorist and I try to avoid playing one on the internet, so I’ll give everybody equal marks here in the vast majority of cases. My point is not that you should look only at the statistics--it is just that I will not be bullied into incorporating the common perceptions for individuals on these points (ARod is a choker, Jeter is clutch, Bonds is a cancer, etc.) If you feel you have insight, go ahead and use it. Just don’t feel compelled to follow the crowd, and don’t expect others to treat your opinion as hard evidence.

In defining the two criteria above in terms of numbers, I’m NOT saying that you should take the list of RAR leaders and simply copy it onto your ballot without taking anything else into account. However, I do think it is helpful to use such a list as a starting point, and make adjustments as you deem necessary.

4. Former winners are eligible.
5. Members of the committee may vote for more than one member of a team.

These anachronisms are necessary because at least one earlier incarnation of the MVP award had the opposite rules, and so the voters needed to be reminded of things we all take for granted now.

An ancillary point that has been brought to the forefront this year due to the performance of Sabathia and to a lesser extent Ramirez is how to deal with players who switch leagues during the season. My position has always been that it is an award for NL MVP, and thus only performance that creates value in the National League should count. However, this is admittedly sort of arbitrary, particularly in this brave new world of interleague play. With the lines between the leagues blurred more than ever, and one could argue that some of a player’s performance in the AL (against NL opponents) is providing value by damaging the opponent of his new team. Or one could just argue that holding on firmly to an AL/NL schism is outdated. Nonetheless, I’m sticking with my position, but without a whole lot of conviction and no desire to claim any sort of high ground.

Fianally, a brief bit on the Cy Young and Rookie of the Year awards. It is incredibly hard to find information online about the rules for these awards. I have always approached the Cy Young as being the best pitcher, and not incorporated batting value into the mix. However, I don’t have any compelling reasoning behind this and have absolutely no qualms with the viewpoint of anyone who wants to include non-pitching contributions. You can probably chalk up my exclusion of offense to sheer laziness.

For Rookie of the Year, I have always treated it exactly as I have the MVP, except limited to rookies. I have never made any allowances for age, potential, or players with top-level experience (in modern times, read Japanese veterans, although Negro Leaguers fit the bill in the early years of the award). It seems that the BBWAA has had a little bit of a backlash against Japanese players in the last few years after giving awards to Nomo, Sasaki, and Suzuki, but I see no compelling reason to follow along.

Tuesday, September 16, 2008

Meanderings

...thus, you are free to disregard it even more than a normal post.

*If the only difference between a no-hitter and a one-hitter (in a specific case) is a matter of scorer’s discretion, than why should I treat that particular no-hitter as something special? I believe this is a case where people elevate statistics above the game itself. Regardless of how the play was scored, the outcome was the same--a runner on first base.

This is not a diatribe against the error, although I could write one. If MLB had overruled the scorer’s decision, what would that mean? It’s even less obviously meaningful in a case in which it was the pitcher’s supposed error that is in question. Either way, he was responsible for what happened.

I think that much of the problem here is the error itself; the silly belief that the error promotes that by tracking it, we can remove pitching from fielding, when they are in fact much more deeply intertwined. For my money, the pitcher’s job is to both pitch and field when appropriate, and that what matters is the overall performance, rather than a somewhat arbitrary segmentation of his role into two distinct parts. Whether it actually was a hit or an error, Sabathia allowed a runner to reach first base. In the terms that actually matter (runs, outs, wins), that’s the bottom line.

(In case anyone reads this in five years, the CC Sabathia one-hitter of August 31 is the game in question.)

*When I have occasion to discuss the matter of “Who’s the better player” with people not versed in sabermetrics (I generally try to avoid this, but it’s a topic that’s fairly pervasive when you talk baseball, and so it’s sometimes necessary), I still find that there are people who absolutely balk at the idea that outs are what a player’s overall production should be measured against.

In my arrogant and stupid early days as a sabermetrician (some things never change), I used to argue for PA as the denominator for Runs Created, not outs. However misguided I may have been, I never dismissed out of hand the idea that the number of outs a player makes is unimportant once you account for how many times he has been to the plate. You might argue that I implicitly did so by not agreeing with the outs camp, and that’s fair, but I understood the idea that the number of plate appearances the team would get is directly tied to the rate at which they make outs.

The interesting thing is that everyone agrees with the idea of an outs denominator when applied to pitchers. What are the two most used pitching rate stats by the general public? Certainly ERA is one of them, and if WHIP is not second, it’s definitely third or fourth (K/W and W% are the only other contenders). What both of those have in common is that they are denominated in outs.

Why don’t people see this and assume that hitting stats should be constructed in the same manner? I see three possibilities. One is that people just don’t think that hard about this stuff. The second is that people just don’t connect innings pitched with outs--that's exactly what they are, opponents’ outs divided by three, but the name makes you think about it. They don’t have anything to do with the actual division into innings...wait, you guys know this. The third is that perhaps people inherently recognize the fact that the individual pitcher (plus his defense, whose efforts is all over his statistical line) is his own team, whereas the batter is operating in a context of eight others. I doubt that is it, though, since that is a subtle point that many people interested in sabermetrics sometimes need a little prodding on (for example, you can’t apply RC directly to individual hitters). It is a valid point, but one that doesn’t really effect the rate stat issue except for very extreme cases.

Perhaps my experience is not reflective of the thoughts of the larger group of baseball fans. I don’t know, but next time I get the opportunity, I’ll try pulling out the “you measure pitchers with a denominator of outs” card.

Along the same lines, I am always amazed at the deference with which some people view the RBI, while heeding no attention to runs scored. Not that I want to perpetuate the use of either, but it is certainly incomplete to consider only one side of the coin.

*A couple years ago I wrote a bit about methods to adjust a pitcher’s win-loss record based solely on his team’s record, and I posted some career data for notable pitchers. This family of “Neutral W%” metrics was pioneered by Ted Oliver, and the version I used is based on logic first formally presented by Rob Wood. Anyway, the formula for NW% that I use is:

NW% = W% - Mate/2 + .25

Where Mate is the team’s winning percentage with the individual’s decisions removed.

This is a linear approximation of a more advanced Pythagorean-based function, and works fine for most cases. Occasionally you will get extreme cases that may test it, and Cliff Lee’s season had the potential to be one of them. Fortunately (due to my rooting preferences), the Tribe has started to win games that Lee does not start with some regularity, and the uniqueness of the situation has been lessened. Still, it is odd to see a 21-2 pitcher (as of September 10) on a 71-73 (and thus .413 Mate) team. That gives him a NW% of .956, as opposed to his actual .913 W%.

If you go through the more complex Pythagorean approach (with an exponent of 2), you get a NW% estimate of .926 for Lee. So for a pretty extreme case, we are off by .7 wins over 23 decisions. This, coupled with the fact that the whole exercise of using pitcher’s W% in the first place is imprecise, is why I use the linear approximation to find NW%.

How good is a .956 NW%? I have figured NW% for every Hall of Fame pitcher as well as a number of other great pitchers, around 150 or so. Obviously, there are many fine pitchers outside the scope of that group, including one year wonders like Lee. Still, it’s a sample that’s likely to include many of the best single season performances.

In this group, there is only one pitcher with ten or more wins in a season and a NW% of .900--Randy Johnson, 1995. The Big Unit went 18-2 with a 79-66 team which is a .906 NW%.

Steve Carlton’s 27-10 season in 1972 for a 59-97 team comes in at .845. Koufax’s best season is 1964 (19-5, .821). Grove’s best is 1931 (31-4, .816). Seaver’s best is 1981 (14-2, .842). Pedro’s best is 1999 (23-4, .839). Maddux’s best is 1995 (19-2, .866). Clemens’ best is 2001 (20-3, .846). Cy Young’s best is 1901 (33-10, .770). Mathewson’s best is 1909 (25-6, .782). Walter Johnson’s best is 1913 (36-7, .844). Alexander’s best is 1915 (31-10, .740). Joe Wood’s 34-5 season in 1912 was good for a .808 NW%.

Obviously, I do not mean to suggest that Lee has had the greatest season ever. NW% is a crude tool on a number of levels. However, one must admit that from a “playing around with numbers” perspective, Lee’s win-loss record is remarkable.

Wednesday, September 10, 2008

Scott Lewis, Major Leaguer

I was excited that Scott Lewis got a chance to pitch for the Indians. I was thrilled when he pitched eight shutout innings. While he only struck out three, he didn't walk anyone, and allowed three hits. Maybe not as impressive in a post-DIPS world as it would have been before, but still an outstanding debut.

He is at least the 48th Buckeye in the majors, and the third this season. Who will be next? Mike Madsen would be my guess, but at this time a year ago, I would have guessed that he would make it before Lewis.

Tuesday, September 09, 2008

Why I Don’t Care About the HOF, Pt. 2

The last post had some facts in it; this one has a lot of opinions. If you are not interested in my opinion, or don’t think that I should dare comment on your blessed Hall of Fame, then save yourself the aggravation and don’t read it. It’s not worth getting your blood pressure up.

First I will attempt to explain the viewpoint expressed in the title. Why don’t I care about the Hall of Fame? First, it should be noted that a better title would have been “Why I Don’t Care About the HOF as it is Currently Constituted” or “Why I Don’t Care Who the HOF Elects, Only Why It is Broken”, as writing a couple thousand words about something is generally inconsistent with not caring about it at all. Touche. Also, by HOF I am referring to the player inductions, not to the other functions of the institution (museum, research library, etc.)

I should really make this a permanent disclaimer for every post on this blog, but again, let it be noted that: the opinions and ideas expressed in this post are not necessarily new, I do not claim to have thought of them independently of reading the work of others, and even if I don’t explicitly state that something was proposed by someone else, it very well may have been.

One of the issues I have with the Hall of Fame, albeit not one that diminishes it for modern players, is the overlook of nineteenth century players, particularly from the 1870s, 1880s, and the pre-professional days. The HOF recently had a committee of knowledgeable people empowered to elect Negro League candidates; I believe that they need a similar initiative to honor the great stars of the nineteenth century.

In saying this, I realize that the VC is taking a vote on ten pre-1943 players, including Bill Dahlen and Deacon White. However, I do not expect that the volume of nineteenth century players that will eventually be picked is sufficient, nor does it result in the candidates being evaluated by those best suited for the task.

However, the primary reason I view the HOF as I do is that I believe it is incapable of truly honoring great players. This viewpoint is somewhat patronizing, as the Hall of Famers pretty much universally put on a good face, and talk about how honored they are to have been elected and about what a great honor it is.

However, I think that the Hall of Famers themselves, as well as the casual baseball fan, do not fully understand the number of players who are in the HOF who simply did not have the impressive careers that one would expect.. They are trusting, and assume that every body that has been empowered to induct players has done so responsibly.

Perhaps I am off base. Perhaps they know more about the players than I am giving them credit for. I doubt it, but I am open to being persuaded otherwise.

The fact that Lloyd Waner is in the Hall of Fame is not in and of itself damning to the ability of the institution to honor great players any more than that Jeff Burroughs won the AL MVP award means that Alex Rodriguez was unable to be sufficiently honored last year. However, the HOF is a perpetual institution; once Waner is there, he’s there forever. No one except Burroughs and whoever deserved the MVP award really cares that he was given it. It could set a precedent, but it seems to me as if those are wiped out from one generation to the next (at one time middle infielders were popular picks; now RBI men on contenders are in vogue). A HOF mistake like Waner is permanent, and while it doesn’t have to, it can permanently affect the standards to which other players are held.

To belabor the MVP comparison a bit (and this was actually discussed a bit in the BTF thread linked in the comments for the last post--although I had written the preceding paragraph before that discussion), while a poor MVP selection can indeed set a precedent and lead to more selections in the same vein, at least each year’s crop is not being compared to Jeff Burroughs. The selection of Justin Morneau in 2006 may have been similar in some way to that of Burroughs, but at least he was not being compared directly to Burroughs--he was being compared to Jeter, Ortiz, Hafner, Mauer, etc. In the case of the Hall of Fame, the most important comparison is not between the candidates on the ballot (after all, you can choose up to ten of them), but between the candidates and the precedent that has been set by those inducted before them. Thus the mistakes made by the HOF have much more potential for perpetuation than do those for the yearly awards.

Furthermore, if Lloyd Waner was the only questionable player in Cooperstown, it would be no big deal. Mistakes happen, and the world goes on. But it’s not just Lloyd Waner--it's Tommy McCarthy, and Harry Hooper, and Ross Youngs, and Chick Hafey, and Heinie Manush, and Earle Combs, and Edd Roush, and we’ve only touched on the outfielders (you may not consider each of those guys to be big mistakes, but I bet there are others that I do not include that you might). These guys make up a significant proportion of the Hall membership, to the point where I think a Rickey Henderson or a Tony Gwynn would be completely justified in saying, “What, you expect me to be honored by being included with Heine Manush and Ross Youngs?”

One thing that was misunderstood in the last post was my comment that the primary standard I am interested in is career value. I did not mean to (nor did I) suggest that any other viewpoint was invalid, only that the discussion was grounded in that perspective. The relevance here is that I don’t believe that Sandy Koufax belongs in the Hall of Fame. However, if you come from a peak-centric viewpoint, Koufax is very much a defensible choice, maybe even an inner circle HOFer. Selections like Koufax, which are defensible from one point of view but questionable from another, are not the problem. The problem is those choices that make very little sense from any point of view. If there is a rational, fair definition of what a Hall of Famer should be that would include Rube Marquard, I am not aware of it.

Getting back to Hooper, Hafey, et al., I don’t even mean to suggest that those guys shouldn’t be in the Hall of Fame. Maybe they should be. After all, the Hall of Fame has done very little to formally define what the standards should be, so everyone is free to make their own judgments. And if you feel that there should be 300 or 500 Hall of Famers, who am I to tell you you’re wrong?

However, at that point, I believe that the ability to honor the great players is gone. A Hall of Fame with Waner, Combs, and company can honor Kenny Lofton, Brett Butler, or Tim Salmon, but it cannot truly honor Tony Gwynn or Gary Sheffield, at least for my money.

Of course, if Gwynn chooses to be honored by it, that’s fine. Some people are naturally humble and will take any sort of accolade as an honor, and there’s nothing wrong with that. However, I personally do not celebrate (nor condemn) false modesty or condemn realism about one’s accomplishments.

Some people, in defending the ability of the Hall of Fame to honor players, point out that the writers have enforced a much higher standard than the Veterans Committee has. The data presented in the last post demonstrates that this is the case. However, any distinction between which body votes you in is solely in the eyes of an observer. All of the inductees have their plaques hanging on the same wall. The only real difference that it makes is that the probability of a vets choice being present and able to enjoy their accolades is lower.

Therefore, I don’t see any good argument for the writers continuing to uphold a higher standard, independent of the inherent problem of having two groups choosing from the same players at different times. If there is a player, let’s call him Alan Trammell, who for whatever reason has been rejected by the writers, but is a very good bet to be chosen by the vets down the line, what good does it do to deny him the honor now? The only good reason for the BBWAA to reject eventual Hall of Famers is to prevent a further decay of the HOF standards.

If the BBWAA selects Trammel, then the Vets Committee might induct a lesser player like Omar Vizquel in thirty years, since they can’t induct Trammell. If the BBWAA lowers its standards, and the Vets maintain a mandate to elect somebody, then eventually the Vets’ standards will probably become lower as well, and the standards will be trapped in a downward spiral.

However, this is a problem that is irrelevant to the question of whether Trammell himself does or does not deserve to be in Cooperstown. It is a problem only when the scope is expanded to multiple players over many years. In the individual case, it just causes Trammell to wait needlessly. (If you don’t think Trammell is worthy of induction at any rate, substitute the candidate of your choice who has been passed over by the writers; my intention is not to advocate for Trammell or anyone else).

How does the Hall of Fame get out of this mess, and escape the mistakes of the past? I believe that it could be done very easily by adding tiers to the Hall (obviously not a unique suggestion on my part). Some would argue that the writers/vets breakdown already constitutes a tiered Hall, but again, to the extent that there is, it is in perception only and not reflected in how members of the two groups are treated by the Hall.

I would propose a three-tiered Hall, for players only. The managers, executives, pioneers, etc. would remain in just one group which would be independent of the player tiers.

I would envision the tiers being voted on in a way not unlike what the Hall of Merit at BTF does. Instead of setting a minimum percentage of votes for a player to be elected, there would be timed elections that would result in a fixed number of honorees.

For an example of how this might work, the players currently in the Hall would all begin in the first tier. The second tier could be filled by an initial election of some sort that would put in 25 players (it doesn’t have to be 25). Then, every year there would be a second-tier vote for which all first-tier members would be eligible in perpetuity. One player would be selected per year. The third tier could be started with five of the second tier members, with an additional player chosen every three, five, or ten years depending on just how exclusive one wants this level to be.

I don’t have a proposal for the precise format of the elections, because 1) that’s not as important to me as the concept and 2) it’s not like it’s going to happen anyway, so what use is there in getting into specifics? I would like to think that the voting for the higher tiers could be done by a group of experts rather than the BBWAA or the former players (I do not mean to imply that writers or players could not be “experts”, just that qualified people are by no means limited to those two groups). Perhaps there would be a run off vote, or a MVP-style rank order ballot.

A not necessarily obvious positive about a tiered Hall would be that it would keep debates about past players in the forefront of baseball news for one day a year, rather than being the sole domain of small groups on message boards. It might serve to educate casual fans about baseball history and players of the past. If you consider the precedent of the All-Century Team, in which a special panel had to be added in order to honor obvious selections like Musial and Wagner, the historical knowledge of that voting base (which admittedly may not be reflective of casual fans at large) is quite poor. I doubt that the one day of coverage would lead to any sort of sea change, but it couldn’t hurt.

Do I expect that such a scheme will ever be adopted? Of course not. The Hall has changed its procedures a number of times throughout its history, but these changes have been mainly cosmetic and have done precious little to address the underlying flaws in the process. Am I saying that they must adopt the ideas here? No. What I am saying is that they should do something to correct the fundamental flaws in the system, and this is just a potential remedy that I would find palatable. However, since I doubt the Hall will reform its election process, I will continue to be disinterested in the debates about who should be inducted.

Tuesday, September 02, 2008

Why I Don’t Care About the HOF, pt. 1

I have stated before that I don’t particularly care about Hall of Fame debates because I think that the HOF is beyond salvation. That is a slight exaggeration of my position, but it’s close enough. However, I do want to comment a little more deeply about it here. In this post I will examine what the standards for the Hall have been empirically in terms I care about. This of course does not imply that the HOF selectors have actually used the basic sabermetric standards I will look at. In the next post, I will discuss why I feel the HOF is broken and the remedies that I think would serve to make it more interesting to me. If you feel that the title is belied a little by what follows, that is understandable.

The idea of examining the what the HOF standards are is by no means a new one. I, like many sabermetricians, have been deeply influenced by Bill James and on this topic particularly by his Politics of Glory. In that book, James set out a few methods for organizing thinking on the Hall, one of which was a Hall of Fame Standards list that set out a number of reasonable criteria (did the player get 2000 hits? 2500 hits? 300 home runs? Etc.). These tools were meant to mimic or approximate things that HOF voters seemed to value.

I have no interest in what HOF voters have valued; what I want to know is how the players that they’ve selected stack up by the standard(s) that I care about. As I have written before, my starting point in evaluating a player’s career is his total value above some baseline, usually replacement level. I am not interested in “peak value” or any such thing. I don’t wish to justify this here; some of the reasoning behind my position is in this post. If you disagree, that’s fine, but that is the perspective that the ensuing discussion is grounded in.

Up front, I should also tell you that this discussion only deals with post-1900 (or at least significantly post-1900) position players, and only those who had significant careers in the major leagues as we know them today (in other words, no Negro Leaguers). I have previously written up my personal rankings of the pitchers, and in that series I wrote a bit about the HOF standards for pitchers. I have excluded nineteenth century players because I believe that they have gotten the short shrift from the Hall of Fame, and because there are more issues in evaluating them. When I finish working through my nineteenth century stats series, I may revisit this topic with a focus on the 1800s.

I did not want to spend a bunch of time calculating career WAR figures, so I used a reasonable approximation. I used Pete Palmer’s Batter-Fielder Wins (formerly Total Player Rating) from the 2005 ESPN Baseball Encyclopedia. In doing so, I am not endorsing TPR; it is not the world’s worst way to evaluate players, but it does have its flaws. The offensive ratings are fine, and the positional adjustment approach is not optimal, but neither is the similar approach that I use, so I can scarcely complain about it. However, the fielding figures leave a lot to be desired.

That said, I am not particular concerned about that, because I am looking at the group of Hall of Famers in the aggregate. Even if I don’t particular trust Palmer’s fielding rating for Nap Lajoie, any distortion for any particular player will have little effect on the second baseman as a whole (particular the median value).

I needed to convert TPR into a measure above replacement, and that is easy enough. In this issue of By the Numbers (PDF link), approaches are offered by both Bill Deane and Tom Ruane. The approach that I used is closer to that of Deane’s, but a little more generous--I gave each player a win for every 73.6 games in which they played. That works out to 2.2 wins/162 games. Most estimates of replacement level peg it at around 2 wins per season for an individual position player; I used 2.2 because the .350 OW% (73% of average R/G) standard that I use coupled with 10 runs per win (the long-term average for Palmer’s RPW formula, corresponding to 9 RPG) suggests that is so. (1 - .73)*4.5 = 1.215 runs per game less for a replacement player in an average context. 1.215*162/9 = 21.87 runs for an individual player, divided by 10 is 2.2 wins.

If you want to argue that plate appearances or outs would be a more accurate way to add the extra wins, I’m not going to object. Using games will tend to overvalue part-time players, who may appear briefly as a pinch-hitter or defensive replacement. Again, my objective here is not to make fine distinctions between players but rather to have a rough estimate of value that we can use to examine groups of players.

If I was dealing with pitchers, I would add a win for each 80 innings pitched. (1.25-1)*4.5 = 1.125 less runs per nine innings, which is .1125 less per inning, divided by 10 runs per win = .0125 (1/80) wins/inning.

Now that the details are out of the way, let’s start looking at the results. First, just for the heck of it, here are the top ten Hall of Famers in WAR using this method. Again, I’m note endorsing these figures as being particularly good, but this is just to satisfy natural curiosity:

1. Ruth, 146

2. Lajoie, 129

3. Cobb, 127

4. Aaron, 126

5. Mays, 125

6. Speaker, 121

7. Wagner, 120

8. Williams, 118

9. Hornsby, 117

9. Musial, 117

I don’t think anyone really believes that Nap Lajoie was the second most valuable player in baseball history, especially after Bill James’ deconstruction of Lajoie’s fielding ranking in Win Shares.

Now let’s get on to the real matter at hand, how the Hall of Famers as a group stand in terms of TPR and WAR. The average TPR for a Hall of Famer is 35, with a WAR of 64. The median is a TPR of 33 with a WAR of 61.

So if WAR is your primary criterion, you can put anyone with a WAR of greater than 61 into the Hall without lowering the bar, as such players can only serve to increase the median. Of course, this analysis ignores the difference between those elected by the Veterans Committee and those elected by the BBWAA, as well as any differences between position.

I included the players (Gehrig and Clemente) who were chosen in special elections as writers’ choices. By my count, there are 42 VC picks, with an average TPR/WAR of 20/46. The median for that group is 18/46.

For the 77 BBWAA picks, the averages are 43/75, with medians of 39/70. So any time the writers’ elect a player with a WAR of at least 70, they can only raise the bar that they have set.

The fact that there are two different groups considering the same players with different standards is one of the big problems I have with the Hall of Fame, and I’ll expand on that thought in the next post. Regardless, it is true that the players selected by the writers contributed much more value to their teams than did those selected by the Veterans’ Committee. The highest-ranked Vets player is Arky Vaughn with 68 WAR, which ranks him below the median writers pick.

Here is the breakdown of the selections by position:

A couple notes on the position breakdown: “LF” is actually all corner outfielders, regardless of whether they played left or right. A player is generally classified at the position at which he played the most games, but I made several exceptions, including Ernie Banks (short rather than first), Rod Carew (second), Harmon Killebrew (third), and Paul Molitor (third).

To be honest, this is not what I expected to see. I expected to see positions on the left side of the defensive spectrum have the lower mean and median WAR figures then those on the right side. I was thinking that the difficulty of measuring fielding value would lead to shortstops with unimpressive TPR/WAR but strong defensive reputations being enshrined. Instead, first baseman are tied with center fielders for the lowest median value.

Of course, it is possible that Palmer’s evaluation system is overvaluing fielding, and thus making the left-side players look more valuable than they were. My take, though, is that it is an indication of how shallow reasoning behind many of the HOF selections was--emphasis on players with impressive offensive numbers, with little consideration for the other factors.

Here is a similar chart, this one giving the figures for those elected by the writers. I am not running a similar chart for the Veterans’ selections, so the last column in this chart lists the number of VC picks at the position:

One thing that can be said for the writers is that they have done an excellent job of balancing selections by position. All positions have between nine and eleven elected players (I am averaging the 21 corner outfielders across the two positions) with the exception of center field (seven).

Again, first baseman bring up the rear in terms of WAR, with a median of only 53 for the nine selections. Centerfielders have been held to a much tougher standard than any other position, so it would seem. The top four players (Cobb, Speaker, Mays, Mantle) tower over the others in WAR, with DiMaggio standing alone in a middle tier. The other two choices, Snider and Puckett, are at 52 and 48 WAR respectively.

Of course, my characterization may not be fair, as center field just happens to be a position where there have been a number of megastars, but slim pickings after that. However, I do consider it surprising that some of the marginal (at least in comparison to a DiMaggio or Mays) candidates like Jimmy Wynn, Reggie Smith, and Dale Murphy have been overlooked by the writers.

In closing, I acknowledge that I have thrown around a word like “standards” a lot, but that is incomplete, as I have only looked at this from the perspective of who is in the HOF, not who is not in the HOF. To truly determine what the standards of the HOF are, it is not enough to say that the median inductee is worth X WAR--we would also need to know how many non-HOFers have more than X WAR, and just in general examine the potential HOFers in more detail.

Finally, here is a link to a spreadsheet listing each player I considered, their primary position, which group elected them, and their career TPR and WAR.