Walk Like a Sabermetrician: My Top 60 Starters, Intro

I don’t have any good serious sabermetric stuff to write about currently, so I am pulling out a series I wrote ranking starting pitchers. I want to be clear that this is an activity for fun only and I don’t want to get this kind of frivolity confused with the more serious stuff I write.

When I started writing the title, I originally had “the greatest”. Then I decided against this because “greatness” means different things to different people. Then I put “most valuable”, and while I have proposed an objective, sabermetric definition of value, I realized I wasn’t following my own definition, and so that couldn’t be it. What I’ve defined as “performance” is the closest to what I’m doing here, but “The Top 60 Performing Starters” sounds stupid. So in the end, as it ultimately must be unless I was to chain myself to using a set formula and change nothing, it has to be my list. Claiming it as something else will only draw quibbles with how I defined terms. I’m sure that you will have quibbles with my approach, but no one can deny that it is my approach.

So why have I chosen to rank sixty starters. Well, I’ve limited myself to career major leaguers in who pitched primarily post-1900, and there happen to be 49 pitchers of this category currently in the Hall of Fame. There are a number of historically great pitchers currently in the game, and by the time they retire and move in to the Hall, there could well be 60. 50 would be too restrictive, as it wouldn’t cover all of the pitchers worthy of HOF induction (with the big unstated assumption being that the Hall has chosen the numbers of pitchers to honor in a sane way), but if you go to 100 you start debating whether Eddie Rommel was better then Jim Perry, and that’s not exactly stimulating. Besides, I have not run the numbers on every pitcher who ever lived, just those that are in the Hall of Fame, won a lot of games, ranked highly in TPI or WAT, or I was curious about. It is very possible that by the time you get down to the hundredth spot, there are guys who deserve to be in the discussion that I didn’t give a look. And I don’t want to exclude them, but I also don’t want to waste my time figuring the NW% for every pitcher who could possibly have a claim.

I will try to explain the principles behind my rankings, because half the battle is how you define things. Many of the arguments in sabermetrics stem from a failure to clearly explain what the point is. My favorite example is park factors. People will criticize run PFs on the basis that they treat lefties and righties the same. But if one is after value, it doesn’t matter whether you were left-handed or not. The impact on the value of a run in that environment is the same. Now you may have other objections to park factors, or the way some people use them, but to criticize someone for using them in a value system for the reason that they don’t consider handedness is invalid.

So, this is where I’m coming from:
1. I am only considering major league performance. That means no credit for Lefty Grove in Baltimore, no credit for Bob Feller in World War II, no credit for Satchel Paige in the Negro Leagues, and no credit to Herb Score for not ducking.

2. Point 1 is not my attempt to dismiss the importance of those things. Lefty Grove was a great pitcher in Baltimore. Bob Feller probably would rank much higher had he not fought for his country. Satchel Paige and his contemporaries were victims of racism and were legitimately great players, completely worthy of their places in the Hall of Fame and in baseball lore. The exception is Herb Score; it’s impossible to extrapolate what he would have done had he ducked. Bill James distinguished between the first three types by arguing to the effect that “Grove actually was a great pitcher in 1923, and Paige in 1934, and Feller in 1944, but Herb Score actually was not in 1963.” I don’t disagree with this, but I choose not to do any assuming at all about how things would have been if not for .

The Grove/Paige examples are even tougher because they were actually playing baseball in fairly high level environments, but I am just not comfortable enough interpreting their statistics (or lack thereof). I’m not qualified to do so. That does not mean that I am denying that Satchel Paige or Hilton Smith or Joe Rogan or whoever was a great pitcher, or that they should not be in the HOF, or that people who are qualified to do so shouldn’t include them in a ranking with the white pitchers of their day. I’m just not going to.

3. The guiding principle of the list is to measure based on value, or at least what I in the past have called “performance”. In other words, I care how much he actually helped the team he pitched for win games. I don’t care if he was hurt more by the park he pitched in because he was left-handed or because he gave up a lot of flyballs or anything like that. I hesitate to call my approach “value”, though, because value implies things like WPA and value-added runs, and that’s not really what I’m doing either. Basically, what I am doing is value, assuming that his events (singles, outs, walks, etc.) were distributed in a league-average way in terms of base/out situation, score differential, etc.

4. Value is measured against the nebulous replacement level, which I have defined as 125% of league average (.390 W%), for all-time. This is very debatable, as I have long been an advocate of a higher baseline then “replacement”, and assuming that it was the same in 1900 as it was in 2000 is quite an assumption.

5. Corollary to #4, I don’t put a lot of weight on “peak” value. I have never understood the fascination with peak value, as I have expressed before. First of all, nobody agrees on how to define it. To some it is the best three seasons. To some it is the best five consecutive seasons. To some it is the best seven series. To Don Malcolm when advocating on behalf of Dick Allen, it is the top nine consecutive seasons.

Now I suppose that there is nothing wrong with defining your criteria as “the best four consecutive seasons”, and then figuring out how players ranked based on that standard. But I just personally don’t see how that ties in to the greater HOF-type questions. To me, it seems that if one player was worth 100 wins to his teams over the course of his career, and another was worth 80, that the first guy is “greater” unless there’s a darn good reason to think otherwise.

From a value perspective, I believe that it is possible to give credit to “peak”, by looking at it terms of pennants. However, in this context I reject the term “peak” and prefer to refer to “clustered” performance, as opposed to “scattered” performance. The pennant approach stems originally from Bill James in The Politics of Glory, and later from the research of other sabermetricians, among them Michael Wolverton and Dan Levitt. After all, pennants are forever. If you have one great season, and your team wins the World Series because of it, that can be seen as being more valuable then helping a .500 team win 83 games each year for some period of time. If one +10 season helps a team win more pennants then two +5 seasons (and as far as we can tell, it does), then it makes sense to rate the one season guy ahead. The problem with this is that the attempts to quantify this show that the different rankings you get from using Pennants Added versus WAR aren’t really all that different.

I have not attempted to run a Pennants Added framework here, but I have kept track of Wins Above Average, which I do give some weight because of the pennant factor as well as the fact that I believe the WAR baseline is probably too low. So if I had a guy who was 300-280, and another who was 250-201, they would both be +74 WAR, but the 250 win pitcher would be +24.5 WAA while the 300 win pitcher would be +10 WAA, and I’d probably rank the 250 guy ahead. Generally, though, WAR is the primary factor.

6. Since the primary comparison is WAR-based, active pitchers are fair game. I’m not concerned about “ranking them too early”, because it’s unlikely that subsequent poor performances will do too much harm to their career WAR. If you compare to a higher baseline, this can be a problem. Now I have only considered older pitchers, so even if a Johan Santana would end up on the list (he wouldn’t), I didn’t figure him, so he wouldn’t be here. Active pitchers I considered are Moyer, Rogers, Clemens, Maddux, Glavine, Smoltz, Johnson, Pettitte, Pedro, Mussina, and Schilling.

7. I have not made any “timeline” adjustment. I have little doubt that the quality of play in the majors today is much better then it was a hundred or even fifty years ago, but I have treated a win in 1900 as equally valuable to a win in 2000. On the other hand, I have not given old-time pitchers any extra credit for pitching in shorter seasons in which each win was more valuable relative to the pennant.

8. The rankings are based on regular season pitching only; I have not considered hitting or post-season performance. Playoff performance certainly is valuable, but in many cases it is a negligible factor in terms of an entire career, even if you weight these games more heavily. In other cases, like Whitey Ford, it is not, and I have in some cases given some extra credit for it.

Hitting is also negligible for many pitchers. In a case like Wes Ferrell, though, you can’t ignore it, and so I have looked into his offensive value. However, there is nothing wrong in theory with having a list based solely on pitching performance, and then having another almost identical list based on pitcher’s total overall contribution. I have tried to make a hybrid, but you can legitimately split them up.

Now, what are the methods that I have used? Well, a very minor consideration were the NW%, WAT, and WCR figures that I did a series on earlier this year. The main considerations were similar stats based on runs allowed.

I use all runs, not just earned runs, which I’m not going to justify here. The Run Average is park-adjusted, using the park factors discussed here. Adjusted Run Average (ARA) is in the same vein as ERA+; it is N/RA*100, where N is league runs/game and RA is the park-adjusted RA. To figure WAA and WAR, I have assumed that the runs per win factor is equal to 2*N. This is not a terrible assumption, but it probably is not the best. There is a deeper issue here about the nature of the run to win converters and what they should do, which I honestly have not given full thought to and don’t want to deal with this in exercise. RPW = RPG is a graceful if incorrect way around it. It is also, incidentally a consequence of using a Pythagorean exponent of 2. Anyway, that gives these formulas:
WAA = (N - RA)*IP/9/(2*N)
WAR = (1.25*N - RA)*IP/9/(2*N)

I have also included the pitchers aggregate WAR in their best five seasons as “Top 5”; this is a “peak” measure, although I am wary about such things, and in this case early pitchers are definitely favored as they pitched many more innings in each season, so it is best to use it to compare contemporary pitchers if you use it at all. Also, AeRA is Adjusted Estimated Run Average, where eRA is a component ERA-type method. I have only included it for pitchers in the second half of the century as I did not want to have to come up with a run estimator covering the entire century and the changing available data. And since we are dealing with careers here, it is more likely then for a single season that any variation of RA from eRA will be a result of a poor eRA estimate, not “luck” in the small sample size making RA different from eRA. So it is a minor factor, along the lines of the W-L based tools.

I may at some times refer to arguments that other people have made in analyzing these pitchers. One of the best sources is of course the Historical Baseball Abstract, by Bill James, as both editions spend a good number of pages on rating players. Another is the Hall of Merit, the alternative history Hall of Fame hosted by Baseball Think Factory. They have spent the last few years voting on who should be included in their Hall of Merit, and many arguments have been advanced on behalf of candidates, and some good research done on them as well.

In the end, there will be people who don’t care how I rank theses guys, and to them, I say “good for you”. There are people who don’t like the practice of making these types of lists, or who think my criteria are stupid, or think I screwed over Sandy Koufax. That’s fine. But just remember that if you want to criticize my list, you should do it on the basis of my criteria. That is not to say that my criteria are unimpeachable, but if that’s your beef, feel free to criticize those. Don’t criticize what flows from them.

If I said I was going to rank the Presidents of the United States, on the basis of how pretty their daughters were, and then ranked George W. Bush ahead of Bill Clinton, would it make any sense to say “Well, P, Clinton was great because he signed welfare reform and NAFTA, and Bush is terrible because of McCain-Feingold and steel tariffs”. No, because that wasn’t the criteria. That is analogous to criticizing me for not ranking Koufax highly on a list that is explicitly stated as being based primarily on career WAR.

Would it make sense to say, “P, that’s a dumb way to rank presidents, and who really cares what their daughters look like?” Of course it would. That would be like saying, “Well, it’s true that Koufax doesn’t rank highly in career WAR, but that’s not a good way to rank pitchers.”

Would it make sense to say, “P, I completely disagree. Chelsea is much hotter then Barbara and Jenna combined”? Sure. I’d think you were nuts, but it would be a valid argument. That would be like saying “Given your criteria, P, I don’t see how you can possibly rank Tom Glavine ahead of John Smoltz.” You can accept my criteria and disagree with my conclusions. Or vice versa.

Walk Like a Sabermetrician

Sunday, May 13, 2007

My Top 60 Starters, Intro

No comments:

Post a Comment

Me, Elsewhere

Analysis Links

Reference Links

Blog Archive

OSU Baseball

End of Season Statistics

Win Shares Walkthrough

NL 1876-1881 Series

Labels

About Me