Thursday, March 11, 2010

Runs Anything and the Hall of Fame

I don't actually want to discuss specific Hall of Fame candidacies, but I will present career Runs Anything figures for predominantly post-1900 Hall of Famers (not including those selected in 2010, as I wrote this a week before the results were announced) and a number of players who often come up in Hall of Fame debates. While I don't care for many of the Hall's selections, using membership as a standard is a convenient way to keep the player pool to a reasonable size.

First, here is a list of the top ten Hall of Famers in R+. They are mostly the usual suspects, but I think there is one name that will surprise you--it certainly surprised me:



Frank Chance scored 60% more runs per out than the average player in his day. He did have a short career and he was on a historically great team, but his is still a surprising name to have pop up on a list of this type.

Here is a similar list for RBI+:



Willie Stargell is probably the most surprising name on the list (and perhaps Cobb if one fails to account for context in setting their expectations), but for the most part these are exactly the players you would expect to see: middle-of-the-order studs.

There are five players who crack the top ten in both R+ and RBI+ (Cobb, Gehrig, Hornsby, Ruth, and Williams)--it is thus no surprise that they make up the top five in Runs Anything:



A natural question to ask is "what is the degree of agreement between Runs Anything and a sabermetrically-approved context-neutral measure of the same thing, on the career level?" With the release of wRC+ on Fangraphs, it is very easy to compare ANY to such a metric. I did so for the Hall of Famers.

Of course, this is a very biased sample--it is only those players selected as the greatest of all-time, and there may well be a tendency for players who excel in R and RBI to a greater extent than context-neutral statistics to be voted into the Hall. The players in the sample all have long careers, there is a higher representation of players that excel in both runs and RBI than one would observe among the general player population, and there also bound to be more middle-of-the-order hitters, who are placed in a much better position to excel in both categories than leadoff hitters, for instance. wRC+ includes a park adjustment; ANY does not.

With all of the obvious problems with restricting the comparison to this group, why am I bothering? Because players of this group are the only ones for whom mainstream fans really examine career statistics closely. Ordinary players are hardly ever placed in historical perspective; the Hall of Fame debates are really the only time careers are given any sort of through examination. Ordinary players are talked about each season, but when their useful playing careers end, they are forgotten by and large. Few people are particularly concerned about Jeff King's career batting performance and how it compares to that of Al Martin. Those that are probably aren't using R and RBI anyway.

In fact, as a brief digression, the fact that HOF debates are the only time that players are really placed into historical context is what might make it seem as if I care about them. If you read this blog, you'll note a few posts tagged "Hall of Fame"--despite the fact that I do not think the current Hall of Fame structure is capable of being salvaged and don't care which players are chosen. But if someone like me (interested in both statistics and the history of the game) chooses to ignore the discussion surrounding the Hall altogether, they are basically shutting himself out from comparing players' careers. It's really the only time that the topic gets any sort of mainstream play (one could talk about top 100 lists, and comparisons between Aaron and Bonds when the home run record fell, or what have you, but those are in a certain sense subsets of the Hall debate--examining the records of no-brainer Hall of Famers).

If you want to skip the last two paragraphs, the takeaway is that I am only looking at the relationship between ANY and wRC+ for great players because that is the only situation in which ANY would be used in practice (in fact, it won't be at all, but that's besides the point).

So, for the Hall of Famers, the correlation between ANY and wRC+ is +.94. The average absolute value of the difference between the two is 7, and the average difference (ANY - wRC+) is -1 (in other words, the average player has a very slightly lower ANY). The players that ANY favors the most:



It's pretty interesting how close both rates are for the three catchers (Bench, Berra, Campanella); they all have ANY of 141-143 and wRC+ of 130. Traynor, Medwick, Klein, and Wilson can be tied into their own group as thirties hitters (although the others, as corner outfielders, form a tighter group than is made if Traynor is included).

The players for that ANY doesn't like:



A lot of high-OBA, top of the order types in this group, highlighted by Henderson and Morgan. It's interesting that two Hall of Famers who would probably be considered in the bottom half of the Hall show up here (Carey and Combs), and that a player often put forth as one of the very worst (Ferrell) is here as well.

The correlation between R+ and wRC+ is +.87; between RBI+ and wRC+ it is +.78. At least with this very biased sample, runs are more closely related to estimated context-neutral production than RBI. The next logical step is to run a regression to estimate wRC+ from R+ and RBI+ (then to never actually use it for a myriad of reasons, as this the type of mathematical gyration that would drive me crazy if it was being used for serious purposes):

wRC+ ~ = .616(R+) + .287(RBI+) + 14.7

This equation estimates that a player with average R+ and RBI+ would actually have a 105 wRC+, which is not quite satisfying; it also puts a double-weight on runs scored. We could try a regression without an intercept:

wRC+ ~ = .710(R+) + .302(RBI+)

This equation puts a R+ = RBI+ = 100 player at 101, which is more in line with what we'd expect, but produces less accurate estimates for this group.

To get back to the point of these posts, one can use runs scored and RBI to derive a decent estimate of a player's productivity, particularly over a career. I'd never use it and even when dressed up it retains the drawbacks present in looking at R and RBI in any event, but I think that it could have some limited persuasive value with mainstream fans if you could get them to accept the key premise (productivity must be expressed in relation to outs). Of course, if you can get one to accept that premise, they are probably open to further sabermetric indoctrination, and that would obviously be preferable.

This spreadsheet includes career totals in these categories for all of the predominantly post-1900 Hall of Famers as well as a few other selected players (the others are not in any way intended to be a comprehensive list of the most productive non-HOF players).

2 comments:

  1. There is a flaw with the way Fangraphs calculates linear weights for seasons where there is no CS data. Players who stole a lot of bases during these years are going to have inflated LW. This also has the affect of deflating the LW of players who didn't steal a lot of bases. They also don't discount 19th century stolen bases. Take Hugh Nicol and his 138 stolen bases in 1887 for instance. Fangraphs credits Nicol's with 28 RAA and RC for his 138 steals. Looking at my SB/CS estimates, I have Nicol with 86 SB, and 28 CS, which Fangraphs would translate to approximately 6 RAA, a difference of 22 RAA.

    ReplyDelete
  2. I wasn't aware of that. However, I'm sort of in their camp on the 1880s numbers. The most accurate BsR equation I could muster up gave an intrinsic weight of .175 runs for a SB in the 1887 AA, and I didn't make any attempt to estimate "true" SB or CS. So I have the 138 steals as worth 24 runs, and Nicol as 20% above-average despite a 215/341/267 batting line.

    That's not say my approach is correct either, but I was able to get more accurate estimates of team runs scored by treating SB in that manner.

    ReplyDelete

Comments are moderated, so there will be a lag between your post and it actually appearing. I reserve the right to reject any comment for any reason.