Thursday, January 31, 2008

Sabermetrics Wiki

Tango Tiger has set up a sabermetrics wiki here. Hopefully, this will become a one-stop sabermetric encyclopedia and research diffusion center. For more information, see this thread, and then come over and contribute.

Tuesday, January 29, 2008

Best Hitting Infields and Outfields, 2007

In a BTF thread on the Phillies’ signing of Pedro Feliz, there was some discussion of whether or not he along with Howard, Utley, and Rollins gave them the best hitting infield in baseball. Obviously, that is a foreword-thinking question, but I thought that since I had the data by position for each team readily available, it would be mildly interesting to see which units performed the best last season.

The stats below are from the Baseball Direct Scoreboard; infielders are (duh) first baseman, second baseman, third baseman, and shortstops. Pitchers, catchers, and DHs do not factor in to the totals for either the infield or the outfield. All players who were classified by STATS as playing the position are included in the totals. There are no park adjustments, and RC does not include basestealing.

Beginning with infielders, three teams were within one-tenth of a run per game of the lead: the Yankees (6.19 RG), the Marlins (6.18), and the Phillies (6.14). Since the comments about Feliz inspired this little survey, it is worthwhile to note that third base was definitely the weakest infield position for Philadelphia last year (4.01)--however, Feliz himself created just 4.07 runs last year.

On the other side, the Giants trailed (3.90), followed by the Twins (4.00) and the White Sox (4.04). The overall average was 5.03.

The top hitting outfield was that of the Rockies (6.63), well ahead of the Tigers (6.34) and the Phillies (6.23). However, with a park adjustment, the Rockies drop to 6.03.

The average outfield created 5.17 runs. The worst hitters were on the south side of Chicago, as White Sox outfielders limped in at 4.34, ahead of the Mets, Diamondbacks, and Royals (4.40).

Monday, January 28, 2008

Beating a Dead Horse, pt. 1

Last summer, I had a post that I described as a rant about OPS, and my problems with the stat. I got (relatively) a lot of feedback about that piece, and you can add that fact to other evidence that shows OPS is still a topic of interest among people who are interested in sabermetrics. If you go to the kinds of discussion forums in which baseball fans of a sabermetric bent, but not themselves hard-core sabermetricians post, various combinations of OBA and SLG, along with rate stats revolving around bases gained, remain a popular topic.

More than one of the respondents to my post felt that it was in essence beating a dead horse, because all of the serious sabermetricians of the world have moved on and found more important things to worry about, and that enough has already been written on the topic to satisfy the inquiries of the less-informed. I largely agree with this; however, I am fairly stubborn by nature, and I like to express myself on sabermetric topics myself, even if others have already covered the matter thoroughly. So despite the fact that I agree that OPS is a topic whose time has passed, I am going to write a little series on it and its cousins. Also, I feel that if people are going to use OPS, they might as well understand how it relates to run scoring rates.

I would also add that by posting about it here, I do not expect anyone to pay attention to it. If you feel that horse has been bloodied enough, then by all means, pay no attention to this post. If no one else ever reads this, that would be just fine with me.

The earlier linked post did not deal a lot with numbers, which these will have more of. I don’t want to rewrite the same thing over again, but I do want to summarize my main points:

1) OPS is a quick and dirty statistic, and it is alright for this purpose. I am not saying that you should never, ever look at OPS.

2) OPS is a statistic which is unitless. If you write the formula over a common denominator, you have ((H + W)*AB + TB*(AB + W))/(AB*(AB + W)), which is a bunch of gobbledy-gook.

3) OPS also does not come in an estimated unit. Runs Created is an example of a stat that does; (H+W)*TB/(AB+W) is an estimate of runs scored. Obviously the result is not actually runs scored, thus it is an estimated unit (see my previous post for more on this). OPS does not have an estimated unit. “The higher the OPS, the better” or “OPS is an estimate of a player’s offensive productivity” are both true statements, but they do not confer a unit upon the stat.

4) OBA is a fundamental baseball statistic. If OBA did not already exist, you would want to invent it. SLG is not a fundamental baseball statistic, because it does not measure any fundamental quantity--it measures “bases gained by the batter on hits”. This is a unit, and it is a useful thing to know, but it is not fundamental. If this does not convince you, ask yourself the question, “What does SLG represent?” Some people will be tempted to say “power”, but that’s obviously false since it includes singles. Some people will say “advancement of baserunners”, which can be true, but it is also true that it is not even close the being the most accurate estimate of advancement. In contrast, there is no definition for a statistic that would better define what OBA attempts to define than OBA itself, at least that can be derived from the official statistics.

5) Because of points 2, 3, and 4, there is nothing special about OPS. When Pete Palmer comes around and invents “OPS+”, it may make sense to claim that the name is a little misleading, but it doesn’t make any sense to complain that it causes distortions in measuring performance because it deviates from OPS. Of course, it makes even less sense to complain about it when it was Palmer who invented both OPS and OPS+.

With that out of the way, let’s talk math. Throughout this series, I have defined OBA as (H + W)/(AB + W); I did not mess with HB or SF, so keep that in mind. Also, I will be focusing on things on the team level, and running a lot of regressions. Then I will be testing the accuracy of the equations on the same sample from which I derived them. I recognize that this is not the best approach to take, but I think that if you focus on the relative accuracy of the formulas to each other rather than the absolute RMSE figures, you will not be mislead too far.

Let’s start with the premise that we have OBA and SLG data for a large group of teams, and we want to estimate a run scoring rate from them. In this case, our large sample will be all teams 1961-2002, except 1981 and 1994. We will also look at using relative OBA, SLG, or OPS to predict relative runs scored, as established by the composite average for the dataset. I am doing it that way because that is how OPS+ is expressed, and because using a constant league average will wash out the adjustments.

Let’s define aOBA as OBA/LgOBA, aSLG as SLG/LgSLG, aOPS as OPS/LgOPS (this is what I called “SOPS+” in my earlier post), aR/P as (R/PA)/Lg(R/PA), and aR/O as (R/O)/Lg(R/O). OPS+ is OBA/LgOBA + SLG/LgSLG - 1, which is the same as aOBA + aSLG - 1.

When we regress OPS to estimate runs, what run rate should we regress to, R/PA or R/O? We can all agree that the most important thing to know about a team’s offense is its R/O, so that would seem to be the right choice. While this does not transfer perfectly to individuals, it is still true that R/O is more telling for them than R/PA is, and R/O is generally fine to use as an individual rate stat.

So let’s look at the equations to predict aR/O from aOPS and OPS+ for the sample in question:

aR/O = 2.06(aOPS) - 1.06
aR/O = 1.06(OPS+) - .06

Here we can see that OPS has a 2:1 relationship to runs scored. If you are 5% better than the league average in OPS, you will be approximately 10% better in runs scored per out (and, by extension, in runs scored). On the other hand, OPS+ has an almost 1:1 relationship to runs scored.

The practical implications of this are that if you see a player listed with an OPS+ of 125, you can interpret this as “the player is estimated to create 25% more runs/out than the league average.” It is of course an estimate, and it may not be as accurate as other estimates, but it does scale properly.

You cannot do the same thing with aOPS (and since aOPS is simply OPS divided by a constant, the same goes for OPS as well). If you have a batter with an OPS of 900 in a league with an OPS of 750, saying that his aOPS is 120 means nothing other than that his OPS is 20% higher than the league average. It does not mean that he created 20% more runs--in fact, he created something close to 40% more runs.

As I mentioned in the older piece, people are conditioned to expect that when they see a stat called X+, it will be calculated as X/LgX. OPS+ breaks this convention, and it is curious why Pete Palmer chose to name it as he did (originally it was PRO+, but OPS itself was PRO, so it was the same situation). However, Pete’s choice to use OPS+ instead of aOPS was a good one, as it is more accurate and expressed in estimated units that have meaning.

Once we have the above equations, we can estimate team runs scored and see how accurate the estimates are (keeping in mind the caveats about using the regression on the sample it was derived from). We can find Runs = aR/O*Lg(R/O)*O. We know that for our sample, BA = .258, OBA = .324, SLG = .391, R/PA = .117, and R/O = .172. Plugging everything in, the RMSE against actual runs scored is 25.72 for the aOPS equation and 24.88 for the OPS+ equation.

What would happen if we tried to predict R/PA, rather than R/O? We would get these equations:

aR/P = 1.74(aOPS) - .74
aR/P = .90(OPS+) + .10

The estimate for team runs scored will be Runs = aR/P*Lg(R/PA)*PA. The RMSEs in this case are 23.76 for the aOPS equation and 23.64 for the OPS+ equation--over a run better than the R/O predictions. Why is this?

First, let me claim without presenting any evidence that most statistics do better at predicting team runs scored when estimating R/PA than it does when estimating R/O. To understand why this is, we need to remind ourselves about the relationship between R/PA and R/O. Assuming, as we have in this case, that the only outs are batting outs and that there are no ways to reach base that are not included in OBA, the relationship R/O = (R/PA)/(1 - OBA) holds. This is not an “estimate”; it is a demonstrable mathematical truth. As you can see, the On Base Average is key, since it is the complement of the rate at which outs are made. It is better to estimate R/PA from OBA and SLG, then convert it to R/O by dividing by (1 - OBA). Instead of doing a regression to try to incorporate the value of OBA, you are better off to use OBA directly for that purpose.

So it is in fact more accurate to estimate R/PA from OPS than it is to estimate R/O. However, if you agree with the premise that individual productivity should be measured in terms of runs/out, and you use OPS or OPS+ as your rate stat of choice, you are in essence locking yourself in to considering the less accurate R/O relationships.

Sunday, January 20, 2008

Units and Comparisons

In writing some other stuff, I’ve noticed that I’ve been referencing this topic, and so I figured I should just write about it in isolation so that I don’t have to go off on a tangent elsewhere.

This kind of goes back to Bill James’ article in the 1987 Baseball Abstract about “meaningful and meaningless statistics”. When looking at a baseball statistic (which I’m using to mean a category like “Triples”, not a single statistic like “Curtis Granderson had 21 triples in 2007”), I ask myself a few questions. These questions don’t answer how worthwhile it is to know, but they are very useful in considering derived statistics:

1) What are the units of the statistic? Are they actual units or estimated units?

Consider a fairly mundane counting statistic, the balk. The unit of a balk is clearly “balks”, and since the category is simply a count, they are actual units.

Batting average is measured in units of “hits per time at bat”. Again, they are actual units, although the at bat itself is a weird, kind of artificial subcategory of plate appearances.

OPS is measured in units of “total bases per at bat plus times on base per plate appearance”. Since we have two different denominators, there’s not a clear unit here as there is in the two components taken separately. The unit total bases per at bat plus times on base per plate appearance has no clear meaning; people use the stat as an approximation of overall offensive ability, but overall offensive ability is not a unit either. OPS has no units.

Runs created is measured in units of “estimated runs”. While both RC and OPS are estimates, the distinction between these will be explained below.

Questions two and three apply primarily to derived statistics, not from counts of events.

2) If the derived statistic is measured in an estimated unit, is it one that is fundamental to our understanding of baseball?

By fundamental, I mean units of things that really matter in terms of winning baseball games. If the stat is expressed in estimated runs or wins, then it is fundamental, although I would consider other things to be fundamental.

For example, On Base Average is a very fundamental thing to know; the rate of reaching base. Of course, OBA is not really an estimated unit, but an actual count of things.

I also consider any sort of event frequency with a sensible denominator to be fundamental...walks/PA, homers/PA, hits/balls in play, etc. Of course, some of these are more telling than others (catcher’s interference/PA is not particularly important to know), but they all are very straightforward, basic pieces of information.

Slugging Average is a more interesting case; clearly “bases gained by the batter on hits” is a factual count. However, bases gained by the batter on hits is not critical to understanding baseball, like the rate of reaching base is. If it were “bases gained on hits by batter and baserunners”, then it would be a bit more telling, either on the team level or as an estimated unit on the individual level. So I’ll leave that one up there as one to be decided on. It’s sort of like if you took (balks + wild pitches)/inning.

OPS fails this test, since it’s not measured in terms of anything. That does not mean that OPS cannot be transformed by mathematical operation into a fundamental estimated unit, like runs, but on its own, the units are meaningless.

Win Shares is measured in wins, except the wins are multiplied by three. I consider that to be fundamental. If you want to be a stickler and demand that Win Shares divided by three to consider it a fundamental estimated unit, that’s okay too. The reason I don’t draw a distinction is that the transformation is scalar and straightforward.

3) Can two players (or teams, etc.) be compared using this statistic by the difference between their figures? By the ratio? Both? Neither?

What I am getting at here is does the difference or the ratio have meaning, other than just to tell us which is better. For instance, an OPS of 1000 can be compared to an OPS of 800, and seeing that the one player is +200 points or has a 1.2 ratio, we can see that the 1000 OPS is superior. But the +200 points or the 1.2 ratio don’t have any meaning other than facilitating the comparison.

For an example of a statistic that can be compared by difference and ratio, take runs created per game. 5 RG is two more runs per game than 3 RG, or it’s 67% more runs. Either way, the numeric result of the comparison is expressed in a meaningful unit. There are many stats that would fall into this category: Winning Percentage, On Base Average, Wins, Losses, …, Balks.

However, there are some derived statistics for which only one of the operations produces a meaningful result. Take Runs Above Average for example. If I tell you that one player is +10 RAA and the another is +1, then we have a ratio of 10 and a difference of 9. The difference of nine tells us that Player A contributed nine more runs beyond an average player than Player B did, which is valuable. But the ratio of 10 just obfuscates things, unless you take the position that RAA measures value, and thus Player A is ten times more valuable. Even if one takes that view, the ratio gives a much cloudier picture of the disparity in value than does the difference.

Statistics with meaningful ratios but meaningless differences are harder to come by, but one example is ERA+. ERA+ inverts the usual format of a relative statistic (X/LgX) to LgX/X, in order to make a figure above 100 desirable as it is for OPS+, or Relative Batting Average, or any number of other such stats.

This seems innocuous enough at first glance, but it causes some problems, and you need to be careful when averaging it, as Tango Tiger has shown. Suppose that you have a pitcher who works exactly 200 innings in consecutive seasons in leagues with an ERA of 4.50. In the first season, our pitcher’s ERA is 3.00, and thus his ERA+ is 1.50. In the second season, his ERA jumps to 4.00, and his ERA+ comes in at 1.125. Since he worked the same number of innings each year, we can just average his ERAs together and find that he has compiled a 3.50, which is a 1.286 ERA+. However, if we average his ERA+s, we get 1.313.

The reason this ends up happening is that when you invert the calculation, earned runs, rather than innings, become the denominator quantity. So in order to average the two seasons, you must weight by earned runs (and indeed (4*1.125 + 3*1.5)/(4+3) = 1.286).

For the same reason, the difference between the two means nothing. 1.5-1.125 = .375. What does .375 represent? If we take it times the league ERA of 4.5, we would expect to get the difference between the two ERAs (1), but instead you get 1.69.

Suppose that we instead consider ERA/LgERA. The seasonal figures are now .667 and .889, and we know the total to be 3.5/4.5 = .778, and the average of .667 and .889 is indeed .778. Now the difference between the two, multiplied by the common league ERA is (.889-.667)*4.5 = 1.

The ratio between the two is .889/.667 = 1.333; the ratio between the ERA+s is 1.50/1.125 = 1.333. You can see that the ERA+ ratio is meaningful, but the difference is not, and that’s something you should always keep in mind when working with ERA+ over the course of a pitcher’s career. So while we may have become accustomed to ERA+, its reciprocal is easier to work with and is meaningful in ratio and differential comparisons.

Then there are the statistics for which neither the difference nor the ratio has any intrinsic meaning. Generally speaking, these are the derived stats that I cannot stand, and wish that their inventors and figurers would convert them to a different format. Examples include OPS (since it is unitless to begin with), EQA, and Offensive Winning Percentage. I’ll examine the case of OW% here. OW% starts with a statistic, Adjusted RG, which is meaningful by both difference and ratio, and converts it into a format where it is meaningless by both, at least for application to individual players.

Assuming a pythagorean exponent of 2, OW% = RG^2/(RG^2 + Lg(R/G)^2), or ARG^2/(ARG^2 + 1). Suppose we have a hitter with an ARG of 110, and another with an ARG of 120. Player A has created 110% of the league average runs per out; Player B 120%. The ratio 120/110 is meaningful; it tells us that Player B created 9% more runs per out than did Player A. The difference 1.2-1.1 is meaningful as well. If we multiply by the league average R/G (say .18), we will find that player B was created .018 runs/out more than Player A.

When we convert to OW%, Player A is now at .548 and Player B is at .590. The ratio between them is now .590/.548 = 1.078. In what sense was Player B 7.8% more productive, more valuable, more whatever than Player A? The answer is in no sense that reflects on their actual status as individual members of a ballclub. It is true that a whole team that hit like Player B would be expected to win 7.8% more games than Player A. However, this does not translate directly into any statement of their actual value as individual members of a team.

The difference is just an unintelligible. OW% is okay for the thought exercise aspect of “how good would a whole team of this guy be”, but when it comes to actually measuring the value of a player to his team in a meaningful way, it fails. All of this is not to say that non-linear relationships have no place in evaluating players, but if you’re going to use them, you need to be sure that you model reality and not an unrealistic scenario. For example, you could ask “What would the team’s W% be if it was made of eight average players and Player X” and use the pythagorean formula to make an estimate. The relationship between two player’s OW%s figured that way would not be the same as the relationship between their ARG, but one could argue that it would be a truer reflection of their value. OW% cannot make that claim, and thus the non-linearity just serves to obfuscate relationships between players.

To wrap this rambling atrocity up, the most useful statistics tend to have positive answers to all three questions: they are denominated in some sort of unit, that unit is fundamentally important, and the ratio or difference between players or teams express the relationship between them in a meaningful way.