Monday, May 26, 2008

A Review of “The Bill James Gold Mine 2008”

I am one of the worst book reviewers in the world. This goes beyond just being a suspect writer; it’s all about the timing. I have never reviewed a book in a timely fashion; by the time I review it, it’s yesterday’s news.

This one is no exception, as The Bill James Gold Mine 2008 from ACTA Publications has been out for at least two months now. The good thing is that this book is not a season preview book in the way that, say, Baseball Prospectus is, and both it and the review have some degree of timelessness.

I should note upfront that I am not a subscriber to Bill James Online; many of the comments I have read have discussed the value of the book for those who are subscribers to the site. This is a review of the book itself, with no ancillary product having any weight on it.

This book, like just about everything James has ever written, is a good read. The problem with it is that there is really not that much to read. There are a number of essays interspersed throughout the book; most of these are not deeply analytical in nature, but are fun to read. For instance, there is one on “Cigar Points”, which is a kind of freak show stat that measures how close a player came to reaching significant seasonal or career milestones. Come to think of it, there are several freak show stats in the book. These are presented as such, and not as any sort of earth-shattering revelation.

The downside is that much of the book is taken up with statistical tables. Some of these are quite interesting--there are breakdown of the percentage of the various pitch types thrown by a given pitcher, broken down by the handedness of the batter; tables of batting statistics on groundballs, line drives to right field, and the like; tables showing the percentage of pitches swung at, taken, swung at by pitch location, etc. by batters. Some of the statistics presented are kind of silly as far as I’m concerned, but others might enjoy them. For example, I don’t get much out of a batter’s “RBI Analysis”, showing how many runs they drove in by various means and which teammates they drove in.

The problem is that these are presented on a selective basis; it’s not a reference book, and so James includes the data that he thinks is interesting or wants to use to make a point or discuss an idea. These rifts are usually about a paragraph in length, and sometimes simply a sentence that highlights something interesting in the accompanying table.

I have seen some comments that it is great to have James back writing what amounts to a reprisal of the Abstract. However, it is not that, at all. Perhaps it is more in the vein of the early self-published editions that were low on words and heavy on numbers, but it doesn’t really resemble, say, the 1985 Abstract at all. If you took a later edition, like 1986 or 1987, it would be like taking very small snippets from the player and team comments and adding in the extraneous essays.

One thing that was a bit disappointing to me about the book is that it relied very little on James’ sabermetric tools--you won’t see a lot of Win Shares, or Runs Created, or Component ERA (regardless of my opinions on the specifics of those methods, their objectives are fine). Instead, unbelievably, James introduces in passing a new method of Season Scores, which puts a number on a season through (apparently) some rough guidelines. It resembles nothing so much as Approximate Value. That kind of method can be okay for its own sake, but when James introduced Win Shares, part of his stated motivation was that he needed a single, accessible number for studies, but that no one took AV seriously because of its arbitrariness. Now he is back to using that sort of method.

Another weakness that has been apparent in James’ writing for some time, not unique to this book, is a lack of currency on the work being done by others in the field. Thus, James’ “Herbie” ERA estimator is very similar to FIP, but he does not notice it. This manifests itself in another strange way--James being unaware of his own previous work. The first iteration of Herbie is based solely on homers and walks (and hit batters). James says: “This is SUCH a simple concept that somebody must have done this before, but…it’s new to me.” In fact, someone did have an ERA based solely on homers and walks, and it was Bill James, with Indicated ERA in the 1987 Abstract.

Two qualifiers to that: the first is that indicated ERA (HR*W*100/IP^2) was not as accurate as Herbie, and James has produced such a massive volume of published work that it is hardly a failing to forget something. I’m sure that I have some scribblings in old notebooks somewhere that I have replicated unknowingly in the future, and of course I probably haven’t spent as much time on this stuff in my life as James did in 1978 alone. And I certainly don’t have anyone to check up on me, since my stuff is not good enough to be in the public domain.

Finally, the last few pages of the book were a little disappointing. They are excerpts from several articles on the website. I would have much preferred to see one more full article than snippets from several, and in this section the book does come across as secondary to and an advertising vehicle for the website. I have no problem with pushing the website, but I would have much rather seen that done by making an open appeal than by a tease.

Another problem with my book reviews is that I tend to focus on the negatives. Is this the Abstract reborn? No. Is it the most scintillating baseball book you’ve ever read? No again. Is there too high of a ratio of numbers to words? Yes. How many better books will be published this year for people who are interested in sabermetrics? One? Two? Certainly not five. If you are interested in sabermetrics, you should probably read this book or at least check out the website. Seriously, it’s still BILL JAMES.

Monday, May 19, 2008

An Analysis of Clay Davenport’s EqR and EqA

This piece will be an analysis of Clay Davenport’s EqR and EqA, and will attempt to break down how they work and what their true nature. As such, it is an opinion piece, and is critical in nature. It should certainly not be construed as a neutral presentation of formulas as the last piece aspired to.

Let’s focus on the nature of the EqR equation, which once again is:

EqR = (2*RAW/LgRAW - 1) * PA * Lg(R/PA)

where RAW = (H + TB + 1.5*(W + HB + SB) + SH + SF)/(AB + W + HB + SH + SF + SB + CS)

The formula starts by treating the league runs per plate appearance as a known constant. There is nothing inherently wrong with this, but it does give EqR an advantage in accuracy testing should other metrics not be granted the same consideration. One could also fix the values for LgRAW and Lg(R/PA) over some period of time and thus have defined numerical values as constants.

The relationship between RAW and expected runs per PA is linear, with a slope of 2 and an intercept of -1. This means that a team with a RAW 10% better than league average is expected to score 20% more runs than average. This property has sometimes led to the accuracy of EqA being underestimated in studies.

For example, in the 1999 Big Bad Baseball Annual, Jim Furtado tested the accuracy of rate stats in predicting by runs by assuming that there was a 1:1 relationship between the rate in question and runs per out. This was flawed on two levels; one is that most rate stats (including OPS) will test as more accurate when used to predict R/PA. The second is that not all statistics have a 1:1 ratio; OPS, like RAW, is closer to 2:1. OTS performed very well in that test because it is one of the few that does have a 1:1 ratio (or close to it).

One way then to attempt to fine-tune the accuracy of the EqR equation is to run a regression of RAW/LgRAW versus (R/PA)/Lg(R/PA). Using data for all teams 1990-2005 (except 1994), the long-term average R/PA is .1219 and the long-term average RAW is .7706. Using those two values as constants in place of the actual league averages, we get this optimal equation:

EQR = (1.968*RAW/.7706 - .968)* PA* .1219

As you can see, very little difference, and small enough to be ignored. However, with earlier versions of RAW that did not include HB, SF, and SH, the slope was closer to 1.9 and there was a measurable improvement in accuracy as a result of using the regression coefficients.

The most important to thing to understand about the structure of Equivalent Runs is that it is essentially a linear weights formula. It is difficult for some people to recognize this from just taking a glance at the formula. However, if one really takes a look at the formula, you’ll see that the only source of non-linearity is the treatment of stolen bases and caught stealing. The denominator of RAW is PA + SB + CS. If SB and CS are both zero, then the RAW denominator cancels with the multiplication by PA, and the formula is 100% linear.

There are strong points for linear run estimators, and there are also strong points for non-linear run estimators. Which class of estimator you should use depends on what you are trying to measure, and both have their uses in sabermetrics. However, it makes no sense to have a formula that is dynamic only in its treatment of stolen base attempts. Of course the value of a SB or CS is dependent on the overall context in which they occur. So is the value of a single, a homer, a walk, or any other event.

In fairness, the impact of stolen base attempts on the coefficients is fairly small. It is not a source of major distortion in the formula. Of course, the fact that they are fairly small makes the decision to treat them as denominator quantities and thus make EqR much more complex than it needs to be all the more puzzling. Such a choice can seemingly only be justified by a blind pursuit of lowering RMSE at the expense of all other attributes of a run estimator (such as logic).

What are the intrinsic linear weights used by EqR, anyway? We can use calculus to find the exact values for any given set of input stats. Let’s define some terms to make the equations easier to write and follow. L is Lg(R/PA), X = LgRAW, m = slope (we’ll use 2, but generalize it in case some other value was to be used), b = intercept (again, generalized to -1), P = PA, N = RAW numerator, D = RAW denominator. Also, let p be the derivative of PA with respect to a given event (1 for any PA event, 0 for any non-PA event like a SB or CS), n = derivative of RAW numerator with respect to a given event, and d = derivative of RAW denominator with respect to a given event. Then, we can rewrite EqR as:

EqR = (m/X)*(N/D)*L*P + b*L*P

The derivative of this with respect to a given event is:

(m*L/X)*(P*(D*n - N*d)/(D^2) + (N/D)*p) + b*L*p

We can use this equation to generate the linear weights for any given set of input stats. We’ll use the composite stats for 1990-2005 (excluding 1994) to get these weights:

EqR = .501S + .810D + 1.119T + 1.428HR + .347(W + HB) + .225SB - .238CS + .193(SH + SF) - .116(AB - H)

You can see that these are fairly reasonable linear weights. You can quibble about how optimal they actually are, but for the purposes of this article, we’ll just leave it at “they are in the ballpark”--there's nothing so far off as to be seriously distorting (as is the case for Basic RC, for instance). Thus, for all of the window dressing, EqR may as well be Estimated Runs Produced, Batting Runs, or any number of other linear weight estimators. In fact, if you apply the above formula to the teams in the aforementioned sample, and compare it to the “actual” EqR figure (using the long-term averages for L and X), the largest difference for any team is four runs--the 1991 Expos and the 1992 Angels.

To get a further handle on how the non-linearity rears its head, let’s consider Roger Maris’ 1961 season. I don’t have any special reason for choosing it, by the way. We will look at his EqR total (based on our fixed values for L and X) and the intrinsic weights with a few made-up SB/CS combinations.

The first column below gives Maris’ actual statistics. The second column has the intrinsic weights and EqR total based on those. The other columns present different, hypothetical SB/CS combinations with their intrinsic weights and EqR. For example “30/10” is 30 SB and 10 CS. The EqR figures at the bottom have removed the contribution of SB and CS--I am trying to show you the effect that different SB/CS combinations had on the weights, not the value of those stolen base attempts themselves. Again, the EqR figures at the bottom are EXCLUDING SB and CS; the different values for SB and CS have been changed the weights for all of the events, and those new values were used to generate EqR--without the contribution of SB and CS. The differences between the various cases are due solely to the changes in weights brought about by considering SB and CS in the formula, not the run values of the SB and CS themselves:


Let me first note that for the column with the weights for Maris' actual performance (zero stolen base attempts) , the weights are exactly the same as we would find for any player or team with zero stolen base attempts, given the same values for X and L. Algebraically, the formula for the weights simplifies under those conditions to:

m/X*n*L + b*L

For example, Maris' homer weight is 1.460. The value of a homer by the above formula is:

2/.7706*5*.1219 + (-1)*.1219 = 1.460

When looking at the other SB/CS combinations, the EqR stays within a 1.5 run range for all of the combinations. However, the weights for the various hit types move pretty wildly, even for the home run, which should be fairly stable. The out is also bizarrely affected.

This could all be easily fixed by using PA in the RAW denominator and making EqR a pure linear formula. Instead, you have something that is:
1) more complex
2) causes problems at extremes, if not in actual result then in theory
3) looks very confusing to those who don’t have the patience to sort through it

The bottom line is that EqR is not a terrible run estimator; it is a terribly constructed run estimator. If someone presents you a pre-figured list of EqR, feel free to look at it and consider the results fairly reasonable. However, if you are going to pick a run estimator to use, take a 100% linear estimator or a dynamic team modeling method (like Theoretical Team Base Runs) instead of a metric that can’t decide what it wants to be and thus is a bizarre hybrid.

Then we move on to the matter of Equivalent Average. For some further exposition on the nature of my issue with EqA, I will refer you to this post, where I discuss criteria for evaluating a statistic. The third criterion I listed is that of comparability: how can two players, teams, etc. be compared with this statistic, such that the comparison itself tells you something meaningful. With a ratio? With a difference? Both? Neither?

For example, in the case of a not-too-useful metric like Batting Average, both the ratio and the difference has inherent meaning. Comparing a .300 hitter to a .250 hitter, the ratio of 1.2 means that the first player is 1.2 times more likely to get a hit in a given at bat than is the second (leaving aside the question of how useful a subset of plate appearances At Bats actually are). The difference of .050 means that the first player gets .05 more hits for each at bat than the second. Both comparisons are meaningful, even if they aren’t particularly insightful.

Equivalent Average is based on runs per out. Runs per out is meaningful for both types of comparisons. EqA does not stop with R/O, though; first it divides it by five. Dividing by five does nothing to change the R/O ratio. It does make the differential comparison less useful, as “runs divided by five per out” is not a meaningful baseball unit. However, it is directly linearly-related to a meaningful one, and thus can still be considered meaningful (and if you disagree, you can just multiply by five).

However, to finish off EqA, the (R/O/5) is raised to the .4 power. This is done so that it approximates the scale of Batting Average. In doing so, though, both ratio and differential comparisons are destroyed. Consider a player with .4 runs per out (about 10 per game) and another with .2 (about 5). The true ratio is 2 and the difference is .2 runs per out.

However, when we convert to EqA, player A has a .364 and player B has a .276. What does the ratio of 1.32 mean? It doesn’t mean that player A has created 32% more runs per out. It doesn’t mean anything, unless you raise it to the 2.5 power. The difference is even more hopeless.

If someone gave me a list of EqA figures, the first thing I’d do would be to raise them to the 2.5 power and multiply by five to convert them back into runs per out. I don’t really want to get into whether having the figures on a BA scale or not is helpful; after all, that is a matter of preference. However, if you are going to make the scale conversion in a non-linear way, I believe it should be incumbent on you as the developer of the metric to make clear the issues involved in player comparisons. I do not believe that the average user of EqA has any idea of how to compare the ratios or differences in meaningful baseball terms.

Monday, May 12, 2008

How to Calculate Clay Davenport’s EqR and EqA (Unofficially, Of Course)

Equivalent Runs (EqR) and Equivalent Average (EqA) are measures of offensive productivity developed by Clay Davenport, and popularized through the various publications of Baseball Prospectus. EqR is a measure of runs contributed and EqA measures the rate of run production. While the presence of other websites and books have somewhat reduced the use of EqR and EqA by the sabermetric public, they still pop up from time to time. Unfortunately, a lot of people do not seem to understand how they are figured. I will attempt to explain that here.

I should note, of course, that I am in no way affiliated with Baseball Prospectus, and thus my description should be taken as that of an outsider, who may be out of the loop on changes to the methodology, or may himself misunderstand the procedure. So if anything that I say contradicts what they say, you know who to believe. Also, BP sometimes presents EqR and EqR formulas within the context of a larger evaluation system that also leads to WARP. That approach is pretty much equivalent to the older one described here; this one is more straightforward if one is only interested in understanding EqR and EqA.

Also, the issue of how to figure EqR and EqA is clouded by the fact that BP rarely publishes them in their unadjusted form. The EqA figures in their annual, for instance, have been put through the wringer of Davenport Translations to conform to an ideal league environment. I have no insight to offer on this process, and my description only attempts to cover straightforward applications of the two statistics.

Finally, I should note that I have written about this subject before, and thus there is nothing new here, even in the context of my own writing. I have seen comments that imply the article on my website on this topic is somewhat arcane, and I wanted to write a more accessible version. So as always, I milk this for all its worth by cross-posting to the blog. This installment is an attempt to be objective and just give a straightforward explanation of the formulas. The next article will give my opinions on the method, but I want to keep those two parts separate.

To figure EqR, one starts by figuring Raw EqR, or simply RAW:

RAW = (H + TB + 1.5*(W + HB + SB) + SH + SF)/(AB + W + HB + SH + SF + SB + CS)

RAW is then converted into an estimate of runs, EqR. The formula is:

EqR = (2*RAW/LgRAW - 1) * PA * Lg(R/PA)

EqR starts by taking the league average runs per PA as a given, and then changes the estimate for the team based on how their RAW compares to the league RAW. This relationship has a slope of two; if a team has a RAW 10% better than the league average, they are expected to score 20% more runs than the league average.

EqR is the estimated number of absolute runs contributed; as such, it is a similar measurement to Runs Created, Extrapolated Runs, and many other familiar methods. Equivalent Average is the rate that Davenport chose, just as Bill James uses RC/G or Offensive Winning Percentage.

EqA is designed to correspond to the scale of Batting Average, due to its familiarity to most baseball fans. Thus, the average is generally around .260 (the aforementioned translations force the average to exactly .260); figures below .200 are dreadful, figures above .300 are good, etc.

EqA is based on EqR per out, but that simple figure is first divided by five, then raised to the .4 power to produce EqA:

EqA = (EqR/Out/5) ^ (.4)

Friday, May 02, 2008

Derby Picks

Whenever I make predictions about baseball, I try to hammer home the point that they are offered in the spirit of fun and not as a serious sabermetric exercise. Making predictions is fun, and there’s nothing wrong with fun--so long as its not taken more seriously than that. Unfortunately, far too many of the people who make predictions love to trumpet their successes and ignore their failures, and act is if picking a pennant race is the ultimate test of how much you know about the game. No fun.

If all of that is true of my baseball picks, than it is true even more so when I dabble in another sport, like horse racing. Whatever you think of my level of baseball knowledge, whether you think I’m an intellectual heir to Bill James (fat chance) or a complete moron, let me assure that my level of horse racing knowledge is much, much, much lower. So all of the caveats about predictions apply even more strongly here. This is for fun. I am not Andy Beyer.

Just to hammer this home, I have been picking the Derby winner for several years, and have only once been right, and that was with an undefeated favorite (Smarty Jones). Last year I told you that Nobiz Like Shobiz would win (he was tenth). The year before I picked Lawyer Ron (a fine horse, as shown by his Older Male Eclipse Award won last year, but didn’t hit the board in the Derby).

This year’s Derby is even harder to make sense of, as it seems as though the talent level is not what it was the last two years (although making these kinds of sweeping judgments about a three year old crop in May is the sort of silly prediction that can get you into trouble). The growing presence of polytrack makes it harder to put the prep races into familiar perspective. The fact that the morning line favorite will break from the extreme outside post is not helping matters.

I’ll divide the race into three groups: horses I don’t like (to win this race), horses I like but don’t think will win, and horses I really like:

DON’T LIKE (the winner may very well come from this group)
1. Anak Nakal
2. Big Truck (but love the name and Barclay Tagg)
3. Bob Black Jack
4. Cowboy Cal
5. Eight Belles (filly, never has raced beyond 1 1/16 miles, never hsas faced a field anywhere near this good, personally can’t stand trainer Larry Jones)
6. Recapturetheglory
7. Smooth Air
8. Visionaire (nice story since he’s trained by Michael Matz of Barbaro fame)
9. Z Humor

LIKE
1. Adriano (would like a lot more if he hadn’t run all but one race on turf/poly, and ran a well-beaten ninth in that one; A.P. Indy doesn’t scream “turf pedigree”)
2. Big Brown (too many question marks: 20 post, foot issues, only 3 lifetime starts, son of a sprinter)
3. Cool Coal Man
4. Court Vision
5. Gayego
6. Monba
7. Tale of Ekati
8. Z Fortune

REALLY LIKE
1. Colonel John (love Tiznow, ran on poly but the California horses who came to the Arkansas Derby had no problems and he should love it; easy favorite IMO)
2. Denis of Cork (May be sitting on a big one after Illinois Derby flop, but very lightly raced for a Derby contender)
3. Pyro (I think the Blue Grass is a toss; that race has not been normal since they put the Poly in--remember Dominican beating Street Sense in last year’s debacle?)

If I was going to place a win bet on this (please don’t), I would go with Denis of Cork because he might give you a good price. A trifecta? Colonel John on top, Denis of Cork, Adriano. I’ve never thought losing money was fun, but the gambling industry does pretty well for itself, so I guess I’m out of touch.