Monday, May 04, 2009

Comments on Baseball Prospectus 2009

I don’t want to bill this as a "book review", because that implies a level of formality that is lacking here. And many of my opinions have already been voiced by others around the net--a particular problem when one comes in to comment on a book over two months after it has been published.

Quite frankly, this is the worst edition of Baseball Prospectus I've read, and I’ve read each edition since 1998. Don’t take this too far--I'm still glad I purchased the book, as it remains the best source for a quick, intelligent outlook on just about every relevant player in organized baseball. But even that function was diminished in this edition by the lack of an index. In fairness to BP, they quickly released an index online and apologized profusely for the oversight.

From an editing standpoint, though, it's pretty inexcusable to leave the index out. Shoddy editing also showed in the Padres chapter, which read as if it was written as the last minute as the author waited to learn whether Jake Peavy would be retained or traded. While I can certainly sympathize with the plight of attempting to publish a relevant and updated book while dealing with a deadline, I'd like to believe that the result could be a bit more polished.

I think it continues to be a mistake on the part of BP to not reveal who the authors are for each portion of the book (I'm referring to the team and player comments, not the essays which list the author, as always). Speaking with one voice may be good for marketing ("Baseball Prospectus says…" sounds a bit more impressive than "Author X of Baseball Prospectus says") and apparent continuity (BP has a fairly high level of author turnover which would be all the more apparent if each article were bylined), but I don't believe that it does the reader any favors. As a reader, I like to know who I am reading, and I can then use my past impressions of their work to inform my view of the new material. It also would explain the contradictions in statistical measures cited, which otherwise appear to be purely schizophrenic.

Some writers use EqA when they want a catch-all rate measure of offensive performance. Others use MLVr. Still others use OPS+. Player comments blatantly contradict the fielding metric results in the corresponding statistical data, often without an acknowledgment of the disconnect.

The most blatant example of this failure to define terms/establish a standard for statistical measures comes in the use of Pythagenport to estimate team wins. The BP annual refers to Pythagenport, while the glossary on their website claims that they use Pythagenpat (which, in full disclosure, I am sometimes credited for co-inventing with David Smyth). It's not so much that I'm bothered that they may be using Pythagenpat without attribution, but that it makes it incredibly confusing to understand how their estimates are calculated. Another Pythagenport problem is on display in the Phillies chapter, in which the "Phillies in a Box" data lists the team with a 93-69 estimated record. However, the author of the chapter writes "by their Pythagorean projections, they were…an 87-win team in 2008". Huh?

In fact, there is an explanation. If you go to the BP website and look at the adjusted standings for 2008, you will find that the Phillies' "third-order record"--that is, a Pythagenport record fueled by EqR and EqR allowed and adjusted for strength of schedule--is 87-75. The average reader is never going to think to think that up as the explanation for the seeming contradiction of same-page data. And that opens the door for three different sets of records to be considered Pythagenport--the first, second, and third order records. Really, though, only the first-order record should be referred to as "Pythagenport" with no further explanation.

If I had to present a unified theory as to why multiple statistics are used for the same purpose, statistics are called by incorrect or misleading names, etc. it would be the drain of their top statistical talent. Clay Davenport is really the only hard-core sabermetrician still acting as a major part of their team. Nate Silver has largely moved on to political analysis, while Dan Fox and Keith Woolner were hired away by the Pirates and Indians respectively. Outside of Davenport, it is an open question as to how many BP writers could explain the intricacies and nuances of all of the statistics they publish in one form or the other.

This brain drain also explains the disappointing crop of essays in the 2009 edition. Traditionally (although I believe 2003 was an exception, and it's possible that another year or two was as well), at least one essay has been a serious sabermetric study of one sort or the other (exemplified by articles on replacement level, win expectancy, catcher's ERA, and other topics down through the years). The only essay that is even remotely sabermetric in the 2009 book is Davenport's explanation of the changes to BP's fielding and WARP methodology.

Of course, this dearth of sabermetric material shouldn't really come as a surprise, even if one ignored the absence of Fox, Silver, and Woolner. After all, Gary Huckaby (in-)famously informed us that "sabermetrics is dead" a few years ago--an arrogant proclamation at the moment it was made which looks downright asinine in retrospect as new frontiers of analysis like PitchF/x have opened up.

Now, about the Davenport article. I will let the fielding portion of it go without comment and leave that for the experts in that area. Upfront, I applaud Clay for making the changes he made despite his own lingering personal reservations about the methodology. In doing so, he has made BP's WARP figures much more useful to the sabermetric community at large. I also should make it clear that although I have written somewhat negative critiques of Equivalent Runs/Average before, I respect Davenport's work and want to exclude him from my broader criticism of the post-Woolner/Fox/Silver BP.

Commencing with the review of his new methodology, he uses an offensive position adjustment approach to calculate runs above average. Apparently these are variable throughout history, although only the modern era position averages are shown. Davenport explains that these adjustments are not based solely on the average EqA for the various positions--he tweaks them based on the defensive responsibilities of the position.

The following table gives the average EqA used at each position, along with its equivalent in terms of Adjusted RG (*) (in other words, runs/out relative to the league average), and the offense-based position-adjustment that I use. The juxtaposition of my values (which are nothing more than the major league averages for 1992-2001) and Davenport's is not in anyway intended to imply that the ones I use are correct and that Davenport's must be evaluated in relation to them.



I don't really have a problem with any of this; my only criticism is that Davenport does not even mention defense-based position adjustments as an option. It seems as if the presentation is between two choices: an offense-based position adjustment or Davenport's old approach of separate offensive and defensive replacement levels. That's really on the level of a nitpick, though.

Then Davenport uses Jose Reyes and Johan Santana as examples of how to calculate WARP under the new methodology. It seems as if there is an error in his example, as he alternates between saying that Jose Reyes made 507 and 499 outs last year. I'm sure this is nothing more than a typo, but it does make his example a little harder to follow.

The key detail to understand is that he uses a replacement level of -22.11 offensive runs per season, which is equivalent to using a .230 EqA, which is equivalent to using a .350 OW% as myself and many others do.

Another thing to keep in mind about his example is that the calculations are based on translated stats, so all of the various context differences (league, park, etc.) have already been accounted for.

There is one assumption made by Davenport that I think is unjustified, and it appears to cause pitcher's WARP to be overstated. For pitchers, Davenport still assumes that their replacement will be -22 runs over a full season, and prorates this over the number of outs the pitcher actually makes at the plate. But just as replacement-level position players tend to be average (or close to it), replacement-level pitchers should be average hitters relative to their peers.

So Johan Santana, with a pitiful .047 EqA (remember, Clay uses .125 as the positional average, so Johan had a bad showing at the plate even by pitcher standards) gets a little credit for his offense; he was -1.7 runs versus the average pitcher in his 69 outs, while the replacement costs his team 3.1 runs. Santana gets .3 more WARP by figuring it Davenport's way than he would if you assumed that a replacement pitcher would have no effect on the team's runs scored. All pitchers in non-DH will be likewise overvalued.

All in all, though, the changes to WARP are welcome and sensible if a little bit overdue. Reading Davenport's piece, it seems as if he still considers WARP to be a work in progress, and so I anticipate that we will see some further refinements down the line.

(*) The conversion of EqA to R/O is R/O = 5*EqA^2.5. The average EqA is .260, which is .172 R/O, and so the equivalent ARG figure is just 5*EqA^2.5/.172.

No comments:

Post a Comment

Comments are moderated, so there will be a lag between your post and it actually appearing. I reserve the right to reject any comment for any reason.