Thursday, April 21, 2011

Wayne Winston's Mathletics

The "book reviews" on this blog are almost always a day late and a dollar short. They are written and published long after the book, and my comments about them usually don't amount to a review but rather as a springboard from which to discuss other topics. This one is no different.

Wayne Winston is a professor of Decision Sciences at Indiana University's business school and a former consultant to the NBA's Dallas Mavericks. He published Mathletics in 2009 with the tagline "How Gamblers, Managers, and Sports Enthusiasts Use Mathematics in Baseball, Basketball, and Football."

If you are a regular reader of this blog or similar material, do not buy this book expecting to learn a lot of new things about sabermetrics. The sabermetric material is fairly standard, rudimentary type material--introductory-level discussion of run estimators, park factors, replacement level, the base/out table, win expectancy, and the like. I would also not recommend it to a novice, not because it is poor (there are elements I like and dislike, as I'll discuss below), but because there are better resources out there--internet primers, Bennett and Fluck's Curve Ball, and Lee Panas' Beyond Batting Average among others.

I am not particularly well-read on either football or basketball quantitative analysis, so I cannot definitively state the level of Winston's discussion on those topics. My guess is that the football discussion is fairly basic (with the caveat that football analysis as a field lags behind apbrmetrics), but that the basketball material is much stronger. It is certainly obvious from the writing that basketball is Winston's passion, and that the adjusted plus/minus ratings are a particular favorite.

Winston's writing is not particularly strong--he writes like someone whose favorite class was math (as do I). There are some minor slip-ups in the baseball discussion; these won't mislead the reader, but they also reflect the pedestrian nature of the material:

* Winston includes a formula for estimating batting outs that accounts for ROE by putting a multiplier on at bats. But this applies the adjustment to all at bats, including those in which we know a batter did not reach on an error (hits) and those in which the likelihood was very small (strikeouts).

* He refers to Keith Woolner's statistic as VORPP--Value Over Replacement Player Points. This makes sense in that he applies the replacement level concept to WPA points, but he also refers to Woolner's run based version as VORPP. Additionally, he credits the concept of replacement level to Woolner. In reality, Woolner did much to popularize replacement level, but the concept did not originate with him.

* Similarly, he credits the concept of park factors to Bill James. James had much to do with popularizing the notion that statistics could be corrected for park effect, but if any single person is to be credited with the concept, Pete Palmer would be an easy choice.

* There is a chapter that discusses player improvement over time by comparing annual performance, but it does so without even really addressing aging and survivor bias.

* The discussion of strategy is fairly bare-bones and deals only with basic estimates based on a standard run expectancy table.

There are positive things of similar magnitude to the list of negatives--for example, while he uses Runs Created, he explains that a theoretical team construct is necessary to make accurate player comparisons. As a whole, the baseball portion of the book is adequate without being excellent for a novice and a yawn for those well-versed in sabermetrics.

Being a novice myself when it comes to football and basketball analysis, I found the discussion in those chapters much more interesting. Focusing on a couple interesting football tidbits, Winston offers a version of the famed two-point conversion chart that incorporates the expected number of possessions remaining in the game. There is also a formula for the probability of a successful field goal in the NFL based on distance that I found interesting, although the model produces results that are clearly too high for very long kicks.

There is also a discussion of quarterback ratings, which have always interested me. Like every other sane person, Winston has little use for the NFL system, focusing his discussion on Berri's rating from Wages of Wins and his own adaptation of Brian Burke's regression of team categories against team wins. Isolating the categories from Burke's equation that can be related directly to individual quarterbacks, Winston offers the following as a quarterback rating:

1.543*(Yards - Sack Yards)/(Attempts + Sacks) - 50.0957*(Interceptions/Attempts)

If you factor out and ignore the 1.543 coefficient, and change the second quantity's denominator to (Attempts + Sacks), this can be rewritten as:

(Yards - Sack Yards - 32.47*Interceptions)/(Attempts + Sacks)

In this form, Winston's rating is very similar to a number of rating formulas, including the NEWS rating published by Bob Carroll, John Thorn, and Pete Palmer in The Hidden Game of Football:

NEWS = (Yards - Sack Yards - 45*Interceptions + 10*Touchdowns)/(Attempts + Sacks)

Breaking into editorial mode and stepping away from Mathletics for a moment, the treatment of a touchdown pass can be thought of as somewhat analogous to the sacrifice fly in baseball. The comparison is strained as touchdown pass is always a positive play from any perspective, while a sacrifice fly might actually reduce run expectancy.

A fairly large number of touchdown passes occur on short passes. Suppose a quarterback completes a three-yard touchdown pass. This will actually reduce his rating in Winston's ranking, as the quarterback's rating prior to the touchdown will be higher than three. By giving a positive weight to all passing touchdowns, one could ensure that a touchdown pass always increases ranking.

However, in doing so, one gives special treatment to the touchdown because it is a tracked category (like sacrifice flies). However, one could also track "sacrifice grounders" or "first down completions". These theoretical categories would also be cases in which a positive or somewhat positive outcome was achieved, but the statistics treat it as a negative (a batting out or a reduction of the passer's rating, assuming the completion was short). Giving special treatment to the recorded categories can thus be seen as unhelpful and biased by particular types of players that might be predisposed to one or the other.

Moving back to the book, most of my comments to this point have focused on the negatives. However, there are three things that Winston does really well:

1. Winston provides downloadable spreadsheets for many of the examples. This allows the reader to follow along with the work and to learn how to carry it out in Excel. Many of the Excel steps are explained in the text as well.

The drawback to this is that some of the why behind the math is glossed over in favor of a quick Excel solution. Winston's rating system for NBA and NFL teams basically boil down to finding the best-fitting solution for a system of linear equations to predict the point margin in each game. Winston doesn't explain the math in that manner, though, instead just explaining that the Excel solver is used to minimize error. While this gives the reader enough detail to produce their own ratings, and no one is actually going to solve hundreds of equations, I personally prefer a stronger emphasis on the underlying math.

2. The bibliography is excellent, as it includes not just a list of sources but descriptions of what they offer. For example, this is the description of Phil Birnbaum's Sabermetric Research blog:

This is perhaps the best mathletics blog on the Internet. Sabermetrician Phil Birnbaum gives his cogent review and analysis of the latest mathletics research in hockey, baseball, football, and basketball. This is a must-read that often gives you clear and accurate summaries of complex and long research papers.

3. Winston's description of Birnbaum's blog provides a nice transition into discussing the best thing about his approach. While Winston has excellent academic credentials (he is a professor of Decisions Sciences at Indiana and earned a PhD at Yale in Operations Research), but he does not beat you over the head with it. In fact, I don't think that his doctorate is ever explicitly referenced.

In any event, Winston mixes the research of other academics into his text, but he gives plenty of space to amateurs as well. Some academics that enter the sports arena seem to thumb their nose down at anyone who doesn't hold an advanced degree or a teaching position. Winston is not one of them. He even used one of Birnbaum's posts to offer a counterpoint to an academic paper on the NFL draft.

Winston's book provides a great example of how sabermetric knowledge generated by academics, amateurs, and everyone in between can be integrated, and how all parties can respect and learn from each other. It also gives analysts specializing in each sport a window into the work being done on other sports. Thanks to those attributes, Mathletics is a worthwhile read.

1 comment:

  1. As I understand it, Winston's treatment analysis of NBA players was pissed on by the APBR guys.


Comments are moderated, so there will be a lag between your post and it actually appearing. I reserve the right to reject any comment for any reason.