Thursday, April 21, 2011

Wayne Winston's Mathletics

The "book reviews" on this blog are almost always a day late and a dollar short. They are written and published long after the book, and my comments about them usually don't amount to a review but rather as a springboard from which to discuss other topics. This one is no different.

Wayne Winston is a professor of Decision Sciences at Indiana University's business school and a former consultant to the NBA's Dallas Mavericks. He published Mathletics in 2009 with the tagline "How Gamblers, Managers, and Sports Enthusiasts Use Mathematics in Baseball, Basketball, and Football."

If you are a regular reader of this blog or similar material, do not buy this book expecting to learn a lot of new things about sabermetrics. The sabermetric material is fairly standard, rudimentary type material--introductory-level discussion of run estimators, park factors, replacement level, the base/out table, win expectancy, and the like. I would also not recommend it to a novice, not because it is poor (there are elements I like and dislike, as I'll discuss below), but because there are better resources out there--internet primers, Bennett and Fluck's Curve Ball, and Lee Panas' Beyond Batting Average among others.

I am not particularly well-read on either football or basketball quantitative analysis, so I cannot definitively state the level of Winston's discussion on those topics. My guess is that the football discussion is fairly basic (with the caveat that football analysis as a field lags behind apbrmetrics), but that the basketball material is much stronger. It is certainly obvious from the writing that basketball is Winston's passion, and that the adjusted plus/minus ratings are a particular favorite.

Winston's writing is not particularly strong--he writes like someone whose favorite class was math (as do I). There are some minor slip-ups in the baseball discussion; these won't mislead the reader, but they also reflect the pedestrian nature of the material:

* Winston includes a formula for estimating batting outs that accounts for ROE by putting a multiplier on at bats. But this applies the adjustment to all at bats, including those in which we know a batter did not reach on an error (hits) and those in which the likelihood was very small (strikeouts).

* He refers to Keith Woolner's statistic as VORPP--Value Over Replacement Player Points. This makes sense in that he applies the replacement level concept to WPA points, but he also refers to Woolner's run based version as VORPP. Additionally, he credits the concept of replacement level to Woolner. In reality, Woolner did much to popularize replacement level, but the concept did not originate with him.

* Similarly, he credits the concept of park factors to Bill James. James had much to do with popularizing the notion that statistics could be corrected for park effect, but if any single person is to be credited with the concept, Pete Palmer would be an easy choice.

* There is a chapter that discusses player improvement over time by comparing annual performance, but it does so without even really addressing aging and survivor bias.

* The discussion of strategy is fairly bare-bones and deals only with basic estimates based on a standard run expectancy table.

There are positive things of similar magnitude to the list of negatives--for example, while he uses Runs Created, he explains that a theoretical team construct is necessary to make accurate player comparisons. As a whole, the baseball portion of the book is adequate without being excellent for a novice and a yawn for those well-versed in sabermetrics.

Being a novice myself when it comes to football and basketball analysis, I found the discussion in those chapters much more interesting. Focusing on a couple interesting football tidbits, Winston offers a version of the famed two-point conversion chart that incorporates the expected number of possessions remaining in the game. There is also a formula for the probability of a successful field goal in the NFL based on distance that I found interesting, although the model produces results that are clearly too high for very long kicks.

There is also a discussion of quarterback ratings, which have always interested me. Like every other sane person, Winston has little use for the NFL system, focusing his discussion on Berri's rating from Wages of Wins and his own adaptation of Brian Burke's regression of team categories against team wins. Isolating the categories from Burke's equation that can be related directly to individual quarterbacks, Winston offers the following as a quarterback rating:

1.543*(Yards - Sack Yards)/(Attempts + Sacks) - 50.0957*(Interceptions/Attempts)

If you factor out and ignore the 1.543 coefficient, and change the second quantity's denominator to (Attempts + Sacks), this can be rewritten as:

(Yards - Sack Yards - 32.47*Interceptions)/(Attempts + Sacks)

In this form, Winston's rating is very similar to a number of rating formulas, including the NEWS rating published by Bob Carroll, John Thorn, and Pete Palmer in The Hidden Game of Football:

NEWS = (Yards - Sack Yards - 45*Interceptions + 10*Touchdowns)/(Attempts + Sacks)

Breaking into editorial mode and stepping away from Mathletics for a moment, the treatment of a touchdown pass can be thought of as somewhat analogous to the sacrifice fly in baseball. The comparison is strained as touchdown pass is always a positive play from any perspective, while a sacrifice fly might actually reduce run expectancy.

A fairly large number of touchdown passes occur on short passes. Suppose a quarterback completes a three-yard touchdown pass. This will actually reduce his rating in Winston's ranking, as the quarterback's rating prior to the touchdown will be higher than three. By giving a positive weight to all passing touchdowns, one could ensure that a touchdown pass always increases ranking.

However, in doing so, one gives special treatment to the touchdown because it is a tracked category (like sacrifice flies). However, one could also track "sacrifice grounders" or "first down completions". These theoretical categories would also be cases in which a positive or somewhat positive outcome was achieved, but the statistics treat it as a negative (a batting out or a reduction of the passer's rating, assuming the completion was short). Giving special treatment to the recorded categories can thus be seen as unhelpful and biased by particular types of players that might be predisposed to one or the other.

Moving back to the book, most of my comments to this point have focused on the negatives. However, there are three things that Winston does really well:

1. Winston provides downloadable spreadsheets for many of the examples. This allows the reader to follow along with the work and to learn how to carry it out in Excel. Many of the Excel steps are explained in the text as well.

The drawback to this is that some of the why behind the math is glossed over in favor of a quick Excel solution. Winston's rating system for NBA and NFL teams basically boil down to finding the best-fitting solution for a system of linear equations to predict the point margin in each game. Winston doesn't explain the math in that manner, though, instead just explaining that the Excel solver is used to minimize error. While this gives the reader enough detail to produce their own ratings, and no one is actually going to solve hundreds of equations, I personally prefer a stronger emphasis on the underlying math.

2. The bibliography is excellent, as it includes not just a list of sources but descriptions of what they offer. For example, this is the description of Phil Birnbaum's Sabermetric Research blog:

This is perhaps the best mathletics blog on the Internet. Sabermetrician Phil Birnbaum gives his cogent review and analysis of the latest mathletics research in hockey, baseball, football, and basketball. This is a must-read that often gives you clear and accurate summaries of complex and long research papers.

3. Winston's description of Birnbaum's blog provides a nice transition into discussing the best thing about his approach. While Winston has excellent academic credentials (he is a professor of Decisions Sciences at Indiana and earned a PhD at Yale in Operations Research), but he does not beat you over the head with it. In fact, I don't think that his doctorate is ever explicitly referenced.

In any event, Winston mixes the research of other academics into his text, but he gives plenty of space to amateurs as well. Some academics that enter the sports arena seem to thumb their nose down at anyone who doesn't hold an advanced degree or a teaching position. Winston is not one of them. He even used one of Birnbaum's posts to offer a counterpoint to an academic paper on the NFL draft.

Winston's book provides a great example of how sabermetric knowledge generated by academics, amateurs, and everyone in between can be integrated, and how all parties can respect and learn from each other. It also gives analysts specializing in each sport a window into the work being done on other sports. Thanks to those attributes, Mathletics is a worthwhile read.

Wednesday, April 13, 2011

Comments on Baseball Prospectus 2011

At some point it becomes bad sport to write the same thing about an annual book--if there’s a certain characteristic of the book that you find yourself dissatisfied with several years running, it might be a you problem. It’s one thing to decide that a certain book is not for you; it’s another to continue to believe that it will when it’s obvious that the writers have something else in mind.

Much of what I could say about the Baseball Prospectus annual for 2011 is the same as I said about in 2010, and 2009…and so I’ll try to avoid saying it again. By now, it’s clear that BP is what it is, and that can either be a great thing or a bad thing or a mostly good thing, depending on your perspective. My perspective is that it’s mostly a good thing--the redeeming qualities of the book outweigh its flaws fairly easily from my perspective.

I still felt compelled to jot down a few comments on the book this year because I might have been a little unfair in nitpicking a few things in the past. Now that there is a lot of new blood on board, it’s more apparent that some of the issues (like stats not matching up between the comments and the data directly above) are systematic, and probably endemic to producing a book of this kind. To put together a tome of that size in a few months is a massive undertaking, and there are thousands of moving parts, so expecting them all to be dialed in to the same setting is unrealistic.

The cover still has the infamous phrase that I will not repeat about PECOTA; this is obviously out of the hands of the writers. They do redeem the cover with a great caption under the little photo of Albert Pujols.

That being said, I do have one major bone to pick with the new, slimmed down statistical offerings. It’s great that they stopped doubling up on metrics that measure the same thing (in the past, there have been simultaneous displays of VORP and WARP, or EqA and MLVr), and with one glaring exception the new stat lines still manage to give you most of the key metrics. That glaring exception is the lack of any kind of component ERA (or RA, which I’d prefer anyway) figure for pitchers.

It’s not simply a matter of limiting your choice to vanilla, while having to leave chocolate, strawberry, and cookies and cream aside (after all, there are a lot of flavors of component ERA). There is none whatsoever. Instead, BP has listed Fair RA, which is a fine metric constructed by Colin Wyers and the primary input for pitcher WARP. But if the choice is between having Fair RA and a component ERA in a book that is largely aimed toward predicting performance in 2011, it’s not a choice at all. Sticking with metrics under the BP umbrella, peripheral ERA and SIERA would fit the bill.

Of course, if I could strike any category from the pitcher stat line to clear space, it wouldn’t be Fair RA--it would be W-L or saves or WHIP. But since a big target audience for the book is fantasy players, that is not an option. However, it leaves everyone (including fantasy players) without a backwards looking metric that gives us the best estimation of how the pitcher’s overall effectiveness in the past. I certainly hope that they will figure out a way to include Peripheral ERA or SIERA or something similar in the 2012 edition.

PECOTA is in good hands with Colin Wyers, and I’m sure there are still some bugs to be worked out, so please take this comment as more amusement than criticism: some of the PECOTA comps seem way off. I’m sure this happened in the past, and I didn’t bother to make note of it, but two players that really stood out to me were Gregor Blanco and Nick Franklin. Blanco’s top comps are Richie Ashburn, Kenny Lofton and Freddy Guzman. One of these things is not like the other, and two of them are nothing like Gregor Blanco (Lofton was still in the process of breaking out, but had already established himself as clearly better). The Franklin comps are more understandable since he’s a younger player with less of a track record, but it’s still an odd juxtaposition to see a player ranked as the #44 prospect in MLB while his top comps are identified as Adrian Beltre (ok), Hank Aaron and Willie Mays.

There are only a few team entries that have extensive sabermetric (as opposed to applied sabermetric) content. One of these is the Arizona entry, and sadly I have a bone to pick with it. The author accepts the mainstream view that Arizona’s copious strikeout totals in recent campaigns had doomed their offense. He (or she; I still maintain it would be more interesting to know which author is responsible for the team entry) asserts that “when the majority of the lineup falls prey to empty at-bats of this sort, highly volatile run-scoring can result.”

While there have been some studies done on the relationship between shape of offense and scoring distribution, I am personally unaware of any comprehensive or well-established enough to make a statement like that without the need for supporting evidence. The only statistic brought in to support that position is that Arizona scored three or more runs per inning as much as the NL average, but scored two or less more often.

That is a very odd and not particularly helpful way to break down innings, because it lumps scoreless innings in with one and two run innings. To be absurd for a moment, if an offense never scored three or more runs an inning, and scored 0-2 in 100% of their innings, but 40% of those were one run and 10% were two runs, they would average a healthy 5.4 runs per game. It is true that Arizona scored in a smaller proportion of their innings than did the average NL offense--25.9% of Arizona innings resulted in a run scored compared to 26.5% for the league as a whole. But Arizona was more likely to have a multi-run inning (12.4%) than the average NL team (12.2%).

Another odd thing about this perspective is that it makes the inning the unit by which scoring volatility is measured. It’s true that the best perspective from which to understand how runs are scored is the inning level, since the events that transpire in each inning is independent of those that occurred in previous innings in terms of scoring in runs (I hope it’s clear that I’m talking about baserunners and outs from one inning affecting each other, not lineups turning over and pitchers being removed and the like, but you never can tell) but from a win/loss perspective, it is the run distribution per game that is crucial. Admittedly, the two are very closely related, but any time you extend the time period over which such volatility is projected, its impact is reduced.

One crude but simple and reasonably sensible way to consider the win value of a team’s per game scoring distribution is a method that I call Game Offensive Winning Percentage (gOW%) and have published here for the last three years. It is based on a Bill James idea; instead of estimating an OW% from average runs scored per game, use the team’s actual distribution of runs scored. If in a given season teams that score one run win 11.8% of the time (as they did in 2010), then credit the offense with .118 wins for each game in which they score exactly one run. Repeat for all scoring levels and average and you have an alternative OW%.

There are of course flaws with this method--the unit of games doesn’t always represent the same things (i.e. there are not always 27 outs per game), the use of the actual W% by runs scored in any given season is subject to sample size fluctuations, there is no adjustment for park, etc.--yet it’s still reasonable to think that if a team’s run distribution was particularly unusual, it would manifest itself in a comparison of gOW% to standard OW% based on average runs per game (in this case, without a park adjustment so as to better match gOW%).

The Diamondbacks led the NL in strikeouts in 2009 and 2010 and were second in 2008. In 2007, they ranked eleventh (and made the playoffs, see!), so those three seasons are the relevant high strikeout seasons for the team. In 2008, Arizona’s gOW% was .485 while their OW% was .479--considering their run distribution rather than just their average suggests an additional win. In 2009, it was .484/.483--no difference. In 2010, the split was .492/.502, which is -1.6 wins. So for the three years considered together, the net total is -.5 wins.

Of course, this does not conclusively demonstrate that Arizona’s offense was as efficient as a typical offense with their scoring average, and it certainly doesn’t allow us to make any statements about the effect of high strikeout offenses generally. However, neither does anything offered or referenced in the BP essay, yet the author chose to make much stronger assertions than I would dare to here.

My comments on strikeouts should not be taken as a negative judgment of the book as a whole--my book “reviews”, such as they are, generally serve as an opportunity to discuss issues raised by the author rather than to offer a summary judgment on the book itself. By now, you already know whether BP is a book for you or not.

Tuesday, April 05, 2011

Scoring Self-Indulgence, pt. 1

When I have occasion to write something on paper, I usually use a pen. It’s easier that way--ball-point pens are ubiquitous and cheap; you can sign things with them; and now that the hideous scourge of blue ink has faded a bit, they no longer result in an assault on one’s sensibilities every time they are used (okay, that last one should say “my sensibilities”). In truth, I like pencil better, specifically a mechanical pencil with .5 lead. I use the real cheap Bic ones exclusively, and have for years--you know, the ones that are supposed to be disposable, but you can hold the clicker down and push the replacement lead in through the top. You can get ten of them at Wal-Mart for $2.

The pluses of the ball-point pen allow me to save that favorite writing utensil for only the most important tasks, ones that just can’t be entrusted to the terrifying permanence of ink. For most of the winter, it sits undisturbed on my bookshelf or in a pencil holder or wherever--but sometime in March, I have occasion to take it out and put it to use, and I don’t stop until mid-autumn.

You have probably surmised by now that the important task to which I refer is scorekeeping. Yes, the existence of internet gametrackers have made the collection of data for one’s own perusal something less than a necessity if one would like access to real-time information on a game, and to the extent that people do want to keep their own score, electronic applications are pushing pencil and paper aside. And admittedly, those of us who keep score not just at the ballpark but in the privacy of our own homes have always been a rare breed and prime targets for the nerd label.

Still, I have no intention of giving up scorekeeping in the foreseeable future. It is still true that if you want something done right, you have to do it yourself. GameDay may have all of the information I need, but it (cannot yet at least) be customized to display it in the exact manner I have become accustomed to. If you want to save it for posterity, a GameDay printout lacks any sort of sentimentality whatsoever. And I might be part of a dying breed, but if I want to give my full undivided attention to the ballgame, the last thing I need to be doing is puttering around on the computer between pitches.

If this reads as a half-hearted defense of scorekeeping, I have accomplished what I set out to do with this post. For one thing, I don’t really need to justify my hobby to you; I just feel compelled to put in a good word for the practice every once in a while. I’ve never understood why announcers sometimes feel compelled to give you basic information about the sequence of plays in a game--information that they are tasked with providing--by prefacing it with “If you are keeping score at home…” Of course, this is a hanging curveball set up for the announcing partner, who gets to jump in and make a snide comment about what kind of deviants would be doing that. Considering that those of us that keep score are the least likely subset of fans to turn the game off when it’s 14-2 in the bottom the eighth...

But the other reason that it can be difficult to espouse the virtues of scorekeeping is that scorekeeping is a very personal pursuit. Everyone has their own technique, their own special symbols built around the familiar position numbers that have united the vast majority of scorecards from the 1890s or so on. (Except for the early twentieth century occasional flip-flopping of 5 and 6 for third base and shortstop). This makes it difficult to generalize--I might say that I love keeping score because I could quickly get a precise count of how many balls the hapless Ranger pitches had thrown while Neftali Feliz waited in the bullpen…but your scoresheet might not tell you that. Instead, it might tell you who won the sausage race.

There are displays of the variety and innovation in individual scorekeeping out there online, but not to an extent that I consider sufficient, so last year I asked people to send me their scoresheets for posting on my scorekeeping blog, Weekly Scoresheet. Several people graciously accepted my invitation, but I was foolish enough to make the initial request during the offseason, when even compulsive scorekeepers weren’t particularly likely to have an example sitting around. (*) So if you’re interested in sharing now, please send me an email.

Weekly Scoresheet has a whopping total of six subscribers on Google Reader, which is completely understandable--a personal scorekeeping blog is a vanity blog, plain and simple. Unfortunately, I haven’t been updating it recently because I no longer have a scanner at home, and it just isn’t a big enough priority for me to buy one even though this is 2011 and they are cheap. Eventually, Weekly Scoresheet will be back in full swing to bore the five people who read it with my own chicken-scratched records of ballgames.

In the meantime, though, I’ll be using this space to occasionally run a tutorial on my scoring system and walking through a sample game from the 2010 season. Calling it “my scoring system” is a misnomer--there's certainly nothing groundbreaking about it and most of the symbols are drawn from other people’s systems--but the great thing about scorekeeping is that the precise combination of data you record, codes you use, and the like is fairly unique. I would not encourage anyone to learn to score a game in the way I do, not because I don’t think it’s a decent system but because I would encourage you to organically develop your own that fits your needs and interests as a baseball fan. As such, an explanation and tutorial is ultimately just a way to fill up space and pad the post count. I’ll enjoy writing it; if you enjoy reading it, then much the better.

(*) Sadly, I have to admit that I spend a not insignificant free time in February doodling scoring of imaginary games, using Excel to design some new scoresheets that I’ll never use because I continue to use the same basic sheet (and I do mean basic) that I have for over a decade, wondering why the calendar can’t turn to March so that spring training games can be scored, and other such pursuits.

I will begin simple, with a look at one of my blank scoresheets--it's just a bunch of empty blocks in a 9x9 grid. I originally made this in the DOS Brief text editor prior to the 1998 season, using a lot of “_” and “|” symbols. The lines weren’t solid, so I eventually traced them down for 1999 or so. Later I would scan it as a PDF and touch it up a little bit in Photoshop, but it still has non-perfect lines which to me grants it a little character that you don’t get from using a computer to draw the lines precisely. I have a couple facsimile versions created in Excel (which I use now for any new scoresheets I make--it might not be as capable graphically as some other programs but making grids is something that spreadsheet software does very well), but there’s nothing like the original article for me.

* Why a 9x9 grid? You do realize that the average team doesn’t even use half of those scoreboxes in a game, right?

One of the great things about the Project Scoresheet is that it pioneered the use of numbered boxes rather than a box for every batter to hit in every inning. This was a great way to conserve space, but it also makes it harder to view the inning as a standalone unit. I prefer to see each inning on its own. Yes, a team batting around is a mild inconvenience, and splits up an inning into two columns, but I’ve never understood why some people freak out about and start crossing off the inning headings and pushing each inning down a column.

* No room for statistical lines (AB-R-H-BI and the like)?

Nope. I don’t think that standard compiled statistics for a single game tell you much of anything, and the sort of boilerplate box score is something that is very easy to obtain online (although some of them aren’t so accurate). I’m already sacrificing space by using the 9x9 format; I don’t want to waste any more on stat lines. Plus, if you fill them in as the game goes on, it’s another distraction and you have a bunch of ugly tallies on your sheet. If you wait until the game is over to finish your scoresheet, then it’s just work.

* No diamonds?

Nope, I’ve never liked them. They’re great if you just want a quick snapshot of where the runners are, but if you’re trying to record a lot of detail on how runners advanced, they get in the way. I split the box up into four corners for each of the bases, but I don’t think a visual aid like a diamond is necessary to accomplish this.