Monday, May 04, 2015

Counter-Revolution

Continuing the tradition of haphazard “book reviews” appearing on this blog well past the time that such a review would be relevant, I recently read The Sabermetric Revolution by Benjamin Baumer and Andrew Zimbalist and have a few thoughts on the book.

On the whole, I am not a fan of the book. While I am not personally very familiar with Baumer’s work, Zimbalist is a seminal figure baseball economics (starting over twenty years ago with his Baseball and Billions). Unfortunately, The Sabermetric Revolution is too short (153 pages of prose not counting footnotes) and too unfocused to really showcase the authors’ knowledge.

In many respects it appears that the book was intended to be something of a rejoinder to Moneyball, both by pointing out areas in which Michael Lewis either played fast and loose with the facts or omitted key details. The preface is clear about this motivation, as the authors write: “This book will attempt to set the record straight on Moneyball and the role of ‘analytics’ in baseball.”

There’s no doubt from reading the book that this is a major goal of the authors, as the first chapter is devoted to “Revisiting Moneyball”. I found some of the criticism to be fair (for example, Lewis’ tendency to gloss over the contribution of young talent the A’s had produced that contributed to the team’s success, such as Eric Chavez, Miguel Tejada, Tim Hudson, Mark Mulder, and Barry Zito). Some, though, strikes me as of 20/20 hindsight (such as a review of the infamous 2002 amateur draft) or nit-picking (such as the fact the A’s OBA decreased in 2002 despite Beane’s emphasis on OBA). In other places, I would contend the authors are guilty of some of the same offenses they accuse Lewis of (for example, they state that Lewis gives short shrift to the work of Bill James and other sabermetric pioneers; however, their own discussion of the internet sabermetric community begins at Baseball Prospectus).

The fundamental issue I had with the book is that it is not clear what it is intended to be (aside from a Moneyball response) or who the intended audience is. The book is not detailed enough to serve as a technical introduction to sabermetrics for newcomers (for instance, I’m not sure park factors are ever discussed outside of brief allusions), but neither is it detailed or advanced enough to strongly appeal to the smaller audience of practicing sabermetricians. There is even a chapter on statistical analysis in other sports, a topic on which I am closer to the novice group, but it also is short on details, even more glaring of an omission since at least there is a quick overview of sabermetric theory.

At the cost of myself falling into the trap of nit-picking this book, I think listing a number of my issues with the book might be the easiest way to write it up:

* There are also a number of incorrect acronyms used in the book, some of which were surprising to me. OPS is said to be an acronym for “Offensive Performance Statistic”; DER an acronym for “Defensive Efficiency Rating”.

* The authors state that the formula for Isolated Power weights doubles and triples equally and is roughly the difference between SLG and BA, “or sometimes” is (D + 2T + 3HR)/AB. While I understand the argument for treating doubles and triples equally in a power metric, Isolated Power is not “sometimes” defined as SLG minus BA. That formulation has been used in conjunction with the term “Isolated Power” since Branch Rickey linked the two (but did not set them equal) in his 1954 Life magazine article and it was used in the manner by Bill James. While this or the meaning of the OPS acronym may seem like insignificant details, they suggest something less than a full command of sabermetric history.

* The authors state that in economic terms, WAR measures “marginal physical product” and state that this is a good idea, but are not fans of the methodology used to calculate current WAR implementations. Their concerns include fair ones, such as failure to report error bars and the use of black box methodologies. But while their reasoning behind these criticisms are clearly laid out, they sometimes engage in what might be called “drive-by” criticisms, in which issues are alluded to but not fully fleshed out to the point where the creators and users of these metrics could offer a defense. In this manner, Baumer and Zimbalist reflect the attitude of another “insider” who has criticized replacement-level metric, Christopher Long.

One such comment is “It is not clear that there exists a pool of replacement players with the productivity that is ascribed to them”. This basically questions the entire concept of replacement level, but is not supported other than with a footnote to site the work of JC Bradbury. This does nothing to forward the discussion of replacement-level, nor does it alert the readers to the well-reasoned and spirited rejoinders sabermetricians have issued to Bradbury’s contentions.

The authors then use a single example to question what is one of the least controversial and most similar step in any WAR methodology--the run to win conversion. The authors simply write: “The use of James’ Pythagorean Expectation to convert runs to wins is less than robust. One need only reflect on the 2012 Baltimore Orioles, who outperformed their expected win total by 11 games, to see how inaccurate the runs to wins conversion can be.”

If I may be impolitic and a bit unhinged for a moment, the authors should be ashamed of themselves for this statement. It is the type of statistically illiterate cherry-picking that one might expect from a Bill Madden rather than from respected professionals familiar with statistical methods. While it is without question true that win estimators (like every other statistical estimator known to man) produce poor estimates in certain individual cases, a reasoned discussion of their error bars does not begin and end with a single poor estimate. Any regression equation presented by Zimbalist in Baseball and Billions or in this work could be easily impugned by similar rhetoric, and likely more effectively given that win estimators are among the more accurate and stable estimates one will find in baseball analytics.

It might also be pointed out that the run to win converters actually used in WAR calculations are likely more robust (in the true meaning of the term, rather than denoting a single outlier) than Pythagorean by recognizing that the shape of the relationship between runs and wins changes as the scoring environment changes. While the authors are surely aware of this, one could never tell from the discussion of run/win estimators in the book, as only Pythagorean constructs with fixed exponents are discussed, with no reference to alternative exponent constructions like Pythagenport/pat or dynamic linear run to win estimators.

* My sense, and it may be unfair, from reading the book, is that Baumer and Zimbalist are eager to emphasis areas and issues in which sabermetric findings have been wrong and/or incomplete. An example is the discussion on sacrifice bunts, which points out that the initial sabermetric analysis (they do not reference Palmer and Thorn by name in this section, but The Hidden Game of Baseball is the usual source of the classical argument) was incomplete in not considering the other outcomes that may occur on a sacrifice bunt attempts, such as bunt hits and errors.

This is without question a valid criticism. However, neither Baumer/Zimbalist nor other present day critics of the conclusion acknowledge that the conventional wisdom that was pushed back against was not that the bunt was a good play because of those outcomes, but that the sacrifice if successfully executed was a good play. I still find myself as one of the few patrons clapping when I attend a game and the team for which I am rooting successfully records the out at first base on a sacrifice. This play was seen, and still is seen by casual fans and presumably a non-negligible portion of major league managers, as a success for the offense, even without the benefit of the error or hit that make the play a palatable strategy in certain situations. Sabermetricians have moved to a more “nuanced understanding” of the sacrifice, but they have also forced the conventional wisdom to tack on a bunch of addendums and hypotheticals that had rarely been discussed before.

* In other cases it is unclear how deep of a literature review of the field the authors have performed. For instance, the authors criticize FIP due to using an ERA scale (a criticism with which I agree but also note can be relatively easily corrected) but state that “What this field needs is a simple, illustrative, but effective model to evaluate pitchers. Until a model can be constructed with interpretable coefficients (a la linear weights), or with meaningful interaction of terms (a la Runs Created), no real insight will be gained, and there is unlikely to be any consensus about which metric is best.”

In all,The Sabermetric Revolution is a book that I think might have been better conceived as a couple of separate journal articles on the topics on which Baumer and Zimbalist have something new to say, because the rest of the book feels like filler and does not establish a consistent purpose or tone.

Wednesday, April 15, 2015

Reinventing the Wheel, Now With Win Estimators!

It is in my nature to snark about bad baseball analysis. Maybe more of it is nurture, as much of my early sabermetric reading was the younger Bill James, with later exposure to early BP and other r.s.bb derivatives, where snark was an integral part of the culture.

That is not really intended to be an excuse, although it may well read that way. As I have grown older I believe that I have generally become more aware of how little I actually know, but more consequently to snarking, less interested in engaging. I have lost almost any desire I ever had to evangelize about sabermetrics to the “unwashed masses” (now there’s a snarky, loaded term). Instead I am content to write to my very small audience, which even so is almost entirely based on what I want to write rather than what I think anyone might want to read, and take passive-aggressive potshots on Twitter. This probably still tilts me more towards the jackass side of the scale than the average sabermetrician, but so be it.

Every once in a while, though, I run across something that irks me so much that I have to respond to it in full. Against my better judgment, I feel compelled to draft a polemic in response, even though I know there’s nothing good that can possibly come of it. That is the case with an article that appeared in the Fall 2014 issue of SABR’s The Baseball Research Journal entitled “A New Formula to Predict a Team’s Winning Percentage” and written by Stanley Rothman, Ph.D.

Historically, the quality of sabermetric articles in the BRJ has been a mixed bag. Early BRJ editions included seminal research by pioneers of the field like Pete Palmer and Dick Cramer. Eventually the quality of such articles significantly dropped off, and BRJ was a leading purveyor of the rehashing of bases/X metrics that I rail against , and other equally banal statistical pieces with notable but rare exceptions. (That is particularly amusing since in the heyday of BRJ as a place where sabermetric research was published, Barry Codell introduced Base-Out Percentage, one of only a few times that metric could have been legitimately been said to have been “invented”).

In recent years, the quality of the statistical pieces in BRJ has been significantly improved, so I hope that my mockery of this particular piece is not taken as an indictment of the entirety of the body of work the editors (now Cecilia Tan) have been doing on this front. In fact, the Fall 2014 issue features a couple of sabermetric pieces I enjoyed greatly, both based on Log5 and other predictors of head-to-head matchups (John A. Richards’ piece “Probabilities of Victory in Head-to-Head Matchups” covered the theoretical basis for Log5 and a comparison of Log5 estimates to empirical results, and Matt Haechrel did likewise for individual batter-pitcher matchups in “Matchup Probabilities in Major League Baseball).

Dr. Rothman’s piece is an unfortunate exception. And since I consider myself (perhaps incorrectly so) to be something of a subject-matter expert in winning percentage estimators, I feel compelled to point out areas in which Rothman’s findings bury obvious, well-established principles in a barrage of linear regressions.

Rothman opens his paper by discussing Bill James’ ubiquitous and groundbreaking Pythagorean method, and then asks “Why not just use the quantity (RS-RA) to calculate EXP(W%)”? Why not indeed? This question is never satisfactorily answered in the paper. Nor is it even addressed henceforth.

Rothman proceeds to set up a W% estimator that he christens the Linear Formula as:

EXP(W%) = m*(RS-RA) + b

Note that Rothman’s terms RS and RA are just that--runs scored and runs allowed by a team. Not per game, per inning, or on any other sensible rate basis--raw, unadulterated seasonal totals.

Next, he provides the standard equations for m and b, and makes some simplifying assumptions. His regressions are run separately for each MLB season, so each team’s number of games is 162 (obviously there are some limited and non-material exceptions) and there are 30 observations in each regression (Rothman uses 1998-2012 data in his analysis). After these substitutions, the intercept b is equal to .5 and the slope m is:

m = SUM[(RS - RA)*W%]/SUM[(RS - RA)^2]

Rothman notes that for major league seasons viewed in aggregate, there is a strong correlation between SUM(RS - RA)*W% and SUM(RS - RA)^2, and so he develops a formula to predict the latter from the former:

EXP[SUM(RS - RA)^2] = 1464.4*SUM[(RS - RA)*W%] + 32710

This is substituted into the regression formula for expected W% with the intercept dropped since it has little impact to get the following equation:

EXP(W%) = SUM[(RS - RA)*W%]/{1464.4*SUM[(RS - RA)*W%]}*(RS - RA) + .5

= .000683*(RS - RA) + .5

This is the final formula that Rothman refers to as the Linear Formula. At this point, I will offer a few of my own comments:

1) There is nothing novel about presenting a W% estimator based on some relationship between run differential and W%. The rule of thumb that ten runs equals one win is just that. One of the earliest published W% estimators, from Arnold Soolman, was based on a regression that used RS/G and RA/G as separate variables but could have just as easily used the difference (and the insignificant difference in regression coefficients for the terms back that up).

2) The author’s choice to express this equation on a team-seasonal basis is, frankly, bizarre. It results in the formula being much less easy to apply to anything other than team seasonal totals, and it obscures the nature of the relationship between runs and wins, hiding the fact that this is little different than assuming ten runs per win. If you divide 1464.4 by 162 games/season, you find that the formula implies 9.04 runs per win and would be more conveniently expressed as .1106*(RS - RA)/G + .5.

3) I don’t understand the rationale for using a separate equation for each league-season, then developing a single slope by running another regression of various league quantities. It would be much more straightforward to combine all teams from the data set together and run a regression. Such an approach would also result in a higher R^2 for the team W% estimates. I don’t think that maximizing R^2 should be a paramount in constructing a W% estimator, but in this case I fail to see the advantage of not studying the relationship between runs and wins directly at the team level rather than aggregating team-level regressions across multiple seasons.

Returning to the article, Rothman uses a Chi-Square test on 2013 data to compare the Linear Formula to Pythagorean. Setting aside the silliness of using thirty data points for an accuracy test when hundreds are available, I must give Rothman credit for not using the Linear Formula’s better test statistic to trumpet its superiority--instead he writes that “there is no reason to believe that both of these formulas cannot be used.”

The article than includes a digression on applying this approach to the NBA and NFL. The conclusion and “additional points” sections of the article provide a handful of interesting contentions:

* Rothman suggests that one of the chief advantages of the Linear Formula is that it is “easier for a general manager to understand and use”. The premise is that GMs can use the Linear Formula to calculate the marginal wins from player transactions.

While there is certainly nothing wrong with these types of back of the envelope estimate, this comment would have been less bizarre twenty years ago. Now it seems incredibly na├»ve to suggest that the majority of major league front offices could improve their planning by using a dumbed down win estimator. It’s hard to determine which is sillier--the notion that front offices that would entertain such analysis would not be using more advanced models (the outcome suggested by which would depend much more on the projection of player performance than how that performance is translated into wins), or the notion that front offices who were so inclined and needed to do back of the envelope calculations would not be able to grasp Pythagorean.

* Apparently referring to the approximation used to derive the multi-year version of the formula above, Rothman asks “Why is there a strong positive correlation between SUM[(RS - RA)^2] and SUM[W%*(RS - RA)] in MLB?”

I might be accused of under-thinking this, but my response is “Why wouldn’t there be?” The key quantity in each sum is run differential. We know that run differential is positively correlated with W% (if it were not, this article would never have been written), so it should follow that the square of run differential (or the square root, the cube, the logarithm, any defined function) should have some relationship to the winning percentage times the run differential. And since the quantities Rothman is comparing are sums on the league level, both should increase as the differences between teams increase (i.e. if all teams were .500 and had zero run differentials, both quantities would be zero. As teams move away from the mean, both quantities increase).

* Rothman notes that if a team’s run differential is greater than 732, than the linear formula will produce an estimated W% in excess of 1.00. “However, this is not a problem because for the years 1998-2012 the maximum value for (RS - RA) is 300.”

Note that Rothman does not discuss the opposite problem, which is that a run differential of -300 will produce an equally implausible negative W%. But the hand-waiving away of this as a potential issue coupled with the posed but unaddressed question “Why not just use the quantity (RS-RA) to calculate EXP(W%)?” is why this article got under my skin.

If Dr. Rothman has taken five seconds to consider the advantages and disadvantages of how to construct a W% estimator, scant evidence of it has manifested itself in his paper (and given as this is a commentary on the paper and not Dr. Rothman himself or whatever unpublished consideration he gave to these matters, that is all I have to go on). There is certainly nothing wrong with experimenting with different estimators, but these experiments should not rise to the level of publication in a printed research journal unless they yield new insight in some way. Nothing in Rothman’s piece did--in fact, given the bizarre manner in which he chose to express the equation, I would suggest that if anything the piece regresses the field’s knowledge on W% estimators.

So allow me the liberty of answering Rothman’s question and the hand-waived problem for him.

Q: Why not use run differential to estimate W%?

A: Because doing so, at least through the simple linear regression approach, does not bound W% between zero and one, does not recognize that the marginal value of runs is variable, and does not recognize that the value of a run is dependent on the scoring environment.


Other than that, it’s great!

“Why not?” is a great reason to experiment, but it’s not a great reason to formally propose a new method (well, really, recycle existing methods, but I’m piling on as it is). There is also nothing wrong with using a model with certain deficiencies that other models avoid, whether due to computation restrictions, ease of use, a lack of deleterious effect for the task at hand, etc. But it should be incumbent on the analyst and the publisher to acknowledge them.

Finally, anyone publishing sabermetric research in this day and age should recognize that whatever new approach you believe you have developed for a common problem (like win estimation, or measuring offensive performance), it’s probably not new at all. This is certainly the case here given the work of Soolman, the rule of thumb that ten runs equals one win, the dynamic runs per win formula used in The Hidden Game of Baseball and Total Baseball by Pete Palmer, and other related approaches. All of these are based on the basic construct W% = m*run differential + b.

Personal anecdote: I don’t remember when this was exactly, maybe when I was in the eighth grade, but in our math class we were learning about linear equations of the form y = mx + b and there was an example in the textbook that showed how one could eyeball a line through a scatterplot and develop the equation for that line. In other words, a manual, poor man’s linear regression.

So I did just that with a few years of team data, plotting run differential per game against W% (I want to say I used 1972-74 data), and came up with W% = .1067*RD + .5. Foolishly, I actually used this for W% estimates for a period of time. Thankfully, I was cognizant that it was not a new approach but rather just a specific implementation of one developed by others, and I did not attempt to/no one permitted me to publish it as if it was. Years later, W% = .1106*RD + .5 appeared in the pages of the Baseball Research Journal.

So that this post might have some smidgeon of lasting value, I will close by reiterating the three conditions of an ideal win estimator that such linear constructs fail to satisfy. I have written plenty about win estimators in the past (and will doubtlessly rehash much of it again in the future), but I don’t believe I’ve explicitly singled out those properties. An ideal W% estimator would satisfy all three, which is not to say there is no use for an estimator that satisfies only two or even zero. The Linear Formula satisfies none. I will discuss how three of the common approaches perform: Pythagorean (with fixed exponent), Pythagenpat, and Palmer (RPW = 10*sqrt(runs per inning by both teams). Palmer can serve as a stand-in for any method that allows RPW to vary as the scoring level varies, and of course there are other constructs that I am not discussing.

1. The estimate should fall in the range [0,1]

The reason for this is self-explanatory. Pythagorean and Pythagenpat pass, while Palmer does not. Obviously this is not really an issue when you apply the method to normal major league teams. It can become an issue when extrapolating to individual/extreme performances, though.

2. The formula should recognize that the marginal value of runs is variable.

This is somewhat related to #1--the construct of Pythagorean results in it passing both tests. However, there are other constructs that are bounded but fail here. Palmer fails here, which is inevitable for a linear formula. The gist here is that each additional run scored is less valuable in terms of buying wins and each additional run prevented is more valuable. This is also the hardest to articulate and the hardest to prove if one has not bought into a Pythagorean-based approach (or examined other W% models such as those based on run distributions).

3. The formula should recognize that as more runs are scored, the number of marginal runs needed to earn a win increases.

This could be confused with #2, but #2 is true regardless of the scoring level in question--it's true in 1930 and in 1968. In this case, the relationship between runs and wins changes as the run environment changes. This is where a fixed exponent Pythagorean approach falls short, while both Pythagenpat and Palmer take this into account.

Saturday, April 04, 2015

2015 Predictions

No involved disclaimer this year; I will just point you to this article and point out that it applies even to much more formal predictions than those displayed here. This is my opinion and it is in the spirit of fun rather than analysis:

AL EAST
1. Boston
2. Toronto
3. New York
4. Baltimore
5. Tampa Bay

I am less confident in my order here than for any other division. The whole AL is something of a tossup, though, as many others have noted. I’ve settled on Boston for the East and the pennant. Their starting pitching is mediocre on paper, but at least they should have the resources to improve it (perhaps in a Cole Hamels type way) and a fair number of competent bodies to cycle through the back of the rotation in case of injury or ineffectiveness. But their offense projects as the best in the league. Toronto in many ways is the same team, but with an offense more dependent on its stars and a rotation that, outside Drew Hutchison, may lack the upside of Boston’s. New York looks like a middle-of-the-pack team to me; they may be better than recent years but have that masked by their recent Pythagorean outperformance. Either way I think they would need a lot of old players to stay healthy to win it. Baltimore is a team that I’ve missed on repeatedly, but I don’t just assume that this year’s equivalent of Steve Pearce or Miguel Gonzalez is bound to materialize. Plus I have a natural distrust of all things Ubaldo Jimenez is even tangentially associated with. Tampa Bay is a team that PECOTA loves, but I tend to agree with the mainstream on. Although they do seem like a high variance team and could surprise, plus Drew Smyly.

AL CENTRAL
1. Detroit
2. Cleveland (wildcard)
3. Chicago
4. Kansas City
5. Minnesota

I thought the Tigers were vulnerable last year; that is even more the case in 2015. Verlander’s status as an ace has been seriously impaired, Price for Scherzer from a preseason perspective is ok except that Drew Smyly and Austin Jackson are replaced by Alfredo Simon and Anthony Gose. Or is that Shane Greene and Rajai Davis? Does it really matter? The only reason I am picking them to win is because I can’t bring myself to pick Cleveland. The history of me picking the Shapiro era Indians to win is not a good one--I think 2007 is the only time I got it right. Plus the Indians are too popular among prognosticators for comfort. It’s easy to imagine a scenario where they are really good--Kluber approaches his 2014 performance, a couple of Carrasco/Salazar/Bauer/House are really good, Kipnis or Swisher bounces back, Gomes and Brantley don’t regress too much…but it’s also not that hard to picture multiple issues resulting in a catastrophic failure. While it’s hard to predict bullpens, the Indians looks a little precarious thanks to how hard it was worked last year and that the third and fourth righties are Scott Atchison and Anthony Swarzak. The White Sox have frontline players to compete with Detroit and Cleveland for sure, but I question whether the other pieces are strong enough. Tyler Flowers, catcher and Hector Noesi, any role don’t inspire confidence. Kansas City will be one of the great sabermetric/mainstream divergence cases, but other than Yordano Ventura, who am I supposed to like in their rotation? Other than Alex Gordon, who am I supposed to really like in their lineup? The Royals could contend again but it’s hard to pick it. There’s not much to say about the Twins, but I’m sure it’s all Joe Mauer’s fault anyway.

AL WEST
1. Los Angeles
2. Seattle (wildcard)
3. Oakland
4. Houston
5. Texas

My crude numbers have it as too close to call between Los Angeles and Seattle. The Angels would appear to have the stronger offense, the Mariners better pitching. All things being equal I’ll bet on the team with Mike Trout, although the lineup looks below average except for him. It would be fun if Oakland could hang around in the race and bust up some narratives, but I’ve never been a believer in the 2012-2014 A’s in making preseason predictions, so I’m certainly not going to start now. I think this will the year that Houston safely clears the bar of respectability, although that bar seems to be set higher for them as they have become a lightning rod, often for sabermetrically-inclined people who want to prove that they are not part of the herd. Such is the price of a touch of self-promotion and the stronger “Billy Beane should have never written that book” effect. I was all set to pick Texas as some kind of dark horse bounceback contender, and then my perennial Cy Young pick Yu Darvish went down and I took it as a sign to banish them to the bottom of the league.

NL EAST
1. Washington
2. New York
3. Miami
4. Atlanta
5. Philadelphia

There’s no reason to get off the Washington bandwagon now, as this looks like the safest division pick in MLB. With the teardown of the Braves, there is no credible threat on paper. I do have to balance my backlash impulses against the tendency to overrate supposed “Super Rotations” and the notion that they somehow guarantee playoff success, as if Washington sans Scherzer wasn’t a darn good group or the experience of the Halladay/Lee/Hamels/Oswalt Phillies and the Maddux/Glavine/Smoltz/(Avery/Neagle) Braves shouldn’t have disabused that notion long ago. But bonus likability points for the fact that so many people want to make Bryce Harper into a villain. The Mets were a tempting wildcard pick for me, but the loss of Wheeler made it easier to push them down a little bit. I personally like them better than the crude numbers I run, which only estimate 79 wins. Miami has a fun young core with Stanton, Yelich, Ozuna, Fernandez, etc. but I think they’ve jumped the gun on trying to win and at least one of those moves will be exposed as a big misstep (Dee Gordon). They are the best bet at the moment to be the next non-Washington winner of this division, though. In the span of two years, Atlanta has gone from a team I irrationally liked to one I thought was good but disliked (thanks Brian McCann!) to one that actively appears to court my dislike. It may not matter because they might have the worst offense in the majors. But that distinction may go to the Phillies, who may also have one of the worst pitching staffs. But they have Ryan Howard, franchise icon.

NL CENTRAL
1. St. Louis
2. Chicago (wildcard)
3. Pittsburgh
4. Milwaukee
5. Cincinnati

St. Louis is an easy pick in a different way than Washington--they don’t tower over the field to the same extent, but they are the only thing resembling a safe pick in the Central. And they have Jason Heyward now, which is good for multiple brownie points that didn’t contribute to this pick. I feel like a sucker for picking Chicago to win a wildcard. It’s really easy to let the prospect hype run wild in one’s mind and jump the gun. But what’s one to do? On paper they do appear to be the second-best team in the division, a good offense supporting a bad pitching staff. My crude workup (based on Fangraphs’ composite projections) isn’t counting on too much from Javier Baez or a full season from Kris Bryant, although it does assume Jorge Soler is an excellent player right now. My point is that it’s not a terribly over-exuberant projection. While I don’t put a whole lot of stock in it, Joe Maddon has experience with the quick turnaround, although this time everyone is watching for it. The East offers nothing special in the way of wildcard material and San Diego also carries potential for serious overhype. Pittsburgh should also be right in the mix, looking on paper to be pretty average on both sides of the ball. I overrated Milwaukee last year--I may be too quick to cast them aside in 2015, they also look like a .500 team on paper which in reality means they are a serious wildcard contender. It’s hard to imagine I would be less impressed with Cincinnati’s management post-Baker, and yet here we are. Moving every possible starter to the bullpen to go with Jason Marquis (Jason Marquis is still in the league?!!) at the back of the rotation does not inspire confidence, nor does the continued sniping (and more importantly, loss of skill) of Brandon Phillips. That Raisel Iglesias, who many felt would be a reliever, somehow escaped the Aroldis Chapman Memorial Black Hole, is a mystery that may never be fully explained.

NL WEST

1. Los Angeles
2. San Francisco (wildcard)
3. San Diego
4. Arizona
5. Colorado

I remain befuddled at why PECOTA loves the Dodgers so much; they are again clear favorites but a midpoint expectation of 98 wins doesn’t make any sense. Their offense is far from the sure thing I would expect to predict such a record, although they have intriguing Cubans on call in case of problems. Their bullpen also is far from a sure thing. San Francisco’s offense looks to be below-average, with strong pitching; if you ignore park effects one might say the same about San Diego. The Padres made a splash on the offensive side to be sure, but the left side of the infield is still spotty and it looks to me like pitching is their strength. Flip a coin between the Giants and the Padres; I’ve picked the former simply because it’s the less desirable outcome in my eyes. Arizona and Colorado are not just the two worst teams in this division on paper, they are both contenders for the worst team in the majors, with Philadelphia and perhaps Minnesota and Texas in on the game. When the moves made by Dave Stewart make more immediate intuitive sense than those by new GM Jeff Bridich (seriously, what’s the deal with Jorge De La Rosa?), it’s time to fear for the non-California wing of the NL West.

WORLD SERIES

Washington over Boston

I picked Washington last year and see no reason to stop now. Boston has the potential to make this post look absurd by August but that will happen one way or the other regardless.

AL Rookie of the Year: SP Carlos Rodon, CHA

AL Cy Young: Yovani Gallardo, TEX
Ok, ok, that’s a joke…I picked Gallardo to win the NL Cy more times than I would care to admit.
Serious pick: Chris Sale, CHA
I’m tempted to pick Drew Smyly but that wouldn’t be serious either. I do really like Drew Smyly though.

AL MVP: 2B Robinson Cano, SEA

NL Rookie of the Year: RF Jorge Soler, CHN

NL Cy Young: Stephen Strasburg, WAS
I’m sticking with this until it happens.

NL MVP: RF Bryce Harper, WAS

Worst team in each league: MIN, PHI

Most likely to go .500 in each league: OAK, MIL

Monday, February 09, 2015

Losing Ground

OSU baseball enters 2015 coming off of one its worst seasons in decades. The Buckeyes went 10-14 in the Big Ten, their worst record since 1987, a record fueled by a seven-game Big Ten losing streak (longest since 1987). Their 5-12 road record was the worst since 1972. In 1988, Bob Todd took over as Buckeye head coach and wasted little time in turning the program around, turning 1987’s 19-27 overall, 4-12 B10 record into a 32-28, 16-12 team. Todd would go on to reign over the program for 22 more seasons which served as the second golden age of OSU hardball (13 NCAA appearances, 13 seasons with either a Big Ten regular season or tournament title).

Unlike 1988, 2015 will not follow a dismal showing with a new regime. Todd’s replacement, Greg Beals, enters his fifth season at the helm and needs to turn things around in order to secure his long-term status as OSU coach. He will attempt to do so with a team that has elicited a wide range of preseason prognostications, one from which a sheer performance and player development track record does not appear to be impressive but which some observers insist has a surfeit of potential.

Beals has been fond of catcher platoons and has never given senior Aaron Gretz the job on a full-time basis despite him appearing to be the best option. Gretz will once again share time behind the plate with fellow senior Conor Sabanosh, a JUCO transfer in his second season as a Buck. Both hit fairly well last season and may get at bats at DH as well. Sophomore Jalen Washington and freshman Jordan McDonough will serve as depth.

First base is an open position and may see three juniors rotate through the spot: Zach Ratcliff, Mark Leffel, and Jacob Bosiokovic. Ratcliff is limited to first defensively, but Leffel also is capable of playing third and Bosiokovic will be an option in all four corners. Each has shown flashes of being productive hitters (Leffel more as a hitter for average, the other two for power potential), but none has clearly emerged to grab the spot.

Second base will go to junior Nick Sergakis. Sergakis transferred from Coastal Carolina prior to 2014 and started the season on the bench before an injury to shortstop Craig Nennig pushed him into the lineup. Sergakis was a revelation as one of the team’s most productive hitters (and lead off despite the team’s lowest walk rate). Nennig, a junior, should be back and will play short, but while his fielding draws rave reviews he has yet to demonstrate any ability to hit (.201/.295/.225 in about 190 career PA). Nennig’s offense will make sliding Sergakis back to short a tempting option for Beals.

At third base, junior Troy Kuhn will start. He spent most of 2014 as the second baseman before being displaced for Sergakis upon Nennig’s return. Kuhn was among the team’s most productive hitters and paced OSU with six longballs, so he will be a key part of the lineup again and could move back to second if Nennig struggles. In that case, Bosiokovic and Leffel could play third. The infield backups will include the aforementioned Washington (that rare catcher/second baseman) as well as sophomore L Grant Davis (a transfer from Arizona State) and freshman Nate Romans.

The outfield should be one of Ohio’s strengths. Sophomore left fielder Ronnie Dawson was as fun of a hitter to watch as OSU has had in years and was the team’s best hitter in 2014 (.337/.385/.454). Sophomore center fielder Troy Montgomery was highly touted but did not impress in his debut (.235/.297/.353). Senior right fielder Pat Porter (obligatory mention that he hails from my hometown) had a very disappointing season, but rebounded to have a strong summer campaign and will likely be penciled in as the #3 hitter. Bosiokovic can play either corner and junior Jake Brobst has served mostly as a pinch-runner/defensive replacement. A pair of freshman, Tre’ Gantt (a speedster from Indiana in the mold of Montgomery) and Ridge Winand will complete the depth chart. The DH spot will most likely be filled by the odd men out at catcher and first base.

OSU’s #1 starter, at least to open the season, will be sophomore Tanner Tully, the Big Ten freshman of the year in 2014. Tully’s smoke and mirrors act featured a vanishingly low walk rate (.7 W/9) and low K rate (5.3 K/9) which scream regression even in northern college baseball. Senior lefty Ryan Riga will look to bounce back from an injury-riddled campaign--he and Tully are fairly similar stylistically so it would not surprise to see them split up with Travis Lakins taking the #2 rotation spot. Lakins is a sophomore who should be the easy favorite to be the ace at the end of the season; his talents were wasted somewhat in the bullpen in 2014, fanning 9.0 per nine and leading the pitchers with +12 RAA. Lakins is draft-eligible and barring injury this should be his last season in Columbus.

Junior Jake Post is a 6-2 righty with decent stuff who has yet to find consistent effectiveness but would my bet would be that he will displace Riga or Tully by mid-season. Other starting options are lefty John Havrid, a JUCO transfer from Mesa Community College and freshman Jacob Niggemeyer, a 28th-round pick of the Cubs.

The bullpen will be anchored by senior slinger Trace Dempsey, who may well become OSU’s all-time saves leader but had a rough 2014 (-7 RAA) after a brilliant 2013 (+13). Dempsey’s control abandoned him last year, drawing comparisons to another erstwhile Buckeye closer, Rory Meister. Past Dempsey the bullpen work is largely up for grabs--Lakins was the star last year and will be starting. It is possible that a pitcher like Post could be used as the setup man, foregoing some mid-week wins for conference bullpen depth.

Otherwise, redshirt freshman Adam Niemeyer looks like the key setup man--his true freshman campaign was limited to just three appearances due to injury. Otherwise, I won’t even hazard to guess who will emerge out of the following possibilities other than to note that Beals allows tries to cultivate at least one lefty specialist in his pen:

RHP: Curtiss Irving (SM), Seth Kinker (FM), Brennan Milby (R-FM), Shea Murray (SM), Kyle Michalik (R-FM), Yianni Pavlopoulos (SM)
LHP: Michael Horejsei (JR), Matt Panek (JR), Joe Stoll (SM)

Beals appears to have instituted a shift in scheduling philosophy, opting for more weekend series over multi-team “classics”/pseudo-tournaments. The Buckeyes’ only of the latter will be this weekend as they face George Mason, St. Louis, and Pitt in the Snowbird Classic in Port Charlotte. Subsequent weekends will include three game series at Florida Atlantic, UAB, and Western Kentucky before the home opener March 10 against Indiana-Purdue Fort Wayne.

The following weekend the Buckeyes will host Evansville for a three game series, Rider for a two-game mid-week series, and open Big Ten play March 20 hosting Michigan State. Subsequent weekends will see OSU at Rutgers, home to Penn State and UNLV (the latter non-conference of course), at Nebraska and Northwestern, home to Illinois and Maryland, and at Indiana. The mid-week slate will include home games against Toledo, Akron, Ohio University, Dayton, Kent State, Louisville, and Morehead State and trips to Miami, Cincinnati, and Youngstown State.

There is wide variety of opinion regarding OSU’s 2015 outlook. Perfect Game tabbed them as the #35 team in the country while Collegiate Baseball picks them tenth out of thirteen (with the addition of Maryland and Rutgers) in the Big Ten, which would see OSU miss the eight-team field for the Big Ten Tournament, to be held at Target Field May 21-24.

I tend to side much more closely to Collegiate Baseball’s view than Perfect Game’s. Aside from a second-place Big Ten finish in 2013, Beals’ teams have yet to live up to the hype that his recruiting has generated. Beals’ players do not seem to have developed according to expectations--in early years many of the key players were transfers rather than high school recruits, and there have yet to been many high producers among his high school crops, especially at the plate. And I have written many times about the horrific baserunning and other tactics employs by Beals. There are teams of fifth-graders that consistently make better decisions than Beals’ crew.

What has been particularly disturbing to watch as a fan of the program is that while the rest of the Big Ten has improved (Baseball America predicts that Illinois, Maryland, M*ch*g*n and Nebraska will all qualify for the NCAA Tournament, which would be a record for the conference), OSU has slid into irrelevancy--even with in the northern baseball picture. While Todd’s program was slipping from its heights near the end, he still managed to qualify for the NCAAs every other year. Beals has yet to make a NCAA Tournament appearance, and a sixth straight season (fifth under Beals) on the outside looking in would only extend OSU’s longest drought since 1983-1990 (once Todd led his team to a first tournament in 1991, he never again fell short in consecutive seasons). If OSU does not play up to the level of the optimists, then the program change that I would have liked to see after 2014 may be a fait accompli.

Monday, February 02, 2015

2014 Statistical Meanderings

This is an abridged and belated version of one of my standard annual posts, in which I poke around the statistical reports I put together here and identify items of curiosity. Curiosity is the key, as opposed to those that encompass analytic insight--any insight to be found is an accident.

* Since 1961, the ten teams with the largest differential between home and road W%:



And the ten largest ratios of HW% to RW%:



* One chart I always run in this piece is a table of runs above average on offense and defense for each playoff team. These are calculated very simply as park-adjusted runs per game less the league average:



It has not been at all uncommon for the average playoff team to be better offensively than defensively and such was the case in 2014. Two playoff teams had below-average offenses while four had below-average defenses, and the world champions had the worst defensive showing of the ten.

* You can’t turn around without reading about the continual rise in strikeouts. Unlike so many, I don’t consider the current strikeout rate to be aesthetically troublesome. But you can get a sense of how crazy strikeout rates have gotten by looking at the list of relievers who strike out ten or more batters per game (I define “game” in this case as a league average number of plate appearances, not innings pitched; eligible relievers are those with forty or more appearances and less than fifteen starts):

Al Alburquerque, Cody Allen, Aaron Barrett, Antonio Bastardo, Joaquin Benoit, Dellin Betances, Jerry Blevins, Brad Boxberger, Carlos Carrasco, Brett Cecil, Aroldis Chapman, Steve Cishek, Tyler Clippard, Wade Davis, Jake Diekman, Sean Doolittle, Zach Duke, Mike Dunn, Josh Edgin, Danny Farquhar, Josh Fields, Charlie Furbush, Ken Giles, Greg Holland, JJ Hoover, Kenley Jansen, Kevin Jepsen, Sean Kelley, Craig Kimbrel, Jack McGee, Andrew Miller, Pat Neshek, Darren O’Day, Joel Peralta, Oliver Perez, Yusmeiro Petit, Neil Ramirez, AJ Ramos, Addison Reed, David Robertson, Fernando Rodney, Francisco Rodriguez, Trevor Rosenthal, Tony Sipp, Will Smith, Joakin Soria, Pedro Strop, Koji Uehara, Nick Vincent, Jordan Walden, Tony Watson.

That’s 51 of the 189 eligible relievers (27%); lower the bar to nine strikeouts per game and it would be 82 (43%); at eight or more there are 110 for 58%.

The lowest-ranking NL reliever by RAR was Rex Brothers (-8), whose strikeout rate was 7.4. The second worst was JJ Hoover (-7), who struck out 10.4 per game. I am not a huge user of WPA metrics, but Hoover’s season was noteworthy for just how bad it was from that value perspective as he was involved in a few huge meltdowns. Per Fangraphs’ WPA figures, Hoover was second-to-last in the majors with -3.56 WPA; only Edwin Jackson at -4.11 was worse, and Jackson pitched 78 more innings. Even among position players, only Jackie Bradley (-4.00) and Matt Dominguez (-3.76 ranked lower). Brothers was the closest reliever to Hoover, but his WPA was -2.31, 1.35 wins better than Hoover.

The anti-Hoover was his teammate Aroldis Chapman, whose numbers over 54 innings are simply ridiculous, with a 19.3 strikeout rate. It’s difficult to fathom that a pitcher with a walk rate of 4.4 could have a RRA of 1.02, an eRA of 1.15, and a dRA of 1.32, but Chapman did and led narrowly missed leading major league relievers in eRA and dRA (Wade Davis had him by a more-than-insignificant 1.1466 to 1.1471 in the former).

* In 2010, the Giants won the World Series with Tim Lincecum and Matt Cain combining to pitch 435 innings and compile 101 RAR. Over the last five years:



While the potential for starting pitcher ruin is well understood, if you’d told me in 2010 that the Giants would win the World Series in four years getting no contribution out of Lincecum and Cain, I would have thought that black magic was at work. It probably is.

* Speaking of bad starting pitchers, only two teams had multiple starters (who made fifteen or more starts) with negative RAR. The Cubs had two--Travis Wood and Edwin Jackson combined to start 58 games, pitch 314 innings, and compile -27 RAR. The Indians had three--Zach Allister, Josh Tomlin, and Justin Masterson combined to start 56 games, pitch 319 innings, and compile -25 RAR (figures do include Masterson’s time in St. Louis). Both of these teams may well be trendy picks to compete in the Central divisions, and this is a one reason that may make sense. The Cubs and Indians are taking different approaches to shore up the back end of their rotation, Chicago by bringing in an ace and a mid-rotation free agent and the Indians by counting on continued strong performances from young starters who stood out in the second half. Either approach figures to work out better than -25 RAR.

* Despite the poor CHN and CLE individual starters, there’ still nothing quite like Minnesota’s utter and complete starting pitcher futility. In 2012, they were last in starters’ eRA and second-to last in innings/start and QS%. In 2013, they completed the triple crown--last in IP/S (5.38), QS% (38%), and eRA (5.76). In 2014, they “improved” to their 2012 standings--second last in IP/S (5.64, COL starters weren’t far behind at 5.59), second last in QS% (41% to the Rangers’ 38%), and last in starter’s eRA (5.08, with Texas second at 4.95).

* Clayton Kershaw had a great season, and was a reasonable choice as NL MVP. I’m not trying to run him down--but there is some notion out there that he had a transcendent season. I think this notion can be tempered by simply comparing his rate stats to those of Jake Arrieta:



Kershaw was better overall than Arrieta, and pitched 42 more innings. But no one should confuse Kershaw 2014 with Pedro 1999 or anything of the sort.

* One of these starting pitchers is now forever known as a clutch pitcher, a modern marvel who harkens back to the days of Gibson and Morris and whoever else has been chosen for lionization. The other is an underachieving
prima donna who Ron Darling thinks is "struggling" as a major league starter. Their regular season performances were hardly distinguishable:



Madison Bumgarner and Stephen Strasburg.

* Cole Hamels was fourth among NL starting pitchers with 55 RAR, but won just nine games. This has to be one of the better pitcher seasons in recent years with single digit wins. Through the last decade of my RAR figures, here is the highest-ranking starter in each league with single digit wins:



This is an interesting collection of names--a number of outstanding pitchers and some who I hadn’t thought about in years (John Patterson, the late Joe Kennedy and Geremi Gonzalez). Since this comparison is across league-seasons, in order to rank these seasons it is necessary to convert RAR to WAR. Using RPW = RPG, Hamels’ 2014 actually ranks highest with 7.0 WAR (Harvey 6.9, Schilling 6.7, Jennings 5.9) since the 2014 NL had the lowest RPG (7.9) of any league during the period. Given that the likelihood of a starter having an outstanding season with fewer than ten wins is greater now than at any point in major league history, it’s quite possible that Hamels’ 2014 is the best such season. Sounds like a good Play Index query if you’re looking for an article idea.

* The worst hitter in baseball with more than 400 plate appearances was Jackie Bradley (2.2 RG). The Red Sox have collected a large collection of outfielders and Bradley is unlikely to be in their plans. The second-worst hitter with more than 400 PA was Zack Cozart (2.5 RG). His team traded for a young shortstop who had 3.4 RG in 266 PA (granted, Eugenio Suarez does not appear to be the fielder that Cozart is), yet Walt Jocketty was quoted as saying "Cozart is our opening day shortstop and he’s one of the best in the league."

In addition to Cozart, the Reds featured three other hitters with 250+ who were essentially replacement-level: Chris Heisey (3.4 RG for a corner outfielder), Bryan Pena (3.3 for a first baseman), and Skip Schumaker (2.9 for a corner outfielder).

* San Diego liked Justin Upton (or Matt Kemp?) so much that they traded for two clones of the same player (in 2014 performance, at least):



* Many hands have been wrung regarding the apparent shift in Mike Trout’s game to old player skills rather than young player skills, particularly with the dropoff in his base stealing exploits (54 attempts in 2012 to 40 in 2013 to 18 in 2014). Yet it should still be noted that Trout ranked fifth in the AL with a 7.2 Speed Score (I use Bill James’ original formula but only consider stolen base frequency, stolen base percentage, triples rate, and runs scored per time on base). In fact, his Speed Score was up from 2013 (7.0) although down from 2012 (8.7). Here are Trout’s three-year figures in each of the four components of Speed Score:



Just to make clear what these numbers represent, Trout attempted a steal in 29.3% of his times on first base (singles plus walks) in 2012, had a 85.2% SB% when adding three steals and four caught stealings to his actual figures, hit a triple on 2.1% of his balls in play, and scored 45.2% of the time he reached base. (These are all estimated based on his basic stat line as opposed to counting actual times on first base or attempted steals of second, etc.)

While these categories certainly don’t capture the full picture of how speed manifests itself in on-field results, it is clear that Trout has been dialing back the most visible such part of his game, basestealing. And his 2014 SS rebound is due to two categories that are subject to more flukes (triples) and teammate influence (runs scored per time on base). Still, it may be a little early to sound the alarm bells on Trout as a one-dimensional slugger. Eventually, the sabermetric writers who have developed a cottage industry of Trout alarmism will be right about something, but there’s no need to prematurely indulge them.

Meanwhile, Bryce Harper’s Speed Scores for 2012-14 are 7.5, 4.9, 2.7.

Monday, January 19, 2015

Run Distribution & W%, 2014

A couple of caveats apply to everything that follows in this post. The first is that there are no park adjustments anywhere. There's obviously a difference between scoring 5 runs at Petco and scoring 5 runs at Coors, but if you're using discrete data there's not much that can be done about it unless you want to use a different distribution for every possible context. Similarly, it's necessary to acknowledge that games do not always consist of nine innings; again, it's tough to do anything about this while maintaining your sanity.

All of the conversions of runs to wins are based only on 2014 data. Ideally, I would use an appropriate distribution for runs per game based on average R/G, but I've taken the lazy way out and used the empirical data for 2014 only. (I have a methodology I could use to do estimate win probabilities at each level of scoring that take context into account, but I’ve not been able to finish the full write-up it needs on this blog before I am comfortable using it without explanation).

The first breakout is record in blowouts versus non-blowouts. I define a blowout as a margin of five or more runs. This is not really a satisfactory definition of a blowout, as many five-run games are quite competitive--"blowout” is just a convenient label to use, and expresses the point succinctly. I use these two categories with wide ranges rather than more narrow groupings like one-run games because the frequency and results of one-run games are highly biased by the home field advantage. Drawing the focus back a little allows us to identify close games and not so close games with a margin built in to allow a greater chance of capturing the true nature of the game in question rather than a disguised situational effect.

In 2013, 74.5% of major league games were non-blowouts while the complement, 25.5%, were. Team record in non-blowouts:



It must have been a banner year for MASN, as both the Nationals and the Orioles won a large number of competitive games, just the kind of fan-friendly programming any RSN would love to have. Arizona was second last in non-blowouts in addition to dead last in blowouts:



For each team, the difference between blowout and non-blowout W%, as well as the percentage of each type of game:



Typically the teams that exhibit positive blowout differentials are good teams in general, and this year that is mostly the case, but Colorado is a notable exception with the highest difference. Not surprisingly, they also played the highest percentage of blowout games in the majors as the run environment in which they play is a major factor. The Rockies’ blowout difference is also correlated to some degree with their home field advantage--more of their blowouts are at home, where all teams have a better record, but they have exhibited particularly large home field advantages. This year the home/road split was extreme as Colorado’s home record was similar to the overall record of a wildcard team (.556) and their road record that of a ’62 Mets or ’03 Tigers type disaster (.259).

I did not look at the home/road blowout differentials for all teams, but of the 52 blowouts Colorado participated in, 38 (73%) came at home and 14 on the road. The Rockies were 22-16 (.579) in home blowouts but just 4-10 (.286) in road blowouts.

A more interesting way to consider game-level results is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:



The “marg” column shows the marginal W% for each additional run scored. In 2014, the fourth run was both the run with the greatest marginal impact on the chance of winning and the level of scoring for which a team was more likely to win than lose.

I use these figures to calculate a measure I call game Offensive W% (or Defensive W% as the case may be), which was suggested by Bill James in an old Abstract. It is a crude way to use each team’s actual runs per game distribution to estimate what their W% should have been by using the overall empirical W% by runs scored for the majors in the particular season.

A theoretical distribution would be much preferable to the empirical distribution for this exercise, but as I mentioned earlier I haven’t yet gotten around to writing up the requisite methodological explanation, so I’ve defaulted to the 2014 empirical data. Some of the drawbacks of this approach are:

1. The empirical distribution is subject to sample size fluctuations. In 2014, teams that scored 9 runs won 94.2% of the time while teams that scored 10 runs won 92.5% of the time. Does that mean that scoring 9 runs is preferable to scoring 10 runs? Of course not--it's a quirk in the data. Additionally, the marginal values don’t necessary make sense even when W% increases from one runs scored level to another (In figuring the gEW% family of measures below, I lumped all games with between 10 and 14 runs scored/allowed into one bucket, which smoothes any illogical jumps in the win function, but leaves the inconsistent marginal values unaddressed and fails to make any differentiation between scoring in that range. The values actually used are displayed in the “use” column, and the “invuse” column is the complements of these figures--i.e. those used to credit wins to the defense. I've used 1.0 for 15+ runs, which is a horrible idea theoretically. In 2014, teams were 20-0 when scoring 15 or more runs).

2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored.

3. Related to #2 (really its root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce more quirks into the data.

I keep promising that I will use my theoretical distribution (Enby, which you can read about here) to replace the empirical approach, but that would require me to finish writing my full explanation of the method and associated applications and I keep putting that off. I will use Enby for a couple graphs here but not beyond that.

First, a comparison of the actual distribution of runs per game in the majors to that predicted by the Enby distribution for the 2014 major league average of 4.066 runs per game (Enby distribution parameters are B = 1.059, r = 3.870, z = .0687):



Enby fares pretty well at estimating the actual frequencies, most notably overstating the probability of two or three runs and understating the probability of four runs.

I will not go into the full details of how gOW%, gDW%, and gEW% (which combines both into one measure of team quality) are calculated in this post, but full details were provided here (***). The “use” column here is the coefficient applied to each game to calculate gOW% while the “invuse” is the coefficient used for gDW%. For comparison, I have looked at OW%, DW%, and EW% (Pythagenpat record) for each team; none of these have been adjusted for park to maintain consistency with the g-family of measures which are not park-adjusted.

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate):

Positive: STL, NYA
Negative: OAK, TEX, COL

The Rockies’ -3.6 win difference between gOW% and OW% was the largest absolute offensive or defensive difference in the majors, so looking at their runs scored distribution may help in visualizing how a team can vary from expectation. Colorado scored 4.660 R/G, which results in an Enby distribution with parameters B = 1.125, r = 4.168, z = .0493:



The purple line is Colorado’s actual distribution, the red line is the major league average, and the blue line is their Enby expectation. The Rockies were held to three runs or less more than Enby would expect. Major league teams had a combined .231 W% when scoring three or fewer runs, and that doesn’t even account for the park effect which would make their expected W% even lower (of course, the park effect is also a potential contributing factor to Colorado’s inefficient run distribution itself).The spike at 10 runs stands out--the Rockies scored exactly ten runs in twelve games, twice as many as second-place Oakland. Colorado’s 20 games with 10+ runs also led the majors (the A’s again were second with seventeen such games, while the average team had just 8.3 double digit tallies).

Teams with differences of +/- 2 wins between gDW% and standard DW%:

Positive: TEX
Negative: NYN, OAK, MIA

Texas’ efficient distribution of runs allowed offset their inefficient distribution of runs scored, while Oakland was poor in both categories which will be further illustrated by comparing EW% to gEW%:

Positive: STL, CHN, NYA, HOU
Negative: SEA, COL, MIA, OAK

The A’s EW% was 4.9 wins better than their gEW%, which in turn was 5.8 wins better than their actual W%.

Last year, EW% was actually a better predictor of actual W% than was gEW%. This is unusual since gEW% knows the distribution of runs scored and runs allowed, while EW% just knows the average runs scored and allowed. gEW% doesn’t know the joint distribution of runs scored and allowed, so oddities in how they are paired in individual games can nullify the advantage that should come from knowing the distribution of each. A simplified example of how this could happen is a team that over 162 games has an apparent tendency to “waste” outstanding offensive and defensive performances by pairing them (e.g. winning a game 12-0) or get clunkers out of the way at the same time (that same game, but from the perspective of the losing team).

In 2014, gEW% outperformed EW% as is normally the case, with a 2.85 to 3.80 advantage in RMSE when predicting actual W%. Still, gEW% was a better predictor than EW% for only seventeen of the thirty teams, but it had only six errors of +/- two wins compared to sixteen for EW%.

Below are the various W% measures for each team, sorted by gEW%: