Now that you presumably have some confidence in Cigol’s ability to do something fairly easy by the standards of classical sabermetrics, you may have some more interest in what Cigol says about a much harder question--how does W% vary by runs scored and runs allowed in extreme situations? This is the area in which Cigol (whether powered by Enby or any other run distribution model) has the potential to enhance our understanding of the relationship between runs and wins. Unfortunately, it is difficult to tell whether these results are reasonable, since we don’t have empirical data regarding extreme teams. If Cigol deviates from Pythagenpat, we won’t know which one to trust. Throughout this post, I am going to discuss these issues as if Cigol is in fact the “true” or “correct” estimate. This is simply for the sake of discussion--it would be unwieldy to have to issue a disclaimer every time we compare Cigol and Pythagenpat. Please note that I am not asserting that this is demonstrably the case.

For a first look at how the two compare at the extreme, let’s assume that a team’s runs scored are fixed at an average 4.5, and look at their estimated W% at each interval of .5 in runs allowed from 1-15 RA/G using Cigol and Pythagenpat with three different exponents (.27, .28, and .29; I’ve always called this Pythagenpat constant z and will stick with that notation here, hoping that it will not be confused with the Enby z parameter):

Just eyeballing the data, two things are evident. The first is that Pythagenpat with any of the exponent choices is a fairly decent match at any RA value. The largest differences come at the extremes, as you’d expect, but the maximum difference is .013 between the Cigol and z = .27 estimate for the 4.5 R/15 RA team. This is a difference of a little over 2 wins over the course of a 162 game schedule, which isn’t terrible since it represents close to the maximum discrepancy. While I have not figured Enby parameters past 15 RG, at some point the differences would begin to decline as both Cigol and Pythagenpat estimates converge at a 1.000 W%. For comparison, a Pythagorean fixed exponent of 1.83 predicts a W% of .099 for the 4.5/15 team, almost 8 wins/162 off of the Cigol estimate.

The second thing that becomes apparent is that Cigol implies that as scoring increases, the Pythagenpat z constant is not fixed. For the lowest RPGs on the table (1-3 RA/G, which when combined with the 4.5 R/G is 5.5-7.5 RPG), .27 performs the best relative to Cigol. Once we cross 3.5 RA/G, .28 performs best, and maintains that advantage from 3.5-8 RA/G (8-12.5 RPG). Past that point (>8.5 RA/G, >13 RPG), .29 is the top-performer. This explains why studies have tended to peg z somewhere in the .28-.29 range, as such a value represents the best fit at normal major league scoring levels.

A nice way to see the relationship is to plot the difference (Pythagenpat - Cigol) relative to RA/G for each exponent:

The point at which all converge is 4.5 RA/G, where R = RA and all estimators predict .500. As you can see, the differences converge as we approach either a .000 or 1.000 W%, since there is a hard cap on the linear difference at those points.

This exercise gives us some direction on where to go, but it is not comprehensive enough to draw any conclusions. In order to do that, we need a more comprehensive set of data than simply fixing R/G at 4.5. To do so, I figured the Cigol W% for each interval of .25 runs scored and runs allowed between 1-15 RPG (removing all points at which R = RA). This yields 3,192 R/RA pairs, many of which are so extreme as to be absurd, which is the point.

In order to make sense of this data, we will need to simplify the scope of what we are considering, so let’s start by trying to ascertain the relationship between runs and wins if we assume that a linear model should be used. Basically, the idea here is that we should be able to determine a runs per win (RPW) factor such that:

W% = (R - RA)/RPW + .5

From this, we can calculate RPW given W%, R, and RA as:

RPW = (R - RA)/(W% - .5)

In its most simple form, this type of equation assumes a fixed number of runs per win; for standard scoring contexts, 10 is a nice, round number that does the job and of course has become famous as a sabermetric rule of thumb. But it has long been known that RPW varies with the scoring context, and usually sabermetricians have attempted to express this by making RPW a function of RPG. So let’s graph our data in that manner:

As you can see, RPW is not even close to being a linear function of RPG when extreme teams are considered. The bulk of the observations scattered around a nice, linear-looking function, but the outliers are such that the linear function will fail horrifically at the extremes. And when I say extremes, I really mean extremes. For instance, a 15 R/1 RA team is at 16 RPG, but would need much more than 16 marginal runs for a marginal win--Cigol estimates that such a team would need 28.11 marginal runs (as would it’s 1/15 counterpart). This should make sense to you logically--the team’s W% is already so high, and so many of the games blowouts, that you need to scatter a large number of runs around to move the win needle. This point represents the maximum RPW for the points I’ve included--the minimum is 3.69 at 1.25/1.

This is not to say that a linear model cannot be used to estimate W%; it is simply the case that one linear model cannot be used to estimate W% over a wide range of possible scoring contexts and/or disparities in team strength. Let’s suppose that we limit the scope of our data in each of these manners. First, let’s consider only cases in which a team’s runs are between 3-7 and its runs allowed are between 3-7. This essentially captures the range of teams in modern major league baseball and limits the sample to 272 data points:

I’ve taken the liberty of including a linear regression line, which now has the slope we’d expect (recall that Tango’s formula for RPW is .75*RPG + 3, and that this is consistent with Pythagenpat). The line is shifted up more than the best fit using normal teams or centering Pythagenpat at 9 RPG indicates, as there are still some extreme combinations here (for example, a 7 R/3 RA team is expected by Cigol to play .815 ball, well beyond anything we’ll ever see in modern MLB).

We can also try limiting the data in another way--only looking at cases in which the resulting records are feasible in modern MLB. For simplicity, I’ll define this as cases in which the Cigol W% is between .300 and .700 (yes, I realize the 2001 Mariners and 2003 Tigers fall outside of this range in terms of actual W%, but in fact it’s probably too wide of a band if we consider only expected W% based on R and RA). Here are the results from our Cigol data points, including all intervals of R and RA between 1-15 (this leaves us with 1,126 cases):

Once again, the slope of the line is the ballpark of what we observe with normal teams, but the intercept is still off, shifting the line up to get closer to the extreme cases. If we make both adjustments simultaneously (look only at cases between 3-7 R, 3-7 RA, and .3-.7 Cigol W%), we are left with 202 data points and this graph:

Closer still, with the slope now essentially exactly where we expect it to be, but the intercept still shifting the line upwards. Why is this happening? We know that it’s not because of a breakdown of Cigol when estimating W% for normal teams--as we saw in the previous post, Cigol is of comparable accuracy to Pythagenpat and RPW = .75*RPG + 3 with normal teams. What’s happening is that we are not biasing our sample with near-.500 team as happens when we observe real major league data. All of our hypothetical teams have a run differential of at least +/- .25. In 1996-2006, about one quarter of teams had run differentials of less than +/- .25.

The standard deviation of W% for 1996-2006 was .073; the standard deviation of Cigol W% for this data is .111. This illustrates the point that I and other sabermetricians who seek theoretical soundness make repeatedly--using normal major league full season data, the variance is small enough that any halfway intelligible model will come close to predicting whatever it is your predicting. Anything that centers estimated W% at .500 and allows it to vary as run differential varies from zero will work just fine. But if you run into a sample that includes a lot of unusual cases, or you start looking at smaller sample sizes, or a higher variance league, or try to extrapolate results to individual player data, then many formulas that work just fine normally will begin to break down.

A linear conversion between runs and wins breaks down in extreme cases for a few main reasons, including no bounds as is the case for real world W% [0,1] and the declining value of marginal runs on not one but two determinants--scoring context and differential between the two teams. There are some things we could attempt to do to salvage it, such as introducing run differential as a variable. If we did this, we could allow RPW to increase not only as RPG increases, but also as absolute value of RD increases.

Let’s use the pared down in both dimensions data set to find a RPW estimator using both RPG and abs(RD) as predictors. I simply ran a multiple regression and got this equation:

RPW = .732*RPG + .204*abs(R - RA) + 3.081

If we assume that a team has R = RA, then this equation is a very good match for our expected .75*RPG + 3, as it would reduce to .732*RPG + 3.081. This is encouraging, since it should work with normal teams and offers the prospect for better performance with extreme teams.

Remember, though, that “extreme” teams in the context of this dataset is a lot more restrictive than extreme teams in the broader set--we've limited the data to only 3-7 R, 3-7 RA, and .3-.7 Cigol W%. If we step outside of that range, the equation will break down again. For example, a 10 R/5 RA has a RPW of 15.081 according to this equation, which suggests a .832 W% versus the .819 expected by Cigol. While this is not a catastrophic error (and much better than the .851 suggested by .75*RPG + 3), don’t lose sight of the fact that the W% function is non-linear.

If we use this equation on the rounded to nearest .05 1996-2006 major league data discussed in the last post, the RMSE times 162 is 3.858--just a tad worse than the RPW version that does not account for RD, but still comparable to (in fact, slightly lower RMSE than) the heavy hitters Pythagenpat and Cigol. It produces a very good match for Cigol over this dataset, in fact closer to Cigol than is Pythagenpat with z = .28.

A similar equation to this one was previously developed by Tango Tiger (which is where I got the idea to use abs(R - RA) as the second variable; there might be some other ways one could construct the equation and achieve a similar outcome) and posted on FanHome in 2001:

RPW = .756*RPG + .403*abs(R - RA) + 2.645

In this version, the lower intercept is offset by the higher coefficient on RD.

We can also attempt to improve the RPW estimate by using a non-linear equation. The best fit comes from a power regression, and again I will limit this to the 3-7 RPG, .300-.700 Cigol W% set of teams to produce this estimate:

RPW = 2.171*RPG^.691

This may look familiar, because as I have demonstrated in the past, the Pythagenpat implied RPW at a given RPG for a .500 team is 2*RPG^(1 - z). Here the implied z value of .309 is higher than we typically see (.27 - .29), but the form is essentially the same.

Any linear approximation might work well near the RPG/team quality level where it was constructed, but will falter outside of that range. We could develop an equation based on teams similar to the 10/5 example that would work well for them, but we’d necessarily lose accuracy when looking at normal teams. Non-linear W% functions allow us to capture a wider range of contexts with one particular equation. We can push the envelope a little bit by using a non-linear estimate of RPW, but we’d still have to be very careful as we varied the scoring context and skill difference between the teams.

Assuming we are not just satisfied with an equation to use for normal teams, all of this caution is a lot to go through to salvage a functional form that still allows for sub-zero or greater than one W% estimates. Instead, it makes more sense to attempt to construct a W% estimate that bounds W% between 0 and 1 and builds-in non-linearity. This of course is why Bill James and many sabermetricians who have followed have turned to the Pythagorean family of estimators.

## Wednesday, May 23, 2018

### Enby Distribution, pt. 7: Cigol at the Extremes--Runs Per Win

## Thursday, April 26, 2018

### Enby Distribution, pt. 6: Accuracy of Enby W% Estimate

In the last post, I demonstrated how one can estimate W% from any runs per game and runs per inning distribution by using the basic principles of how baseball games are decided. This model is simple conceptually, but a bear to implement computationally when compared to the other W% estimators that have been developed by sabermetricians over the last fifty years. As such, it is not a practical tool to use for common sabermetric applications of a winning percentage estimator. If you want to know how many games a team that scores 828 runs and allows 753 runs in a season can expect to win, there are any number of formulas that are better practical options than Enby.

However, it is important to verify that Enby is able to hold its own when estimating W% for normal teams. If it does not work as well as our other tools for normal situations, it will be harder to put any stock in its results when looking at extreme situations.

To check if Enby was up to the challenge, I performed a limited accuracy test based on 1996-2006 data (a sample of 326 teams). This was in no way intended to be a comprehensive accuracy test, but rather one with a sufficiently large sample to determine if Enby can predict normal teams with comparable accuracy to other approaches.

Since I have only calculated Enby distribution parameters at intervals of .05 RG, I rounded all team’s R/G and RA/G to the nearest .05 and used these figures as the inputs for all of the estimators. This ensured that they were all on equal footing, rather than Enby only having some imprecision in terms of the actual R and RA counts. In addition to Enby, I tested four other estimators:

* A simple assumption of 10 RPW

* Tango’s formula that varies RPW by RPG (runs per game for both teams): RPW = .75*RPG + 3. This formula (or at least something very close to it) can be derived by using Pythagenpat.

* Pythagorean with a fixed exponent of 1.83

* Pythagenpat using x = RPG^.28

The resulting RMSE for each estimator (W% RMSE multiplied by 162 for ease of interpretation):

The three methods which allow the relationship between runs and wins to vary by scoring context (either by explicitly changing the RPW factor or Pythagorean exponent, or by estimating the scoring distribution as Enby does) come out on top. The linear RPW formula wins here, although the best performer would be Pythagenpat with x = RPG^.29, edging it out at a 3.850 RMSE. Of course, we could also find the coefficients in Tango’s RPW formula that minimize error, and quite possibly push that method back ahead of Pythagenpat.

In any event, the three formulas allowing for customization are close enough that we can safely conclude that none is grossly deficient for the task of estimating W% for normal teams. That means that Enby has passed the first hurdle towards being taken seriously as a model for W% based on average runs scored and allowed.

I also thought it would be interesting to test the RMSE of using each W% estimator to predict Pythagenpat. This is obviously a biased approach, assuming that Pythagenpat is the standard by which other estimators should be compared. The real reason to do this is to see how closely Enby tracks Pythagenpat with normal teams, since Pythagenpat is the closest W% estimator in theory to Enby. Both attempt to dynamically model the relationship between runs and wins; the other approaches, even the dynamic RPW estimator, assume that there is a fixed relationship between runs and wins. We should expect Pythagenpat and Enby to be in general agreement. And they are (RMSE once again multiplied by 162):

Enby and Pythagenpat are essentially in lockstep. In fact, the largest discrepancy between the two is for 2002 Braves, who scored 4.40 and allowed 3.50 runs per game (rounded). Pythagenpat expects that such a team would have a W% of .6007, while Enby predicts a .5997 W%, a difference of .15 wins over the course of a season.

The minimum RMSE between Pythagenpat and Enby occurs when the Pythagenpat exponent is dropped slightly to .279 (.026 RMSE). As the exponent varies, the discrepancy increases; with a Pythagenpat exponent of .29, the RMSE is .274.

At this point, I’d like to pause for a moment and change the name of the Enby estimate of W%. This is just for my own sanity as I write and hopefully use these tools in the future, but I want to draw a distinction between the Enby distribution, which is used to estimate the probability of scoring k runs in a game, and the methodology described for estimating W%. I’m a little hesitant to put a name on it, since I haven’t earned that right--the logic is based in reality, not my insight, and has been used by many sabermetricians long before me. Plus, I’m not very good at making up these kinds of names--if you don’t believe me, re-examine the name of the blog.

This methodology is compatible with any means of estimating the probability of scoring k runs a game, whether empirically, through the Enby distribution, solely through the Tango Distribution (as Enby itself borrows from the Tango Distribution), the Weibull distribution (as implemented by Sal Baxamusa or Steven Miller), or any other approach that may be developed in the future. Going forward, I will be referring to this as the Cigol method. As Toirtap can attest, I like to spell things backwards when I am flummoxed. Since the W% estimator is based on simple logic, Cigol it is.

## Tuesday, March 27, 2018

### 2018 Predictions

See the standard disclaimers. This is an exercise in fun more than analysis, although hopefully there's a touch of the latter or you're just wasting your time.

AL EAST

1. Boston

2. New York (wildcard)

3. Toronto (wildcard)

4. Baltimore

5. Tampa Bay

Picking the Red Sox is something of a tradition in this space. I don’t do it on purpose, it’s just that my “model” (such as it is) has tended to pick them consistently. This year it’s a virtual tie with the Yankees; some projections agree with that, but others (notably PECOTA) see a huge advantage for the latter. The Yankees arguably were more impressive last season given their component statistics, and yet Boston’s offense should bounce back, their starting pitching should be better, their bullpen could benefit from some healthy pieces coming back...and if Aaron Judge and Giancarlo Stanton combine for even ninety homers the takes will be hot. The top-heaviness of the AL, with four teams that really stand out, leaves a team like the Blue Jays a stealthy wildcard contender. Incidentally, I have them +9 runs on both offense and defense. The Orioles added enough late and the Rays subtracted enough to make me flip-flop their places, but it would be surprising if either gets into this race.

AL CENTRAL

1. Cleveland

2. Minnesota

3. Kansas City

4. Detroit

5. Chicago

As a partisan am I always queasy about picking the Indians, but last year it worked out okay, and again this year the on-paper gap is just too large to superstitiously pick someone else. But it’s easier to see how the Indians might lose to the Twins in 2018 than it was to compare them to the field in 2017. While it’s easy to overstate the impact of the Twins pitching additions (one could argue that Jake Odorozzi and Lance Lynn would be no more than #4 starters for the Tribe, even #5 if Danny Salazar could get it together), Cleveland’s bullpen is showing signs of vulnerability without a lot of clear candidates to step in, there are still injury questions surrounding Jason Kipnis and Michael Brantley, the outfield is unsettled...but it’s also sometimes easier to worry about these things as a fan. The Twins true quality for 2016-2017 might be matched by the win total, but the distribution was all off. A plexiglass principle year would not surprise. The Royals kept just enough of the band together to a) still be annoying and b) provide some measure of optimism for their partisans, but probably more of the former. I’ve been calling for the Tigers to dead cat bounce for a couple years; I’m surrendering and just expecting it for Miguel Cabrera. The White Sox have a lot of prospects and could well be the future of this division, but it’s still a year or two away.

AL WEST

1. Houston

2. Los Angeles

3. Seattle

4. Texas

5. Oakland

The Astros, in my crude system, are the second-best team in the AL...on offense and defense. Just slightly behind the Yankees and the Indians, respectively; combined, that’s enough to declare them the best and most well-rounded team on paper. Prior to Shohei Ohtani’s rough showing in spring training, I was set to pick the Angels as the second wildcard. Is dropping them a small sample size overreaction? Quite possibly, yes, but there wasn’t much separating teams like the Angels, Blue Jays, and Twins to begin with. You have to feel bad (unless you’re a fan of the…wait, do they even have a rival of note) for the Mariners - they now have the longest playoff drought in North American sports. Longer than the Cleveland Browns (this has been true for years but it is a miscarriage of justice that it’s not the Browns that hold this dubious distinction). They’ve been good enough to squeak out a second wildcard for a few years, but it never came together, and the window may be closing. The Rangers franchise history from 2010 - 2017 will make a fascinating case study some day, but I don’t think 2018 will add another dramatic return from the dead to the story. I still like the A’s players and think they could contend in the coming years, but the starting pitching is too shaky to predict good things this season.

NL EAST

1. Washington

2. New York

3. Philadelphia

4. Atlanta

5. Miami

The Nationals are basically what they have been for the last six years -- the clear favorite in the NL East. This is probably the last year for them to enjoy that status, but that’s a pretty impressive run in a division that features two big markets and a Braves franchise that until some point in the Washington run had basically contended for 25 years. As a neutral observer, it would be nice to at least see them get a NLCS out of the deal. Everyone talks about the health of the Mets rotation, but I think scoring runs might be a bigger question mark. I like the Phillies over the Braves this season, but over the next five years I’d flip that. Philly is a popular second wildcard pick--while that’s certainly within the realm of possibility, it will take better than forecast performances from some of the rookies (JP Crawford, Jorge Alfaro, Nick Williams) and Maikel Franko to make that happen. The Marlins are obviously a sad team to ponder, but the fact that Derek Jeter’s halo is being tarnished in the process makes it more entertaining than the usual Miami teardown.

NL CENTRAL

1. Chicago

2. St. Louis (wildcard)

3. Milwaukee

4. Pittsburgh

5. Cincinnati

The Cubs have the best offense in the NL by my estimation (although they distributed their runs across games so unfortunately last season that it wasn’t evident in the standings), their rotation is stronger entering this season (relative to last April) with the acquisitions of Jose Quintana and Yu Darvish, and I think they’re ready to re-challenge the Dodgers for NL superiority. The Cardinals look like a solid 86 win team, which is enough to make them a wildcard favorite; if they win with it, it’s a departure from the Pujols-era Cardinal teams which always had big stars, although maybe Carlos Martinez will take a step forward or Marcell Ozuna will hold his level and people will recognize how good he is outside of Miami and Stanton’s shadow. I look at four sources for team win projections when writing these up: my own crude version (fueled by the Steamer projections published at Fangraphs and some manual overrides on my part), Fangraphs, PECOTA from Baseball Prospectus, and Clay Davenport’s. The Brewers projected wins range from 76 - 86, which is tied with PECOTA darling Tampa Bay for the largest spread. Mine is on the low end of the spectrum--it just doesn’t seem like they have the pitching, and they have an outfield/corners logjam that’s good for depth but bad for allowing all of their name hitters to fully contribute. Last year I held on to hope for the Pirates; now I think it’s safe to say their 2012 - 2015 revival is over (come here for the bold statements). Amazingly, they would have been better off to have been in the NL East. If the only baseball I was allowed to watch this year was the games of one of the teams I picked last, I’d go with the Reds. Joey Votto, Luis Castillo, some interesting bullpen pieces, Billy Hamilton as a side show…it’s a fun team if not a good one.

NL WEST

1. Los Angeles

2. Arizona (wildcard)

3. San Francisco

4. Colorado

5. San Diego

I might be shortchanging the Dodgers by not picking them as the best team in the NL. They are still really good, they still have good depth, they still have the resources to address issues, but you know all that. There’s not much to say other than to tip one’s cap to the machine. I’m not bullish on the Diamondbacks, per se, but I’ll see your Zack Greinke decline concerns and raise you Zack Godley. I was surprised at how well the Giants came out when I put my forecast spreadsheet together; I was expecting 78-82 wins. A few more put them in prime wildcard contention position, but that was before Bumgarner and Shark became huge injury concerns. I don’t think the Rockies offense is all that good. I don’t think you can expect Charlie Blackmon to be as good, I still am skeptical of DJ LeMahieu, catcher and first base aren’t exactly settled. The Padres are definitely intriguing going forward, but it’s too soon to expect contention.

WORLD SERIES

Houston over Chicago

AL ROY: RF Austin Hays, BAL

AL Cy Young: Trevor Bauer, CLE

I don’t actually think this is the most likely outcome, I just love Trevor Bauer.

AL MVP: SS Carlos Correa, HOU

NL ROY: SP Alex Reyes, STL

NL Cy Young: Stephen Strasburg, WAS

NL MVP: RF Bryce Harper, WAS

## Wednesday, February 28, 2018

### Enby Distribution, pt. 5: W% Estimate

While an earlier post contained the full explanation of the methodology used to estimate W%, it’s an important enough topic to repeat in full here. The methodology is not unique to Enby; it could be implemented with any estimate of the frequency of runs scored per game (and in fact I first implemented it with the Tango Distribution). As I discussed last time, the math may look complicated and require a computer to implement, but the model itself is arguably the simplest conceptually because it is based on the simple logic of how games are decided.

Let p(k) be the probability of scoring k runs in a game and q(m) be the probability of allowing m runs a game. If k is greater than m, then the team will win; if k is less than m, then the team will lose. If k and m are equal, then the game will go to extra innings. In setting it up this way, I am implicitly assuming that p(k) is the probability of scoring k runs in nine innings rather than in a game. This is not a horrible way to go about it since the average major league game has about 27 outs once the influences that cause shorter games (not batting in the ninth, rain) are balanced with the longer games created by extra innings. Still, it should be noted that the count of runs scored from a particular game does not necessarily arise from an equivalent opportunity context (as defined by innings or outs) of another game.

Given this notation, we can express the probability of winning a game in the standard nine innings as:

P(win 9) = p(1)*q(0) + p(2)*[q(0) +q(1)] +p(3)*[q(0) + q(1) + q(2)] + p(4)*[q(0) + q(1) + q(2) + q(3)] + ...

Extra innings will occur whenever k and m are equal:

P(X) = p(0)*q(0) + p(1)*q(1) + p(2)*q(2) + p(3)*q(3) + p(4)*q(4) + ...

When the game goes to extra innings, it becomes an inning by inning contest. Let n(k) be the probability of scoring k runs in an inning and r(m) be the probability of allowing m runs in an inning. If k is greater than m, the team wins; if k is less than m, the team loses; and if k is equal to m, then the process will repeat until a winner is determined.

To find the probability of each of the three possible outcomes of an extra inning, we can follow the same logic as used above for P(win 9). The probability of winning the inning is:

P(win inning) = n(1)*r(0) +n(2)*[r(0) +r(1)] +n(3)*[r(0) + r(1) + r(2)] + n(4)*[r(0) + r(1) + r(2) + r(3)] + ...

The probability of the game continuing (equivalent to tying the inning) is similar to P(extra innings above):

P(tie inning) = n(0)*r(0) + n(1)*r(1) +n(2)*r(2) + n(3)*r(3) + n(4)*r(4) + ...

The probability of winning in extra innings [P(win X)] is:

P(win X) = P(win inning) + P(tie inning)*P(win inning) + P(tie inning)^2*P(win inning) + P(tie inning)^3*P(win inning) + ...

This is a geometric series that simplifies to:

P(win X) = P(win inning)*[P(tie inning) + P(tie inning)^2 + P(tie inning)^3 + ...] = P(win inning)*1/[1 - P(tie inning)] = P(win inning)/[1 - P(tie inning)]

This could also be expressed in a very clever way using the Craps Principle if we had also computed P(lose inning); I did it that way last time, but it doesn’t really cut down on the amount of calculation necessary in this case.

Since I want these last few posts to serve as a comprehensive explanation of how to calculate the Enby run and win estimates, it is necessary to take a moment to review how to use the Tango Distribution to estimate the runs per inning distribution. c of course is the constant, set at .852 when looking with a head-to-head matchup. RI is runs/inning, which I’ve defined as RG/9:

a = c*RI^2

n(0) = RI/(RI + a)

d = 1 - c*f(0)

n(1) = (1 - n(0))*(1 - d)

n(k) = n(k - 1)*d for k >= 2

Once we have these three key probabilities [P(win 9), P(X), and P(win X)], the formula for W% is obvious:

W% = P(win 9) + P(X)*P(win X)

We will use the Enby Distribution to determine p(k) and q(m), and the Tango Distribution to determine n(k) and r(m). In both cases, we’ll use the Tango Distribution constant c = .852 since this works best when looking at a head-to-head matchup, which certainly is the applicable context when discussing W%.

I have put together a spreadsheet that will handle all of the calculations for you. The yellow cells are the ones that you can edit, with the most important being R (cell B1) and RA (cell L1), which naturally are where you enter the average R/G and RA/G for the team whose W% you’d like to estimate. The other yellow cell is for the c value of Tango Distribution. Please note that editing this cell will do nothing to change the Enby Distribution parameters--those are fixed based on using c = .852. Editing c in this cell (B8) will only change the estimates of the per inning scoring probabilities estimated by the Tango Distribution. I don’t advise changing this value, since .852 has been found to work best for head-to-head matchups and leaving it there keeps the Tango Distribution estimates consistent with the Enby Distribution estimates. The sheet also calculates Pythagenpat W% for a given exponent (which you can change in cell B15).

The calculator supports the same range of values as the one for single team run distribution introduced in part 9--RG at intervals of .25 between 0-3 and 7-15 runs, and at intervals of .05 between 3-7 runs. The vlookup function will round down to the next R/G value on the parameter sheet (for example, the two highest values supported are 14.75 and 15.00. You can enter 14.93 if you want, but the Enby calculation will be based on 14.75 (the Pythagenpat calculation will still be based on 14.93). Have some fun playing around with it, and next time we’ll look at how accurate the Enby estimate is compared to other W% models.

## Tuesday, February 13, 2018

### Doubles or Nothing

In previewing the season to come for any team, it is customary (for good reason) to start by taking a look back at the previous season. Sometimes this is a pleasant or at least unobjectionable experience. On some occasions, though, it forces one to review an absolute disaster of a season, as was turned in by the 2017 Ohio State Buckeyes.

OSU went 22-34, which was the lowest W% by a Buckeye club since 1974. Their 8-16 Big Ten record was the worst since 1987. The seven years in which Beals have been at the helm have produced a .564 W%, which excepting the largely overlapping span of 2008-2014, is the worst since 1986-1992. Beals has taken the program build by Bob Todd, who inherited the late 80s malaise, and driven it right back into mediocrity.

Yet merrily he rolls along, untroubled by the pressures of coaching at a school that fired its all-time winningest basketball coach for having two straight NCAA tournament misses, despite compiling a .500 record in Big Ten play over those two seasons. Beals and his unenlightened brand of baseball may be too small fry to draw the ire of AD Gene Smith, but tell that to the track, gymnastics, and women’s hockey coaches who have been pushed out in recent years. Beals record of doing less with a historically strong program is unmatched at the University.

When one peruses the likely lineup for 2018, it’s hard to think that a turnaround is imminent. Stranger things have happened, of course, but eight years into his tenure in Columbus, enough time to have nearly turned over two whole recruiting classes with no overlap, he is still plugging roster wholes with unproven JUCO transfers, failing to develop the high school recruits he’s brought in. It’s gotten to the point that if a player doesn’t find a role as a freshman, you can basically write him off as a future contributor.

Junior Jacob Barnwell is firmly ensconced at catcher; he was an average hitter last year and appears to have the coach seal of approval as a receiver, so he’s golden for playing time over the next two seasons. True freshman Dillon Dingler may be the heir apparent, with junior Andrew Fishel and redshirt freshman Scottie Seymour providing depth.

Seniors Bo Coolen and Noah McGowan, both JUCO transfers a year ago, will compete for first base; Coolen was bad offensively in 2017 with no power (.074 ISO), McGowan a little better but still below average. Junior Brady Cherry will move from the hot corner to the keystone, a curious move to this observer; Cherry flashed power as a freshman but was middling with the bat last year. That opens up third for sophomore Connor Pohl, who filled in admirably at second last year but does look more like a third baseman; on a rate basis he was the second most productive returning hitter, although it wasn’t a huge sample size (89 PA and it was very BA-heavy with a .325 BA/.225 SEC). JUCO transfer junior Kobie Foppe is penciled in at shortstop. The utility infielders are both sophomores; Noah West played more as a freshman, getting starts at second base (he didn’t hit at .213/.278/.303) and serving as a defensive replacement for Pohl, while Carpenter had 14 hitless (one walk) PAs. True freshman Aaron Hughes rounds out the roster.

Senior Tyler Cowles has the inside track at left field, coming off a first season as a JUCO transfer in which he hit .190/.309/.314 over 129 PA. McGowan could also contend for this spot, with backup outfield redshirt juniors Nate Romans and Ridge Winand also in the mix. JUCO transfer Malik Jones has been anointed as the centerfielder, with true freshman Jake Ruby as an understeady. Right field along with catcher is the only spot on the roster that features an established starter at the same position; sophomore Dominic Canzone is OSU’s best returning hitter, although it was BA heavy (.343 BA/.205 SEC). Some combination of Cowles, McGowan, and Fishel would appear to have the first crack at DH.

OSU’s pitching was an utter disaster last year, partly due to injury and partly because, well, Greg Beals. The only sure bet for the rotation appears to be senior Adam Niemeyer, with junior lefty Connor Curlis and senior Yianni Pavlopoulos (who closed as a sophomore) most likely to join him. Their RAs were 6.23, 5.03, and 7.65 respectively in 2017, although only Curlis had good health. Junior Ryan Feltner pitched poorly last year (7.32 RA over 62 IP despite 8.2 K/9), then went to the Cape Cod league and was named Reliever of the Year. Sophomore Jake Vance had a 6.92 RA over 26 innings, largely thanks to 20 walks, and is the fifth rotation candidate.

The perennial bright spot of the pitching staff is senior righty Seth Kinker, who easily led the team with 13 RAA over 58 innings, even getting 3 starts when everything fell to pieces. He figures to be the go-to reliever, with fifth-year senior righties Kyle Michalik, Austin Woody, and Curtiss Irving in middle relief. You’re not going to believe this, but their RAs ranged between 6.85 and 7.94 over a combined 66 innings. Sophomore Thomas Waning will follow Kinker and Michalik in one of Beals’ good traits, which is an affinity for sidearmers; Waning was effective (11 K, 4 W) in a 12 inning injury-shortened debut season. Junior Dustin Jourdan will be in the mix as well.

Beals also has an affinity for lefty specialists, which he will have to cultivate anew from sophomore Andrew Magno (4 appearances in 2016) and true freshman Luke Duermit, Griffan Smith, and Alex Theis.

The schedule is fairly typical, with the opening weekend (starting Friday) featuring a pair of games with both Canisus and UW-Milwaukee in Florida. The following weekend will see the Bucks in Arizona for the Big Ten/Pac-12 Challenge where they’ll play two each against Utah and Oregon State. Another trip to Florida to play low-level opponents (Nicholls State, Southern Miss, and Eastern Michigan) follows, followed by a trip to the Carolinas that will feature two games each against High Point, Coastal Carolina, and UNC-Wilmington.

Bizarrely, the home schedule opens March 16 with a weekend series against Cal St-Northridge; usually any home dates with non-Northern opponents come later in the calendar. Another non-conference weekend series against Georgetown follows, and then Big Ten play: Nebraska, @ Iowa, @ Penn St, Indiana, Minnesota, Illinois, Purdue, @ Michigan St. Mixed in will be a typically home-heavy mid-week slate (Eastern Michigan, Toledo, Kent St, Ohio University, Miami, Campbell) with road games at Ball St and Cincinnati.

As I wrote the roster outlook (which relied on my own knowledge and guesses but also heavily on the season preview released by the athletic department), two things that I already thought I knew struck me even more plainly.

1) This team does not appear to be very good. One can construct a rosy scenario where the pitching woes of 2017 were due largely to injury, but we’re talking about pitcher injuries. It takes extra tint on those glasses. It has to be better than last year, when nine pitchers started at least three games, but this team was 22-34; “better” isn’t going to cut it.

2) The offense has a couple solid returnees, but in the eighth year of Beals tenure, major positions on the diamond are still being papered over with JUCO transfers. There is no pipeline of young players getting their feet wet in utility roles and transitioning into starting as you would expect in a healthy program. There are no freshman studs to come in and commandeer lineup positions as you would expect in a strong program. It is quite easy to imagine a scenario in which five of the nine lineup spots are held by first or second-year JUCO transfers.

Beals has failed in recruiting, he has failed in player development, and most importantly he has failed to win at the level to which an OSU program should aspire. I’ve devoted many words in previous season previews and recaps (and the hashtag #BealsBall) to his asinine tactics. I won’t rehash that here, but I will end with a quote from the Meet the Team Dinner that program icon Nick Swisher was roped into headlining, which makes one seriously question in what decade Mr. Beals thinks he coaches:

*“Our goal in 2018 is to hit a lot of doubles,” said Beals on Saturday night.
*

## Monday, January 08, 2018

### Run Distribution and W%, 2017

I always start this post by looking at team records in blowout and non-blowout games. This year, after a Twitter discussion with Tom Tango, I’ve narrowed the definition of blowout to games in which the margin of victory is six runs or more (rather than five, the definition used by Baseball-Reference and that I had independently settled on). In 2017, the percentage of games decided by x runs and by >= x runs are:

If you draw the line at 5 runs, then 21.6% of games are classified as blowouts. At 6 runs, it drops to 15.3%. Tango asked his Twitter audience two different but related poll questions. The first was “what is the minimum margin that qualifies a game as a blowout?”, for which the plurality was six (39%, with 5, 7, and 8 the other options). The second was “what percentage of games do you consider to be blowouts?”, for which the plurality was 8-10% (43%, with the other choices being 4%-7%, 11%-15%, and 17-21%). Using the second criterion, one would have to set the bar at a margin of seven, at least in 2017.

As Tango pointed out, it is of interest that asking a similar question in different ways can produce different results. Of course this is well-known to anyone with a passing interest in public opinion polling. But here I want to focus more on some of the pros and cons of having a fixed standard for a blowout or one that varies depending on the actual empirical results from a given season.

A variable standard would recognize that as the run environment changes, the distribution of victory margin will surely change (independent of any concurrent changes in the distribution of team strength), expanding when runs are easier to come by. Of course, this point also means that what is a blowout in Coors Field may not be a blowout in Petco Park. The real determining factor of whether a game is a blowout is whether the probability that the trailing team can make a comeback (of course, picking one standard and applying to all games ignores the flow of a game; if you want to make a win probability-based definition, go for it).

On the other hand, a fixed standard allows the percentage of blowouts to vary over time, and maybe it should. If the majority of games were 1-0, it would sure feel like there were vary few blowouts even if the probability of the trailing team coming back was very low. Ideally, I would propose a mixed standard, in which the margin necessary for a blowout would not be a fixed % of games but rather somehow tied to the average runs scored/game. However, for the purpose of this post, Tango’s audience answering the simpler question is sufficient for my purposes. I never had any strong rationale for using five, and it does seem like 22% of games as blowouts is excessive.

Given the criterion that a blowout is a game in which the margin of victory was six or more, here are team records in non-blowouts:

Records in blowouts:

The difference between blowout and non-blowout records (B - N), and the percentage of games for each team that fall into those categories:

Keeping in mind that I changed definitions this year (and in so doing increased random variation if for no reason other than the smaller percentage of games in the blowout bucket), it is an oddity to see two of the very best teams in the game (HOU and WAS) with worse records in blowouts. Still, the general pattern is for strong teams to be even better in blowouts, per usual. San Diego stands out as the most extreme team, with an outlier poor record in blowouts offsetting an above-.500 record in non-blowouts, although given that they play in park with the lowest PF, their home/road discrepancy between blowout frequency should theoretically be higher than most teams.

A more interesting way to consider game-level results is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:

The “marg” column shows the marginal W% for each additional run scored. In 2017, three was the mode of runs scored, while the second run resulted in the largest marginal increase in W%.

The major league average was 4.65 runs/game; at that level, here is the estimated probability of scoring x runs using the Enby Distribution (stopping at fifteen):

In graph form (again stopping at fifteen):

This is a pretty typical visual for the Enby fit to the major league average. It’s not perfect, but it is a reasonable model.

In previous years I’ve used this observed relationship to calculate metrics of team offense and defense based on the percentage of games in which they scored or allowed x runs. But I’ve always wanted to switch to using theoretical values based on the Enby Distribution, for a number of reasons:

1. The empirical distribution is subject to sample size fluctuations. In 2016, all 58 times that a team scored twelve runs in a game, they won; meanwhile, teams that scored thirteen runs were 46-1. Does that mean that scoring 12 runs is preferable to scoring 13 runs? Of course not--it's a quirk in the data. Additionally, the marginal values don’t necessary make sense even when W% increases from one runs scored level to another.

2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored.

3. Related to #2 (really its root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce more quirks into the data.

So this year I am able to use the Enby Distribution. I have Enby Distribution parameters at each interval of .05 runs/game. Since it takes a fair amount of manual work to calculate the Enby parameters, I have not done so at each .01 runs/game, and for this purpose it shouldn’t create too much distortion (more on this later). The first step is to take the major league average R/G (4.65) and park-adjust it. I could have park-adjusted home and road separately, and in theory you should be as granular as practical, but the data on teams scoring x runs or more is not readily available broken out between home and road. So each team’s standard PF which assumes a 50/50 split of home and road games is used. I then rounded this value to the nearest .05 and calculated the probability of scoring x runs using the Enby Distribution (with c = .852 since this exercise involves interactions between two teams).

For example, there were two teams that had PFs that produced a park-adjusted expected average of 4.90 R/G (ARI and TEX). In other words, an average offense playing half their games in Arizona's environment should have scored 4.90 runs/game; an average defense doing the same should have allowed 4.90 runs/game. The Enby distribution probabilities of scoring x runs for a team averaging 4.90 runs/game are:

For each x, it’s simple to estimate the probability of winning. If this team scores three runs in a particular game, then they will win if they allow 0 (4.3%), 1 (8.6%), or 2 runs (12.1%). As you can see, this construct assumes that their defense is league-average. If they allow three, then the game will go to extra innings, in which case they have a 50% chance of winning (this exercise doesn’t assume anything about inherent team quality), so in another 13.6% of games they win 50%. Thus, if this the Diamondbacks score three runs, they should win 31.8% of those games. If they allow three runs, it’s just the complement; they should win 68.2% of those games.

Using these probabilities and each team’s actual frequency of scoring x runs in 2017, I calculate what I call Game Offensive W% (gOW%) and Game Defensive W% (gDW%). It is analogous to James’ original construct of OW% except looking at the empirical distribution of runs scored rather than the average runs scored per game. (To avoid any confusion, James in 1986 also proposed constructing an OW% in the manner in which I calculate gOW%, which is where I got the idea).

As such, it is natural to compare the game versions of OW% and DW%, which consider a team’s run distribution, to their OW% and DW% figured using Pythagenpat in a traditional manner. Since I’m now using park-adjusted gOW%/gDW%, I have park-adjusted the standard versions as well. As a sample calculation, Detroit averaged 4.54 R/G and had a 1.02 PF, so their adjusted R/G is 4.45 (4.54/1.02). OW% The major league average was 4.65 R/G, and since they are assumed to have average defense we use that as their runs allowed. The Pythagenpat exponent is (4.45 + 4.65)^.29 = 1.90, and so their OW% is 4.45^1.90/(4.45^1.90 + 4.65^1.90) = .479, meaning that if the Tigers had average defense they would be estimated to win 47.9% of their games.

In previous year’s posts, the major league average gOW% and gDW% worked out to .500 by definition. Since this year I’m 1) using a theoretical run distribution from Enby 2) park-adjusting and 3) rounding team’s park-adjusted average runs to the nearest .05, it doesn’t work out perfectly. I did not fudge anything to correct for the league-level variation from .500, and the difference is small, but as a technical note do consider that the league average gOW% is .497 and the gDW% is .503.

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate, with the teams in descending order of absolute value of the difference):

Positive: SD, TOR

Negative: CHN, WAS, HOU, NYA, CLE

The Cubs had a standard OW% of .542, but a gOW% of .516, a difference of 4.3 wins which is the largest such discrepancy for any team offense/defense in the majors this year. I always like to pick out this team and present a graph of their runs scored frequencies to offer a visual explanation of what is going on, which is that they distributed their runs less efficiently for the purpose of winning games than would have been expected. The Cubs average 5.07 R/G, which I’ll round to 5.05 to be able to use an estimated distribution I have readily available from Enby (using the parameters for c = .767 in this case since we are just looking at one team in isolation):

The Cubs were shutout or held to one run in eight more games than one would expect for a team that average 5.05 R/G; of course, these are games that you almost certainly will not win. They scored three, four, or six runs significantly less than would be expected; while three and four are runs levels at which in 2017 you would expect to lose more often then you win, even scoring three runs makes a game winnable (.345 theoretical W% for a team in the Cubs’ run environment). The Cubs had 4.5 fewer games scoring between 9-12 runs than expected, which should be good from an efficiency perspective, since even at eight runs they should have had a .857 W%. But they more than offset that by scoring 13+ runs in a whopping 6.8% of their games, compared to an expectation of 2.0%--7.8 games more than expected where they gratuitously piled on runs. Chicago scored 13+ runs in 11 games, with Houston and Washington next with nine, and it’s no coincidence that they were also very inefficient offensively.

The preceding paragraph is an attempt to explain what happened; despite the choice of non-neutral wording, I’m not passing judgment. The question of whether run distribution is strongly predictive compared to average runs has not been studied in sufficient detail (by me at least), but I tend to think that the average is a reasonable indicator of quality going forward. Even if I’m wrong, it’s not “gratuitous” to continue to score runs after an arbitrary threshold with a higher probability of winning has been cleared. In some cases it may even be necessary, as the Cubs did have three games in which they allowed 13+ runs, although they weren’t the same games. As we saw earlier, major league teams were 111-0 when scoring 13+ runs, and 294-17 when scoring 10-12.

Teams with differences of +/- 2 wins between gDW% and standard DW%:

Positive: SD, CIN, NYN, MIN, TOR, DET

Negative: CLE, NYA

San Diego and Toronto had positive differences on both sides of the ball; the Yankees and Cleveland had negative difference for both. Thus it is no surprise that those teams show up on the list comparing gEW% to EW% (standard Pythagenpat). gEW% combines gOW% and gDW% indirectly by converting both to equivalent runs/game using Pythagenpat (see this post for the methodology):

Positive: SD, TOR, NYN, CIN, OAK

Negative: WAS, CHN, CLE, NYA, ARI, HOU

The Padres EW% was .362, but based on the manner in which they actually distributed their runs and runs allowed per game, one would have expected a .405 W%, a difference of 6.9 wins which is an enormous difference for these two approaches. In reality, they had a .438 W%, so Pythagenpat’s error was 12.3 wins which is enormous in its own right.

gEW% is usually (but not always!) a more accurate predictor of actual W% than EW%, which it should be since it has the benefit of additional information. However, gEW% assumes that runs scored and allowed are independent of each other on the game-level. Even if that were true theoretically (and given the existence of park factors alone it isn’t), gEW% would still be incapable of fully explaining discrepancies between actual records and Pythagenpat.

The various measures discussed are provided below for each team.

Finally, here are the Crude Team Ratings based on gEW% since I hadn’t yet calculated gEW% when that post was published: