Even for this series, this is an esoteric topic, but I wanted to specifically explore how Enby, Cigol, runs per win, Pythagorean exponent, etc. behaved around 1 RPG. 1 RPG is not a particularly interesting point from a real-world baseball perspective. Take 20 RPG. This is an outlandish level of scoring for teams, but one can easily imagine a theoretical scenario constructed from real players, and using the types of constructs that have sometimes been used by sabermetricians (for instance, a team of Babe Ruths with average pitching playing a team of Ty Cobbs with average pitching) in which 20 RPG would be the context. But 1 RPG? Maybe if you have a team of Rey Ordonezes facing Pedro Martinez 1999, but Pedro Martinez 1999 is backed by a team of Bill Bergens and they have to face Lefty Grove 1931?

Still, 1 RPG is of interest in the world of win estimators, as it is the point that led to Pythagenpat (and thus my own intense interest in win estimators). As you know, 1 RPG is the minimum possible scoring level since a game doesn’t end until at least one run is scored. This insight, which to my knowledge was first proffered by David Smyth, led to my discovery of the Pythagenpat exponent (and I believe Smyth’s as well). So it will always hold a special interest to me, regardless of how impractical any application may be.

In order to facilitate this, I expanded my list of Enby and Cigol parameters (the difference is that Enby uses c = .767 in the Tango Distribution and Cigol uses c = .852) to look at each .05 RPG interval from .05 - 1.95. First, using the Enby pararmeters is a graph of the estimated probability of scoring X runs for teams that average .5, 1, 1.5, and 2 R/G:

I deliberately cut-off the .5 R/G team’s probability of being shutout, which is 68.7%, in order to increase the space available for other points by about 40%. One thing that should stand out if you’ve looked at any of the other graphs of this type I’ve posted is that the distinctive shape (which for the lack of a more precise term I’ll call left tail truncated, extremely elongated right tail bell curve) is not present. For all of these teams except the 2 R/G, the probability of scoring x+1 runs is always lower than the probability of scoring x runs. The 2 R/G team is actually the first at .05 intervals that achieves this modest success; teams that average 1.95 R/G are expected to be shutout in 25.1% of games and score one run in 25.0%. At 2, it is 24.3% and 24.7% respectively.

My real interest with these teams is how RPW and Pythagenpat exponent might behave at such low levels of scoring. In order to test this, I generated a Cigol W% for each possible matchup between teams average .05 - 2 R/G at intervals of .05. I included inverse matchups (e.g. 1.25 R/G and 2 RA/G as well as 2 R/G and 1.25 RA/G), but eliminated cases where R = RA (obviously W% is .500 at these points). I also eliminated cases in which R + RA < 1, since these are impossible:

The relationship between RPG and RPW, even in this extremely low scoring context, is generally as we’d expect. The power regression line is a decent fit and takes a very satisfying form, as Pythagenpat RPW can be shown to be equal to 2*RPG^(1 - z). The implied z value here is lower than the .27 - .29 used for more normal environments, but close enough to suggest that Pythagenpat, which is correct by definition at 1 RPG, remains a useful tool at slightly higher RPGs.

To test that more directly, we can look at the required Pythagorean exponents for these teams plotted against RPG as well:

This graph is less encouraging. At first glance the most disturbing this is that the power regression doesn’t do a great job of fitting the data, as it produces Pythagorean exponents too low for the higher scoring contexts. The only way to achieve a RPG approaching 4 given how I defined this dataset is to have teams that are fairly evenly matched, while wide gaps in team quality can pop up at low RPG (for example, we could get 1 RPG from .05 R/.95 RA at one extreme of imbalance or .5 R/.5 RA at the other). This again suggests that the imbalance between the two teams has a material impact on the needed Pythagorean exponent, but one that I’ve as of yet been unable to successfully capture in a satisfactory equation.

The more alarming thing about these results is they show a fraying of the Cigol W% estimates from Smyth’s logical conclusion that underpins Pythagenpat--namely that a 1 RPG team will win the same number of games as runs they score. For the nine unique pairs of R/RA (not counting their inverses), the Cigol W% is off slightly, as you can see the needed Pythagorean exponents at 1 RPG are not equal to 1:

True W% is equal to R/G, and the error/162 is (Cigol W% - True W%)*162. The errors are not horrible, all well within one standard deviation of the typical Pythagenpat error for normal major league teams, but they still could into question the theoretical validity of the Cigol estimates in extremely low scoring contexts.

I redid the graph by replacing the Cigol estimates for these nine teams and their inverses with the True W%. This only corrects the W% for cases where we think for the moment that by definition Cigol is wrong; if that is so, Cigol is likely causing significant distortions at scoring levels just above 1 RPG as well, which are not corrected. I never expected Cigol to be a perfect model (or, to phrase it more precisely, I never expected any actual implementation of Cigol to be a perfect model; the mathematical underpinnings of Cigol, given the assumption of independence of runs scored and allowed, are true by definition), but I have written much of this series as if Cigol and the previously unnamed “True W%” were one in the same. This is not the case, but it is always a bit disappointing when you find a blemish in your model.

With these corrections, we have this graph and regression equations:

This doesn’t do much to change the regression equations (changing eighteen observations out of 1,398 generally will not), but at least it looks better to have observations at (1, 1). I don’t have any correction to offer to Enby/Cigol itself to solve this problem; my inclination is to assume there are two problems at play:

1) that the estimate probability of being shutout, the Enby parameter z, for which I use the Tango Distribution to estimate, doesn’t hold up at these extremely low scoring levels. Maybe the Tango Distribution c parameter, which varies based on whether the question revolves around one team’s runs per inning scoring distribution or at matchup between two teams, inherently assumes covariance between R and RA that doesn’t hold when only one team scores in a game by definition (at 1 RPG, and many other games between teams for which RPG is slightly greater than 1 would end 1-0 as well). But that is just a guess, and one that might appear to a reader to throw the other method under the bus. I don’t mean it in that way at all, of course; the Tango Distribution was not developed to be an input into a runs/game distribution.

2) Regardless of the z parameter, Cigol assumes that runs scored and runs allowed are independent between the two teams and from game to game. But when I say that a team that plays scored .6 R/G and allows .4 must have a .600 W%, I am referring to a team that has actually put up those figures over some period of time. This is still not the same as saying that the team is a true .6/.4 team. And so there is not necessarily a flaw in Cigol at all. Enby (using the c = .852 parameters) expects a true talent .6 R/G team to score more than one run in 13.9% of their games. So it would be extremely unlikely that any team, even at these ridiculously low scoring levels, could ever produce a 1 RPG over a period of several games or longer.

But redefining the question in terms of true talent means that you could have a true talent .3 R/.4 RA team, for instance. I unceremoniously tossed these teams out of the dataset earlier, but they should have been included. So I will quickly look at Cigol’s estimate of the necessary Pythagorean exponent for these teams (these are teams scoring and allowing .05 - .9 runs per game at intervals of .05 with a total R+RA < 1):

This isn’t interesting except as confirmation that the lower bound for the exponent is 1, which means that Pythagenpat fails for these teams. Pythagenpat will allow these teams to have exponents below 1. For example, .5 RPG is a Pythagenpat exponent around .5^.28 = .824.

For the sake of the rest of this discussion, I will no longer hew to a strict requirement that the exponent be equal to 1 at any point (only that it never dip below 1). In its place, let me propose an alternate set of rules for an equation to estimate the Pythagorean exponent to be valid:

1) the exponent must always increase with RPG if R = RA (or, the equation need not be strictly limited to using RPG; however, it must strictly increase with RPG for a theoretically average team. I don’t know for sure that this is a theoretical imperative, but I want to preclude the use of a quadratic model that might appear to be a good fit but with a negative coefficient for the x^2 term which results in a negative derivative when x is large

2) the exponent must be close to 1 at 1 RPG. If we came up with a power regression that said the exponent = 1.02*RPG^.272, for instance, that would be fine. It’s close to 1.

Once I decided that I didn’t need to adhere to the constraint that x = 1 when RPG = 1, I tried a number of forms of x = RPG^z plus some other term that incorporated run differential. Here are a handful of the more promising ones:

x = 1.03841*RPG^.265 + .00114*RD^2 (RMSE = 4.0084)

x = 1.04567*RPG^.2625 + .00113*RD^2 (RMSE = 4.0082)

x = 1.05299*RPG^.26 + .00113*RD^2 (RMSE = 4.0080)

x = 1.05887*RPG^.258 + .00113*RD^2 (RMSE = 4.0077)

x = 1.03059*RPG^.27 + .16066*(RD/RPG)^2 (RMSE = 4.0076)

x = 1.04561*RPG^.265 + .15274*(RD/RPG)^2 (RMSE = 4.0076)

x = 1.01578*RPG^.275 + .16862*(RD/RPG)^2 (RMSE = 4.0080)

I must have run thirty regressions, looking for some formula that would beat 4.0067 (the minimum RMSE for an optimized Pythagenpat for 1961-2014 major league teams). Just to give you an idea of how silly I got, I tried this equation to estimate x (the Pythagorean exponent, eschewing the Pythagenpat construct):

x = 10^(.30622 * log(RPG) + .0091*log(RD^2/RPG) - .01342) (RMSE = 4.011)

Abandoning for a moment the attempt to get a lower RMSE with major league teams, how do those equations fare with the full Cigol dataset compared to Pythagenpat? In this case the RMSE is comparing the estimated W% from the formula in question to the Cigol estimate. Using z = .2867 (the value that optimizes RMSE for the 1961-2014 major league teams), the RMSE (per 162 games) is .46784. Using z = .2852 (the value that optimized RMSE for the full Cigol dataset), the RMSE is .46537. For each of the equations above:

x = 1.03841*RPG^.265 + .00114*RD^2 (RMSE = .37791)

x = 1.04567*RPG^.2625 + .00113*RD^2 (RMSE = .40180)

x = 1.05299*RPG^.26 + .00113*RD^2 (RMSE = .42551)

x = 1.05887*RPG^.258 + .00113*RD^2 (RMSE = .44487)

x = 1.03059*RPG^.27 + .16066*(RD/RPG)^2 (RMSE = .56590)

x = 1.04561*RPG^.265 + .15274*(RD/RPG)^2 (RMSE = .60852)

x = 1.01578*RPG^.275 + .16862*(RD/RPG)^2 (RMSE = .52524)

At least we can do better with the full Cigol dataset with a more esoteric construct than just using a fixed z value. But the practical impact is very small, and as we’ve seen these formulas add nothing to the accuracy of estimates for normal major league teams and sacrifice a bit of theoretical grounding.

## Wednesday, May 29, 2019

### Enby Distribution, pt. 10: Behavior Near 1 RPG

## Wednesday, March 27, 2019

### 2019 Predictions

This is a blog intended for sabermetrically-inclined readers. I shouldn’t have to spell out a list of caveats about the for entertainment purposes only content that follows, and I won’t.

AL EAST

1. New York

2. Boston (wildcard)

3. Tampa Bay

4. Toronto

5. Baltimore

I usually don’t actually show the numbers that come out of my “system” such as it is - it is not as robust a system as PECOTA or Fangraphs’ or Clay Davenport’s predictions, simplifying where the others are more rigorous and fed by other people’s player projections, because why bother reinventing that wheel when others have already done it so well? But in the case of the 2019 AL I think the estimates for the top four teams are illustrative of my failure to commit to any of this:

NYA 822/653, 100

HOU 814/653, 99

BOS 850/683, 99

CLE 783/634, 98

That’s (R/RA, Wins) in case it wasn’t obvious. So I can make bland statements like “the Red Sox appear to have a little better offense but worst defense than the Yankees”, but beyond that there’s not much to say other than it should be another entertaining season. It does appear to me that the Yankees and Astros have more surplus arms sitting around than the other contenders, and that’s certainly not a bad thing and something that the crude projection approach I take ignores. I’d expect Tampa Bay to take a step back from 2018 with a subpar offense. The Blue Jays are interesting as a sleeper, especially if the prospects show up and play more to their potential than their 2019 baseline expectation. Baltimore has two things going for them - I have Miami as worse on paper, and at least they’re trying a new approach. Actually three, because Camden Yards is awesome.

AL CENTRAL

1. Cleveland

2. Minnesota

3. Detroit

4. Kansas City

5. Chicago

The Indians are still the easy divisional favorite, to an extent that surprised me when I actually put the numbers to it. They are closer to the big three in the AL (in fact, right behind by my reckoning) than they are to the Twins. It’s easy to look at the negatives – a borderline embarrassing outfield, an unsettled bullpen with little attempt to add high upside depth, a clustering of the team’s excellence in starting pitching which is more prone to uncertainty. But it’s worth keeping in mind that Cleveland underplayed their peripherals last year (although less their PW% than their EW%) - they have some room to decline while still projecting to win 90 as they did last year. Names like Sano and Buxton both make the Twins offense look better than it actually figures to be while also giving it more upside than a typical team, but they look like a slightly above average offense and slightly below average defense. You can throw a blanket over the three teams at the bottom - the order I’ve picked them for 2019 is the reverse order of the optimism I’d hold for 2020 as a fan of those teams.

AL WEST

1. Houston

2. Los Angeles (wildcard)

3. Oakland

4. Seattle

5. Texas

Houston is an outstanding team once again, a World Series contender with room for misfortune. The Angels are my tepid choice for second wildcard - the Rays are in a tough division, the Twins could feast on the Central underlings but look like about as .500 of a team on paper as you can get, while the A’s can expect some regression on both offense and the bullpen. The Angels have huge rotation question marks, but all of these teams are flawed. The Mariners and Rangers both strike me as teams that could easily outplay projections; alas, it would take a surfeit of that to get into the race.

NL EAST

1. Philadelphia

2. Washington (wildcard)

3. New York

4. Atlanta

5. Miami

This should be interesting. It’s easy to overrate the Phillies given that they were in the race last year when they really shouldn’t have been as close. It would be easy to overrate the Braves, who arrived early. It would be easy to underrate the Nationals, losing their franchise icon while bringing in another ace and graduating another potential outfield star. It would be easy to underrate the Mets, who are generally a disaster but still have talent. The only thing that wouldn’t be easy to do is trick yourself into thinking the Marlins are going to win.

NL CENTRAL

1. Chicago

2. Milwaukee (wildcard)

3. St. Louis

4. Cincinnati

5. Pittsburgh

I have this about dead even on paper, but I give a slight edge to the Cubs with a bounce back from Kris Bryant and a more settled (if aging) rotation. The Brewers are legit, and their rotation should benefit from some arms that were used as swingmen last year getting a shot at starting. But the bullpen will likely be worse and some offensive regression shouldn’t come as a surprise. The Cardinals and Reds are a bit further back on paper, but close enough that it wouldn’t be that surprising if they played themselves into the mix. As a semi-Reds fan I’m a little skeptical about the chances of the quick transitional rebuild actually paying off. The Pirates look easily like the best team that I’ve picked last; the start of 2018 is a good reminder that teams like this can find themselves in the race.

NL WEST

1. Los Angeles

2. Colorado

3. Arizona

4. San Diego

5. San Francisco

The Dodgers run in the NL West is underappreciated due to their failure to win the World Series and people inclined to write it off because of their payroll. I like their divisional chances better in 2019 as only the Rockies are real challengers. I’d put Colorado in the second tier of NL contenders with Cincinnati, St. Louis, New York, and Atlanta. If you can figure out if Arizona is starting a rebuild or trying to do one of those on-the-fly retools, let me know. Maybe let Mike Hazen know too. The Padres are interesting in that the prospects that have shown up so far haven’t lived up to expectations yet, but there are more and LOOK MANNY MACHADO. The Giants with Machado or Harper would have been the opposite of the Padres, more or less, which is considerably less interesting.

WORLD SERIES

Houston over Los Angeles

Or Houston or Boston. They’re basically interchangeable.

AL MVP: CF Mike Trout, LAA

AL Cy Young: Trevor Bauer, CLE

AL Rookie of the Year: LF Eloy Jimenez, CHA

NL MVP: 3B Nolan Arenado, COL

NL Cy Young: Aaron Nola, PHI

NL Rookie of the Year: CF Victor Robles, WAS

## Saturday, February 09, 2019

### Pitching Optional?

What happens when you take a team that got into the NCAA tournament despite finishing in the middle of the pack in its conference and relying on a makeshift pitching staff and remove the few reliable pitchers while leaving much of the offense intact? Does this sound interesting to you, like an experiment cooked up in the lab of a mad sabermetrician (or more likely a resident of that state up north)? If so, you may be interested in the 2019 Buckeyes.

In the ninth season of the seemingly never-ending Greg Beals regime, he once again has an entire unit with next to no returning experience. Sometimes this is unavoidable in college sports, but it happens to Beals with regularity as player development does not appear to be a strong suit of the program. Players typically either make an impact as true freshmen or are never heard from, while JUCO transfers are a roster staple to paper over the holes. The only difference with this year’s pitching situation is that holes are largely being plugged with freshmen rather than transfers.

The three pitchers penciled in as the rotation have precious little experience, with two true freshmen and a junior with 24 appearances and 11 starts in his career. Lefty Seth Lonsway was a nineteenth-round pick of Cincinnati and will be joined by classmate Garrett Burhenn, with Jake Vance as the junior veteran. Vance was +3 RAA in 36 innings last year, which doesn’t sound like much until you consider the dearth of returning performers on the rest of the staff.

Midweek starts and long relief could fall to sophomore lefty Griffan Smith, who was not effective as a freshman (-7 RAA in 32 innings). The other veteran relievers are junior Andrew Magno (sidelined much of last season with an injury, but Beals loves his lefty specialists so if healthy he will see the mound) and senior sidewarmer Thomas Waning, who was promising as a sophomore but coughed up 18 runs in 16 frames in 2018. A trio of freshmen righties are said to throw 90+ MPH (Bayden Root, TJ Brock, Will Pfenning) joined by other freshmen in Cole Niekamp and lefty Mitch Milheim. Joe Gahm is a junior transfer from Auburn via Chattahoochee Valley Community College and given his experience and BA ranking as a top 30 Big Ten draft prospect should find a role. Senior Brady Cherry will also apparently get a chance to pitch this season, something he has yet to do in his Buckeye career.

The Buckeye offense is more settled, and unless the pitchers exceed reasonable expectations will have to carry the team in 2019. Sophmore Dillon Dingler moves in from center field (that’s nothing, as recent OSU catcher Jalen Washington moved to shortstop) to handle the catching duties and was raved about by the coaches last season so big things are expected despite a .244/.325/.369 line. He’ll be backed up by sophomore transfer Brent Todys from Andrew College, with senior Andrew Fishel, junior Sam McClurg and freshman Mitchell Smith rounding out the roster.

First base will belong to junior Conner Pohl after he switched corners midway through 2018; he also played the keystone as a freshman so he’s been all over the infield. While his production was underwhelming for first base, at 3 RAA he was a contributor last season and looks like a player who should add power as he matures. Senior Kobie Foppe got off to a slow start last year, flipped from shortstop to second base, and became an ideal leadoff man (.335/.432/.385); even with some BABIP regression he should be solid. Third base will go to true freshman Zach Dezenzo, while junior shortstop Noah West needs to add something besides walks to his offensive game (.223/.353/.292). The main infield backups are freshman Nick Erwin at short, sophomore Scottie Seymour and freshmen Aaron Hughes and Marcus Ernst at the corners, and junior Matt Carpenter everywhere just like his MLB namesake (albeit without the offensive ability).

I’ll describe the outfield backwards from right to left, since junior right fielder Dominic Canzone is the team’s best offensive player (.323/.396/.447 which was a step back from his freshman campaign) and will be penciled in as the #3 hitter. The other two spots are not as settled as one would hope given the imperative of productive offense for this team. A pair of seniors will battle for center: Malik Jones did nothing at the plate as a JUCO transfer last year besides draw walks (245/.383/.286 in 63 PA) while Ridge Winand has barely seen the field. In left, senior Nate Romans has served as a utility man previously, although he did contribute in 93 PA last year (.236/.360/.431). Senior Brady Cherry completes his bounce around the diamond which has included starting at third and second; in 2018 he hit just .226/.321/.365, a step back from 2017. While he could get time in left, it’s more likely he’ll DH since the plan is to use him out of the bullpen as well. Other outfield backups are freshman Nolan Clegg in the corners and Alec Taylor in center.

OSU opens the season this weekend with an odd three-game series against Seton Hall in Pt. Charlotte, Florida. It is the start of a very lackluster non-conference schedule that doesn’t figure to help the Buckeyes’ cause come tournament time as the schedule did last year (although unfortunately as you can probably tell I tend to think the resume will be beyond help). There are no games against marquee names, although OSU will play MSU in a rare non-conference Big Ten matchup. The home schedule opens March 15 with a three-game series against Lipscomb, a one-off with Northern Kentucky, and a four-game series against Hawaii, whose players will probably wondering what they did to wind up in Columbus in mid-March when they could be home.

Big Ten play opens March 29 at Rutgers, with the successive weekends home to Northwestern and the forces of darkness, at Maryland, home to Iowa, at Minnesota, home to PSU, and at Purdue. Midweek opponents are the typical fare of local nines, including Toledo, Cincinnati, Ohio University (away), Dayton, Xavier, Miami (away), Wright State, and Youngstown State (away). The Big Ten tournament will be played May 22-26 in Omaha.

It’s hard to be particularly optimistic that another surprise trip to the NCAA tournament is in the cards. Even some of the best pitchers who have come through OSU have struggled as freshman so it’s hard to project the starting pitching to be good, and while there are productive returnees at multiple positions, only Canzone is a proven excellent hitter and a couple positions are occupied by players who must make serious improvement to be average. The non-conference schedule may be soft enough to keep the record respectable, but there are few opportunities to grab wins that will help come selection time. Aspiring to qualify for the Big Ten tournament seems a more realistic goal. Beals is the longest-tenured active coach at OSU in any of the four sports that I follow rabidly, which on multiple levels is concerning (although two of the three other program have coaches in place who have demonstrated their value at OSU, and the third did well in a three-game trial). Yet somehow Beals marches on, floating aimlessly in the middle of an improved Big Ten.

Note: This preview is always a combination of my own knowledge and observation along with the official season outlook released by the program, especially as pertains to position changes and newcomers about which I have next to no direct knowledge. That reliance was even greater this year due to the turnover on the mound.

## Monday, February 04, 2019

### Enby Distribution, pt. 9: Cigol at the Extremes--Pythagenpat Exponent

In the last installment, I explored using the Cigol dataset to estimate the Pythagorean exponent. Alternatively, we could sidestep the attempt to estimate the exponent and try to directly estimate the z parameter in the Pythagenpat equation x = RPG^z.

The positives of this approach include being able to avoid the scalar multipliers that move the estimator away from a result of 1 at 1 RPG, and also maintains a form that has been found useful by sabermetricians in the last decade or so. The latter is also the biggest drawback to this approach--it assumes that the form x = RPG^z is correct, and foregoes the opportunity of finding a form that provides a better fit, particularly with extreme datapoints. It’s also fair to question my objectivity in this matter, given that a plausible case could be made that I have a vested interest in “re-proving” the usefulness of Pythagenpat. That’s not my intent, but I would be remiss in not raising the possibility of my own (unintentional) bias influencing this discussion.

Given that we know the Pythagorean exponent x as calculated in the last post, it is quite simple to compute the corresponding z value:

z = log(x)/log(RPG)

For the full dataset I’ve used throughout these posts, a plot of z against RPG looks like this:

A quick glance suggests that it may be difficult to fit a clean function to this plot, as there is no clear relationship between RPG and z. It appears that in the 15-20 RPG range, there are a number of R/RA pairs for which a higher z is necessary than for the pairs at 20-30 RPG. While I have no particular reason to believe that the z value should necessarily increase as RPG increases, I have strong reason to doubt that the dataset I’ve put together allows us to conclude otherwise. Based on the way the pairs were chosen, extreme quality differences are overrepresented in this range. For example, there are pairs in which a team scores 14 runs per game and allows only 3. The more extreme high RPG levels are only reached when both teams are extremely high scoring; the most extreme difference captured in my dataset at 25 RPG is 15 R/10 RA.

The best fit to this graph comes from a quadratic regression equation, but the negative coefficient for RPG^2 (the equation is z = -.0002*RPG^2 + .0062*RPG + .2392) makes it unpalatable from a theoretical perspective. The apparent quadratic shape may well be an accident of the data points used as described in the preceding paragraph. Power and logarithmic functions fail to produce the upward slope from 5-10 RPG, as does a linear equation. The latter has a very low r^2 (just .022) but results in an aesthetically pleasing gently increasing exponent as RPG increases (equation of .2803 + .00025*RPG). The slope is so gentle as to result in no meaningful difference when applying the equation to actual major league teams, leaving it as useless as the r^2 suggests it would be (RMSE of 4.008 for 1961-2014, with same result if using the z value based on plugging in the average of RPG of 8.805 for that period).

It’s tempting to assume that z is higher in cases in which there is a large difference in runs scored and runs allowed. This could potentially be represented in an equation by run differential or run ratio, and such a construct would not be without sabermetric precedent, as other win estimators have been proposed that explicitly consider the discrepancy between the two teams (explicitly as in beyond the obvious truth that as you score more runs than you allow, you will win more games). (See the discussion of Tango’s old win estimator in part 7).

First, let’s take a quick peak at the z versus RPG plot we’d get for the limited dataset I’ve used throughout the series (W%s between .3 and .7 with R/G and RA/G between 3 and 7):

The relationship here is more in line with what we might have expected--z levels out as RPG increases, but there is no indication that z decreases with RPG (which assuming my reasoning above is correct, reflects the fact that the teams in this dataset are much more realistic and matched in quality than are the oddballs in the full dataset). Again, the best fit comes from a quadratic regression, but the negative coefficient for RPG^2 is disqualifying. A logarithmic equation fits fairly well (r^2 = .884), but again fails to capture the behavior at lower levels of RPG, not as damaging to the fit here because of the more limited data set. The logarithmic equation is z = .2484 + .0132*ln(RPG), but this produces a worse RMSE with the 1961-2014 teams (4.012) than simply using a fixed z.

Returning to the full dataset, what happens if we run a regression that includes abs(R - RA) as a variable alongside RPG? We get this equation for z:

z = .26846 + .00025*RPG + .00246*abs(R - RA)

This is interesting as it is the same slope for RPG as seen in the equation that did not include abs(RD), but the intercept is much lower, which means that for average (R = RA) teams, the estimated z will be lower. This equation implies that differences between a team and its opponents really drive the behavior of z in the data.

Applying this equation to the 1961-2014 data fails to improve RMSE, raising it to 4.018. So while this may be a nice idea and seem to fit the theoretical data better, it is not particularly useful in reality. I also tried a form with an RPG^2 coefficient as well (and for some reason liked it when initially sketching out this series), but the negative RPG^2 coefficient dooms the equation to theoretical failure (and with a 4.017 RMSE it does little better with empirical data):

z = .24689 - .00011*RPG^2 + .00378*RPG + .00183*abs(R - RA)

One last idea I tried was using (R - RA)^2 as a coefficient rather than abs(R - RA). Squaring run differential eliminates any issue with negative numbers, and perhaps it is extreme quality imbalances that really drive the behavior of z. Alas, a RMSE of 4.014 is only slightly better than the others:

z = .27348 + .00025*RPG + .00020*(R - RA)^2

If you are curious, using the 1961-2014 team data, the minimum RMSE for Pythagenpat is achieved when z = .2867 (4.0067). The z value that minimized RMSE for the full dataset is .2852. This may be noteworthy in its own right -- a dataset based on major league team seasons and one based on theoretical teams of wildly divergent quality and run environment coming to the same result may be an indication that extreme efforts to refine z may be a fool's errand.

You may be wondering why, after an entire series built upon my belief in the importance of equations that work well for theoretical data, I’ve switched in this installment to largely measuring accuracy based on empirical data. My reasoning is as follows: in order for a more complex Pythagenpat equation to be worthwhile, it has to have a material and non-harmful effect in situations in which Pythagenpat is typically used. If no such equation is available (which is admittedly a much higher hurdle to clear than me simply not being able to find a suitable equation in a week or so of messing around with regressions), then it is best to stick with the simple Pythagenpat form. If one a) is really concerned with accuracy in extreme circumstances and b) thinks that Cigol is a decent “gold standard” against which to attempt to develop a shortcut that works in those circumstances, then one should probably just use Cigol and be done with it. Without a meaningful “real world” difference, and as the functions needed become more and more complex, it makes less sense to use any sort of shortcut method rather than just using Cigol.

Thus I will for the moment leave the Pythagenpat z function as a humble constant, and hold Cigol in reserve if I’m ever really curious to make my best guess at what the winning percentage would be for a team that scores 1.07 runs and allows 12.54 runs per game (probably something around .0051).

The “full” dataset I’ve used in the last few posts is available here.

## Saturday, January 19, 2019

### Run Distribution and W%, 2018

I always start this post by looking at team records in blowout and non-blowout games. I define blowouts as games in which the margin of victory is six runs or more (rather than five, the definition used by Baseball-Reference). I settled on this last year after a Twitter discussion with Tom Tango and a poll that he ran. This definition results in 19.4% of major league games in 2018 being classified as blowouts; using five as the cutoff, it would be 28.0%, and using seven it would be 13.2%. Of course, using one standard ignores a number of factors, like the underlying run environment (the higher the run scoring level, the less impressive a fixed margin of victory) and park effects (which have a similar impact but in a more dramatic way when comparing teams in the same season). For the purposes here, around a fifth of games being blowouts feels right; it’s worth monitoring each season to see if the resulting percentage still makes sense.

Team records in non-blowouts:

With over 80% of major league games being non-blowouts (as we’ll see in a moment, the highest blowout % for any team was 26% for Cleveland), it’s no surprise that all of the playoff teams were above .500 in these games, although the Indians and Dodgers just barely so. The Dodgers compensated in a big way:

There was very little middle ground in blowout games, with just three teams having a W% between .400 - .500. This isn’t too surprising since strong teams usually perform very well in blowouts, and the bifurcated nature of team strength in 2018 has been much discussed. This also shows up when looking at each team’s percentage of blowouts and difference between blowout and non-blowout W%:

A more interesting way to consider game-level data is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:

The “marg” column shows the marginal W% for each additional run scored. In 2018, three was the mode of runs scored, while the second run resulted in the largest marginal increase in W%. The distribution is fairly similar to 2017, with the most obvious difference being an increase in W% in one-run games from .057 to .103; not surprisingly, the proportion of shutouts increased as well, from 5.4% to 6.4%.

The major league average dipped from 4.65 to 4.44 runs/game; this is the run distribution anticipated by Enby for that level (actually, 4.45) of R/G for fifteen or fewer runs:

Shutouts ran almost 1% above Enby’s estimated; that stands out in graph form along with Enby’s compensation by over-estimating the frequency of 2 and 3 run games. Still, a zero-modified negative binomial distribution (which is what the distribution I call Enby is) does a decent job:

One way that you can use Enby to examine team performance is to use the team’s actual runs scored/allowed distributions in conjunction with Enby to come up with an offensive or defensive winning percentage. The notion of an offensive winning percentage was first proposed by Bill James as an offensive rate stat that incorporated the win value of runs. An offensive winning percentage is just the estimated winning percentage for an entity based on their runs scored and assuming a league average number of runs allowed. While later sabermetricians have rejected restating individual offensive performance as if the player were his own team, the concept is still sound for evaluating team offense (or, flipping the perspective, team defense).

In 1986, James sketched out how one could use data regarding the percentage of the time that a team wins when scoring X runs to develop an offensive W% for a team using their run distribution rather than average runs scored as used in his standard OW%. I’ve been applying that concept since I’ve written this annual post, and last year was finally able to implement an Enby-based version. I will point you to last year’s post if you are interested in the details of how this is calculated, but there are two main advantages to using Enby rather than the empirical distribution:

1. While Enby may not perfectly match how runs are distributed in the majors, it sidesteps sample size issues and data oddities that are inherent when using empirical data. Use just one year of data and you will see things like teams that score ten runs winning less frequently than teams that score nine. Use multiple years to try to smooth it out and you will no longer be centered at the scoring level for the season you’re examining.

2. There’s no way to park adjust unless you use a theoretical distribution. These are now park-adjusted by using a different assumed distribution of runs allowed given a league-average RA/G for each team based on their park factor (when calculating OW%; for DW%, the adjustment is to the league-average R/G).

I call these measures Game OW% and Game DW% (gOW% and gDW%). One thing to note about the way I did this, with park factors applied on a team-by-team basis and rounding park-adjusted R/G or RA/G to the nearest .05 to use the table of Enby parameters that I’ve calculated, is that the league averages don’t balance to .500 as they should in theory. The average gOW% is .495 and the average gDW% is .505.

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate, with the teams in descending order of absolute value of the difference):

Positive: None

Negative: LA, WAS, CHN, NYN, CLE, LAA, HOU

It doesn’t help here that the league average is .495, but it’s also possible that team-level deviations from Enby are greater given the unusual distribution of offensive events (e.g. low BA, high K, high HR) that currently dominates in MLB. One of the areas I’d like to study given the time and a coherent approach to the problem is how Enby parameters may vary based on component offensive statistics. The Enby parameters are driven by the variance of runs per game and the frequency of shutouts; for both, it’s not too difficult to imagine changes in the shape of offense having a significant impact.

Teams with differences of +/- 2 wins (note: this calculation uses 162 games for all teams even though a handful played 161 or 163 in 2018) between gDW% and standard DW%:

Positive: MIA, PHI, PIT, NYN, CHA

Negative: HOU

Miami’s gDW% was .443 while their DW% was .406, a difference of 5.9 wins which was the highest in the majors for either side of the ball (their offense displayed no such difference, with .449/.444). That makes them a good example to demonstrate what having an unusual run distribution relative to Enby looks like and how that can change the expected wins:

This graph excludes two games in which the Marlins coughed up 18 and 20 runs, which themselves do much to explain the huge discrepancy--giving up twenty runs kills your RA/G but from a win perspective is scarcely different then giving up thirteen (given their run environment, Enby expected that Miami would win 1.1% of the time scoring thirteen and 0.0% allowing twenty).

Miami allowed two and three runs much less frequently than Enby expected; given that they should have won 79% of games when allowing two and 59% when allowing three that explains much of the difference. They allowed eight or more runs 23.6% of the time compared to just 12.5% estimated by Enby, but all those extra runs weren’t particularly costly in terms of wins since the Marlins were only expected to win 6.4% of such games (calculated by taking the weighted average of the expected W% when allowing 8, 9, … runs with the expected frequency of allowing 8, 9, … runs given that they allowed 8+ runs).

I don’t have a good clean process for combining gOW% and gDW% into an overall gEW%; instead I use Pythagenpat math to convert the gOW% and gDW% into equivalent runs and runs allowed and calculate an EW% from those. This can be compared to EW% figured using Pythagenpat with the average runs scored and allowed for a similar comparison of teams with positive and negative differences between the two approaches:

Positive: MIA, PHI, KC, PIT, CHA, MIN, SF, SD

Negative: LA, WAS, HOU, CHN, CLE, LAA, BOS, ATL

Despite their huge defensive difference, Miami was edged out for the largest absolute value of difference by the Dodgers (6.08 to -6.11). The Dodgers were -4.8 on offense and -1.7 on defense (astute readers will note these don’t sum to -6.11, but they shouldn’t given the nature of the math), while the Marlins 5.9 on defense was only buffeted by .9 on offense (as we’ve seen before, there was only a .005 discrepancy between their gOW% and OW%).

The table below has the various winning percentages for each team:

## Tuesday, December 18, 2018

### Crude Team Ratings, 2018

For the last several years I have published a set of team ratings that I call "Crude Team Ratings". The name was chosen to reflect the nature of the ratings--they have a number of limitations, of which I documented several when I introduced the methodology.

I explain how CTR is figured in the linked post, but in short:

1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.

2) Figure the average win ratio of the team’s opponents.

3) Adjust for strength of schedule, resulting in a new set of ratings.

4) Begin the process again. Repeat until the ratings stabilize.

The resulting rating, CTR, is an adjusted win/loss ratio rescaled so that the majors’ arithmetic average is 100. The ratings can be used to directly estimate W% against a given opponent (without home field advantage for either side); a team with a CTR of 120 should win 60% of games against a team with a CTR of 80 (120/(120 + 80)).

First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS:

The playoff teams all finished in the top twelve, with the third-place teams from the top-heavy AL East and West being denied spots in the dance despite having the fifth/sixth most impressive records in the majors (and damn the .3 CTR that separated us from #6org). The AL also had four of the bottom five teams; the bifurcated nature of the AL is something that was well observed and noted from the standings but also is evident when adjusting for strength of schedule. Note the hellish schedule faced by bad AL teams; Baltimore, with the worst CTR in MLB, had the toughest SOS at 118 - an average opponent at the level of the Cubs. Those Cubs had the easiest schedule, playing an average opponent roughly equivalent to the Pirates.

Next are the division averages. Originally I gave the arithmetic average CTR for each division, but that’s mathematically wrong--you can’t average ratios like that. Then I switched to geometric averages, but really what I should have done all along is just give the arithmetic average aW% for each division/league. aW% converts CTR back to an “equivalent” W-L record, such that the average across the major leagues will be .50000:

The AL once again was markedly superior to the NL; despite the sorry showing of the Central, the West was almost as good as it was bad, and the East was strong as well. Given the last fifteen years of AL dominance, you may have glossed over the last sentence, but if you are familiar with the results of 2018 interleague play, it may give you pause. The NL went 158-142 against the AL, so how does the average AL team rank ahead? It may be counter-intuitive, but one can easily argue that the NL should have performed better than it did. The NL’s best division got the benefit of matching up with the AL’s worst division (the Centrals). The AL Central went 38-62 (.380), but the East went 54-46 (.540) and the West 50-50 (.500).

Of course, the CTRs can also use theoretical win ratios as a basis, and so the next three tables will be presented without much comment. The first uses gEW%, which is a measure I calculate that looks at each team’s runs scored distribution and runs allowed distribution separately to calculate an expected winning percentage given average runs allowed or runs scored, and then uses Pythagorean logic to combine the two and produce a single estimated W% based on the empirical run distribution:

The next version utilizes EW%, which is to say standard Pythagenpat based on actual runs scored and allowed:

And one based on PW%, which is Pythagenpat but using runs created and runs created allowed in place of actual runs totals:

Everything that I’ve shared so far has been based on regular season data only. Of course, ten teams play additional games for keeps, and we could consider these in ratings as well. Some points to consider when it comes to incorporating postseason data:

1. It would be silly to pretend that these additional games don’t give us any insight on team quality. Of course every opportunity we have to observe a team in a competitive situation increases our sample size and informs our best estimate of that team’s quality.

2. Postseason games are played under a different set of conditions and constraints than regular season games, particularly when it comes to how pitchers are used. This is not a sufficient reason to ignore the data, in my opinion.

3. A bigger problem, and one that causes me to present ratings that include postseason performance only half-hardheartedly, is the bias introduced to the ratings by the playoff structure. The nature of postseason series serves to exaggerate the difference in team performance observed during the series. Take the Astros/Indians ALCS. The Astros dominated the Indians over three games, which certainly provides additional evidence vis-a-vis the relative strength of the two teams. Based on regular season performance, Houston looked like a superior club (175 to 113 CTR, implying that they should win 61% of their games against the Indians), and the sweep provided additional evidence. However, because the series terminated after three games, all won by the Astros, it overstated the difference. If the series was played out to completion (assuming you can imagine a non-farcial way in which this could be done), we would expect to see Cleveland pull out a win, and even adding 1-4 and 4-1 to these two team’s ratings would decrease the CTR gap between the two (although still correctly increasing it compared to considering only the regular season).

This is one of those concepts that seems very clear to me when I think about it, and yet is extremely difficult to communicate in a coherent manner, so let me simply assert that I think bias is present when the number of observations is dependent on the outcome of the previous observations (like in a playoff series) that is not present when the number of observations is independent of the outcome of previous observations (as is the case for a regular season in which all teams play 162 games regardless of whether they are mathematically eliminated in August).

Still, I think it’s worth looking at CTRs including the post-season; I only will present these for actual wins and losses, but of course if you were so inclined you could base them on estimated winning percentages as well:

Here is a comparison of CTR including postseason (pCTR) to the regular season-only version, and the difference between the two:

I’ve been figuring CTRs since 2011 and playoff-inclusive versions since 2013, so Boston’s rating in both stood out when I saw it. I thought it might be interesting to look at the leader in each category each season. The 2018 Red Sox are the highest-rated team of the past eight seasons by a large margin (of course, such ratings do nothing to estimate any kind of underlying differences in MLB-wide quality between seasons):

I didn’t realize that the 2017 Dodgers were the previous leaders; I would have guessed it was the 2016 Cubs, although they would be much farther down the list. It is also worth noting that this year’s Astros would have been ranked #1 of the period were it not for the Red Sox. Boston’s sixteen point improvement including the playoffs was easily the best for any team that had been ranked #1, and that does make sense intuitively: 3-1 over the #3 ranked (regular season) Yankees, 4-1 over #2 ranked Houston, and 4-1 over the #9 Dodgers is one impressive playoff showing.

## Monday, December 10, 2018

### Hitting by Position -- 2018

Of all the annual repeat posts I write, this is the one which most interests me--I have always been fascinated by patterns of offensive production by fielding position, particularly trends over baseball history and cases in which teams have unusual distributions of offense by position. I also contend that offensive positional adjustments, when carefully crafted and appropriately applied, remain a viable and somewhat more objective competitor to the defensive positional adjustments often in use, although this post does not really address those broad philosophical questions.

The first obvious thing to look at is the positional totals for 2018, with the data coming from Baseball-Reference.com. "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the overall major league average (this is a departure from past posts; I’ll discuss this a little at the end). “LPADJ” is the long-term positional adjustment that I use, based on 2002-2011 data. The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:

An annual review of this data is problematic because it can lead to seeing trends where there are actually just blips. Two years ago second basemen smashed their way to unprecedented heights; this year they were right back at their long-term average. In 2017, DHs were 4% worse than the league average -- in 2018 they were part of what one could call the left side convergence of the defensive spectrum, as DH, 1B, LF, RF, and 3B all basically hit the same. Shortstops were above league average, which is of note, while catchers and center fielders also ended up right at their normal levels (yes, I really should update my “long-term” positional adjustments; I promise to do that for next year).

Moving on to looking at more granular levels of performance, I always start by looking at the NL pitching staffs and their RAA. I need to stress that the runs created method I’m using here does not take into account sacrifices, which usually is not a big deal but can be significant for pitchers. Note that all team figures from this point forward in the post are park-adjusted. The RAA figures for each position are baselined against the overall major league average RG for the position, except for left field and right field which are pooled.

While positions relative to the league bounce around each year, it seems that the most predictable thing about this post is that the difference between the best and worst NL pitching staffs will be about twenty runs at the plate. Sixteen is a narrower spread than typical, but pitchers also fell to an all-time low -5 positional adjustment.

I don’t run a full chart of the leading positions since you will very easily be able to go down the list and identify the individual primarily responsible for the team’s performance and you won’t be shocked by any of them, but the teams with the highest RAA at each spot were:

C--LA, 1B--LA, 2B--HOU, 3B--CLE, SS--BAL, LF--MIL, CF--LAA, RF--BOS, DH--NYA

I don’t know about “shocked”, but I was surprised to see that Baltimore had the most productive shortstops. Not that I didn’t know that Manny Machado had a great “first half” of the season for the O’s, but I was surprised that whoever they threw out there for the rest of the year didn’t drag their overall numbers down further. In fact Tim Beckham and Jonathan Villar were perfectly cromulent (offensively at least, although Machado wasn’t lighting up any defensive metrics himself) and Baltimore finished two RAA ahead of the Red Sox (Bogaerts) and Indians (Lindor), and three runs ahead of the Rockies (Story).

More interesting are the worst performing positions; the player listed is the one who started the most games at that position for the team:

Boston’s catchers weren’t the worst relative to their position, but they were the worst hitting regular position in MLB on a rate basis; teams can certainly overcome a single dreadful position, but they usually don’t do it to the tune of 108 wins. The most pathetic position was definitely the Orioles’ first basemen, with a healthy lead for lowest RAA thanks to having the fifth-worst RG of any regular position (only Red Sox catchers, Tigers second basemen, Diamondback catchers, and their own catchers were worse).

I like to attempt to measure each team’s offensive profile by position relative to a typical profile. I’ve found it frustrating as a fan when my team’s offensive production has come disproportionately from “defensive” positions rather than offensive positions (“Why can’t we just find a corner outfielder who can hit?”) The best way I’ve yet been able to come up with to measure this is to look at the correlation between RG at each position and the long-term positional adjustment. A positive correlation indicates a “traditional” distribution of offense by position--more production from the positions on the right side of the defensive spectrum. (To calculate this, I use the long-term positional adjustments that pool 1B/DH as well as LF/RF, and because of the DH I split it out by league.) There is no value judgment here--runs are runs whether they are created by first basemen or shortstops:

We’ve already seen that Milwaukee’s shortstops were the least productive in the majors and their left fielders the most productive, which helps explain their high correlation. Baltimore’s low correlation likewise makes sense as they had the least productive first basemen and the most productive shortstops.

The following tables, broken out by division, display RAA for each position, with teams sorted by the sum of positional RAA. Positions with negative RAA are in red, and positions that are +/-20 RAA are bolded:

Boston’s monstrously productive outfield easily led the majors in RAA (as did their corner outfielders), but the catchers dragged them down just behind New York. Baltimore was below-average at every position except shortstop, so after they dealt Machado it was really ugly. They were the worst in MLB at both the infield and outfield corners.

Cleveland’s offense had issues all over the place, but a pair of MVP candidates on the infield can cover that up, at least in the AL Central where every other team was below average. Chicago had the major’s worst outfield RAA while Detroit was last in the AL for middle infielders.

For a second straight season, Houston’s infield and middle infield were tops in the AL despite injuries slowing down Altuve and Correa. Oakland’s Matts led the majors in corner infield RAA. Los Angeles had the majors least productive infield, which for all its badness still couldn’t cancel out the Trout-led centerfielders.

Washington led the NL in corner outfield RAA. New York was last in the NL in corner infield RAA. Last year Miami led the majors in outfield RAA; this year they trailed. This comes as no surprise of course, but is still worthy of a sad note before chuckling at the dark, monochromatic threads they will wear in 2019.

This division was nearly the opposite of the AL Central, as every team had a positive RAA with room to spare. Chicago lead the majors in middle infield RAA; Milwaukee was the worst in the same category, but covered it over with the NL’s most productive outfield. I will admit to being as confused by their trade deadline manueverings at the next guy, but when you see so starkly how little they were getting out of their middle infield, the shakeup makes more sense. Of course one of the chief regular season offenders Orlando Arcia raked in the playoffs.

Los Angeles’ corner infielders led the majors in RAA, and dragged their middle infielders (well, really just the second basemen) to the best total infield mark as well. The rest of the division was below average, and Colorado’s corner outfielders were last in the NL which should provide a juicy career opportunity for someone.

The full spreadsheet with data is available here.

## Monday, November 19, 2018

### Leadoff Hitters, 2018

I will try to make this as clear as possible: the statistics are based on the players that hit in the #1 slot in the batting order, whether they were actually leading off an inning or not. It includes the performance of all players who batted in that spot, including substitutes like pinch-hitters.

Listed in parentheses after a team are all players that started in twenty or more games in the leadoff slot--while you may see a listing like "HOU (Springer)" this does not mean that the statistic is only based solely on Springers's performance; it is the total of all Houston batters in the #1 spot, of which Springer was the only one to start in that spot in twenty or more games. I will list the top and bottom three teams in each category (plus the top/bottom team from each league if they don't make the ML top/bottom three); complete data is available in a spreadsheet linked at the end of the article. There are also no park factors applied anywhere in this article.

That's as clear as I can make it, and I hope it will suffice. I always feel obligated to point out that as a sabermetrician, I think that the importance of the batting order is often overstated, and that the best leadoff hitters would generally be the best cleanup hitters, the best #9 hitters, etc. However, since the leadoff spot gets a lot of attention, and teams pay particular attention to the spot, it is instructive to look at how each team fared there.

The conventional wisdom is that the primary job of the leadoff hitter is to get on base, and most simply, score runs. It should go without saying on this blog that runs scored are heavily dependent on the performance of one’s teammates, but when writing on the internet it’s usually best to assume nothing. So let's start by looking at runs scored per 25.5 outs (AB - H + CS):

1. BOS (Betts/Benintendi), 9.0

2. STL (Carpenter/Pham), 7.1

3. NYA (Gardner/Hicks/McCutchen), 6.8

Leadoff average, 5.4

ML average, 4.4

28. SF (Hernandez/McCutchen/Blanco/Panik), 4.1

29. SD (Jankowski, Margot), 4.1

30. BAL (Mancini/Mullens/Beckham), 4.0

In the years I’ve been writing this post, I’m not sure I’ve since the same player show up as a member of a leading team and a atrailing team, but there is Andrew McCutchen, part-time leadoff hitter for both the group that scored runs as the third-highest clip and at the third-lowest. Leading off just 28 times for the Giants and 21 times for the Yankees, he wasn’t the driving force behind either performance.

The most basic team independent category that we could look at is OBA (figured as (H + W + HB)/(AB + W + HB)):

1. BOS (Betts/Benintendi), .421

2. CHN (Almora/Rizzo/Murphy/Zobrist), .367

3. KC (Merrifield/Jay), .365

Leadoff average, .335

ML average, .320

28. BAL (Mancini/Mullens/Beckham), .297

29. DET (Martin/Candelario), .296

30. SF (Hernandez/McCutchen/Blanco/Panik), .294

I’m still lamenting the loss of “Esky Magic” as a punchline for every leaderboard in this post, even though it’s been two years since the Royals leadoff spot was making outs in bunches thanks to their magical shortstop. Luckily Whit Merrifield gives off scrappy player vibes that media narrative makers can get behind...well, could if anyone still cared about the Royals.,

The next statistic is what I call Runners On Base Average. The genesis for ROBA is the A factor of Base Runs. It measures the number of times a batter reaches base per PA--excluding homers, since a batter that hits a home run never actually runs the bases. It also subtracts caught stealing here because the BsR version I often use does as well, but BsR versions based on initial baserunners rather than final baserunners do not. Here ROBA = (H + W + HB - HR - CS)/(AB + W + HB).

This metric has caused some confusion, so I’ll expound. ROBA, like several other methods that follow, is not really a quality metric, it is a descriptive metric. A high ROBA is a good thing, but it's not necessarily better than a slightly lower ROBA plus a higher home run rate (which would produce a higher OBA and more runs). Listing ROBA is not in any way, shape or form a statement that hitting home runs is bad for a leadoff hitter. It is simply a recognition of the fact that a batter that hits a home run is not a baserunner. Base Runs is an excellent model of offense and ROBA is one of its components, and thus it holds some interest in describing how a team scored its runs. As such it is more a measure of shape than of quality:

1. BOS (Betts/Benintendi), .360

2. KC (Merrifield/Jay), .338

3. CHN (Almora/Rizzo/Murphy/Zobrist), .334

Leadoff average, .298

ML average, .285

28. BAL (Mancini/Mullens/Beckham), .264

29. SF (Hernandez/McCutchen/Blanco/Panik), .259

30. LAA (Calhoun/Kinsler/Cozart), .257

The Angels are the only change from the top/bottom three on the OBA list; they were fourth-last at .298 but their 26 homers eighth and drove their ROBA down to the bottom.

I also include what I've called Literal OBA--this is just ROBA with HR subtracted from the denominator so that a homer does not lower LOBA, it simply has no effect. It “literally” (not really, thanks to errors, out stretching, caught stealing after subsequent plate appearances, etc.) is the proportion of plate appearances in which the batter becomes a baserunner able to be advanced by his teammates. You don't really need ROBA and LOBA (or either, for that matter), but this might save some poor message board out there twenty posts, by not implying that I think home runs are bad. LOBA = (H + W + HB - HR - CS)/(AB + W + HB - HR):

1. BOS (Betts/Benintendi), .379

2. KC (Merrifield/Jay), .343

3. CHN (Almora/Rizzo/Murphy/Zobrist), .343

Leadoff average, .306

ML average, .294

28. BAL (Mancini/Mullens/Beckham), .271

29. LAA (Calhoun/Kinsler/Cozart), .267

30. SF (Hernandez/McCutchen/Blanco/Panik), .266

The next two categories are most definitely categories of shape, not value. The first is the ratio of runs scored to RBI. Leadoff hitters as a group score many more runs than they drive in, partly due to their skills and partly due to lineup dynamics. Those with low ratios don’t fit the traditional leadoff profile as closely as those with high ratios (at least in the way their seasons played out, and of course using R and RBI incorporates the quality and style of the hitters in the adjacent lineup spots rather then attributes of the leadoff hitters’ performance in isolation):

1. SEA (Gordon/Haniger), 2.0

2. MIL (Cain/Thames), 1.9

3. MIA (Dietrich/Castro/Ortega), 1.9

Leadoff average, 1.6

28. WAS (Eaton/Turner), 1.3

29. ATL (Acuna/Inciarte/Albies), 1.3

30. TOR (Granderson/McKinney), 1.2

ML average, 1.0

I don’t know about you, but if you’d told me that leadoff spots led by Cain/Thames and Eaton/Turner would both appear as extreme on this list, I would have guessed that the former would be the one tilted to RBI.

A similar gauge, but one that doesn't rely on the teammate-dependent R and RBI totals, is Bill James' Run Element Ratio. RER was described by James as the ratio between those things that were especially helpful at the beginning of an inning (walks and stolen bases) to those that were especially helpful at the end of an inning (extra bases). It is a ratio of "setup" events to "cleanup" events. Singles aren't included because they often function in both roles.

Of course, there are RBI walks and doubles are a great way to start an inning, but RER classifies events based on when they have the highest relative value, at least from a simple analysis:

1. PHI (Hernandez), 1.4

2. KC (Merrifield/Jay), 1.3

3. TB (Smith/Kiermaier/Span), 1.1

Leadoff average, .8

ML average, .7

28. LA (Taylor/Pederson), .6

29. PIT (Frazier/Harrison/Dickerson), .5

30. TOR (Granderson/McKinney), .5

I should note that in the context-neutral RER, the two teams with seemingly backwards placement on the R/RBI list are closer to where you’d expect--the Nats were sixth at 1.0 while the Brewers were still forwardly placed but much closer to average (ranking tenth with .9).

Since stealing bases is part of the traditional skill set for a leadoff hitter, I've included the ranking for what some analysts call net steals, SB - 2*CS. I'm not going to worry about the precise breakeven rate, which is probably closer to 75% than 67%, but is also variable based on situation. The ML and leadoff averages in this case are per team lineup slot:

1. WAS (Eaton/Turner), 26

2. KC (Merrifield/Jay), 25

3. TB (Smith/Kiermaier/Span), 17

Leadoff average, 5

ML average, 2

27. BAL (Mancini/Mullens/Beckham), -6

27. CHN (Almora/Rizzo/Murphy/Zobrist), -6

27. LA (Taylor/Pederson), -6

30. CIN (Peraza/Schebler/Winker/Hamilton), -9

Shifting back to quality measures, first up is one that David Smyth proposed when I first wrote this annual leadoff review. Since the optimal weight for OBA in a x*OBA + SLG metric is generally something like 1.7, David suggested figuring 2*OBA + SLG for leadoff hitters, as a way to give a little extra boost to OBA while not distorting things too much, or even suffering an accuracy decline from standard OPS. Since this is a unitless measure anyway, I multiply it by .7 to approximate the standard OPS scale and call it 2OPS:

1. BOS (Betts/Benintendi), 1017

2. CLE (Lindor), 847

3. STL (Carpenter/Pham), 842

Leadoff average, 762

ML average, 735

28. SF (Hernandez/McCutchen/Blanco/Panik), 669

29. DET (Martin/Candelario), 652

30. BAL (Mancini/Mullens/Beckham), 649

Along the same lines, one can also evaluate leadoff hitters in the same way I'd go about evaluating any hitter, and just use Runs Created per Game with standard weights (this will include SB and CS, which are ignored by 2OPS):

1. BOS (Betts/Benintendi), 8.7

2. CLE (Lindor), 5.9

3. STL (Carpenter/Pham), 5.7

Leadoff average, 4.7

ML average, 4.3

28. SF (Hernandez/McCutchen/Blanco/Panik), 3.5

29. DET (Martin/Candelario), 3.3

30. BAL (Mancini/Mullens/Beckham), 3.1

Seeing the same six extreme teams in perfect order overstates the correlation, but naturally there is a very strong relationship between the last two metrics. The biggest difference in any team’s ranks in the two was four spots.

Allow me to close with a crude theoretical measure of linear weights supposing that the player always led off an inning (that is, batted in the bases empty, no outs state). There are weights out there (see The Book) for the leadoff slot in its average situation, but this variation is much easier to calculate (although also based on a silly and impossible premise).

The weights I used were based on the 2010 run expectancy table from Baseball Prospectus. Ideally I would have used multiple seasons but this is a seat-of-the-pants metric. The 2010 post goes into the detail of how this measure is figured; this year, I’ll just tell you that the out coefficient was -.234, the CS coefficient was -.601, and for other details refer you to that post. I then restate it per the number of PA for an average leadoff spot (748 in 2017):

1. BOS (Betts/Benintendi), 57

2. STL (Carpenter/Pham), 20

3. CLE (Lindor), 17

Leadoff average, 0

ML average, -7

28. SF (Hernandez/McCutchen/Blanco/Panik), -22

29. DET (Martin/Candelario), -24

30. BAL (Mancini/Mullens/Beckham), -27

Boston completely dominated the quality metrics for leadoff hitters in 2018, due mostly of course to the superlative season of Mookie Betts, who was the second-best offensive player in the game in 2018. Put one of the top hitters in the leadoff spot and you can expect to lead in a lot of categories - BOS led easily not just in categories that reflect quality without any shape distortion (not necessarily context-free) like R/G, OBA, 2OPS, RG, and LE, but also in ROBA and LOBA, which are designed to not measure value but rather the rate of leadoff hitters reaching base for their teammates to drive in. What’s more, it’s not as if Boston achieved this by having a really good OBA out of the leadoff spot without a lot of power - the Red Sox and Cardinals tied for the ML league with 38 homers out of the leadoff spot. The Indians ranked third with 37, and those teams ranked 1-2-3 in all of the overall quality measures. One can argue about optimal lineup construction, but in 2018, leadoff hitters hit dingers like everyone else. Every team was in double digits in homers out of the leadoff spot; in 2017 there were only two, but one of those teams hit just three homers.

The spreadsheet with full data is available here.