Saturday, February 09, 2019

Pitching Optional?

What happens when you take a team that got into the NCAA tournament despite finishing in the middle of the pack in its conference and relying on a makeshift pitching staff and remove the few reliable pitchers while leaving much of the offense intact? Does this sound interesting to you, like an experiment cooked up in the lab of a mad sabermetrician (or more likely a resident of that state up north)? If so, you may be interested in the 2019 Buckeyes.

In the ninth season of the seemingly never-ending Greg Beals regime, he once again has an entire unit with next to no returning experience. Sometimes this is unavoidable in college sports, but it happens to Beals with regularity as player development does not appear to be a strong suit of the program. Players typically either make an impact as true freshmen or are never heard from, while JUCO transfers are a roster staple to paper over the holes. The only difference with this year’s pitching situation is that holes are largely being plugged with freshmen rather than transfers.

The three pitchers penciled in as the rotation have precious little experience, with two true freshmen and a junior with 24 appearances and 11 starts in his career. Lefty Seth Lonsway was a nineteenth-round pick of Cincinnati and will be joined by classmate Garrett Burhenn, with Jake Vance as the junior veteran. Vance was +3 RAA in 36 innings last year, which doesn’t sound like much until you consider the dearth of returning performers on the rest of the staff.

Midweek starts and long relief could fall to sophomore lefty Griffan Smith, who was not effective as a freshman (-7 RAA in 32 innings). The other veteran relievers are junior Andrew Magno (sidelined much of last season with an injury, but Beals loves his lefty specialists so if healthy he will see the mound) and senior sidewarmer Thomas Waning, who was promising as a sophomore but coughed up 18 runs in 16 frames in 2018. A trio of freshmen righties are said to throw 90+ MPH (Bayden Root, TJ Brock, Will Pfenning) joined by other freshmen in Cole Niekamp and lefty Mitch Milheim. Joe Gahm is a junior transfer from Auburn via Chattahoochee Valley Community College and given his experience and BA ranking as a top 30 Big Ten draft prospect should find a role. Senior Brady Cherry will also apparently get a chance to pitch this season, something he has yet to do in his Buckeye career.

The Buckeye offense is more settled, and unless the pitchers exceed reasonable expectations will have to carry the team in 2019. Sophmore Dillon Dingler moves in from center field (that’s nothing, as recent OSU catcher Jalen Washington moved to shortstop) to handle the catching duties and was raved about by the coaches last season so big things are expected despite a .244/.325/.369 line. He’ll be backed up by sophomore transfer Brent Todys from Andrew College, with senior Andrew Fishel, junior Sam McClurg and freshman Mitchell Smith rounding out the roster.

First base will belong to junior Conner Pohl after he switched corners midway through 2018; he also played the keystone as a freshman so he’s been all over the infield. While his production was underwhelming for first base, at 3 RAA he was a contributor last season and looks like a player who should add power as he matures. Senior Kobie Foppe got off to a slow start last year, flipped from shortstop to second base, and became an ideal leadoff man (.335/.432/.385); even with some BABIP regression he should be solid. Third base will go to true freshman Zach Dezenzo, while junior shortstop Noah West needs to add something besides walks to his offensive game (.223/.353/.292). The main infield backups are freshman Nick Erwin at short, sophomore Scottie Seymour and freshmen Aaron Hughes and Marcus Ernst at the corners, and junior Matt Carpenter everywhere just like his MLB namesake (albeit without the offensive ability).

I’ll describe the outfield backwards from right to left, since junior right fielder Dominic Canzone is the team’s best offensive player (.323/.396/.447 which was a step back from his freshman campaign) and will be penciled in as the #3 hitter. The other two spots are not as settled as one would hope given the imperative of productive offense for this team. A pair of seniors will battle for center: Malik Jones did nothing at the plate as a JUCO transfer last year besides draw walks (245/.383/.286 in 63 PA) while Ridge Winand has barely seen the field. In left, senior Nate Romans has served as a utility man previously, although he did contribute in 93 PA last year (.236/.360/.431). Senior Brady Cherry completes his bounce around the diamond which has included starting at third and second; in 2018 he hit just .226/.321/.365, a step back from 2017. While he could get time in left, it’s more likely he’ll DH since the plan is to use him out of the bullpen as well. Other outfield backups are freshman Nolan Clegg in the corners and Alec Taylor in center.

OSU opens the season this weekend with an odd three-game series against Seton Hall in Pt. Charlotte, Florida. It is the start of a very lackluster non-conference schedule that doesn’t figure to help the Buckeyes’ cause come tournament time as the schedule did last year (although unfortunately as you can probably tell I tend to think the resume will be beyond help). There are no games against marquee names, although OSU will play MSU in a rare non-conference Big Ten matchup. The home schedule opens March 15 with a three-game series against Lipscomb, a one-off with Northern Kentucky, and a four-game series against Hawaii, whose players will probably wondering what they did to wind up in Columbus in mid-March when they could be home.

Big Ten play opens March 29 at Rutgers, with the successive weekends home to Northwestern and the forces of darkness, at Maryland, home to Iowa, at Minnesota, home to PSU, and at Purdue. Midweek opponents are the typical fare of local nines, including Toledo, Cincinnati, Ohio University (away), Dayton, Xavier, Miami (away), Wright State, and Youngstown State (away). The Big Ten tournament will be played May 22-26 in Omaha.

It’s hard to be particularly optimistic that another surprise trip to the NCAA tournament is in the cards. Even some of the best pitchers who have come through OSU have struggled as freshman so it’s hard to project the starting pitching to be good, and while there are productive returnees at multiple positions, only Canzone is a proven excellent hitter and a couple positions are occupied by players who must make serious improvement to be average. The non-conference schedule may be soft enough to keep the record respectable, but there are few opportunities to grab wins that will help come selection time. Aspiring to qualify for the Big Ten tournament seems a more realistic goal. Beals is the longest-tenured active coach at OSU in any of the four sports that I follow rabidly, which on multiple levels is concerning (although two of the three other program have coaches in place who have demonstrated their value at OSU, and the third did well in a three-game trial). Yet somehow Beals marches on, floating aimlessly in the middle of an improved Big Ten.

Note: This preview is always a combination of my own knowledge and observation along with the official season outlook released by the program, especially as pertains to position changes and newcomers about which I have next to no direct knowledge. That reliance was even greater this year due to the turnover on the mound.

Monday, February 04, 2019

Enby Distribution, pt. 9: Cigol at the Extremes--Pythagenpat Exponent

In the last installment, I explored using the Cigol dataset to estimate the Pythagorean exponent. Alternatively, we could sidestep the attempt to estimate the exponent and try to directly estimate the z parameter in the Pythagenpat equation x = RPG^z.

The positives of this approach include being able to avoid the scalar multipliers that move the estimator away from a result of 1 at 1 RPG, and also maintains a form that has been found useful by sabermetricians in the last decade or so. The latter is also the biggest drawback to this approach--it assumes that the form x = RPG^z is correct, and foregoes the opportunity of finding a form that provides a better fit, particularly with extreme datapoints. It’s also fair to question my objectivity in this matter, given that a plausible case could be made that I have a vested interest in “re-proving” the usefulness of Pythagenpat. That’s not my intent, but I would be remiss in not raising the possibility of my own (unintentional) bias influencing this discussion.

Given that we know the Pythagorean exponent x as calculated in the last post, it is quite simple to compute the corresponding z value:

z = log(x)/log(RPG)

For the full dataset I’ve used throughout these posts, a plot of z against RPG looks like this:



A quick glance suggests that it may be difficult to fit a clean function to this plot, as there is no clear relationship between RPG and z. It appears that in the 15-20 RPG range, there are a number of R/RA pairs for which a higher z is necessary than for the pairs at 20-30 RPG. While I have no particular reason to believe that the z value should necessarily increase as RPG increases, I have strong reason to doubt that the dataset I’ve put together allows us to conclude otherwise. Based on the way the pairs were chosen, extreme quality differences are overrepresented in this range. For example, there are pairs in which a team scores 14 runs per game and allows only 3. The more extreme high RPG levels are only reached when both teams are extremely high scoring; the most extreme difference captured in my dataset at 25 RPG is 15 R/10 RA.

The best fit to this graph comes from a quadratic regression equation, but the negative coefficient for RPG^2 (the equation is z = -.0002*RPG^2 + .0062*RPG + .2392) makes it unpalatable from a theoretical perspective. The apparent quadratic shape may well be an accident of the data points used as described in the preceding paragraph. Power and logarithmic functions fail to produce the upward slope from 5-10 RPG, as does a linear equation. The latter has a very low r^2 (just .022) but results in an aesthetically pleasing gently increasing exponent as RPG increases (equation of .2803 + .00025*RPG). The slope is so gentle as to result in no meaningful difference when applying the equation to actual major league teams, leaving it as useless as the r^2 suggests it would be (RMSE of 4.008 for 1961-2014, with same result if using the z value based on plugging in the average of RPG of 8.805 for that period).

It’s tempting to assume that z is higher in cases in which there is a large difference in runs scored and runs allowed. This could potentially be represented in an equation by run differential or run ratio, and such a construct would not be without sabermetric precedent, as other win estimators have been proposed that explicitly consider the discrepancy between the two teams (explicitly as in beyond the obvious truth that as you score more runs than you allow, you will win more games). (See the discussion of Tango’s old win estimator in part 7).

First, let’s take a quick peak at the z versus RPG plot we’d get for the limited dataset I’ve used throughout the series (W%s between .3 and .7 with R/G and RA/G between 3 and 7):



The relationship here is more in line with what we might have expected--z levels out as RPG increases, but there is no indication that z decreases with RPG (which assuming my reasoning above is correct, reflects the fact that the teams in this dataset are much more realistic and matched in quality than are the oddballs in the full dataset). Again, the best fit comes from a quadratic regression, but the negative coefficient for RPG^2 is disqualifying. A logarithmic equation fits fairly well (r^2 = .884), but again fails to capture the behavior at lower levels of RPG, not as damaging to the fit here because of the more limited data set. The logarithmic equation is z = .2484 + .0132*ln(RPG), but this produces a worse RMSE with the 1961-2014 teams (4.012) than simply using a fixed z.

Returning to the full dataset, what happens if we run a regression that includes abs(R - RA) as a variable alongside RPG? We get this equation for z:

z = .26846 + .00025*RPG + .00246*abs(R - RA)

This is interesting as it is the same slope for RPG as seen in the equation that did not include abs(RD), but the intercept is much lower, which means that for average (R = RA) teams, the estimated z will be lower. This equation implies that differences between a team and its opponents really drive the behavior of z in the data.

Applying this equation to the 1961-2014 data fails to improve RMSE, raising it to 4.018. So while this may be a nice idea and seem to fit the theoretical data better, it is not particularly useful in reality. I also tried a form with an RPG^2 coefficient as well (and for some reason liked it when initially sketching out this series), but the negative RPG^2 coefficient dooms the equation to theoretical failure (and with a 4.017 RMSE it does little better with empirical data):

z = .24689 - .00011*RPG^2 + .00378*RPG + .00183*abs(R - RA)

One last idea I tried was using (R - RA)^2 as a coefficient rather than abs(R - RA). Squaring run differential eliminates any issue with negative numbers, and perhaps it is extreme quality imbalances that really drive the behavior of z. Alas, a RMSE of 4.014 is only slightly better than the others:

z = .27348 + .00025*RPG + .00020*(R - RA)^2

If you are curious, using the 1961-2014 team data, the minimum RMSE for Pythagenpat is achieved when z = .2867 (4.0067). The z value that minimized RMSE for the full dataset is .2852. This may be noteworthy in its own right -- a dataset based on major league team seasons and one based on theoretical teams of wildly divergent quality and run environment coming to the same result may be an indication that extreme efforts to refine z may be a fool's errand.

You may be wondering why, after an entire series built upon my belief in the importance of equations that work well for theoretical data, I’ve switched in this installment to largely measuring accuracy based on empirical data. My reasoning is as follows: in order for a more complex Pythagenpat equation to be worthwhile, it has to have a material and non-harmful effect in situations in which Pythagenpat is typically used. If no such equation is available (which is admittedly a much higher hurdle to clear than me simply not being able to find a suitable equation in a week or so of messing around with regressions), then it is best to stick with the simple Pythagenpat form. If one a) is really concerned with accuracy in extreme circumstances and b) thinks that Cigol is a decent “gold standard” against which to attempt to develop a shortcut that works in those circumstances, then one should probably just use Cigol and be done with it. Without a meaningful “real world” difference, and as the functions needed become more and more complex, it makes less sense to use any sort of shortcut method rather than just using Cigol.

Thus I will for the moment leave the Pythagenpat z function as a humble constant, and hold Cigol in reserve if I’m ever really curious to make my best guess at what the winning percentage would be for a team that scores 1.07 runs and allows 12.54 runs per game (probably something around .0051).

The “full” dataset I’ve used in the last few posts is available here.

Saturday, January 19, 2019

Run Distribution and W%, 2018

I always start this post by looking at team records in blowout and non-blowout games. I define blowouts as games in which the margin of victory is six runs or more (rather than five, the definition used by Baseball-Reference). I settled on this last year after a Twitter discussion with Tom Tango and a poll that he ran. This definition results in 19.4% of major league games in 2018 being classified as blowouts; using five as the cutoff, it would be 28.0%, and using seven it would be 13.2%. Of course, using one standard ignores a number of factors, like the underlying run environment (the higher the run scoring level, the less impressive a fixed margin of victory) and park effects (which have a similar impact but in a more dramatic way when comparing teams in the same season). For the purposes here, around a fifth of games being blowouts feels right; it’s worth monitoring each season to see if the resulting percentage still makes sense.

Team records in non-blowouts:



With over 80% of major league games being non-blowouts (as we’ll see in a moment, the highest blowout % for any team was 26% for Cleveland), it’s no surprise that all of the playoff teams were above .500 in these games, although the Indians and Dodgers just barely so. The Dodgers compensated in a big way:



There was very little middle ground in blowout games, with just three teams having a W% between .400 - .500. This isn’t too surprising since strong teams usually perform very well in blowouts, and the bifurcated nature of team strength in 2018 has been much discussed. This also shows up when looking at each team’s percentage of blowouts and difference between blowout and non-blowout W%:



A more interesting way to consider game-level data is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:



The “marg” column shows the marginal W% for each additional run scored. In 2018, three was the mode of runs scored, while the second run resulted in the largest marginal increase in W%. The distribution is fairly similar to 2017, with the most obvious difference being an increase in W% in one-run games from .057 to .103; not surprisingly, the proportion of shutouts increased as well, from 5.4% to 6.4%.

The major league average dipped from 4.65 to 4.44 runs/game; this is the run distribution anticipated by Enby for that level (actually, 4.45) of R/G for fifteen or fewer runs:



Shutouts ran almost 1% above Enby’s estimated; that stands out in graph form along with Enby’s compensation by over-estimating the frequency of 2 and 3 run games. Still, a zero-modified negative binomial distribution (which is what the distribution I call Enby is) does a decent job:



One way that you can use Enby to examine team performance is to use the team’s actual runs scored/allowed distributions in conjunction with Enby to come up with an offensive or defensive winning percentage. The notion of an offensive winning percentage was first proposed by Bill James as an offensive rate stat that incorporated the win value of runs. An offensive winning percentage is just the estimated winning percentage for an entity based on their runs scored and assuming a league average number of runs allowed. While later sabermetricians have rejected restating individual offensive performance as if the player were his own team, the concept is still sound for evaluating team offense (or, flipping the perspective, team defense).

In 1986, James sketched out how one could use data regarding the percentage of the time that a team wins when scoring X runs to develop an offensive W% for a team using their run distribution rather than average runs scored as used in his standard OW%. I’ve been applying that concept since I’ve written this annual post, and last year was finally able to implement an Enby-based version. I will point you to last year’s post if you are interested in the details of how this is calculated, but there are two main advantages to using Enby rather than the empirical distribution:

1. While Enby may not perfectly match how runs are distributed in the majors, it sidesteps sample size issues and data oddities that are inherent when using empirical data. Use just one year of data and you will see things like teams that score ten runs winning less frequently than teams that score nine. Use multiple years to try to smooth it out and you will no longer be centered at the scoring level for the season you’re examining.

2. There’s no way to park adjust unless you use a theoretical distribution. These are now park-adjusted by using a different assumed distribution of runs allowed given a league-average RA/G for each team based on their park factor (when calculating OW%; for DW%, the adjustment is to the league-average R/G).

I call these measures Game OW% and Game DW% (gOW% and gDW%). One thing to note about the way I did this, with park factors applied on a team-by-team basis and rounding park-adjusted R/G or RA/G to the nearest .05 to use the table of Enby parameters that I’ve calculated, is that the league averages don’t balance to .500 as they should in theory. The average gOW% is .495 and the average gDW% is .505.

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate, with the teams in descending order of absolute value of the difference):

Positive: None
Negative: LA, WAS, CHN, NYN, CLE, LAA, HOU

It doesn’t help here that the league average is .495, but it’s also possible that team-level deviations from Enby are greater given the unusual distribution of offensive events (e.g. low BA, high K, high HR) that currently dominates in MLB. One of the areas I’d like to study given the time and a coherent approach to the problem is how Enby parameters may vary based on component offensive statistics. The Enby parameters are driven by the variance of runs per game and the frequency of shutouts; for both, it’s not too difficult to imagine changes in the shape of offense having a significant impact.

Teams with differences of +/- 2 wins (note: this calculation uses 162 games for all teams even though a handful played 161 or 163 in 2018) between gDW% and standard DW%:

Positive: MIA, PHI, PIT, NYN, CHA
Negative: HOU

Miami’s gDW% was .443 while their DW% was .406, a difference of 5.9 wins which was the highest in the majors for either side of the ball (their offense displayed no such difference, with .449/.444). That makes them a good example to demonstrate what having an unusual run distribution relative to Enby looks like and how that can change the expected wins:



This graph excludes two games in which the Marlins coughed up 18 and 20 runs, which themselves do much to explain the huge discrepancy--giving up twenty runs kills your RA/G but from a win perspective is scarcely different then giving up thirteen (given their run environment, Enby expected that Miami would win 1.1% of the time scoring thirteen and 0.0% allowing twenty).

Miami allowed two and three runs much less frequently than Enby expected; given that they should have won 79% of games when allowing two and 59% when allowing three that explains much of the difference. They allowed eight or more runs 23.6% of the time compared to just 12.5% estimated by Enby, but all those extra runs weren’t particularly costly in terms of wins since the Marlins were only expected to win 6.4% of such games (calculated by taking the weighted average of the expected W% when allowing 8, 9, … runs with the expected frequency of allowing 8, 9, … runs given that they allowed 8+ runs).

I don’t have a good clean process for combining gOW% and gDW% into an overall gEW%; instead I use Pythagenpat math to convert the gOW% and gDW% into equivalent runs and runs allowed and calculate an EW% from those. This can be compared to EW% figured using Pythagenpat with the average runs scored and allowed for a similar comparison of teams with positive and negative differences between the two approaches:
Positive: MIA, PHI, KC, PIT, CHA, MIN, SF, SD
Negative: LA, WAS, HOU, CHN, CLE, LAA, BOS, ATL

Despite their huge defensive difference, Miami was edged out for the largest absolute value of difference by the Dodgers (6.08 to -6.11). The Dodgers were -4.8 on offense and -1.7 on defense (astute readers will note these don’t sum to -6.11, but they shouldn’t given the nature of the math), while the Marlins 5.9 on defense was only buffeted by .9 on offense (as we’ve seen before, there was only a .005 discrepancy between their gOW% and OW%).

The table below has the various winning percentages for each team: