Monday, February 15, 2016

Running With the Devil (You Know)

Greg Beals enters his sixth season at the helm of an OSU baseball program going that, by multiple common sense measures, is suffering through a rough period that corresponds closely to his tenure as coach. OSU’s six-year NCAA Tournament drought is the program’s longest since 1983-1990. OSU’s five-year record of 159-125 (.560) is the program’s worst since 1986-1990 (.500), except for the overlapping period of 2010-2014 (.543). In addition to the bottom line of wins and losses, Beals teams have displayed a propensity for late-season collapses and a recurring theme of horrific baserunning that I used to try to document but now have lost the strength (and frequent trips to Bill Davis Stadium to see the Bucks play in person). Suffice it to say some must be seen to be believed, and the hashtag #BealsBall will give you a taste of what is being taught in Columbus.

Despite his mediocre record, Beals received a two-year contract extension through 2017 after the season as reported by B1G Baseball. Ordinarily, I would hope that this was a “program continuity” contract, granted with the knowledge that having a lame duck collegiate coach can be a major obstacle in recruiting. However, given that Beals already skippered 2015 without a contract, two years in writing has more of the feel of an actual extension.

Regardless of my own reservations about the man in charge, the players on the field deserve the bulk of the attention. But while OSU returns a large senior class, on paper this team does not strike me as an obvious Big Ten championship contender or NCAA Tournament team.

One spot where virtually no experience returns is catcher. Co-starters Aaron Gretz and Conor Sabanosh both graduated, leaving junior captain Jalen Washington as the backstop. Washington’s limited playing time over the past two seasons has been mostly as a pinch-runner or defensive replacement at second base. Thirty-three plate appearances are insufficient to make any conclusions regarding his offense, but they have been middling BA, no secondary average PA. Washington will be backed up by freshman Jacob Barnwell and sophomore Jordan McDonough. The latter is more likely to get in the lineup as a DH than as a catcher, however. Freshman Andrew Fishel rounds out the corps.

First base will be up for grabs between a pair of seniors, Zach Ratcliff and Ryan Leffel. Ratcliff has always teased with power potential that the lineup could desperately use, but has never been able to find his way into the lineup on a consistent basis (career .263/.304/.403 line in 199 PA). Leffel can play third base and has always found his way into the lineup. His junior season was rough (.211/.333/.267 in 107 PA), but his sophomore season was of Beals’ dreams (.303/.369/.343 in 110 PA). I will personally be surprised if Leffel does not get the lion’s share of time at first.

The rest of the infield positions will be in the hands of multi-year senior starters. Second base will belong to Nick Sergakis, who in his two-year OSU career has hit .281/.358/.364 in 357 PA with questionable fielding. At third, Troy Kuhn is a .275/.362/.469 career hitter in 543 PA. Shortstop Craig Nennig is a slick fielder, but for some reason rarely gets pinch-hit for despite a .235/.314/.280 line in 401 PA. In the case of injuries to any of the three, expect Sergakis to slide into the open spot and senior L Grant Davis to play at second. Davis was surprisingly good offensively last year (.282/.347/.353 in 95 PA) and should make a fine utility infielder. Other infield reserves include true freshmen Brady Cherry at third, Casey Demko in the middle, and Matt Carpenter.

The team’s only legitimate power threat is junior left fielder (and erstwhile center fielder) Ronnie Dawson (.307/.379/.460 with 11 homers in 469 career PA). Dawson is prone to some baserunning blunders and outfield adventures, but still is by far the most exciting player on the roster. Junior center fielder Troy Montgomery is the leadoff hitter and had a fantastic sophomore season (.317/.431/.493 with 35 steals in 41 attempts). Right field must replace Pat Porter, drafted by the White Sox, and a platoon of junior Jacob Bosiokovic and senior Daulton Mosbarger is expected (for positions in which newcomers will play, I relied on the season preview released by the athletic department for guidance on what to expect). Bosiokovic had a promising freshmen season in 2013 as a third baseman, held his ground as a sophomore, but got a medical redshirt last year after appearing in just five games. He will also be an option at first base and with a career line of .264/.341/.360 in 421 PA, a return to health would be a boost to the Buckeye offense. Mosbarger is a senior transfer from Akron, where he was a career .247/.366/.359 hitter in 465 PA. He may be an interesting case study in whether great patience can survive Beals’ preference for a contact-oriented approach.

The key reserve outfielders are speedy sophomore Tre’ Gantt (a promising .311/.373/351 in 82 PA in 2015) and senior Jake Brobst (who has served mostly as a pinch-runner in his OSU career). Redshirt freshman Ridge Winand and true freshman Jacob Vander Wal (yes, he is the son of John) are also on the roster. DH starts could go to McDonough, Leffel/Ratcliff, Bosiokovic/Mosbarger, Gantt, or even Davis, but I think Beals will have a hard time keeping Gantt out of the lineup, even if he may not be a prototypical DH.

OSU should be able to muster a decent offense, led by Dawson and Montgomery with solid contributors in Kuhn, Sergakis, and the right field/DH units. But catcher, first base, and shortstop will likely hold the offense back from scoring runs at a championship level. And a strong lineup may be needed because questions abound on the pitching staff.

OSU lost its two best pitchers, Ryan Riga to graduation and Travis Lakins to the Red Sox, leaving junior lefty Tanner Tully, a soft-tosser who rode a 5.1/.7 K/W to a great freshman season (3.20 eRA) but was hit hard last year (5.3/1.8 with a 5.75 eRA and .345 BABIP) as the #1 starter. While it is safe to assume that Tully’s true level is somewhere in the middle, it does not appear to be the profile of a #1 starter for a conference champion. Sophomore righty Adam Niemeyer is penciled in at #2. He worked 33 innings over 12 appearances (4 starts) as a redshirt freshman, with 5.7/1.9 K/W and a 3.79 eRA. The #3 spot (and thus the one or two mid-week starter positions) is up for grabs, with next to no returning experience in the mix. Senior lefty John Havird was solid in 27 innings (mostly in relief) last year, with 9.8/3.6 K/W and a 4.17 eRA. Sophomore right Austin Woodby transferred from Cincinnati, where he had an uninspiring freshman season in 2014 (20 K/7 W in 33 IP with a 5.18 ERA). Redshirt sophomore right Yianni Pavlopoulos threw nine innings of relief for the Bucks in 2014, and the other contender is true freshman righty Ryan Feltner. Starting pitching is a definite area of concern.

The bullpen lost closer Trace Dempsey, and while senior Jake Post is listed on the roster, his injury at the end of 2015 plus his omission from OSU’s season preview suggests a medical redshirt is in the cards. Without Post, OSU’s bullpen is also a near-complete rebuild job from 2015. The three key right-handers are junior Shea Murray, who throws hard but has yet to harness his control (career 13 K/10 W in just 10 IP); sophomore Seth Kinker, a three-quarter slinger who had a promising freshman campaign (19 K/3 W, 2.84 eRA in 22 IP); and sophomore Kyle Michalik, who was passed by Kinker on Beals’ pecking order as 2015 progressed despite also pitching effectively (12 K/5 W, 2.92 eRA in 19 IP).

Beals is a strong believe in left-right matchups out of the pen, which means he will rely on some combination of senior Michael Horejsei (career 5.67 ERA despite 23 K/7 W in 27 IP), redshirt junior Joe Stoll (zero career appearances), and true freshman Connor Curlis to get outs. Other bullpen options include sophomore righty Curtiss Irving off a medical redshirt season and possible two-way performers Bosiokovic (who has never pitched for OSU), Mosbarger (who has pitched for Akron with a bizarre career line of 23 K/27 W but a 1.99 ERA in 31 innings), and Cherry.

The Buckeyes’ schedule appears to be designed to pad the win total early in the season as no strong non-conference foes are on the preseason schedule, and unlike prior years which have mid-conference season matchups against the likes of Georgia Tech, Louisville, and Oregon, OSU will not challenge national powers during the Big Ten campaign. The Buckeyes travel to the old Dodgertown to open the season the weekend of February 19 with games against Toledo (to open and close the weekend), Niagra, and Pitt. The following weekend, OSU competes in Coastal Carolina’s tournament with a pair of games against the hosts plus matchups with Duke and Liberty. The first weekend of March will see OSU in Port Charlotte to play another set of middling northern opponents (Seton Hall, Illinois St., and Boston College). The Buckeyes will then play four games at UNLV to return the Runnin’ Rebels 2015 trip to Columbus before the home opening series, three games with Hofstra the weekend of March 18.

From there, weekend series are mostly Big Ten: Northwestern, out-of-conference against Bethune-Cookman, @ Maryland, Rutgers, @ Illinois, @ Purdue, Iowa, the forces of evil, @ Minnesota. Mid-week opponents include home dates with Xavier, Toledo, Morehead State (two), Cincinnati, UAB (two), Florida Atlantic (two), and Eastern Michigan, along with road trips to Ohio University and Kent State.

The schedule seems to reflect where the program is right now--a middling northern program not eager to take on challenges outside of the Big Ten schedule. While such a strategy is perfectly defensible, even if the program is strong, it is a contrast to the bluster of national contention that peppered the early days of Beals’ tenure. The Buckeyes need only to finish in the top eight of a thirteen-team Big Ten (Wisconsin remains a pathetic, cowardly, contemptible institution) to get to Omaha…for the Big Ten Torunament. That should (better) be attainable, but getting back to Omaha several weeks later seems as far away as it has ever been for what was once (and could be again) a northern baseball power.

Monday, February 01, 2016

Run Distribution and W%, 2015

Every year I state that by the time this post rolls around next year, I hope to have a fully functional Enby distribution to allow the metrics herein to be more flexible (e.g. not based solely on empirical data, able to handle park effects, etc.) And every year during the year I wind up deciding that writing articles about other topics or trying to finish my professional education or watching some terrible TV show like Haven is a bigger priority than explaining how Enby applies.

Enby is a zero-modified negative binomial model to calculate the probability that a team will score X runs in a game. It is without question my favorite of my own body of sabermetric work and yet for some reason the hardest for me to get motivated to write about. Were the problem that I needed to finish working on the model itself, it would be a huge priority (almost a compulsion) for me. But I did that a long time ago, and now just need to make it presentable. I’d say maybe next year but history suggests I’d be lying to you.

Anyway, there are some elements of Enby in this post, as I’ve written enough about the model to feel comfortable using bits and pieces. But I’d like to overhaul the calculation of gOW% and gDW% that are used at the end based on Enby, and I’m not ready to do that just yet given the deficiency of the material I’ve published on Enby.

Self-indulgence, aggrandizement, and deprecation aside, I need to caveat that this post in no way accounts for park effects. But that won’t come in to play as I first look at team record in blowouts and non-blowouts, with a blowout defined as 5+ runs. Obviously some five run games are not truly blowouts, and some are; one could probably use WPA to make a better definition of blowout based on some sort of average win probability, or the win probability at a given moment or moments in the game. I should also note that Baseball-Reference uses this same definition of blowout. I am not sure when they started publishing it; they may well have pre-dated by usage of five runs as the delineator. However, I did not adopt that as my standard because of Baseball-Reference, I adopted it because it made the most sense to me being unaware of any B-R standard.

73.9% of major league games in 2015 were non-blowouts (of course 26.1% were). The leading records in non-blowouts:

The three NL Central powerhouses top the list, with all playing a lot of non-blowout games as you’ll see in a moment (the Cubs had the second-highest percentage of non-blowouts, the Pirates fourth, and Cardinals seventh) and playing very well in those games. Of the three only Pittsburgh had a better record in blowouts, which is unusual as good teams tend to do better in blowouts:

The Blue Jays were an odd case as well, actually sub-.500 (56-57) in non-blowouts but dominant in them. The only other playoff team to be sub-.500 in either was Texas in blowouts (22-25).

This chart is sorted by the difference between blowout and non-blowout W% and includes the percentage of blowouts for each team:

A more interesting way to consider game-level results is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:

The “marg” column shows the marginal W% for each additional run scored. In 2015, the fourth run was both the run with the greatest marginal impact on the chance of winning and the level of scoring for which a team was more likely to win than lose.

I use these figures to calculate a measure I call game Offensive W% (or Defensive W% as the case may be), which was suggested by Bill James in an old Abstract. It is a crude way to use each team’s actual runs per game distribution to estimate what their W% should have been by using the overall empirical W% by runs scored for the majors in the particular season.

The theoretical distribution from Enby discussed earlier would be much preferable to the empirical distribution for this exercise, but I’ve defaulted to the 2015 empirical data. Some of the drawbacks of this approach are:

1. The empirical distribution is subject to sample size fluctuations. In 2015, teams that scored 11 runs won 98.5% of the time while teams that scored 10 runs won 98.0% of the time. Does that mean that scoring 12 runs is preferable to scoring 11 runs? Of course not--it's a quirk in the data. Additionally, the marginal values don’t necessary make sense even when W% increases from one runs scored level to another (In figuring the gEW% family of measures below, I lumped games with 11 and 12 runs scored/allowed into one bucket, which smoothes any illogical jumps in the win function, but leaves the inconsistent marginal values unaddressed and fails to make any differentiation between scoring in that range. The values actually used are displayed in the “use” column, and the “invuse” column is the complements of these figures--i.e. those used to credit wins to the defense. I've used 1.0 for 13+ runs, which is a horrible idea theoretically. In 2015, teams were 81-0 when scoring 13 or more runs).

2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored.

3. Related to #2 (really its root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce more quirks into the data.

I keep promising that I will use Enby to replace the empirical approach, but for now I will use Enby for a couple graphs but nothing more.

First, a comparison of the actual distribution of runs per game in the majors to that predicted by the Enby distribution for the 2015 major league average of 4.250 runs per game (Enby distribution parameters are B = 1.0798, r = 3.966, z = .0619):

Enby didn’t predict enough shutouts or two run games, and too many three run games. There’s also a blip in the empirical data at eight runs scored (5.29% compared to 4.55% predicted by Enby). It doesn’t show up on the chart, but Enby predicted .35% of games with 16+ runs scored; the actual frequency was .31%.

I will not go into the full details of how gOW%, gDW%, and gEW% (which combines both into one measure of team quality) are calculated in this post, but full details were provided here and the paragraph below gives a quick explanation. The “use” column here is the coefficient applied to each game to calculate gOW% while the “invuse” is the coefficient used for gDW%. For comparison, I have looked at OW%, DW%, and EW% (Pythagenpat record) for each team; none of these have been adjusted for park to maintain consistency with the g-family of measures which are not park-adjusted.

A team’s gOW% is the sumproduct of their frequency of scoring x runs, where x runs from 0 to 22, and the empirical W% of teams in 2015 when they scored x runs. For example, Atlanta was shutout 17 times; they would not be expected to win any of those games (nor would they, we can be certain). They scored one run 20 times; an average team would have a .082 W% when scoring one run, so they could have been expected to win 1.64 of the twenty games given average defense. They scored two runs 32 times; an average team would have a .283 W% when scoring two, so they could have been expected to win 9.06 of those games given average defense. Sum up the estimated wins for each value of x and divide by the team’s total number of games and you have gOW%.

It is thus an estimate of what W% a team with the given team’s empirical distribution of runs scored and a league average defense would have. It is thus analogous to James’ original construct of OW% except looking at the empirical distribution of runs scored rather than the average runs scored per game. (To avoid any confusion, James in 1986 also proposed constructing an OW% in the manner in which I calculate gOW%).

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate):

Positive: TB, SEA, ATL, PHI, CHA, LA, STL
Negative: NYA, TOR, HOU

Teams with differences of +/- 2 wins between gDW% and standard DW%:

Positive: TEX, ATL, BOS, SEA
Negative: PIT, HOU, SF, STL, MIA

Pittsburgh’s defense allowed 3.679 runs per game, which one would expect to result in a .565 W% with average offense. But based on their runs allowed distribution, one would only expect a .540 W% paired with an average offense. That difference of 4.1 wins was the greatest absolute difference on offense and defense for any major league team, so it may be instructive to look at a graph of their runs allowed distribution and what Enby would predict for such a team (B = 1.0163, r = 3.655, z = .0859):

Pittsburgh had many fewer one-run games than one would expect (actual 8.0%, Enby estimate 14.1%), but allowed two to five runs more than would be expected and allowed eight or more runs 8.0% of the time versus an expectation of 9.4%.

Teams with differences of +/- 2 wins between gEW% and standard EW%:

Positive: SEA, ATL, CHA, PHI, OAK, TB, COL
Negative: HOU, TOR, PIT, NYA, WAS, NYN, SF

It’s no surprise that SEA, ATL, and HOU appear prominently as they were the only teams to have both their offense and defense appear on the positive and negative lists in the same direction. Even with bad clustering of both runs scored and runs allowed, Houston was a good team, but their gEW% of .539 tracks their actual W% of .531 better than their EW% of .576. In 2015, the RMSE of gEW% as a predictor of W% was about 4.4 wins, while EW% had a RMSE of 4.7 wins (gEW% usually, but not always over a thirty team sample, performs better as it should given the advantage of knowing the actual distribution of runs scored and allowed, even treating them independently.)

One might think that the blessed Royals, given their well-known ability to hit at the right time and play the game the right way and so many other attributes that make them so very dear to media members everywhere, would have clustered their runs efficiently. Especially their offense. But they really didn’t. KC’s gOW% was .524; their standard OW% was .524. Their run distribution, converted to equivalent wins with an average defense, was pretty much exactly what you would expect for a team that averaged 4.47 R/G. Their defense was slightly less efficient, with a .529 gDW% and .533 standard DW%. Where Kansas City made hay was the difference between their gEW% (and standard EW%) and their actual W%, which would necessarily result from a more efficient pairing of runs scored and runs allowed. It is quite tempting to credit the bullpen for this, as in theory bullpens can be strategically deployed given the game circumstances and thus increase the covariance between runs scored and allowed. But any such deviation for the Royals falls under the standard deviation from Pythagorean expectation and not anything special in the way the offense or defense alone distributed their runs.

Below is a full chart with the various actual and estimated W%s: