Walk Like a Sabermetrician: February 2011

Monday, February 21, 2011

Comments on Bill James Gold Mine 2010, pt. 1

I quite enjoyed the third edition of the Bill James Gold Mine, even though I didn't get around to reading it until a few months after it was published. It jogged some thoughts, which lead to this post, which is not fully based on James' essays but on the semi-related paths they sent my mind down. To me, that is one of the tests of a really good sabermetric work--does it get you thinking, even if not about the exact topics covered? James' book passed that test for me.

However, I do think that the book would be stronger if it contained more of James' essays and less "statistical nuggets". The nuggets were of less interest to me, and seemed to be present in lesser quantity than they were in the first two editions of the book. The reverse was true for the essays, and those are what compel me to buy the book. Not being a subscriber to Bill James Online, I'm not positive about this, but I believe that James writes a number of additional essays in each year that are not included in the book.

If that is indeed the case, I believe that they'd be much better off to collect all of Bill's essays in the Gold Mine, and leave the nuggets for the individual to drudge up themselves online. Not only does the website lend itself more to the statistics (the data there is much more extensive than what can be printed in the book even if the book were the size of one of the old Great American Baseball Stat Books) and the essays to the printed page, but if there are any folks out there who still refuse to use the Internet and are interested in James, I'd think they'd be more enticed by the essays. A book of just the essays, with some other filler of some sort, would have a character not unlike that of the 1990-1992 Baseball Books, which I liked very much.

Of course, since it appears that the book is not even being published in 2011, those suggestions are for naught.

I have three subjects to touch on, two of which could be considered critiques and one of which is just a good old-fashioned tangent. This post went a lot longer than I originally intended so it's been broken up into two portions:

1. Starting pitcher rankings

The longest essay in the book deals with a system to rate starting pitchers based on where they place among other starters in each of their league seasons. James first ranks pitchers by Season Score (*), and then assigns points based on the pitcher's standing in the league. Each league season has 5.5 points per team available. In a fourteen team league, the top ranked pitcher gets 12 points, the #2 ranked pitcher gets 11 points, and so on down to the #11 pitcher who gets 2 points. There are also three-point bonuses, up to nine points per season, available for truly historic seasons. The resulting metric is called Strong Season points.

(*) James does not give the formula for Season Score in the article, but explains that is based on W, L, IP, ERA, K, W, and SV. "The point of the system is to evaluate a pitcher's record without context"..."This was a way of trying to say 'How good are the numbers themselves?', rather than 'How good was the pitcher who compiled these numbers?'".

Personally, I'm not sure that I have a whole lot of interest in rankings of pitchers based on a method that deliberately ignores context (and James certainly does not deny the importance of context). Setting my objection aside, though, it seems to me as if the Season Score is yet another result of a process that James has repeated over the course of his career: the re-invention of Approximate Value. Of all of his methods, my impression is that there is none that James personally likes more than AV. Even Win Shares is in some respects a return to AV--while it attempts to adjust for everything, it still expresses the result in an integer. The scale is higher than that of AV (a 20 AV would be an extraordinary season, while 20 WS is good but ordinary).

And so after attempting to adjust for everything, it seems James still had a void in his own toolkit, and so he filled it with the Season Score.

Digression aside, James found that a career total of 43 strong season points marks a fairly clear line for the Hall of Fame in retrospect. Only five pitchers retired for a significant length of time have more than 44 points and are not in the Hall--Vida Blue, Bert Blyleven, Ron Guidry, Carl Mays and Billy Pierce. James says that Blyleven and Guidry (60 points) are the only two pitchers that were far above 43 yet are excluded from Cooperstown. (Blyleven has been elected since James wrote the book and I wrote this post, obviously).

Since Guidry's is the most surprising result of James' survey, I'll take a closer look at him. I do not intend the discussion about Guidry to be a commentary on his Hall worthiness or even his value, but rather as a means of discussing the issue I have with the strong season method. It is important to note that James does not in any claim that the strong season method must be used in ranking pitchers, that it is better than any methods X, Y, and Z, or any such thing. James does not argue that Guidry should be in the Hall of Fame because of his showing in the system.

Guidry earned points for six seasons in James' analysis--1977-79, 1982-83, and 1985. Suppose we apply James' method, but use a different metric--a simple Runs Above Replacement, figured using total runs allowed and adjusted for park. How many points would Guidry earn under such a system?

* James ranked Guidry #6 in the AL in 1977, which is worth seven points. I have him #7, worth six points.

* James and I both have Guidry #1 in 1978 with an extraordinary season for 12 points (James awards the 9 point bonus, and I'll do so as well to keep things comparable). Guidry turned in 101 RAR, seventeen more than the next closest pitcher and nine more than any other AL pitcher in any of these six years.

* James had Guidry #3 in 1979 for ten points. I have him second, for eleven points.

* James ranks Guidry #11 in 1982 for two points. I have him all the way down at #26. His RA was 4.22 in a league in which 4.5 runs were scored per game, and he pitched in a moderate pitchers' park (.97 PF). At 34 RAR, he is eleven runs behind the eleventh-place pitcher (Geoff Zahn, 45). Presumably Season Score gives Guidry a boost because of his 14-8 record, one of the most impressive in the league (seventh in the league in Win Points).

* James ranks Guidry #4 in 1983 for nine points; I have him #6 for seven points.

* James ranks Guidry #2 in 1985 for eleven points; I have him #11 for two points. This is another season in which Guidry's W-L record seems to give him a huge season score boost (22-6).

Add it all up, and I have Guidry at 47 points--suddenly not that far above the Hall of Fame line James observed. I followed his scoring method exactly, but the results changed significantly simply by changing ranking methods.

More interesting, IMO, is how the use of in-season rank elevates the importance of very small performance differences. In 1979, Guidry ranked second in RAR at 71. However, Tommy John (71) and Jerry Koosman (70) were right behind him. Given that Guidry relied much less on his fielders, I strongly support the notion that he had a better season than the other lefties. Still, negligible differences in actual performance are given much greater impact when one uses a points system like James'.

Another example is 1985, in which Guidry ranks eleventh on my list at 61. Jimmy Key ranks sixth at 62--there are six pitchers within two RAR of each other. Guidry could very easily rank sixth in this season, which would be worth an additional five points. That would vault him from 47 points to 52 points, and give him a great deal more clearance over the HOF line.

This is not to say that James' ranking system is without its strong points with respect to its aims--it values peak performance and it sets an equal total value relative to the size of the league, which depending on one's perspective might be very good properties. My contention is that such a system is very sensitive to small changes in statistics, ones that would have no impact on a career-based evaluation. If Guidry had been evaluated at 62 RAR and thus sixth in 1985, the extra run saved would have zero impact on your evaluation of his career RAR total--and rightfully so. Allowing one run to exert a significant difference in a player's rank on an all-time list strikes me as utterly illogical and unsatisfactory.

You may object and say that I am using RAR rather than Season Score, and that Season Score is not subject to minute differences in performance having a large effect on rank order as is the case for RAR. While it is true that RAR and Season Score are very different methods, and that their application to Guidry might be very different as well, any metric is going to be subject to the same concerns when making a rank order over one season. There is always the potential that a very small margin could be the difference between a batting title and third place, between fifth in the league on a list and out of the top ten. That is true for any metric you want to pick, from BA to home runs to ERA to Season Score to RAR.

Monday, February 14, 2011

Into the Great Wide Open

For as long as I have been writing this blog, I have done an annual preview of the OSU baseball team. I have never before been at such a loss for words or offered so little insight about the team that will appear on the field in the upcoming season. While I have never possessed nor claimed any sort of insider status, any sort of knowledge about the thinking of the coaching staff and the player’s performance in off-season drills and the like, I’ve always felt that I had a pretty good handle on what to expect the lineup to look like. The tiny scraps of information available through the media about the team, coupled with my knowledge of the personnel from the previous season and years of observation of Bob Todd’s decision-making made it a relatively simple guessing game.

The 2011 Ohio State team is anything but a simple guessing game. There has been massive roster turnover, with the team’s two best players (ace Alex Wimmers and catcher Dan Burkhart) now playing professionally, and, worst of all from a prognostication standpoint, the fact that Coach Todd has retired, and Greg Beals is now in charge of the program. Beals had moderate success at Ball State, but his Ohio roots and purported recruiting skill landed him what is likely the most attractive coaching position in Midwest baseball. He has filled out his staff with Chris Holick, his pitching coach from Ball State (and former OSU pitcher) Mike Stafford, and volunteer coach Josh Newman (another former Buckeye hurler and briefly a major leaguer with the Rockies and Royals).

Holick returns to coaching after spending two years in private industry that came on the heels of a seven-year assistant coaching career (Kent State, Arizona State, and Florida International). Stafford served as pitching coach under Beals at Ball State since 2003; prior to that he had been a bullpen catcher for the Columbus Clippers and pitched four years in the minors after ending his OSU closing career in 1998. Newman is entering his first season as a collegiate coach.

The talent they have to work with does not inspire a great deal of confidence for 2011, as there is very little in the way of proven performers. My fears about this were enhanced when Beals, speaking on the radio halftime show of an OSU football game this fall, answered a question about expectations for the season with vapid generalities about “playing hard”, “playing the game the right way”, “learning how to win”…the sort of talk you hear from a coach that knows he’s building for the future. At a recent meet the team event, Beals said that due to new NCAA rules on bats “It's going to be a faster paced game, but a safer game with less runs scored," Beals said. "Our game is going to become a game of the finer skills - defense...throwing strikes...aggressive on the base paths - and we want to be ahead of this curve." Uh-oh.

Even if the entire roster from 2010 was back, it would be difficult to get a good read on the team’s prospects. 2009 was a great season for the Buckeyes, with a Big Ten regular season title and a second-place regional finish, but 2010 was a disaster despite high expectations. OSU was in first place (although by a very slim margin in a very tight conference race) in late April when Wimmers went down with a hamstring injury. The few starts he missed might have been enough to mark the difference between finishing near the top of the conference and failing to finish in the top six to qualify for the Big Ten Tournament--the first time that fate had befallen an OSU club since 1996. Ohio’s 11-13 conference record was the first and only sub-.500 record in Big Ten play in Coach Todd’s twenty-two year career.

Usually I write my preview and lineup expectation a month or so before the season starts; this time I’m doing it less than a week prior to Opening Day. As such, the ensuing discussion is based largely on the preview posted at the official athletic website; I’ve had to rely on such accounts to get any sort of feeling for how the lineup would shake out

Burkhart’s successor behind the plate will be Greg Solomon, a sophomore-eligible transfer from Paradise Valley Community College in Arizona. Solomon missed most of last season with a knee injury and did not hit particularly well in his 2009 freshman season, so this position is a huge question mark (you’ll note questions as a recurring theme). Beals’ comments on him praise his defensive skill with little to say about his bat. Burkhart was such an ironman for the past three seasons that none of the other catchers on the roster have any experience. Redshirt freshman Steel Russell, son of former Pittsburgh skipper John, would seem to be the logical choice as backup. True freshman Josh Bokor, junior-eligible JUCO transfer Brad Hutton, and his brother Blake, a true freshman, could also be options. The Hutton brothers are both listed as C/IF on the roster, so it is possible they could see time at the corners as well.

For the past two seasons, first base belonged to senior Matt Streng, but he has flipped corners with sophomore Brad Hallberg. I was quite surprised to learn of this move, as Streng has never struck me as particularly athletic, but he will man the hot corner in 2011 after a very disappointing 2010 campaign that saw him hit just one home run (coincidentally, it came in the only game I was able to attend) and turn in a -13 RAA performance. Hallberg was -4 runs in 112 PA in his debut season.

Hallberg will split first base and DH duties with true freshman Josh Dezse, the most impressive member of the OSU recruiting class and a 28th round pick of the Yankees. Given the fact that Dezse is also expected to be a key part of OSU’s bullpen, it would probably make more sense to have him at first base more often than not, but we’ll have to wait and see.

Second base is another position where the Bucks have to fill a big whole as one of my favorite players, Cory Kovanda, completed his eligibility. Kovanda was a consistent on base threat, slapping infield hits, drawing walks, and getting plunked. His replacement will be redshirt sophomore Ryan Cypret, whose father was a member of the previous coaching staff. Cypret served something of a utility infield role last season, but had just one extra base hit in 51 PA and thus has much to prove with the stick.

Shortstop is the field position with the most continuity, as Tyler Engle has spent most of his three years in Columbus playing good defense but producing little at the plate. Last year he posted an ugly .224/.342/.376 line; his 28 walks were second on the team and the only thing that made him playable. If Engle could regain something approaching his 2009 form (.285/.411/.423), it would be a huge help as the infield looks as if it will struggle to create runs. Two true freshmen Indiana natives, Derek Hannahs and Jacob Hayes, and local product Phil Jaskot figure to provide depth, with Hayes also being trained for left field.

The outfield features one returning starter, as senior Brian DeLucia will play right field. DeLucia is easily OSU’s top returning hitter, with a .320/.395/.503 line in 2011, and he’s a very good outfielder as well. The rest of the Buckeye outfield will have big shoes to fill, as left fielder Zach Hurley (.385/.438/.602) and center fielder Michael Stephens (.360/.395/.556) were the team’s two most potent hitters outside of Burkhart in 2010. Also gone to graduation is DH Ryan Dew (.348/.420/.498), leaving OSU short six of its top seven offensive performers--this from a team that averaged 6.6 runs to the Big Ten’s 6.8.

It was expected that sophomore Hunter Mayfield would get one of the open positions, but he transferred to Rollins College (a Division II school that managed to beat the Bucks last year, perhaps costing them more than just the game), so there is no returning experience of which to speak outside of DeLucia. Left field will be a platoon between junior David Corna and sophomore Joe Ciamocco. Despite having burned three years of eligibility combined, they have a total of one collegiate at bat between them, so it’s impossible to count on much offense out of them. Center field will belong to true freshman Tim Wetzel, described by the web account as having--you guessed it--"speed and defensive prowess”.

One can only hope that the unproven players will be better batters than the advance billing would suggest, because otherwise it seems as if Ohio will have significant trouble putting runs on the scoreboard. This is particularly unfortunate since the 2010 pitching staff was Wimmers and anyone who could stay healthy, and Wimmers, perhaps the best pitcher at OSU since Steve Arlin, is now a Twins farmhand.

The Friday starter should be senior Drew Rucinski, who had been constantly shuttled back and forth between the bullpen and the rotation by the previous staff. Rucinski was a good complement to Wimmers, but does not figure to match up well with the other top starters in the conference. Another senior, Dean Wolosiansky, has been a rotation stalwart throughout his career and should get the Saturday starts. Wolosiansky has always fit the profile of a league-average innings muncher, which is a good thing to have, but in a perfect world he would be the Sunday starter. That role apparently will go to true freshman Greg Greve, a 45th round pick by San Francisco in last year’s draft.

Greve’s spot in the rotation may not hold up if Brad Goldberg is granted a waiver to pitch in 2011. The Coastal Carolina transfer is a junior in eligibility, but may not be able to use it until 2012. He pitched just five innings for Coastal last year after pitching fourteen effective innings out of the pen as a freshman. If eligible, he figures to be a big boost to the Buckeye mound staff.

Josh Dezse, slated to start at 1B/DH, is also the favorite for the closer role. While two-way players are fairly common in the college game, especially as closer, the only prominent OSU two-way player of recent years was JB Shuck, now a solid prospect in the Astros organization. However, Shuck was used as a starting pitcher, not a reliever, and rarely was in the lineup when he pitched. It appears as if Beals is more comfortable with hybrid players than Todd was.

Senior Jared Strayer’s role and effectiveness increased as the season went on last year, and he figures to be the key middle reliever in 2011. Unfortunately, it seems as if the new staff has taught Strayer a more conventional delivery--last year he adopted a three-quarters/sidearm delivery that was a pleasure to watch and a rare sight on the Bill Davis Stadium mound.

Sophomore Brett McKinney showed some promise last year, but was way too wild and way too hittable (32 walks and 78 hits in 59 innings). He figures to be the Wednesday starter/weekend long man at this point, but he has a chance to be a quality contributor at some point during his career. Two left-handed pitchers--senior Theron Minium and junior Andrew Armstrong--figure to be the other candidates for Wednesday starts and weekend middle relief options. Armstrong showed much promise as a freshman in 2008, but was injured during the ’09 season and missed all of last season.

Junior Brian Bobinski and sophomore Cole Brown combined for eighteen ineffective innings last year and in a perfect world will be mopup pitchers this season, provided they have shown no improvement. Three other pitchers on the roster do not figure to see significant action: true freshman lefty Ben Bokor (twin brother of backup catcher Josh), junior walkon Paul Guey, and a freshman walkon who was unsuccessful in attempting to make Wake Forest’s roster last year, John Kuchno.

The OSU pre-conference schedule is significantly different than those of past years, showing a change in philosophy at the top. The schedule opens the weekend of February 18 with the Big Ten/Big East challenge in Florida, where the opponents will be Cincinnati, Louisville, and St. John’s--the latter two have both been problems for OSU in recent years, with Louisville burying the Bucks in five games over the past three seasons by a combined 62-32 score, and St. John’s having knocked Ohio out of the NCAA Tournament in 2005.

The weekend of February 25th sees the Bucks back in Florida four a four-game series with Western Michigan, a team with OSU has a rich tradition--the teams played ten times in the NCAA Tournament between 1951 and 1967. The weekend of March 3 will be a trip to North Carolina to face Army, Western Carolina, and Akron, and the weekend of March 10 is another Florida trip to play Illinois State, Bradley, and Army.

The annual spring break trip, which this year will run from March 18-26, is the one in which the philosophical differences emerge. Coach Todd used the spring break trip as a week of tuneup for conference play, going to Florida to face mostly northern teams that OSU should expect to go 6-2 or so against. He had not taken his team to the West Coast since a 2002 trip to Albuquerque that featured a crazy 38-15 game against Toledo, and the last trip to California came around twenty years ago. In his first season, Beals will take his charges to play a three game series in Berkeley (in what sadly appears to be the final campaign for Golden Bear baseball), two games at Fresno State, and three at Cal St.-Bakersfield.

The home opener will be Tuesday, March 29 against Xavier. Other standard one game mid-week home opponents will be Miami, Akron, Bowling Green, and Toledo. The Buckeyes will travel to play at Ohio University on one Wednesday, and have two out-of-region opponents coming in for two game series: North Florida and Oklahoma State.

Big Ten play begins April 1 against Northwestern; in subsequent weeks, OSU goes to Indiana, hosts Michigan State, travels to Penn State, hosts the forces of evil, goes to Illinois, hosts Iowa, and goes to Minnesota. If OSU is able to make the top six, they’ll be able to stay home and play downtown at Huntington Park; the Big Ten Tournament will be held there May 25 - 28.

Sadly, if I had to hazard a guess, I would say that for the second straight season six other Big Ten teams will be playing in Columbus that weekend. While I thought that Todd’s program was healthy enough, 2011 would have loomed as a possible rebuilding year in any case. Coaching changes often excite fans at the outset, but first seasons tend to be rough.

Wednesday, February 09, 2011

Crude Team Ratings 2010

In a previous post I explained the methodology behind these rankings, and acknowledged a fairly decent number of shortcomings they possess. I will not harp on either of those topics again here, but that is simply to avoid being repetitive--these rankings are far from perfect and should be taken in that spirit.

I have will present four different sets of rankings based on four different inputs. The manner of calculating the rankings is identical for all four; the only difference is which initial win ratio is used. I tend to believe the final set based on Runs Created/Runs Created Allowed is the most indicative of true talent, but any season aggregate metric is going to have obvious deficiencies when it comes to estimating true talent, and in reality some combination would probably be superior for that purpose.

The first ranking is what I call actual CTR, since it is based on the actual W/L ratio of each team. This is an attempt to evaluate teams based on their actual game outcomes, just adjusted for strength of schedule. Some might argue that even if actual record is used for the team, some component-estimated record should be used to gauge SOS, and they have a point, but this approach equates team strength to W/L ratio.

In the chart below, aW% is adjusted W%, and "s rk" is the rank of the team's SOS (#1 = toughest schedule):

You can see from the chart that a 100 CTR does not correspond to a .500 aW%. This is because the rankings are designed to give the average team a 100 CTR; since the properties of W/L ratio ensure that the average W/L will be > 1, an average W/L ratio is not the same time as the W/L ratio of an average team (which is 1, of course). If this scale distortion bothers you, use aW%--and stop using ERA+, because the distortion is similar. I am unconcerned about the issue because I want the average rating (but not necessarily the median) to be 100.

The three toughest schedules are the bottom three in the AL East (BAL, TOR, BOS, although it's hardly fair to refer the latter two teams as being at the bottom of anything). Of course, this is a consequence of playing in the stronger league and having to play a bunch of games against the two highest-rated teams. What I don't like about this definition of SOS is that one can make the case that Tampa Bay's schedule was equally as tough on paper as Toronto's (assuming an equal distribution of games against non-AL East opponents)--but Tampa Bay's success in winning games makes Toronto's harder in practice. It would be difficult to devise a SOS technique that took that point of view, however.

The more notable weakness of the schedule-adjustment (and thus the ratings themselves) is they make no correction for the influence that a team has upon the win-loss record of its opponents, and thus might be acting in a distortive manner at the extremes.

I also have some division/league ratings; these are simply the average CTR of all of the teams in the division:

Four of the six divisions were relatively equal, with an average CTR in the 96-102 range. However, one extremely good division and one extremely bad division result in the AL having a 108 ranking to the NL's 93. Those overall league rankings imply that the average AL team would be expected to have a .537 W% against the average NL team. The 2010 interleague record for the AL was .532, which of course was not generated through balanced schedule, neutral-field meetings and covers a sample of 252 games.

While I developed the rankings this year, I ran 2009 through the spreadsheet and the AL/NL disparity was estimated to be greater (113/89, .559). The AL West was the top-rated division (126), and the NL Central fared a tick worse than they would in 2010 (80).

The weak NL Central allowed the Reds to have the worst CTR of any playoff team (108) in 2010, although that figure was also better than the Cardinals' 2009 low of 103.

Switching gears, here are the gCTR figures. These are based on gEW% (described in this post), which takes into account the distribution of team runs scored and allowed per game (but does so independently of the other):

I'm not going to have a lot to say about the charts for each input, since they track the differences between their inputs and actual W%, which I have already written about in one form or another. Next is eCTR, which uses standard Pythagenpat W-L record as its starting point:

Finally, pCTR, based on Runs Created/Runs Created Allowed used to fuel Pythagenpat:

Here the AL East really looks good, with the top four teams in baseball. The potential for this type of clustering of strong teams (and, in the case of the NL Central, lousy teams) in one division is one of the reasons I oppose treating winners of small divisions playing with unbalanced schedules as sacrosanct.

pCTR and the associated aW% are the closest in construction to other popular ratings that account for strength of schedule and use component rather than actual W-L inputs, namely the third-order records published by Baseball Prospectus and the TPI rankings figured by Justin at Beyond the Box Score. Here are the most comparable winning percentages for each methodology--my aW% based on pCTR, the third-order winning percentage from BP, and the TPI from Justin:

You can see that there is general agreement between all of the methods, which is a good sign. The best correlation is between CTR and BP (+.983); the worst between CTR and Justin (+.942), with the BP/Justin correlation falling in the middle (+.965). The methods are all in general agreement about the proper spread of the teams--the standard deviation of the BP and Justin figures is .059 compared to .060 for the CTR-based figures, .060 for my estimate of PW% without any schedule adjustments, and .068 for actual W%.

The BP approach and my own to generating the underlying W% estimate are essentially the same, except for the use of different run estimators (BP uses EqR and I use Base Runs). Justin’s approach is a little different, and I’m personally not wild about it--it breaks defense down into pitching and fielding, making use of FIP and defensive metrics like UZR and Dewan’s Runs Saved. The approach used by BP and myself looks at the actual total component statistics surrendered by the defense. This does not allow one to split defense into pitching and fielding, but it also makes use of the actual observed interaction between the two on the field rather than using estimates that might make sense in isolation but leave something missing when the two are considered as one unit.

In any event, it is encouraging to see that CTR is able to produce similar results to systems developed by others that have been around a little or a lot longer as the case may be. If CTR returned very different results, I would probably conclude that it had a serious methodological error rather than the minor though not insignificant flaws that I am already aware of.