Sunday, February 24, 2013

2013 World Yakyu Classic

The World Baseball Classic invokes a number of reactions from baseball fans. Many (particularly in the United States) are indifferent. Some, like the author, love it and consider it a huge bonus--competitive games being played throughout March as the initial thrill of exhibition games begins to dull towards counting down to Opening Day. And others hate it for one reason or another--the threat of injury to key players and a dislike of any sort of display of national pride, regardless of how benign it might be.

The latter viewpoint is the one that I am unable to comprehend for a number of reasons that aren’t really germane to a baseball post. I will simply say that I see no evidence that any of the effects that one might view as particularly harmful have come to pass or are likely to come to pass. I don’t see riots between opposing countries’ fans in the stands, nor a bleed-over of passions stirred by the World Baseball Classic to the regular major league season in a harmful manner. There is much more ample evidence of senseless, tribal conflict between fans of the Yankees and Red Sox than there is of fans of Country X and Country Y.

Organizing teams of ballplayers by country is no less arbitrary or silly than any other manner of doing it. In fact, one could advance the argument that American pro sports have one of the most bizarre means of assigning players to teams, since veteran players are acquired under a completely different structure than young players, teams are placed in cities by the sometimes irrational decisions of a cartel, and these teams have been locked into an organizational structure that is so entrenched that moving a team from one league to another long after the distinctions between leagues have been eroded still can invoke fan meltdowns.

One element of the WBC that is more difficult to defend is the constantly changing set of rules that determine which teams play (and the name, which is beyond awful). These changes have not all been bad by any stretch. For instance, in the initial two Classics (2006 and 2009), the sixteen teams were pre-selected by MLB, but in 2013, four spots were filled through qualifying tournaments. The twelve countries that won games in the 2009 tournament were automatically qualified (Australia, China, Cuba, Dominican Republic, Italy, Japan, Korea, Mexico, The Netherlands, Puerto Rico, United States, and Venezuela), and four modified double-elimination tournaments (modified in that the game between the winner’s bracket winner and the loser’s bracket winner served as a decisive championship game rather than a must-win solely for the loser’s bracket winner) filled the remaining four spots. These tournaments, held in the fall of 2012, resulted in Brazil (over Colombia, Nicaragua, and Panama), Canada (over the Czech Republic, Germany, and Great Britain), Spain (over France, Israel, and South Africa), and Taiwan (over New Zealand, the Philippines, and Thailand) qualifying. The net result relative to the first two tournaments was replacing Panama and South Africa with Brazil and Spain.

The countries have been divided into four pools for the first round, which once again has a new format. In 2006, the first round was conducted as a round robin between the four teams, which require the use of arcane tiebreakers. In 2009, this was modified to double-elimination, which was much easier to understand. However, the 2013 format has returned to round robin and all of the confusion that goes with it. If you want to keep your sanity throughout the tournament, then root for each pool to have one team go 3-0, another 2-1, another 1-2, and some poor country 0-3. Or two 2-1s and two 1-2s, although that is less likely given the often wide variations in team strength.

The teams with the two best records will advance. If there is a tie, a modified run differential (runs/innings batted - runs allowed/innings pitched) that for some reason the IBAF calls “Team Quality Balance (TQB)” will be used to break the tie. Only the games between the tied teams will be used to figure TQB. If there is a three-way tie, then the TQB tiebreaker will be applied, and if two teams remained tied, their head-to-head result will be the determining factor. If all three teams have the same TQB, then a TQB based on earned runs will be used and the process will begin again (this may be the single stupidest rule in baseball history)...at least until you read down to the next tiebreaker, which repeats the process with batting average. Batting average.

The likelihood of teams being tied after the TQB step is low, but that doesn't make it any jarring to read earned runs and batting average spelled out officially as components of a championship determination process.

A brief capsule on each first-round pool follows; the ranking listed for each team is their IBAF world ranking. These rankings are based only on international competition and thus provide no insight on these specific rosters; I’ve simply provided them for amusement. These rankings are especially harsh on countries like the Dominican Republic and Venezuela that often do not field teams for the second-rate international tournaments. Of course, given the non-existent sample size of the tournament, any predictions are beyond worthless (witness the Netherlands two victories over the Dominican Republic in 2009), but that doesn’t prevent one from making vague assertions on team strength. Dates are EST.

Pool A

Location: Fukuoka, Japan
Dates: March 2-6
Participants: #1 Cuba, #3 Japan, #18 China, #20 Brazil

Cuba and Japan are obviously the huge favorites to win here. The two countries have already built quite a history at the WBC, with Japan defeating Cuba in the 2006 title game. In 2009, they met in the first game of the second round, with Japan winning; after Cuba beat Mexico and Japan lost to Korea, they met again in a game to determine which would advance to the semifinals. Japan ran their record to 3-0 over Cuba en route to winning a second championship. Unfortunately, their meeting will be the final game in Pool A and both may already be assured advancement, and will be played at 5 AM EST. Japan has opted to go with all NPB players this time, so the names will not be as familiar to American fans (now 38 year-old Kazuo Matsui is the most recognizable).

Pool B

Location: Taichung, Taiwan
Dates: March 1-5
Participants: #4 Korea, #5 Taiwan, #7 Netherlands, #10 Australia

Pool B should be compelling as it is the only pool that features four teams with a relatively decent chance of beating any of the others. Korea, who lost in the 2006 Semifinals and 2009 Final to Japan, is the favorite along with home-standing Taiwan. Taiwan suffered an embarrassing sweep out of the 2009 tournament, including a loss to China, and national baseball pride will be on the line. The Netherlands continues to improve, bolstered by the now-burgeoning talent pool in Curacao. They will not have Jurickson Profar or Kenley Jansen, but Xander Bogaerts, Andrelton Simmons, Roger Bernadina, and Andruw Jones are all familiar faces. Australia went 0-3 in the 2006 Classic, but got their first win in 2009 against Mexico and lost 5-4 to Cuba before being knocked out in a rematch with Mexico.

Pool C

Location: San Juan, Puerto Rico
Dates: March 7-10
Participants: #8 Venezuela, #12 Puerto Rico, #13 Dominican Republic, #16 Spain

Pool C may have the lowest composite IBAF ranking, but it is the pool in which it is toughest to pick two winners and perhaps the strongest in talent. While Spain has no legitimate change to advance, the top three professional Caribbean powers should make this must-watch beisbol. Much has been written about the decline in Puerto Rican talent, and it’s true that the names aren’t as impressive as those for Venezuela and the Dominican, but if Puerto Rico can patch together a pitching staff (or in a short tournament with pitch restrictions, get one good ensemble performance), they could easily advance. It’s impossible to quantify home field advantage, but it shouldn’t hurt.

Pool D

Location: Phoenix
Dates: March 7-10
Participants: #2 United States, #6 Canada, #9 Italy, #11 Mexico

I’ll have more to say below about the US team and its performance in previous tournaments. The draw here is such that the US is the strong favorite, but as 2006 showed, Mexico and Canada are more than capable of beating the US in a single game. Italy is fortified by American players of Italian heritage, but not to an extent that makes them a strong threat (although they did send Canada home in Toronto in 2009). The key game here on paper would appear to be Canada/Mexico.

The two surviving teams from each lettered pool will advance to the second round, where the pools will be numbered and will follow a modified-double elimination format. Pools A and B will combine into Pool 1, played March 7-12 in Tokyo, while Pools C and D will merge into Pool 2, played March 12-16 in Miami. The second round will begin with the winner of one pool against the runner-up of the other pool. The winners and losers will meet; the winner of the winner’s game will punch their ticket to their semifinals, while the loser of the loser’s game will be eliminated. The remaining two teams will play for the other semifinal berth, and that team will then play the winner’s bracket winner for the pool title, which will only matter in determining semifinal matchups. (This all makes a lot more sense in bracket form). If the favorites were to win, this means Pool 1 might feature Japan, Cuba, Korea, and Taiwan, while Pool 2 would feature the United States, Mexico, the Dominican Republic, and Venezuela.

The final four will be played in San Francisco, with the semifinals on March 17 and 18 and the championship game on March 19. The semifinals will feature the winner of one pool against the runner-up from the other.

A few other points on the tournament:

Rules

* The tiebreaker rules were covered above, but there are a few other noteworthy rules. The pitch limits are oft-discussed and too detailed to repeat here, but the key rule is that pitchers are limited to 65 pitches/game in the first round, 80 in the second round, and 95 in the final four. Pitchers cannot work more than two consecutive games, and must have a day of rest if they exceed thirty pitches and four days of rest if they exceed fifty.

* Mercy rules are in place that will halt the game if the lead is fifteen after five innings or ten after seven innings in the first two rounds.

* Starting in the thirteenth inning, runners will be placed at first and second base, and should they score will not be considered earned runs (I only mention this last part due to the silly earned run TBQ tiebreaker).

* For some reason which I do not understand, there is a rule that says “Players shall not lie down or sit on the bases when time is called on the field.” This apparently is not in the MLB rulebook, but thankfully we will be spared the horrible sight of players lying down on the bases.

United States

The US team, as you know, has not acquitted itself particularly well in the WBC, needing a runs allowed tiebreaker to advance past Canada in the 2006 first round, then losing to Mexico and Korea in the second round. In 2009 the US was mercy-ruled 11-1 by Puerto Rico in the first game of the second round. After beating the Netherlands, the US needed a dramatic ninth-inning rally to slip past Puerto Rico and qualify for the semifinals, where they lost 9-4 to Japan. Overall, the US was 3-3 in 2006 and 4-4 in 2009.

This has led to a lot of armchair psychology, which is to be expected and isn’t really worth commenting on. It’s certainly not outside of the realm of possibility that the US players have a more casual mindset towards the WBC than players from other countries, or that the genius managing of Buck Martinez in 2006 was more spring training in style than competitive, or that the Asian players in particular are closer to their top physical condition in early-to-mid March.

However, it has also led to two ridiculous strains of argument that can be addressed factually. One is that the WBC results somehow demonstrate that the United States is not the #1 source for baseball talent in the world. (Obviously, this argument is only unreasonable when expressed in terms of bulk rather than per capita talent--but the construction of the argument is inherently a bulk argument, since it’s based on the performance of each country’s “best” roster). This is obviously absurd as the results of a small sample size tournament do not even begin to provide a counterpoint to the wealth of available data from major league play (as well as the performance of players crossing between professional leagues, particularly MLB and NPB).

The second is that the US WBC rosters have been unimpressive aggregations of talent, a second-rate collection of players that is inferior to the rosters used by the other top contenders. While it is undoubtedly true that no WBC roster for the US has featured the best possible roster, it is nevertheless silly to pretend that the US rosters have been less than sterling collections of talent. If the US team were instead a MLB roster, it would be my choice to win the World Series.

For a crude illustration, look at the possible US lineup for 2013 compared to a similar group from the Dominican Republic and Venezuela, the other top western contenders for the WBC title, the 2012 AL and NL All-Star starting lineups, and the Tigers, who are projected by Baseball Prospectus to lead the majors in runs scored. I’ve listed OPS and an average OPS as projected by CAIRO; obviously OPS is a crude metric and averaging it as I have is crude, but for the purpose here it should suffice just fine:



In this illustration, the US team does not rise to the level of an all-star team, but compares favorably to the Dominican and Venezuelan teams as well as the team projected to have the best offense in MLB. Could the US team be better? Sure...you could replace Mauer with Posey, Teixeira with Fielder, Jones with Trout, etc. But it’s still a really good lineup.

On the pitching side, there is a more pronounced lack of top stars. The US team does not have the services of Justin Verlander, Clayton Kershaw, Cliff Lee, Stephen Strasburg, Jered Weaver, Matt Cain, CC Sabathia, David Price, Cole Hamels, and the list goes on. And one could certainly argue that the strongest aspect of a theoretical perfect US team would be the starting pitching depth which swamps what any other country can offer.

And yet, the US still boasts two of the top NL Cy Young vote getters in RA Dickey and Gio Gonzalez, and solid major league starts behind them in Derek Holland, Ryan Vogelsong, and Ross Detwiler. The US bullpen features a number of solid arms, in Steve Cishek, Chris Perez, Vinnie Pestano, Luke Gregerson, Mitchell Boggs,... It’s not an all-world group, but it’s a strong real-world unit on paper.

One can argue that the relative dearth of high-profile starting pitchers participating in the WBC benefits the US, since it also hits the Dominicans and Venezuelans: the Dominicans only established major league starters are Wandy Rodriguez and Edinson Volquez, while the Venezuelans have Anibal Sanchez, Jhoulys Chacin, and Carlos Zambrano.

One thing that has been disappointing about the WBC from the US perspective is the failure of the first two tournaments to produce two of the three high profile potential matchups. US/Japan has happened twice, but we have yet to see the superpower showdown of US/DR or the politically-charged US/Cuba matchup. From a US-centric perspective, I’d say those are the three most intriguing WBC matchups. Other interesting games like Japan/Korea and Dominican Republic/Venezuela have occurred as the brackets have set them up.

Making predictions about who will win the WBC is a degree sillier than making predictions on playoff series, but on paper, the United States should be the favorite.

Monday, February 18, 2013

Omar y Amigos

Back in the mid-90s, the Indians radio network carried a pregame feature before Sunday afternoon games called "Kenny's Kids". It was your standard bolierplate schlock--little kids given tickets to the game through the Indians charities or Kenny Lofton's foundation or something of the sort would get to listen to Kenny talk to another Indian, or get to ask him questions ("Mr. Wofton, Mr. Wofton, how do I get to be a baseball player?")

Then Kenny got traded prior to the 1997 season, and this grave duty fell to Omar Vizquel. The segment was renamed "Omar y Amigos". When Lofton came back the next year, Kenny's Kids did not return; Omar y Amigos continued. So if you ever thought there wasn't an upside to getting traded, then returning to your former club as a free agent...you were wrong.

Title exposition aside, I'd like to discuss Omar Vizquel's place in history. This is a standard boring article written at a low level of sabermetric literacy, one you can safely skip if you're not interested in comparing players' careers across the years (or if you value peak over career). The goal is not to pinpoint a ranking and say that "Omar Vizquel is the fourteenth best shortstop of all-time" or anything like that, just to get a general sense of how he compares to other great shortstops.

One of the most common comparisons for Vizquel, particularly among mainstream thinkers, is Ozzie Smith. The comparison generally assumes that the two provided similar value in the field, and thus can be compared on the basis of their offensive production:



It is easy to see why many traditional, context-free glances at the stats result in considering Vizquel to be Smith's equal. He hit for a higher average, hit fifty more homers, and both drove in and scored over 150 more runs than Ozzie. He hit for a better slugging average while getting on base at the same rate. Smith does have an advantage in basestealing, swiping 176 more bases than Vizquel while getting caught 19 fewer times.

Of course, as a sabermetrically-informed reader you know that context is king. It makes a huge difference; Vizquel played in parks with a composite PF of 1.01, while Smith played in parks with a park factor of .98. Much more significantly, Vizquel played in leagues in which the average team scored 4.81 runs per game; teams in Smith's leagues averaged 4.15 runs per game. Combining the two, Vizquel played in a context in which 19% more runs were scored.

Any serious analytical approach is going to take that into account, and rather than emerging as the superior hitter, Vizquel will assuredly come out as inferior to Smith. Compared to a league average hitter (comparable to Palmer's Batting Wins, except counting stolen bases), I have Smith at +2 wins for his career and Vizquel at -23, with Smith’s RC/out relative to the league average bettering Vizquel’s 102 to 86. Not all methods think the gap is that large (see Technical Note below), but the other commonly used sabermetric methods concur that Smith was a better offensive player: he leads in TAv (.250 to .243), wRC+ (90 to 84), and OPS+ (87 to 82).

Even if one accepts that Smith and Vizquel were similar fielders, they are not particularly close in offensive value. Unless one wants to make the claim that Vizquel was a much better fielder than Smith, Ozzie is the more valuable player--easily.

Engaging in a little bit of unhealthy stereotyping (on both baseball and ethnic levels) one can find decent comparables for Vizquel in three players: Luis Aparicio, Bert Campaneris, and Dave Concepcion. I don't mean to suggest that these three are the most comparable players to Vizquel, but they are obvious comparisons as they are all shortstops, all Latin, all from the expansion era, and all are known for their glovework. One could of course compare Vizquel to players from other positions, or shortstops who arguably were roughly as valuable but with a different offense/defense split than Vizquel (like Jim Fregosi, Tony Fernandez, Junior Stephens, Edgar Renteria, or Miguel Tejada), but comparisons to the aforementioned trio are irresistible:



Once again, it is easy to see why Vizquel is highly regarded in the mainstream when looking at the raw statistics. Most people, presented with that data alone, would choose Vizquel. When park factors and league averages are considered, though, it is clear that Vizquel played in a much more offense-friendly time and place:



The final column is Vizquel's run environment (N*PF, where N is the league average runs/game) as a ratio to the others. Runs were 18% more common in Vizquel's games than Aparicio's, 25% versus Campaneris', and 17% versus Concepcion's. The result is that while Vizquel has -23 hitting WAA, Concepcion has -2, Campaneris -3, and Aparicio -10. (See technical note for an explanation of why the differences are greater than some other methods show).

Let’s suppose you refuse to consider sabermetric measures, and want to limit your offensive evaluation only to what you protest are “actual” runs--runs scored and RBI. For the sake of argument, I’ll play along, as long as you allow me to consider league context and outs made. (See this post for the full details on how these are figured, but essentially R+ is runs scored per out relative to league average, RBI+ is the same for RBI, ANY is the average of R+ and RBI+, and ANYA is the average of runs scored above league average and runs batted in above league average).

When you do, it becomes clear that Vizquel’s runs scored and batted in totals are not as impressive as those in this peer group:



Vizquel scored fewer runs per out than his league average; only Concepcion joined him, but Concepcion had the best RBI rate of the five. In terms of the average, Vizquel is well behind the others at just 83% of league average R and RBI per out.

I'm sure that any Vizquel partisan who is still reading is screaming "What about fielding?" I don't have any particular insight to lend on that question; all I can do is regurgitate the figures that others have published. Bill James, in Win Shares assigns fielders letter grades based on their Defensive Win Shares per inning rates. Vizquel (through 2001) is evaluated as a B- shortstop, with Aparicio and Campaneris earning Bs and Concepcion an A+. As you know, Win Shares fielding ratings are based on a top-down team evaluation without the benefit of play-by-play data.

Chone Smith's TotalZone method uses Retrosheet batted ball locations to estimate runs saved compared to an average fielder. Smith's RAA results for the four almost flip James' rankings on their head, with Aparicio estimated to have saved 149 runs (adding in the double play component) above an average shortstop, Vizquel 144, Campaneris 62, and Concepcion 50.

TotalZone certainly deems Vizquel to have been an outstanding fielder, but not to such an extent that it elevates him significantly above this pack. And thus, in Chone's WAR figures, Vizquel ranks comfortably head of Concepcion but behind Aparicio and Campaneris. Even with a more generous evaluation of Vizquel's prowess in the field, it's difficult to argue that he was clearly superior to the other members of this group.

In Baseball Prospectus’ FRAA, Vizquel ranks as the lowest of the four with just 14 (Campaneris 31, Concepcion 56, Aparicio 109). This contributes to Vizquel ranking 20 WARP behind Campaneris, 17 behind Aparicio, and 6 behind Concepcion.

That leaves the matter of his reputation as a fielder, which is considerable, and often in mainstream discussions is assumed to be nearly on par with Ozzie Smith. You can decide for yourself how much weight to place on non-statistical evaluation of fielding. The only strong suggestion I'd make is not to place too much emphasis on the evaluation of any one particular individual--including my own take on Vizquel which follows, which I offer not because I think it's particularly insightful but because it's my blog.

I watched Omar Vizquel play shortstop more than any other shortstop--I find it hard to imagine that I'll ever watch anyone more in the future. Most of this occurred during my formative years as a baseball fan, so it is quite possible that it has forever tinged my perception of shortstop fielding--that I am unable to fairly evaluate other shortstops I watch because my expectations for a major league shortstop have been set largely by watching Omar Vizquel.

In any event, I don't have much of a pre-Vizquel frame of reference to offer. I can't say that I was never impressed by Omar Vizquel because he didn't look as good to me as Ozzie did, or as Campaneris did. What I can tell you is that, in watching other shortstops after watching Vizquel, I have never felt that I was watching pale imitations of the master.

Vizquel certainly stands among the best shortstops I've seen myself, possibly even the best. But he does not tower over them. I do think that Vizquel did things with as much flair as anyone I've watched--the barehand grab-and-throws, the back turned to the plate catches of outfield pops. But when you rely on observation, flair can be misleading. It's flair that gets you on SportsCenter, not workman-like consistency. It's the flashy play that gets burned in most people's memories, not the routine play or even the tough play made in the hole with a strong plant-and-throw.

None of this is to say that Omar Vizquel was not an outstanding fielder and a fine all-around player. He was, but so were Aparicio, Campaneris, Concepcion, and others who don't form as natural of a comparison group. His reputation seems to place him in a higher class, at least at this time. These assessments seem to be based on a very rosy assessment of his fielding prowess, a lack of recognition of the high-offense era in which he played, or a combination thereof.

TECHNICAL NOTE

I did not make any allowance for players in DH leagues, which is part of the reason you'll see discrepancies between the WAA/WAR figures for players listed here and those from Pete Palmer, Chone, and other sabermetricians. Vizquel spent the majority of his career in DH leagues, but Campaneris is the only other player discussed who spent a significant amount of his career in them.

Swapping out pitchers for DHs has the effect of raising the league averages, which are used to set baselines for average and replacement level performance, and this is something that you want to be aware of. However, the increase in scoring which drives runs per win up is a real factor that should not just be adjusted out of existence.

And while Vizquel would benefit from a DH correction, I've implicitly assumed that the position adjustment for shortstops have remained stable from Aparicio's day to Vizquel's. However, shortstop offense reached its all-time low in the 60s and 70s. The matter of positional adjustments is thornier than simply comparing mean positional offense, but there is a distinct possibility that treating the positional adjustment as a constant is hurting the pre-Vizquel members of the group, canceling out much of the DH hit that Omar takes.

Also, I realize that some people are aware of the offensive environment in which Vizquel played, but ignore it because they chalk it up to steroids. Since Vizquel is not suspected of using steroids, we're supposed to believe that he could have transferred his raw offensive performance into a different era and been more valuable relative to the league. That's a nice story.

Monday, February 11, 2013

Returning Starters (and Success?)

2013 will mark Greg Beals’ third season at the helm of the OSU baseball program, and it promises to be an important season for the program. One reason for this is that the expiration date for the use of any excuses regarding the talent level inherited from the previous coaching staff is now gone. This is now Beals’ team, for better or worse, with a large number of key players recruited by him, and ample opportunity to bring in alternatives to any less than productive holdovers from the prior staff. On top of that, OSU will have a fairly large number of returning starters, and is expected by many to be a contender for the Big Ten championship.

The Buckeyes are likely to have a platoon arrangement, as neither senior Greg Solomon nor sophomore Aaron Gretz have hit well enough to seize control of the job. Gretz’ only real offensive positive as a freshman was his 19 walks to 18 strikeouts in just 91 at bats, an area in which he is the opposite of Solomon, whose plate judgment in two years in Scarlet and Gray (10 W/80 K in 311 AB) has been dreadful. Gretz is considered to have one of the best catcher throwing arms in the conference, although Solomon is also fairly solid defensively. Gretz bats left and Solomon right, so there is a natural platoon possibility. They will handle all the catching if healthy - walkon freshman Matt Emge and sophomore utility man Ryan Wonders round out the roster, but will only see the field in an emergency.

First base will be manned by senior Brad Hallberg, who has bounced between the infield corners throughout his OSU career, starting at third in 2012. Hallberg was one of the team’s better hitters last year at 311/414/431 and will once again figure in the middle of the order. Senior second baseman Ryan Cypret will be a three-year starter but is coming off a rough 2012 (236/350/304) after a very effective 2011 (323/400/428) and will be one of the key offensive players for OSU.

Third base is vacant with Hallberg’s move across the diamond, leaving slick fielding sophomore Ryan Leffel as the most likely starter. Leffel was used often in the field during his freshman campaign as a defensive replacement when Josh Dezse left first base for the mound, but only received 28 PA. According to Chris Webb, Leffel missed early practices with a wrist injury, opening the door for freshman Craig Nenning, also considered to be a strong fielder. Third base is thus a position of concern, especially offensively.

Senior Kirby Pellant will return as the shortstop. The most notable aspect of Pellant’s game both offensively and in the field is his speed (31 steals in 38 attempts), but to this observer he is a bit lacking as a pure fielder. Offensively, Pellant is similar to a number of his teammates in that he hits for a decent average while drawing some walks but rarely ever hitting for power (274/375/340). Other possibilities around the infield include freshman first baseman Zach Ratliff, freshman third baseman/OF Jacob Bosiokovic, freshman Troy Kuhn, and the aforementioned Wonders.

In the outfield, all three spots should be manned by at least partial returning starters. Left field figures to be a battle between senior Joe Ciamacco and junior Mike Carroll. Ciamacco was held back by injuries in 2012, but was a perfect 14-14 staling bases as speed is also his top asset. At the plate, he hit for a decent average (.291) but only a mediocre number of walks (8 in 103 at bats) and little power (.039 ISO). Carroll is a hit-first player who spent a lot of time at DH in 2012, with a better plate approach but similar overall production to Ciamacco (279/368/333). Both bat left, so a platoon is not in the cards.

Junior Tim Wetzel is the leadoff hitter and center fielder--stop me if you’ve heard this before, but he has speed (although a poor two-year base stealing record of 17 for 30), a good eye (63 walks in 390 career at bats), but no power (.049 career ISO). Wetzel has a chance to leave his mark on OSU’s career record books in counting categories as he has already started and played in 104 games and figures to be a four-year starter. Right field will belong to Pat Porter, who played left a year ago and hails from my hometown. Porter’s 266/370/322 line looks all too familiar for this roster, but as a freshman he seemed to improve as the year goes on and will be a key to the Buckeyes’ offensive success in 2013. Outfield reserves include freshman Jake Brobst, Bosiokovich, and freshman Joe Stoll, whose listed on the roster as LHP/OF.

The plan apparently is for junior Josh Dezse to serve as the DH rather than first baseman. Dezse, who has also doubled as the team’s closer, is considered one of the best pro prospects in the Big Ten, but has been something of a tease as a very good hitter but not the all-around offensive star OSU has lacked at a corner position since Ronnie Bourquin. Dezse’s career 318/425/447 line belies the fact that he has much greater power potential. He clubbed three homers in one game at Georgia Tech last year, but that represents one-third of his career total. Still, a healthy Dezse is needed to anchor the Buckeyes lineup. Carroll would seemingly have the first crack at this role if Dezse does not fill it for whatever reason, with Kuhn, Bosiokovich, or Ratliff the next line.

While the offensive starters are fairly well-established, roles on the mound are up for grabs. One that is not is the #1 starter, which will go to junior Jaron Long, a first-team all-Big Ten pick in 2012. Long is a soft-tossing right-hander who relies on his command (13 walks in 101 innings) rather than stuff (63 strikeouts). While watching him work may give the perception of doing it with mirrors, his .329 BABIP was little different than the team average of .331. Still, Long should not be counted on for a repeat performance, but should eat innings and feast on undisciplined lineups.

Behind Long, things are considerably murkier. There was a thought that Dezse could be used in a starting role, but his back issues make that considerably less likely. Senior lefty Brian King stayed in OSU’s weekend rotation throughout 2012 and thus is a good bet to be back. It was King, not Long, who was expected to be the key JUCO pitcher added to the staff, but was only an average performer. The third starter could be senior righty Brett McKinney, who started 2012 as the #1 but eventually lost weekend starting assignments before being pressed back into action at the end of the season.

Three others stand out as possible rotation options: freshman right-hander Jacob Post was apparently impressive in fall camp, and as a new option may prove to be enticing. Senior right-hander Brad Goldberg has sat out the last two seasons as a transfer and then due to eligibility issues, but is now cleared to pitch and will be a contributor in some manner. Junior right-hander Greg Greve has started 28 games over two seasons, but has never been able to keep a hold on a rotation job, and may now be slotted for the bullpen.

Dezse has been the closer for each of the last two years, but OSU planned to transition him to a starting role for 2013. However, Webb has reported that a back issue has sidelined Dezse throughout the fall and into spring practices, which I presume will leave the feasibility of that plan in jeopardy. Behind Dezse, the top setup man will be senior sidearmer David Fathalikhani. Sophomore Trace Dempsey throws from a slightly higher arm slot, but gives OSU a double dose of right handers with unorthodox deliveries. Beals loves to play matchup ball, but the last two years have seen him lose his lefties to graduation one at a time--first Theron Minium and now Andrew Armstrong. Candidates to fill this role include sophomore Matt Panek (who also throws from the side) and sophomore JUCO transfer Ryan Riga. Other pitchers on the roster include right-handers Logan Bowles (freshman), Tyler Giannonatti (a JUCO transfer who had to redshirt in 2012 with an injury), Shea Murray (walk-on freshman), Tito Nava (sophomore transfer from Duke) and left-handers Michael Horesjei (walk-on sophomore), Luke McGee (freshman), and Joe Stoll (freshman).

A constant complaint towards the end of Bob Todd’s career was that the non-conference schedule was week. Beals has beefed it up a little, but this year’s slate offers a striking dichotomy: as unambitious of a pre-conference schedule as one could credibly embark upon, but a few high-profile non-conference clashes during the Big Ten season. OSU will open its season the weekend of February 15 with three games in Sarasota against Mercer, Notre Dame, and St. John’s. The following weekend they are in Port Charlotte for two games each with South Dakota St. and Mt. St. Mary’s. The first weekend of March sees the Buckeyes in Florida yet again (Deland) to play UConn, Stetson, and Central Michigan. The following weekend OSU will play at Coastal Carolina, with two against the hosts plus single games with Harvard, Ball St., and Charleston Southern.

The final non-conference weekend is March 15, a three-game home series against Bryant. Big Ten play will go: @ Purdue, Michigan St., @ Minnesota, @ Nebraska, Illinois, Penn St., @ Northwestern, Indiana (this is the same slate as 2012 except with the home series flipped). OSU will not play Iowa or Michigan (and, if completeness is sought, Maryland or Rutgers).

Non-conference opponents weaved in throughout the schedule are the typical local foes in March and April--Toledo, Ohio University, Miami, West Virginia, Marshall, Akron, Cincinnati, Northern Kentucky. Where it gets interesting is May 7, a Tuesday on which OSU will open a two game home series against Georgia Tech. The Buckeyes and Yellow Jackets have met 22 times, the first in 1924, but never before in Columbus. That weekend (on which OSU is idle from Big Ten play), Ohio State will host Oregon for a three-game series, the first ever meeting between the two schools on the diamond. On the following Tuesday, OSU will host Louisville for a single game.

Such a stretch is, as best as I can tell, unprecedented in OSU history--8 days in which the Buckeyes will play six home games against teams ranked #10 (UO), #12 (GT), and #15 (LOU) in Perfect Game’s pre-season Top 25.

Will the Buckeyes be ready to face that gauntlet? Unfortunately, I’m inclined to think they won’t be, as they don’t look like a top-tier Big Ten team to me. One the positive side, the pitching depth is beginning to return to the level one would expect at OSU, with some reserves who appear to be capable of serving as weekend starters. The offense returns all of its returning starters except right fielder David Corna, and the freshman class appears to offer more promise than those of recent seasons. But in my (admittedly anecdote-based) observation, one of the best way for a college team in any sport to be overrated is if they return a lot of starters from a team that was only average in the prior season (with a 33-27 record, 11-13 in the Big Ten, and #94 ISR ranking, the 2012 Bucks fit this bill). Sure, returning starters means fewer question marks--but the degree of improvement anticipated can often be overstated.

And for me, the jury remains out on Beals. In year three, it’s time for his recruits to shine, and there’s far less ability to blame things on the prior staff. As I wrote about last year, Beals’ game management frustrates me to know end, particularly his obsession with a delayed steal of home that would embarrass many junior high coaches. The fact that he continues to try to pull it off, not just against unsuspecting non-conference opponents, but against Big Ten coaches that surely know it is coming, force me to question his judgment and his ego. It’s hard to look at college coaches through the standard, non-nuanced sabermetric lens (bunts bad, intentional walks bad, etc.) because so many would look bad, but Beals is especially grating. What’s really bizarre is that I have actually read OSU fans on the internet rave about how much more enlightened his offensive strategy is than Todd’s, which is a case of people seeing what they want to see.

My best guess (and it’s just that) at the lineup and pitching staff:

UPDATE: Today OSU announced that Josh Dezse will miss at least the first two months of the season with a "stress reaction in his lower back". This is obviously a blow to the OSU lineup and bullpen, so I have revised my projected lineup accordingly:

1. 8 Tim Wetzel (JR)
2. 4 Ryan Cypret (SR)
3. 9 Pat Porter (SM)
4. 3 Brad Hallberg (SR)
5. D Mike Carroll (JR)
6. 7 Joe Ciamacco (SR)
7. 2 Aaron Gretz (SM)
8. 6 Kirby Pellant (SR)
9. 5 Ryan Leffel (SM)

SP #1: R Jaron Long (JR)
SP #2: L Brian King (SR)
SP #3: R Brad Goldberg (SR)
SP #4 (midweek): R Brett McKinney (SR)
SP #5 (midweek): R Jacob Post (FM)

RP: L Ryan Riga (SM)
RP: R Trace Dempsey (SM)
RP: R David Fathalikhani (SR)
CL: R Greg Greve (JR)

Thursday, January 31, 2013

MLB Has Lost Control of the Game

Many years ago, the powers that be in baseball sat down and drew up a set of rules to govern the game. These rules applied to the game on the field in an attempt to ensure that the game was contested in a sporting manner. However, an exclusive analysis performed by Walk Like a Sabermetrician regarding the 2012 MLB season reveals that these rules are routinely violated despite the penalties against them. Some may scoff at this analysis and say that some of these rules have not been broken intentionally. To do so is to coddle the rule breakers. These rules should have shaped the training of players and molded their behavior; instead, the players have failed to conform their physical actions to the sacred code of the game. Baseball is out of control, and it is out in the open for all to see, yet the powers that be have not taken any substantive steps to beef up enforcement. The shocking details:

* Pitchers have long been tasked with the simple job of providing a fair pitch to the batter, one that is within a zone deemed to be conducive to putting the ball in play. Pitchers have pushed the envelope, though, attempting to throw pitches that violate the spirit but not the letter of the rule. During the 2012 regular season, there were 14,709 separate occasions on which a pitcher failed to provide a hittable pitch and was penalized with a walk.

However, two additional details illustrate just how bad this problem has become. The first is that every single pitcher who logged a non-negligible number of innings issued at least one walk. The disregard for this rule and the attempt to deceive batters has infected the entire population of major league pitchers.

The second is that no fewer than 1,055 times did a pitcher intentionally violate this rule and make no attempt to provide a hittable pitch to the batter. This almost always occurred with expressed consent and even on direct order of the manager. Blatantly thumbing their nose at the code of the game, these pitchers and managers engaged in unsporting activity. The penalties simply must be increased to stamp out this behavior.

* Even more shockingly, there were 1,494 instances of a pitcher hitting a batter with a pitch. This act is expressly prohibited by the rules of baseball and the deleterious nature of this action is not limited to simply breaking rules. Hit batters have been linked to numerous cases of injury and even death. In choosing to play baseball, players should not be forced to make any decisions that could have an impact on their health, but batters risk extreme injury every game as they are forced to bat against these recalcitrant pitchers.

It has also become apparent that these violent acts are sometimes intentionally committed, often in a bizarre meld of revenge and tribal grudges that have more in common with gang warfare than gentlemanly sport. MLB has left the penalties for hit batters so toothless that these events continue, risking the health of batters and setting an awful example for the children of America.

* Any excuses regarding physical rather than moral failings go out the window when it comes to the matter of ejections. Umpires are given the power to remove disrespectful and violent offenders from the game. Such an awesome power should never have to be used in a civil game, but MLB’s product is anything but civil. 179 times an umpire had no choice but to remove a participant from the game for bad behavior. Again, the titular authority figures known as managers were frequently involved in these violations.

It is a matter of simple common sense that when rules are violated, it means that the associated penalties are insufficiently strong. This simple truth has been illustrated time and time again throughout human society. Any time draconian penalties are instituted, the associated behavior ceases. Examples include the lack of murders and non-existence of drug use in America, the strict adherence to all bylaws of the NCAA, and, of course, the complete lack of PED use in Olympic sport. MLB needs to learn from these examples and curb the culture of rule-breaking that prospers on the field.

Monday, January 21, 2013

Meanderings

* I don’t have anything of substance to add on the deaths of Earl Weaver and Stan Musial, but since both were favorites of mine I feel compelled to write a little something. Weaver of course is a managerial hero to many in the sabermetric community. He predated my time as a baseball fan by a significant number of years, but he still was an influence on me through the writing of Bill James and Thomas Boswell, his own book Weaver on Strategy, and Earl Weaver Baseball, which in its DOS form was the first baseball game I played (even though I’m sure he wasn’t writing the code).

The paragraph that follows is the kind of unsupported by evidence blurb I try to avoid writing, because in many cases you can get destroyed with a little cursory fact-checking. Weaver is famed for utilizing his roster to its fullest, particular his bench -- finding specialists whose strengths could help a club. The value of the bench has been greatly reduced in today’s game thanks to the roster crunch--the extra spots have gone to pitchers. Some of this is a natural result of the never-ending progression towards lighter pitching workloads, but some of it may be traceable to an attempt to counter the Weaver school. Once the bench was smartly utilized with specialists, it was necessary to have a counter-stocked bullpen. The verdict of the powers that be in baseball has been to prioritize stopping the other manager from gaining an offensive advantage through substitution rather than leaving one’s self with the tools to do so. One could argue that Tony LaRussa is the anti-Weaver in this regard.

Stan Musial was of course a great player, one who has gotten the short end of the stick--among his contemporaries on either bordering generation, there has been more celebration reserved for DiMaggio, Williams, Mantle, Mays, and Aaron among outfielders. He’s always been a favorite of mine, though--in fact, I have two framed baseball pictures hanging on my wall (the selection is more due to happenstance than a design to pick these particular two, but I wouldn’t hang a picture on my wall if I didn’t like the subject. One is of Babe Ruth’s sixtieth home run, and one is a picture of Stan Musial and various related Musial memorabilia (bats, uniform, ball, etc.)

* The various sports stories of last week (thankfully, not baseball-related) were a perfect reminder of why I have so little respect for sportswriters as a class (certainly I judge people as individuals, but it so happens that I have little use for the majority of individual sportswriters).

The best attribute of sportswriters is how ignorant they tend to be. When someone writes invective against sabermetrics, or displays a complete lack of understanding of statistics or economics or probability, it is easy to simply laugh them off. A huge number of sportswriters fall into this toss category.

As an aside, if I didn’t come to the table with a pre-conceived dim view of the world view held by most mainstream journalists (non-sports), it would be difficult for me to believe how ignorant they are. When I read news stories about topics on which I am well-informed, it is rare to go through an article that does not contain an outright falsehood, a statement of surprise at something that is blindingly obvious, or a quote from a clearly biased source that is allowed to pass without noting that bias. And when I see this occur in articles about a topic about which I know more than the journalist, it naturally gives me great pause about what I read about topics on which I am seeking to learn more.

Unlike many people inclined to interest in sabermetrics, I am not at all looking forward to the rapidly approaching day in which any aspiring young mainstream baseball scribe will be fluent in sabermetrics and not prone to dismissing non-traditional viewpoints. While this will have a limited positive effect of reducing the amount of idiocy we are all exposed to, it will make it that much harder to simply ignore a writer with cause.

My biggest problem with sportswriters is not that they are ignorant--it's that they are self-righteous, prone to pop psychology, and often downright nasty to their subjects. The new breed of baseball writer will still display all these traits, but without the casual ignorance of logic when it comes to strategy and evaluation of players. The perfect symbol of this new breed is Jeff Passan. Passan is as smarmy and as prone to being a jackass as your garden variety Murray Chass-era hack. But because Passan incorporates sabermetric statistics and thinking as appropriate, he is much more likely to get a pass for being a jerk than is a sabermetric ignoramus.

* The Armstrong and Te’o stories are also worthwhile as an illustration of how different my interest in sports is from the interest of the fictional public to which the stories are written. I have essentially zero interest in the private lives of athletes. I do not pick which athletes to root for because they seem like nice people, or because they have overcome some tragedy in their personal lives--I pick which athletes to root for because they play(ed) for/support the teams that I do, or because I enjoy watching them play the game.

When sportswriters single out a human interest story, it is their way of telling you who to root for. One could look at the roster of any Division I college football team with its 85 players and find someone who has been through a traumatic experience. Frankly, the notion that the death of a college-aged person’s grandmother would be a trauma worthy of making into a story is laughable. Certainly such an event is a terrible thing for the affected person and family, but it is also a fact of life and something that the majority of people in that age group have experienced. But journalists decided that Te’o was special, and that you should root for him--and they almost gave him an absurd Heisman trophy for it.

More broadly, I don’t care if Player X is a jerk to fans (and I certainly don’t care if he is a jerk to the media). I might care if I had any reason to interact with Player X--but I don’t, and the odds that I will ever interact with him are infinitesimal. Sportswriters are often incapable of realizing that the rest of us aren’t affected or interested by whatever inconveniences or petty issues Player X creates for them.

If I discover that Player X isn’t too friendly to fans who come up to him and ask for his autograph, I feel no reason to change my opinion of him. I know that when people who want something that I have approach me, I do my best to ignore them altogether (obviously no one is asking me for my autograph, but we all deal with panhandlers, charities calling for money, family members who need a favor, and the like). How can I fault Player X for behaving exactly as I would behave? Why would I factor this into the degree to which I like Player X, when the only reason I am even aware of his existence rather than that of another 1/7,000,000,000 of the world’s population is his ability to play baseball?

Of course, this can be easily looped back to the big baseball story of the month, the Hall of Fame voting. While the Hall of Fame was broken beyond repair before steroids made it a complete joke, the principle holds: I want to view baseball players as baseball players, nothing more and nothing less. Whatever else they are outside of that is of equal consequence to me as that of a mailman in Topeka.

Tuesday, January 15, 2013

Crude Team Ratings, 2012

For the last few years I have published a set of team ratings that I call “Crude Team Ratings”. The name was chosen to reflect the nature of the ratings--they have a number of limitations, of which I documented several when I introduced the methodology. Crude or not, this was a banner year in sabermetrics in which to be a purveyor of a team rating system, and I wouldn’t want to miss out on the fun.

The silliness of the Fangraphs power rankings and the eventual decision to modify them (while shifting blame to defensive shifts for odd results rather than the logic of the method) offered an opportunity to consider the nature of such systems. When you think about, team ratings actually may be the most controversial and important objective methods used in sports analysis. As sabermetricians it is easy to overlook this because they don’t play a large role in baseball analysis. But for college sports, rating systems are not just a way to draw up lists of teams--they help determine which teams are invited to compete for the national championship. And while most teams with a chance to win the championship in sports with large tournaments are comfortably in the field by any measure, in college football ranking systems are asked to make distinctions between multiple teams that would be capable of winning the title if permitted to compete for it.

There are any number of possible ways to define a team rating system, but to simply things I will propose two broad questions which should be asked before such a system is devised:

1. Do you wish to rank teams based on their bottom line results (wins and losses), or include other distinguishing factors (underlying performance, generally in terms of runs/runs allowed or predictors thereof)?

I would contend that if you are using team ratings to filter championship contenders, it is inappropriate to consider the nature of wins and losses, only the binary outcomes. If you are attempting to predict how teams will perform in the future, then you’d be a fool not to consider other factors.

2. Do you wish to incorporate information about the strength of the team at any given moment in time, or do you wish to evaluate the team on its entire body of work?

I would contend that for use as a championship filter, the entire body of work should be considered, with no adjustments made for injuries, trades, performance by calendar, etc. If you are using ratings to place bets, then ignoring these factors means that you must consider sports books to be the worthiest of charities.

Obviously my two questions and accompanying answers painted in broad strokes. But defining what you are setting out to measure in excessively broad strokes is always preferable to charging ahead with no underlying logic and no attempt to justify (or even define) that logic. Regardless of how big your website is, how advanced your metrics are, how widely used your underlying metric is for other purposes, how much self-assuredness you make your pronouncements with, or who is writing the blurbs for each team, if you don’t address basic questions of this nature, your final product is going to be an embarrassing mess. Fangraphs learned that the hard way.

For the two basic questions above, CTR offers some flexibility on the first question. It can only use team win ratio as an input, but that win ratio can be estimated. In this post I’ll present four variations--based on actual wins and losses, based on runs/runs allowed, based on game distribution-adjusted runs/runs allowed, and based on runs created/runs created allowed. You could think up other inputs or any number of permutations thereof (such as actual wins/losses regressed 25% or a weighted average of actual and Pythagorean record, etc.). On the second question, CTR has no real choice but to use the team’s entire body of work.

I explained how CTR is figured in the post linked at the top of this article, but in short:

1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.

2) Figure the average win ratio of the team’s opponents.

3) Adjust for strength of schedule, resulting in a new set of ratings.

4) Begin the process again. Repeat until the ratings stabilize.

First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS:



While Washington had MLB’s best record at 98-64, they only rank fifth in CTR. aW% suggests that their 98 wins was equivalent to 95 wins against a perfectly balanced schedule, while the Yankees’ 95 wins was equivalent to 99 wins.

Rather than comment further about teams with CTRs that diverge from their records, we can just look at the average CTR by division and league. Since schedules are largely tied to division, just looking at the division CTRs explains most of the differences. A bonus is that once again they provide an opportunity to take gratuitous shots at the National League:



This is actual a worse performance for the NL than in 2011. Going back to 2009, the NL’s CTR has been 89, 93, 97, 89. The NL Central remained the worst division, dragged down to a dismal rating of 81 by the Astros and Cubs and the lowest divisional rating I’ve encountered in the four years I’ve figured these ratings. This explains why the best teams in the NL Central have the lowest SOS figures. 2012 marks the first time that AL East has not graded out as the top division in those four seasons, although its 121 CTR is higher than for some of the years in which it ranked #1.

This year it may be worth considering how this breakout would look if Houston was counted towards the ratings for the AL (West) as they will in 2013. It helps the NL cause, naturally, but it isn’t enough to explain away the difference in league strength:



I will present the rest of the charts with limited comment. This one is based on R/RA:



This set is based on gEW% as explained in this post. Basically, gEW% uses each team’s independent distributions of runs scored and runs allowed to estimate an overall W% based on the empirical winning percentage for teams scoring x runs in a game in 2012:



The last set is based on PW%--that is, runs created and runs created allowed run through Pythagenpat:



By this measure, two of the top four teams in the game didn’t even make the playoffs, and the third was unceremoniously dumped after one game.

I will now conclude this piece by digressing into some theoretical discussion regarding averaging these ratings. CTR return a result which is expressed as an estimated win ratio, which as I have explained is advantageous because these ratios are Log5-ready, which makes them easy to work with during and after the calculation of the ratings. However, the nature of win ratios makes anything based on arithmetic averages (including the average division and league ratings reported above) non-kosher mathematically.

These distortions are more apparent in the higher standard deviation of W% world (whether due to the nature of the sports or the sample size) of the NFL, so let me use those as an example. A 15-1 team and a 1-15 team obviously average to 8-8, which can be seen by averaging their wins, losses, or winning percentages. However, their respective win ratios of 15 and .07 average to 7.53.

Since the win ratios are intended to be used multiplicatively, the correct way to average in this case is to use the geometric average (*). For the NFL example above, the geometric average of the win ratios is in fact 1.

So here are the divisional ratings for actual wins based CTR using the geometric average rather than the arithmetic average:



The fact that all of the ratings decline is not a surprise; it is a certainty. By definition the geometric average is less than or equal to the arithmetic average. There really is no reason to use the arithmetic average other than laziness, which I have always found to be an unacceptable excuse when committing a clear sabermetric or mathematical faux pas (as opposed to making simplifying assumptions, working with a less-than-perfect model, or focusing on easily available and standardized data), and so going forward any CTR averages I report will be geometric means rather than arithmetic.

(*) The GEOMEAN function in Excel can figure this for you, but it’s really pretty simple. If you have n values and you want to find the geometric mean, take the product of all of these, then take the n-th root. The geometric average of 3, 4, and 5 is the cube root of (3*4*5). The Bill James Power/Speed number is the geometric average of home runs and stolen bases, although I don't think he realized it at the time it was introduced.

Tuesday, January 08, 2013

Run Distribution and W%, 2012

A couple of caveats apply to everything that follows in this post. The first is that there are no park adjustments anywhere. There's obviously a difference between scoring 5 runs at Petco and scoring 5 runs at Coors, but if you're using discrete data there's not much that can be done about it unless you want to use a different distribution for every possible context. Similarly, it's necessary to acknowledge that games do not always consist of nine innings; again, it's tough to do anything about this while maintaining your sanity.

All of the conversions of runs to wins are based only on 2012 data. Ideally, I would use an appropriate distribution for runs per game based on average R/G, but I've taken the lazy way out and used the empirical data for 2012 only.

The first breakout is record in blowouts versus non-blowouts. I define a blowout as a margin of five or more runs. This is not really a satisfactory definition of a blowout, as many five-run games are quite competitive--"blowout” is just a convenient label to use, and expresses the point succinctly. I use these two categories with wide ranges rather than more narrow groupings like one-run games because the frequency and results of one-run games are highly biased by the home field advantage. Drawing the focus back a little allows us to identify close games and not so close games with a margin built in to allow a greater chance of capturing the true nature of the game in question rather than a disguised situational effect.

In 2012, 74.9% of games were non-blowouts (and thus 25.1% were blowouts). Here are the teams sorted by non-blowout record:



Records in blowouts:



This chart is sorted by differential between blowout and non-blowout W% and also displays blowout/non-blowout percentage:



As you can see, the Phillies had the highest percentage of non-blowouts (and also went exactly .500 in both categories) while the Angels had the highest percentage of blowouts. This is the second consecutive season in which Cleveland has had the most extreme W% differential (in either direction). Coincidentally, both pennant winners were better in non-blowouts by the same -.012.

A more interesting way to consider game-level results is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:



The “marg” column shows the marginal W% for each additional run scored. In 2012, the fourth run was both the most marginally valuable and the cutoff point between winning and losing (on average).

I use these figures to calculate a measure I call game Offensive W% (or Defensive W% as the case may be), which was suggested by Bill James in an old Abstract. It is a crude way to use each team’s actual runs per game distribution to estimate what their W% should have been by using the overall empirical W% by runs scored for the majors in the particular season.

Theoretical run per game distribution was a major topic on this blog in 2012, and so I will digress for a moment and talk about what I found. The major takeaway is that a zero-modified negative binomial distribution provides a pretty good model of runs per game (I called my specific implementation of that model Enby so that I didn’t have to write “zero-modified negative binomial” a hundred times, but that’s what it is. This is important to point out so that I don’t 1) give the impression that I created a unique distribution out of thin air and 2) to assure you that said distribution is a real thing that you could read about in a textbook).

However, the Enby distribution is not ready to be used to estimate winning percentages. In order to use Enby, you have to estimate the three parameters of the negative binomial distribution at a given R/G mean. I do this by estimating the variance of runs scored and fudging (there is no direct way to solve for these parameters, at least that is published in math journals that I can make heads or tails of). The estimate of variance is quite crude, although it appears to work fine for modeling the run distribution of a team independently. But as Tango Tiger has shown in his work with the Tango Distribution (which considers the runs per inning distribution), the distribution must be modified when two teams are involved (as is the case when considering W%, as it simultaneously involves the runs scored and allowed distribution). I have not yet been able to apply a similar corrector in Enby, although I have an idea of how to do so which is on my to-do list. Perhaps by the time I look at the 2013 data, I’ll have a theoretical distribution to use. Here are three reasons why theoretical would be superior to empirical for this application:

1. The empirical distribution is subject to sample size fluctuations. In 2012, teams that scored 11 runs won 96.9% of the time while teams that scored 12 runs won 95.9% of the time. Does that mean that scoring 11 runs is preferable to scoring 12 runs? Of course not--it's a small sample size fluke (there were 65 games in which 11 runs were scored and 49 games in which 12 runs were scored). Additionally, the marginal values don’t necessary make sense even when W% increases from one runs scored level to another--for instance, the marginal value of a ninth run is implied to be .030 wins while the marginal value of an tenth run is implied to be .063 wins. (In figuring the gEW% family of measures below, I lumped all games with 11+ runs into one bucket, which smoothes any illogical jumps in the win function, but leaves the inconsistent marginal values unaddressed and fails to make any differentiation between scoring 20 runs and scoring 11).

2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored. (Enby doesn’t allowed for fractional runs either, which makes sense given that runs are indeed discrete, but you can park adjust Enby by park adjusting the baseline).

3. Related to #2 (really its root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce quirks into the data.

Before leaving the topic of the Enby distribution, I was curious to see how it performed in estimating the major league run distribution for 2012. The major league average was 4.324 R/G, which corresponds to Enby distribution parameters of (r = 4.323, B = 1.0116, z = .0594). This graph truncates scoring at 15 runs per game to keep things manageable, and there’s very little probability in the far right tail:



From my (admittedly biased) vantage point, Enby does a fairly credible job of estimating the run scoring distribution. Enby is too low on zero and one run and too high on 2-4 runs, which is fairly common and thus an area for potential improvement to the model.

I will not go into the full details of how gOW%, gDW%, and gEW% (which combines both into one measure of team quality) are calculated here, but full details were provided here. The “use” column here is the coefficient applied to each game to calculate gOW% while the “invuse” is the coefficient used for gDW%. For comparison, I have looked at OW%, DW%, and EW% (Pythagenpat record) for each team; none of these have been adjusted for park to maintain consistency with the g-family of measures which are not park-adjusted.

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate):

Positive: CIN, DET, KC, MIA
Negative: BOS, TEX, ARI, OAK

Last year, the Red Sox gOW% was 6.2 wins lower than their OW%, which is by far the highest I’ve seen since I started tracking this. Boston once again led the majors in this department, but only with a 2.5 win discrepancy. Of course, last year their gOW% was a still-excellent .572, while this year it was down to a near average .507.

As I’ve noted in an earlier post, Cincinnati’s offense was much worse than one would have expected given the names in the lineup and their recent performances. Historically bad leadoff hitters certainly didn’t help, but on the bright side, the Reds distributed their runs as efficiently as any team in MLB. CIN had a .479 OW% (which would be a little lower, .470, if I was park-adjusting), but their .498 gOW% was essentially league average. To see how this came about, the graph below considers Cincinnati’s runs scored distribution, the league average for 2012, and the Enby distribution expectation for a team averaging 4.15 runs per game (CIN actually averaged 4.13). The graph is cutoff at 15 runs; the Reds highest single game total
was 12:



The Reds were shutout much less frequently than an average team (or the expectation for a team with their average R/G), but they gave up much of this advantage by scoring exactly one run more frequently than expected. In total, CIN scored one or fewer runs 16.7% of the time, compared to a ML average of 17.4% and Enby expectation of 17.8%. They were also scored precisely two runs less than expected. Where Cincinnati made hay was in games of moderate runs scored--the Reds exceeded expectations for 3, 4, 5, and 6 runs scored. As you can see if you look at the chart from earlier in the post, the most valuable marginal runs in 2012 were 2-4, for the Reds did a decent job of clustering their runs in the sweet spot where an extra run can have a significant impact on your win expectancy.

From the defensive side, the biggest differences between gDW% and DW% were:

Positive: TEX, CHN, BAL
Negative: MIN, WAS, TB, NYA, CIN

The Reds and the Rangers managed to offset favorable/unfavorable offensive results with the opposite for defense. For the Twins to have the largest negative discrepancy was just cruel, considering that only COL (.386) and CLE (.411) had worse gDW%s than Minnesota’s .418. In gDW%, Minnesota’s .400 was better only than Colorado’s .394, a gap that would be wiped out by any reasonable park adjustment.
gOW% and gDW% are combined via Pythagenpat math into gEW%, which can be compared to a team’s standard Pythagenpat record:

Positive: DET, CHN, KC, NYN, BAL, MIA
Negative: MIN, ARI, OAK, WAS, STL

The table below is sorted by gEW%:

Tuesday, January 01, 2013

Crude NFL Ratings, 2012

Since I have a ranking system for teams and am somewhat interested in the NFL, I don’t see any reason not to take a once a year detour into ranking NFL teams (even if I’d much rather I have something useful to contribute regarding the second best pro sport, thoroughbred racing).

As a brief overview, the ratings are based on win ratio for the season, adjusted over the course of several iterations for opponent’s win ratio. They know nothing about injuries, about where games were played, about the distribution of points from game to game; nothing beyond the win ratio of all of the teams in the league and each team’s opponents. The final result is presented in a format that can be directly plugged into Log5. I call them “Crude Team Ratings” to avoid overselling them, but they tend to match the results from systems that are not undersold fairly decently.

First, I’ll offer ratings based on actual wins and losses, but I would caution against putting too much stock in them given the nature of the NFL. Ratios of win-loss records like 2-14 and 15-1 which pop up in the NFL are not easily handled by the system. In order to ensure that there are no divide by zero errors, I add half a win and half a loss to each team’s record. This is not an attempt at regression, which would require much more than one game of ballast. This year the most extreme records were 2-14 and 13-3, so the system produced fairly reasonable results:



In the table, aW% is an adjusted W% based on CTR. The rank order will be exactly the same, but I prefer the CTR form due to its Log5 compatibility. SOS is the average CTR of a team’s opponents, rk is the CTR tank of each team, and s rk is each team’s SOS rank.

The rankings that I actually use are based on a Pythagorean estimated win ratio from points and points allowed:



Seattle’s #1 ranking was certainly a surprise, but last year Seattle’s 92 CTR ranked 13th in the league, reflecting a little better than their 7-9 record. When I have posted weekly updates on Twitter, I’ve gotten a few comments on the high ranking of the Bears. CTR may like Chicago more than some systems, but comparable systems with comparable inputs also hold them in high regard. Wayne Winston ranks them #5; Andy Dolphin #7; Jeff Sagarin #7; and Football-Reference #6. Chicago ranked sixth in the NFL in P/PA ratio, which is the primary determinant of CTR, and played an above-average schedule (they rank 10th in SOS at 116, which means that their average opponent was roughly as good as the Vikings). The NFC North was the second-strongest division in the league, with Green Bay ranking #6, Minnesota #9, and Detroit #17. They played the AFC South, which didn’t help, although it was marginally better for SOS than playing the West. Their interdivisional NFC foes were Arizona (#24), Carolina (#16), Dallas (#19), Seattle (#1), San Francisco (#3), and St. Louis (#13) which is a pretty strong slate.

Obviously the Bears did not close the season strong, but the system doesn’t know the sequence of games and weights everything equally. Still, their losses came to #1 Seattle, #3 San Francisco, twice to #6 Green Bay, #7 Houston, and #9 Minnesota. I didn’t check thoroughly, but I believe that no other team save Denver was undefeated against the bottom two-thirds of the league (the Broncos’ losses came to #2 New England, #7 Houston, and #8 Atlanta). Even the other top teams had worse losses--for instance, Seattle and New England both lost to #24 Arizona, San Francisco lost to #13 St. Louis, Green Bay and Houston lost to #23 Indianapolis, and Atlanta lost to #20 Tampa Bay.

Last year I figured the CTR for each division and conference as the arithmetic average of the CTRs of each member team, but that approach is flawed. Since the ratings are designed to be used multiplicatively, the geometric average provides a better means of averaging. However, given the properties of the geometric average, the arithmetic average of the geometric averages does not work out to the nice result of 100:



The NFC’s edge here is huge--it implies that the average NFC team should win 64% of the time against an average AFC team. The actual interconference record was 39-25 in favor of the NFC (.609). The NFC’s edge is naturally reflected in the team rankings; 7 of the top 10 teams are from the NFC with 7 of the bottom 8 and 10 of the bottom 12 from the AFC.

This exercise wouldn’t be a lot of fun if I didn’t use it to estimate playoff probabilities. First, though, we need regressed CTRs. This year, I’ve added 12.2 game of .500 to each team’s raw win ratio based on the approach outlined here. That produces this set of ratings, which naturally result in a compression of the range between the top and bottom of the league, and a few teams shuffling positions:



The rank orders differ not because the regression changes the order of the estimated win ratios fed into the system (it doesn’t), but because the magnitude of the strength of schedule adjustment is reduced.

Last year I included tables listing probabilities for each round of the playoffs, but I will limit my presentation here to the first round and the probabilities of advancement. After each round of the playoffs, the CTRs should be updated to reflect the additional data on each team, and thus the extensive tables will be obsolete (although I will share a few nuggets). This updating might not be particularly important for MLB, since a five or seven game series adds little information when we already have a 162 game sample on which to evaluate a team. But for the more limited sample available for the NFL, each new data point helps.

In figuring playoff odds, I assume that having home field advantage increases a team’s CTR by 32.6% (this is equivalent to assuming that the average home W% is .570). Here is what the system thinks about the wildcard round:



The home team is a solid favorite in each game except for Washington, which faces the top-ranked team in the league. Houston is the weakest favorite; the Texans would be estimated to have a 54% chance on a neutral field and 47% at Cincinnati.

The overall estimated probabilities for teams to advance to each round are as follows:



San Francisco, Denver, and New England are all virtually even at 20% to win the Super Bowl. The Patriots are the highest ranked of the three, but San Francisco benefits from the weak NFC and Denver from home field advantage. CTR would naturally pick Seattle to win it all if they weren’t at a seeding disadvantage; however, their probability of winning the Super Bowl given surviving the first round is 14%, greater than Atlanta’s 12%.

The most likely AFC title game is Denver/New England (48% chance), with Denver given a 54% chance to win (it would be 47% on a neutral field and 40% at New England); the least likely AFC title game is Indianpolis/Cincinnati (1% chance). The most likely NFC title game is Atlanta/San Francisco (34%), with a 53% chance of a 49ers road win; the least likely matchup is Washington/Minnesota (2%). The most likely Super Bowl matchup is Denver/San Francisco (14% likelihood and 54% chance of a 49er win); the least likely is Indianapolis/Washington (.1%). The NFC is estimated to have a 51% chance of winning the Super Bowl, lower than one might expect given the NFC’s dominance in the overall rankings. However, the NFC’s best team has to win three games on the road (barring a title game against Minnesota) while the probability of New England or Denver carrying the banner for the AFC is estimated to be 77%.

Of course, all of these probabilities are just estimates based on a fairly crude rating system, and last year the Giants were considered quite unlikely to win the Super Bowl (although I didn’t regress enough in calculating the playoff probabilities last year, resulting in overstating the degree of that unlikelihood).