Tuesday, April 23, 2013

Crude WBC Ratings

What follows is an attempt to use the WBC results over the first three tournaments to produce ratings for each team. Obviously there is a lengthy list of caveats that must be included with such a procedure, enough that I’ll begin with a basic disclaimer: these ratings are simply meant to provide some insight on which countries have had the best aggregate performances over the first three tournaments. They are not to be taken seriously as an analytical endeavor. They are not in any way intended to produce accurate ratings of the strength of each team in any given tournament or at any moment in time.

Other issues:

* Due to the very small sample sizes inherent with a tournament like the WBC, I have chosen to aggregate results from all three tournaments. Obviously the makeup of the teams is nowhere near the same between 2006 and 2013, and while we can expect national team strength to be more stable than that of major league franchises, it by no means is a constant. Even if national baseball strength was constant, the quality of the WBC roster would be highly variable. And I don’t think you would get a lot of disagreement if you posited that, say, the Netherlands has a much-improved talent level from the inaugural tournament.

* Even after aggregating three tournaments, the sample sizes are still tiny. Japan leads WBC entrants with 24 all-time games, which represents less than a month of major league play. Thus any inputs will need to be heavily regressed to avoid ridiculous result. I am regressing by adding 69 games of .500 level performance (or some other W% as discussed later) to each team’s record, but this figure is based off the standard deviation of W% between major league teams rather than between WBC teams (given the limited sample of WBC play, I didn't want to make an assumption regarding the standard deviation of W% between WBC teams). Of course, we should expect a higher standard deviation of W% for WBC teams, and the higher the standard deviation, the less regression is necessary. Assuming that the necessary regression for major league play holds thus potentially results in excessive regression.

* Expected W% (that is, based on runs scored and runs allowed) is not as useful of a tool for WBC play as it is for MLB due to the higher frequency of blowout games (particular as observed in earlier tournaments), the mercy rules, and the like. I could have done some sort of capping on run differential from a particular game, but I chose to keep it simple and just use unadulterated runs scored and allowed.

So please take the results with a grain of salt. First, here are the raw records for the three tournaments combined, sorted by W%:

The Dominican sweep in 2013 enables them to slide past Korea for the best overall tournament record, although Japan’s EW% still towers over the other nations. There is a pretty clear top seven that stands out, ending with the 10-10 United States; these seven countries make a nice group as they represent the only seven teams with a .500 or better W%, the only seven teams with a .500 or better EW%, and also pretty clearly represent the top baseball countries in the world (with apologies to Mexico and Canada).

On to the ratings, I’ll start with a set based on regressed actual W%:

As you can see, the regression is very strong, serving to create a range of implied strength that is clearly more narrow than reality. There are also some important disclaimers to be made regarding countries that have only played in one tournament like Brazil and Spain--they rate much better than one would expect due to having only three games of actual experience to feed into the formula. Obviously a ranking that puts the 0-3 Spanish record ahead of the 10-10 United States leaves much to be desired.

We can of course use regressed EW%, but the results are quite similar:

The key problems with these ratings are the sample size itself and over-regression. There’s nothing that can be done about the first issue, but there are two east ways I could seek to address the latter problem. One is to use a different regression weight besides 69 games. This would be appropriate due to the presumed higher variance in team strength between WBC teams compared to that observed in major league play. However, as I touched on earlier, it is difficult to develop a good estimate for WBC variance.

Furthermore, simply changing the amount of regression does nothing to solve the larger problem--I am regressing all teams to .500. For analysis involving major league players and teams, this is a reasonable course of action since we can generally assume that all players are drawn from the same talent pool. In the case of the WBC, it is not an appropriate assumption. There is no reason we should assume that Spain’s underlying true talent is equivalent to that of Puerto Rico given what we know about baseball in these two nations. If we had the ability to observe enough WBC games, we could ignore this and still be alright, since eventually Spain’s .200 W% would overpower the regression weight. While Spain and Puerto Rico would still not be drawn towards the same center, the distortion would be limited.

Given the sample size available here, though, the easiest way to handle this issue is to regress each team to a different level. Unfortunately, this introduces a great deal of subjectivity, as I must make these judgments.

I have used a very simple approach--I have grouped teams into three buckets. The first is a .650 assumed W% and includes the group of seven countries mentioned earlier (Cuba, Dominican Republic, Japan, Korea, Puerto Rico, United States, Venezuela). The composite record of these seven in WBC play is 91-48 (.655).

The second group includes mid-tier teams who will be regressed to .500. This group includes Canada, Italy, Mexico, the Netherlands, and Taiwan. The composite record of these five is 23-38 (.377). The final group is the other countries who will be regressed to .250; this group is Australia, Brazil, China, Panama, South Africa, and Spain. The actual record of this group is 3-31 (.088).

The values I’ve chosen are fairly arbitrary, but I believe the groupings are reasonable and match both WBC performance to date but more importantly our pre-existing knowledge of the baseball strength of each country. You’ll note that the regression complements don’t add to .500; this is not really a problem because the rating system inherently centers the average team to .500. Obviously this approach could be refined, but it should offer a reasonable adjustment, particularly when used with a system for which the “crude” disclaimer is already applied.

I have used the average of W% and EW% as the basis for the ratings--given the issues regarding blowouts and mercy rules in WBC play, I think that considering both is appropriate:

Given the fact that the regression weight still drives much of the adjusted W% used to fuel the ratings, it should come as no surprise that the stratifications imposed above cause the teams to fall out into three clear groups. Among the power seven, the US ranks last as they do in W%, but are essentially even with Cuba despite the latter’s 13-7 WBC record. Within the power seven, the SOS numbers are fairly close with two countries standing out a bit lower--the Dominican Republic and Cuba. It should come as no surprise that Spain has the highest SOS, as their three games have come against #3 DR, #4 PR, and #5 VEN. Among other SOS figures, the Netherlands’ 116 stands out. In WBC play, the Dutch have played a meat grinder of a schedule:

2 games with #1 Japan
1 game with #2 Korea
3 games with #3 Dominican Republic
3 games with #4 Puerto Rico
1 game with #5 Venezuela
3 games with #6 Cuba
1 game with #7 United States
1 game with #10 Taiwan
1 game with #14 Panama
1 game with #17 Australia

Thus, 14 of the Netherlands’ 17 WBC contests have been against the power seven, making their 7-10 record all the more impressive. Were I to bump them up to the .650 group and recalculate the ratings, the Dutch would move to #7 in the ratings, sliding just ahead of the United States. For now I’m more comfortable with them in the .500 group, but if the current wave of Curacao talent continues to develop and more follows, it may be past time to reassess the Netherlands’ place in the global baseball pecking order.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.