Walk Like a Sabermetrician: Crude Team Ratings, 2018

For the last several years I have published a set of team ratings that I call "Crude Team Ratings". The name was chosen to reflect the nature of the ratings--they have a number of limitations, of which I documented several when I introduced the methodology.

I explain how CTR is figured in the linked post, but in short:

1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.

2) Figure the average win ratio of the team’s opponents.

3) Adjust for strength of schedule, resulting in a new set of ratings.

4) Begin the process again. Repeat until the ratings stabilize.

The resulting rating, CTR, is an adjusted win/loss ratio rescaled so that the majors’ arithmetic average is 100. The ratings can be used to directly estimate W% against a given opponent (without home field advantage for either side); a team with a CTR of 120 should win 60% of games against a team with a CTR of 80 (120/(120 + 80)).

First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS:

The playoff teams all finished in the top twelve, with the third-place teams from the top-heavy AL East and West being denied spots in the dance despite having the fifth/sixth most impressive records in the majors (and damn the .3 CTR that separated us from #6org). The AL also had four of the bottom five teams; the bifurcated nature of the AL is something that was well observed and noted from the standings but also is evident when adjusting for strength of schedule. Note the hellish schedule faced by bad AL teams; Baltimore, with the worst CTR in MLB, had the toughest SOS at 118 - an average opponent at the level of the Cubs. Those Cubs had the easiest schedule, playing an average opponent roughly equivalent to the Pirates.

Next are the division averages. Originally I gave the arithmetic average CTR for each division, but that’s mathematically wrong--you can’t average ratios like that. Then I switched to geometric averages, but really what I should have done all along is just give the arithmetic average aW% for each division/league. aW% converts CTR back to an “equivalent” W-L record, such that the average across the major leagues will be .50000:

The AL once again was markedly superior to the NL; despite the sorry showing of the Central, the West was almost as good as it was bad, and the East was strong as well. Given the last fifteen years of AL dominance, you may have glossed over the last sentence, but if you are familiar with the results of 2018 interleague play, it may give you pause. The NL went 158-142 against the AL, so how does the average AL team rank ahead? It may be counter-intuitive, but one can easily argue that the NL should have performed better than it did. The NL’s best division got the benefit of matching up with the AL’s worst division (the Centrals). The AL Central went 38-62 (.380), but the East went 54-46 (.540) and the West 50-50 (.500).

Of course, the CTRs can also use theoretical win ratios as a basis, and so the next three tables will be presented without much comment. The first uses gEW%, which is a measure I calculate that looks at each team’s runs scored distribution and runs allowed distribution separately to calculate an expected winning percentage given average runs allowed or runs scored, and then uses Pythagorean logic to combine the two and produce a single estimated W% based on the empirical run distribution:

The next version utilizes EW%, which is to say standard Pythagenpat based on actual runs scored and allowed:

And one based on PW%, which is Pythagenpat but using runs created and runs created allowed in place of actual runs totals:

Everything that I’ve shared so far has been based on regular season data only. Of course, ten teams play additional games for keeps, and we could consider these in ratings as well. Some points to consider when it comes to incorporating postseason data:

1. It would be silly to pretend that these additional games don’t give us any insight on team quality. Of course every opportunity we have to observe a team in a competitive situation increases our sample size and informs our best estimate of that team’s quality.

2. Postseason games are played under a different set of conditions and constraints than regular season games, particularly when it comes to how pitchers are used. This is not a sufficient reason to ignore the data, in my opinion.

3. A bigger problem, and one that causes me to present ratings that include postseason performance only half-hardheartedly, is the bias introduced to the ratings by the playoff structure. The nature of postseason series serves to exaggerate the difference in team performance observed during the series. Take the Astros/Indians ALCS. The Astros dominated the Indians over three games, which certainly provides additional evidence vis-a-vis the relative strength of the two teams. Based on regular season performance, Houston looked like a superior club (175 to 113 CTR, implying that they should win 61% of their games against the Indians), and the sweep provided additional evidence. However, because the series terminated after three games, all won by the Astros, it overstated the difference. If the series was played out to completion (assuming you can imagine a non-farcial way in which this could be done), we would expect to see Cleveland pull out a win, and even adding 1-4 and 4-1 to these two team’s ratings would decrease the CTR gap between the two (although still correctly increasing it compared to considering only the regular season).

This is one of those concepts that seems very clear to me when I think about it, and yet is extremely difficult to communicate in a coherent manner, so let me simply assert that I think bias is present when the number of observations is dependent on the outcome of the previous observations (like in a playoff series) that is not present when the number of observations is independent of the outcome of previous observations (as is the case for a regular season in which all teams play 162 games regardless of whether they are mathematically eliminated in August).

Still, I think it’s worth looking at CTRs including the post-season; I only will present these for actual wins and losses, but of course if you were so inclined you could base them on estimated winning percentages as well:

Here is a comparison of CTR including postseason (pCTR) to the regular season-only version, and the difference between the two:

I’ve been figuring CTRs since 2011 and playoff-inclusive versions since 2013, so Boston’s rating in both stood out when I saw it. I thought it might be interesting to look at the leader in each category each season. The 2018 Red Sox are the highest-rated team of the past eight seasons by a large margin (of course, such ratings do nothing to estimate any kind of underlying differences in MLB-wide quality between seasons):

I didn’t realize that the 2017 Dodgers were the previous leaders; I would have guessed it was the 2016 Cubs, although they would be much farther down the list. It is also worth noting that this year’s Astros would have been ranked #1 of the period were it not for the Red Sox. Boston’s sixteen point improvement including the playoffs was easily the best for any team that had been ranked #1, and that does make sense intuitively: 3-1 over the #3 ranked (regular season) Yankees, 4-1 over #2 ranked Houston, and 4-1 over the #9 Dodgers is one impressive playoff showing.

1 comment:

UnknownDecember 18, 2018 at 3:01 PM
I could see there was a problem with Bill James' "log5" formula ever since I was a teenager. Suppose a .600 team plays a .400 team. His formula says the better team should have a winning ratio of
(.6*.6)/(.4*.4) = 36/16 win ratio. That sounds plausible, but for a 2 team league, one team .600 and the other .400, the calculated result SHOULD be .6/.4

Obviously there should be an adjustment depending on number of opponents. Prior to inter division and inter league play, the obvious correction was the square root of James' formula for a 2 team league.

A close approximation was (winning percentage -.500)*(n-1/n) + .500

so for a 3 team league with
team A 6-4
team B 5-5
team C 4-6

a good approximation of A would be .500 + 2/3(100) = .566
team B would stay at .500
team C would have a rating of .509 +2/3(-100) = .433

Check : A beats B (.567*.5)/(.433*.5) = 567 wins to 433 losses
A beats C (.567^2)/(.433^2) = 633 wins to 367 losses

This gives .600 as the overall average for A( it would have been slightly off if not for rounding error), just as expected.

This adjustment to Bill James' log 5 formula worked great for 8 team and 10 team leagues with no division or interleague play, but I'm not sure about what adjustments to make for inerleague, interdivision play.

I reserve the right to reject any comment for any reason.

Walk Like a Sabermetrician

Tuesday, December 18, 2018

Crude Team Ratings, 2018

1 comment:

Me, Elsewhere

Analysis Links

Reference Links

Blog Archive

OSU Baseball

End of Season Statistics

Win Shares Walkthrough

NL 1876-1881 Series

Labels

About Me