## Monday, January 31, 2011

### Crude Team Ratings

This methodology is really unnecessary, and very possibly has a flaw somewhere inside that I was unable to spot. However, one night in September the Rays/Yankees game I was watching went into a rain delay, and not wanting to do some real work I used this time to fiddle around with a rating system for teams that incorporated strength of schedule. Never one to simply let an idle experiment rest in peace without milking a couple of blogposts out of it, I am compelled to describe the Crude Team Rating (CTR) here.

There is nothing novel about the idea--similar ratings which are likely based on better theory are published by Baseball Prospectus, Beyond the Box Score, Andy Dolphin, Baseball-Reference, and others. The concept is simple; the execution in this case is likely muddled.

Let me offer an example of how this works with a hypothetical four-team league composed of the Alphas, Bravos, Charlies, and Ekos. The 90 game schedule is not balanced; the Alphas/Bravos and Charlies/Ekos are in "divisions" and play each other 40 times, playing their cross-divisional foes 25 times each. The Alphas go 25-15 against the Bravos and 17-8 against the Charlies and Ekos; the Bravos go 14-11 against the Charlies and Ekos; the Charlies go 14-6 against the Ekos. We wind up with these standings:

The Bravos and Charlies are equal in the standings, but we have every reason to believe that the Bravos are a better team--they won their season series against both teams from the other division, and had to play forty games against the Alphas, who are clearly the league's dominant team. Obviously strength of schedule worked against the Bravos.

Let's start by looking at each team's win ratio (W/L, which of course is also W%/(1 - W%)):

These do not average to 1, because of the nature of working with ratios rather than percentages (the average W% is .500 in this league--don't worry, I made sure my standings added up). The average win ratio for the league will always be greater than one, unless all of the teams are .500. What we can do now is calculate the average win ratio for each team's opponents. The Alphas' opponents had an average win ratio of (40*.915 + 25*.915 + 25*.636)/90 = .838. This is the initial strength of schedule, S1.

Next, we can adjust each team's win ratio for their strength of schedule. First, though, I need to point out a flaw that is inherent in the CTR approach--it does not recognize the degree to which a team's SOS is affected by their quality. The Alphas will take a hit on this count, because all of the teams they play against have poor records. Of course, based on what we know, the Alphas actually play the second-toughest schedule in the league because of the forty games with the Bravos. To the algorithm used here, they have a weak schedule because all the teams they play have losing records. That is true, but the reason the Bravos have a losing record is because they play the Alphas so often.

I could attempt to adjust for this in some way, but I've chosen to let it slide--that's one of many reasons why these are self-proclaimed crude ratings. Of course, the effect is much more pronounced in this hypothetical when teams are playing 44% of their games against the same opponent, and is much less pronounced in a 30 team league in which the most frequent opponents play only 12% of the time.

Once we have the initial strength of schedule S1, we need to find avg(S1)--the average S1 for all teams in the league. In this example it is 1.092. To adjust each team's W0 for SOS, we multiply by the ratio of S1 to avg(S1). This gives us the first-iteration adjusted win ration, which I'll call W1:

W1 = W0*(S1/Avg(S1))

For the Alphas: W1 = 1.903*.838/1.092 = 1.459

Now that we have an adjusted win ratio, we can re-estimate SOS in the same manner as before, producing S2. This is necessary because we now have a better estimate of the quality of each team, and that knowledge should be reflected in the SOS estimate. S2 for the Alphas is .916.

In order to find W2, we need to compare S2 to S1. We can't simply apply S2 to W1, because W1 already includes an adjustment for SOS. S2 supercedes S1; it doesn't act upon it as a multiplier or addition. We also can't apply S2 to W0, the initial win ratio, because the schedule adjustment of S2 is based on each team's W1. At each step in the process, the previous iteration has to be seen as the new starting point, and the adjustment has to be a comparison between the new iteration and the one that directly preceded it.

So W2 is figured in the same manner as W1, except the ratio of S2 to the average is compared to the same for S1:

W2 = W1*(S2/Avg(S2))/(S1/Avg(S1))

For the Alphas, W2 = 1.459*(.916/1.029)/(.838/1.092) = 1.692

Then we use W2 to figure S3, and use S3 to figure W3, and the process continues through as many iterations as you feel like setting up in Excel (at least for me). For the purpose of this example, I did nine; for the actual spreadsheet for ML teams, I went a little overboard and did thirty iterations. That results in a "final" estimate of win ratio, W9. It is not of course truly final as that would be W(infinity). The "final" estimate of SOS is S10, so that the last estimate of SOS is based on the last estimate of team quality.

One undesirable effect of the iterative process is that the average W9 is no longer equal to the actual average win ratio for the league, and the distribution is not the same. Thus, when the adjusted win ratios are converted into winning percentages (W% = WR/(1 + WR)), they are not guaranteed to average to .500, which of course is a logical must.

In order to convert W9 into an adjusted winning percentage, first figure an initial W% for each team:

iW% = W9/(1 + W9)

Dividing by the average of iW% will force the new adjusted W% to average to .500 for the league:

aW% = iW%/Avg(iW%)*.5

The results for the theoretical league are illustrative of the strength of schedule problem I touched on earlier--the Alphas' aW% is lower than their actual W%, as they did not have to play against themselves. It really doesn't make sense to consider what a team's record would be if they played themselves, but thankfully this distortion becomes more trivial as the number of teams in a league increases.

aW% might seem like a logical measure to use as the final outcome of the system, but I actually prefer a scale which remains grounded in win ratio. So I convert W9 into the final product, Crude Team Rating, by dividing W9 by the average W9. This is different from aW%, which forces the average to equal the initial average win ratio for the league; this adjustment forces the average to 1 (or 100 when the decimal point is dropped), which is a nice property to have for the overall rating:

CTR = W9/Avg(W9)

In the same manner, a final strength of schedule metric that is centered at 1 is:

SOS = S10/Avg(S10)

The CTR and SOS for the four hypothetical teams are:

This post is already a mess to read, so at this point I'm going to break it up into sections on particular aspects of the methodology that I wish to expound upon:

Why is win ratio used throughout the process rather than W%?

I use win ratio because it is much easier to work with; for instance, the ratios between the various SOS estimates can be multiplied by the win ratio to find an adjusted win ratio. The math would not be anywhere near as straightforward if W% were used. Using win ratio also allows for a larger spread in the final CTRs; I could use CTR = aW%/Avg(aW%), but the range would be narrower.

Most importantly, though, is that win ratios can be plugged directly into Odds Ratio (equivalent to Log5) to estimate W% for matchups between teams. If a team with a win ratio of 1.2 plays a team with a win ratio of .9, they can be expected to win 57.1% of the time--1.2/(1.2 + .9). There would be equivalent but messier math if working with W%.

Due to this property, we can use CTR to estimate head-to-head winning percentages. CTR is not a true win ratio, since it has been re-centered at 100, but the re-centering is done with a common scalar and so it has no effect on ratios of team CTRs. So a team with a CTR of 130 can be expected to win 52% of the time against a team with a 120 CTR--130/(130 + 120).

I estimated W% for the Alphas against each of their opponents using CTR. The fact that they are close to the actual results should not be taken as any type of indication that the head-to-head estimates are accurate, as I obviously came up with the numbers to follow logically. They are offered just to show how the ratings can be used to estimate W% in a head-to-head matchup:

Your ranking system is novel and unique, right?

Wrong. There's nothing new about it. It is really quite similar to the Simple Ranking System used by the Sports-Reference sites, except it operates on ratios rather than differentials. As mentioned above, there are a number of similar and likely more refined approaches utilized by other analysts.

It's really quite simple:

1. Assign each team an initial ranking
2. Use those initial rankings to estimate SOS for each team
3. Compare SOS to the average schedule and adjust initial ranking accordingly
4. Repeat until rankings stabilize

No, I haven't, either in the process of estimating team strength (I don't breakdown a team's schedule into home and road games) or by producing adjustments for CTR at home and on the road (i.e. having a formula that tells you that a 130 overall CTR team has an equivalent 110 CTR on the road and 150 at home). The former falls outside the scope of an admittedly crude rating system; the latter is something that is easy enough to account for on the fly.

While there is a lot that could be said about incorporating home field advantage (writing about some of it is on my to-do list), the simplest thing to do is to incorporate a home field edge into the odds ratio calculation. The long-term major league average is for the home team to win 54% of the time, which is a win ratio of 1.174. I call the square root of that win ratio "h" (1.083). If a team is away, divide their CTR by h; if they are home, multiply it by h.

Suppose that we have a 125 CTR team hosting a 110 CTR team. In lieu of home field advantage, we'd expect the home team to win 125/(125 + 110) = 53.2% of the time. But the 125 team is now an effective 135.4 team (125*1.083), and the 110 team is now an effective 101.6 team (110/1.083), and so the expected outcome is now a home win 57.1% of the time.

Equivalently, one can figure the odds ratio probability by first dividing the two CTRs (125/110 = 1.136), and dividing by that ratio plus one (1.136/2.136 = 53.2%). If you approach it in this manner, you need to multiply by the ratio by h^2 (1.174) rather than h (125/110*1.174 = 1.334 and 1.334/2.334 = 57.1%). I prefer the former method, because it produces a distinct new rating for each team based on whether they are home or away rather than accounting for it all in one step, but it is a matter of preference and has no computational impact.

CTR is based on actual win ratio. Why don't you use expected win ratio from runs and runs allowed or runs created and runs created allowed?

One can very easily substitute expected win ratio for actual win ratio. I have two variations, eCTR, gCTR, and pCTR. eCTR uses win ratio estimated from actual runs scored and allowed, while gCTR uses "Game EW%" (based on runs scored and allowed distribution taken independently in this post) and pCTR uses win ratio estimated from runs created and runs created allowed.

The discussion of what inputs to use helps to illustrate another flaw in the methodology--there is no regression built in to the system. For the purpose of ranking a team's actual W-L results, there is no need for regression, but if one is using the system to estimate a team's W% against an opponent, it is incorrect to assume that a team's sample W% is equal to the true probability of them winning a game. Even if one did not want to regress the W/L ratio of the team being rated, it would make sense to regress the records of their opponents in figuring strength of schedule. I've done neither.

Are there any other potential applications of CTR?

One can always think up different ways to use a method like this (and since methods similar in spirit but superior in execution to CTR already exist many of them have been implemented already); the question is whether the results are robust enough to provide value in a given function. I'll offer one possible use here, which is using a team's strength of schedule adjustment to estimate an adjustment factor for their players' performance.

There are some perfectly good reasons why one would not want to adjust individual performance for the strength of his opponents, but if that is something in which you are interested, CTR might be useful. If a team's opponents have an expected win ratio of .9, then based on the Pythagorean formula with an exponent of two, their equivalent run ratio should be sqrt(.9) = .949. Custom exponents could be used as well, of course, but two will suffice for this example.

So a pitcher on a team with a .9 SOS could have his ERA adjusted by dividing by .949 to account for the weaker quality of opposition. This approach assumes that the team's opponents are evenly balanced between offense and defense. One could put together a CTR-like system that broke down runs scored and allowed separately, but that would require the use of park factors or home/road breakdowns, and would greatly complicate matters.

Speaking of park factors, one could use an iterative approach to calculate park factors (Pete Palmer's PF method takes this path). Instead of simply comparing a team's home and road RPG as I do, you could look at the team's road games in each park, calculate an initial adjustment, and iterate until the final park factors stabilized. At some point I’ll cover this application in detail.

Wait, there's something messed up with the 100 scale. A .500 team will not get a 100 CTR.

You're right, they won't, but it's not a problem with the scale--it's a feature of it. (You are free to prefer a different scale, of course). This issue is hinted at in the section about why win ratio is used, but I didn't address it explicitly there. The scale is designed so that a team with an average win ratio gets a 100 rating, not so that an average team (which would by definition play .500) gets a 100 rating.

Others have pointed out the potential dangers in working with ratios rather than percentages--assuming that ratios work as percentages can result in mathematical blunders. Suppose we have two football teams that comprise a league, one which goes 1-15 and the other that goes 15-1. Obviously, the average record for a team in this league is 8-8 with a winning percentage of .500. Such a team would have a win ratio of 1.

But what is the average win ratio for a team in this league? Not the win ratio for a hypothetical average team--the arithmetic mean of the win ratios of the teams in this league. It is (15/1 + 1/15)/2 = 7.533. It's not even close to 1.

Obviously this is an extreme example, but the principle holds--teams that are an equal distance from .500 will not see their win ratios balance to 1. The average win ratio for a real league will always be >= 1. The effect is stronger in leagues in which win ratios deviate more from 1. In the 2009 majors, for instance, the average win ratio was 1.039. In the 2009 NFL, it was 1.386.

I could have set up the ratings so that a team with a .500 record (i.e. 1 win ratio) was assured of receiving a ranking of 100 by simply not dividing what is called W(f) below by Avg[W(f)], but it also would have ensured that the average of all of the team ratings would not be 100. In the first spreadsheet I put together, that's exactly what I did, but I decided that it was more annoying to have to remember what the league average of the ratings was (especially when looking at aggregate rankings for divisions and leagues) than it was to remember that a .500 team would have a ranking of 96 or so. It's purely a matter of aesthetics and personal preference.

Generic Formulas

W0 = W/L for aCTR; (gEW%/(1 - gEW%) for gCTR; (R/RA)^x for eCTR; (RC/RCA)^y for pCTR
where x = ((R + RA)/G)^z and y = ((RC + RCA)/G)^z, where z is the Pythagenpat exponent (I use .29 out of habit)

for a league of t teams, where G is total games played for a team and g(i) is the number of games against a particular opponent:

S1 = (1/G)*{SUM(i = 0 to t)[g(i)*W0(i)]}

W1 = W0*(S1/Avg(S1))

S(n) for n > 1 = (1/G)*{SUM(i = 0 to t)[g(i)*W(n-1)(i)]

W(n) for n > 1 = W(n-1)*(S(n)/Avg(S(n))/(S(n-1)/Avg(S(n-1)))

For final win iteration (f) in the specific implementation (f = 30 in my spreadsheet):

S(f + 1) = (1/G)*{SUM(i = 0 to t)[g(i)*W(f)(i)]

iW% = W(f)/[1 + W(f)]

aW% = iW%/Avg(iW%)*.5

CTR = W(f)/Avg(W(f))

SOS = S(f + 1)/Avg(S(f + 1))