Wednesday, January 20, 2021

Akousmatikoi Win Estimators, pt. 3: Differential-Based Simplifications

Simplifying the Pythagorean estimate by focusing on run differential is not as intuitive as using run ratio, since of course Pythagorean constructs are based on the latter rather than the former. The upfront calculus is messier, the relationships harder to explain – I’ve covered all this before, and so I went back to my previous work rather than go through the hassle of re-deriving it. However, while the calculus is messier, the end result is simpler, and give you relationships that you might actually choose to use in place of the full Pythagorean treatment if you want something quick and simple to punch into a calculator.

The easiest way I’ve found to demonstrate this approach (which is not to say that a simpler derivation doesn’t exist) is to use the following definitions. To make this easier to follow, I’m going to define R as R/G and RA as RA/G:

RR = R/RA

RD = R - RA

RPG = R + RA

Given these relationships, we can relate run ratio and run differential using RPG:

RR = (RD + RPG)/(RPG – RD)

If you need a proof of that, replace RD and RPG with the equations above and you will see that:

RR = (R – RA + R + RA)/(R + RA – (R – RA)) = (2*R)/(2*RA) = R/RA

In the last installment, we differentiated Pythagorean win ratio with respect to run ratio; here, I want to differentiate Pythagorean winning % with respect to run ratio, which will look slightly messier. Starting from the Pythagorean relationship:

W% = RR^x/(RR^x + 1)

we differentiate to get:

dW%/dRR = ((RR^x + 1)*(x*RR^(x – 1)) – RR^x*(x*RR^(x – 1)))/(RR^x + 1)^2

= (x*RR^(x – 1))*((RR^x + 1) – RR^x)/(RR^x + 1)^2

dW%/dRR = x*RR^(x – 1)/(RR^x + 1)^2

That’s well and good, but it doesn’t tell us anything about the relationship between Pythagorean W% and run differential. To bridge that gap, we can differentiate run ratio with respect to run differential and multiplying this result with dW%/dRR which we just derived:

(dW%/dRR)*(dRR/dRD) = dW%/dRD

Since we know that RR = (RD + RPG)/(RPG – RD), we get:

dRR/dRD = ((RPG – RD)*1 – (RD + RPG)*(-1))/(RPG – RD)^2

= 2*RPG/(RPG – RD)^2

If you slogged through any of my previous treatments of this topic, I must apologize – I missed some simplifications of both of these formulas before. The final math worked out the same, but it was needlessly difficult to follow. In any event, we now have:

dW%/dRD = (x*RR^(x – 1)/(RR^x + 1)^2) * (2*RPG/(RPG – RD)^2)

= 2*RPG*x*RR^(x -1)/((RR^x + 1)^2*(RPG – RD)^2)

This ends up being expressed in terms of marginal wins per margin run. The classic sabermetric presentation is marginal runs per margin win (Runs Per Win, ala the rule of thumb that 10 runs = 1 win). So we can take the reciprocal to get this formula for Runs Per Win from Pythagorean:

Pythagorean RPW = (RR^x + 1)^2*(RPG – RD)^2/(2*RPG*x*RR^(x - 1))

Before moving forward, one thing I should note is that this function does not allow us to match the Pythagenpat W% at a given point for a set of inputs. For example, if you plug in 5 runs scored and 4 runs allowed, you will get a dW%/dRD of .1071. You might then reasonably assume that if you take the team’s run differential of 1 times .1071 plus a y-intercept (which by definition would be .5 since Pythagorean will estimate a .500 W% when R = RA), you will get a restatement of the team’s Pythagorean W%. But in fact you will get .6071, while Pythagorean would estimate 5^2/(5^2 + 4^2) = .6098. The differences will be more extreme if you put in more extreme teams.

Alas, I do not have a simple mathematical explanation for why this is the case. However, I will note that we don’t need calculus to calculate the actual Runs Per Win value from Pythagorean for any given set of R, RA, and x that we input. We can simply calculate this by noting that:

W% = RD/RPW + .5

Plugging in Pythagorean relationships and solving for RPW:

R^x/(R^x + RA^x) = RD/RPW + .5

R^x/(R^x + RA^x) - .5 = RD/RPW

RPW = (R – RA)/(R^x/(R^x + RA^x) - .5)

For our 5 R/4 RA team, this results in (5 – 4)/(5^2/(5^2 + 4^2) - .5) = 9.1111 RPW or .1098 wins/run, which of course is the right answer. In terms of simplifying the Pythagorean relationship, though, this is useless – all we’ve done is rearrange terms to calculate runs per win for a given set of inputs. How we could use this to produce a flatter win estimator is to eliminate the use of a team’s R and RA figures and instead replace with a function that only considers the scoring level (i.e. RPG).

This is what the rule of thumb that 10 runs = 1 win does, substituting a general rule for specifics about the team’s actual location on a run/win curve with respect to the marginal value of an additional run scored or allowed. As such, since it’s establishing a rule that will be applied to all teams, it makes sense to center it at the point which will be closest to an average team – at the point where R = RA.

In other words, we will be developing a RPW equation that can be applied generally, but will be defined based on the relationship at the point where R = RA for a given RPG. Using our formula above for RPW based on rearrangement of terms in the Pythagenpat relationship, we can substitute R = RA wherever we see one of those terms and...reduce the equation to 0/0, as the denominator R – RA equals 0 when R = RA, and the numerator R^x/(R^x + RA^x) - .5 = 0 when R = RA.

However, this is where the equation for RPW derived using calculus can step in, and tell us what the theoretical RPW value is at that point. Recall from above that:

RPW = (RR^x + 1)^2*(RPG – RD)^2/(2*RPG*x*RR^(x – 1))

If we assume that R = RA, then RR = 1 and RD = 0, and this simplifies nicely to:

RPW = (1^x + 1)^2*(RPG – 0)^2/(2*RPG*x*1^(x – 1))

= 2^2*RPG^2/(2*RPG*x) = 4*RPG^2/(2*RPG*x) = 2*RPG/x

The first immediate implication is that for our special Pythagorean case where x = 2, RPW = RPG. Since the general case is:

W% = (R/G – RA/G)/RPW + .5

RPW = RPG is equivalent to saying that (after all of the game denominators cancel out):

W% = (R – RA)/(R + RA) + .5

What if x is a constant other than 2, like the value of x = 1.847 that minimizes RMSE for expansion-era major league teams? Then RPW = 2*RPG/1.847 = 1.083*RPG, and we could say that:

W% = (R/G – RA/G)/(1.083*(R/G + RA/G)) + .5

= (1/1.083)*(R/G – RA/G)/(R/G + RA/G) + .5

= .923*(R – RA)/(R + RA) + .5

More generally:

W% = (x/2)*(R – RA)/(R + RA) + .5

This form is one that was proposed by Ben Vollmayr-Lee as .91*(R – RA)/(R + RA) + .5 (I’ve rewritten his formula to match the format I’m using), which would imply a Pythagorean x = 1.82. I would suggest that the Kross equations and the Vollmayr-Lee equation are the ultimate in terms of simplified win estimators from the Akousmatikoi family (again, Kross and Vollmayr-Lee did not start from Pythagorean as we have; by including these estimators in the Akousmatikoi family, I only mean to suggest that they are mathematically related to Pythagorean, not that their creators didn’t independently discover them).

Remember that for the expansion era, the average RPG is 8.83, which would imply that the long-term RPW value is approximately 1.083*8.83 = 9.56; close enough to ten that you can see why we might have a rule of thumb, although ten runs would imply a 4.5% higher scoring context (10/1.083 = 9.23) than observed in the expansion era.

We could also use a hybrid approach, in which we allow each team’s RPW according to the formula that applies when R = RA to vary based on their RPG, but not on how that RPG breaks down into runs scored and allowed. In order to do this, we’d return to RPW = 2*RPG/x, but instead of setting x equal to a constant, use a custom value for x. Of course, my suggested value would be the Pythagenpat estimate of x, namely:

x = RPG^z, where z = .282 for now (value that minimizes RMSE for the expansion era)

Substituting this equation for x, we find a general case for a variable z that:

RPW = 2*RPG/(RPG^z) = 2*RPG^(1 – z)

Or for the specific case that z = .282:

RPW = 2*RPG^.718

We could further flatten this equation by approximating it with a linear function. Recall from the last section that we can write a tangent line in the form:

y – y1 = m(x – x1) where x1 and y1 and the x and y values for the point in question, and m is the slope of the curve at x1.

To apply this approach to develop a linear approximation of the above equation, we first need the slope of the RPW function 2*RPG^(1 – z). Differentiating with respect to RPG yields 2*(1 – z)*RPG^(-z).

Let’s center this at the point corresponding to our expansion-era averages, so x = 1.847 (For the eagle-eyed readers or those checking my math (always welcomed!) I’m choosing to use the value that minimizes RMSE to be consistent with earlier applications rather than the value of 1.848 that corresponds to 8.83 RPG using the equation directly). In this case x1 will be 8.83 RPG, and y1 = 2*8.83^.718 = 9.555 At 8.83 RPG, m will be 2*(1 - .282)*8.83^(-.282) = .777, so we have:

RPW – 9.555 = .777*(RPG – 8.83)

which simplifies to:

RPW = .777*RPG + 2.694

We’ve now developed two RPW estimates, using only RPG as a dependent variable, one with a y-intercept and one without, by trying to flatten the Pythagorean relationships wherever possible. Which is more accurate? One would assume that it’s the version with y-intercept, but even if it is, how much more accurate for normal teams, and how does this tangent line based approach compare with the best fit for an equation of the form RPW = m*RPG + b? Those are questions we’ll explore in the final installment.

References

Ben Vollmayr-Lee’s article on win estimation formulas:

http://www.eg.bucknell.edu/~bvollmay/baseball/pythagoras.html

Ralph Caola published multiple articles on using differentiation with the Pythagorean formula, as well as an (to the best of my knowledge) unpublished article he shared with me on double the edge.

His articles can be found in the 11/2003, 2/2004, and 5/2004 issues of By the Numbers.

https://sabr.org/research/statistical-analysis-research-committee-newsletters/

Kevin D. Dayaratna and Steven J. Miller explored the relationship that RPW = 2*RPG/x in the 5/2012 issue of BTN. I had known and used that one for a long time, thanks originally to a post by David Glass on rec.sport.baseball. Unfortunately a quick search did not yield a live link to Glass’ post.

2 comments:

  1. Interesting. The one I proposed to Fangraphs is 1.5*RPG+3, where RPG is for ONE team.

    Patriot has derived for BOTH teams:
    RPW = .777*RPG + 2.694
    Which if we do it for one team is:
    RPW = 1.554*RPG + 2.694

    So a slightly higher multiplier for Patriot, and a slightly higher intercept for me, to compensate.

    I really love round numbers, but rounding 1.554 to either 1.5 or 1.6 really is a big jump! And that would cause the intercept to change greatly as well.

    Good stuff!

    ReplyDelete
  2. Thanks. I remember your formula and think I may have referenced it previously when covering this topic, although it slipped my mind this time. I didn't try a best fit with actual team data, but this formula has a pretty good RMSE and tracks Pythagenpat for normal teams as well as any other estimator. And hard to beat for simplicity as everyone can follow y = mx + b

    ReplyDelete

I reserve the right to reject any comment for any reason.