Wednesday, January 06, 2021

Akousmatikoi Win Estimators, pt. 2: Ratio-Based Simplifications

We will begin our endeavor to simplify/”flatten” the Pythagenpat exponent by looking at approaches that maintain the use of run ratio as the chief independent variable in the W% estimate. Before jumping into that, I should note that we could think of the first flattening as being moving from a variable exponent like Pythagenport/pat to a fixed exponent. However, since the latter came first historically, and is easier to explain conceptually, I didn’t approach it in that manner.

We could also make flattening the Pythagenpat exponent itself the first step. My definition of “flatten” for the sake of this discussion is to replace exponents with multiplication where possible. We could start by trying to convert z = RPG^.282 into a linear formula. I’ve skipped this step because we would still be left with exponents when we go to calculate the winning percentage. While simplifying the equations will generally cost us some theoretical and a tiny bit of empirical accuracy, it will gain us ease of calculation. Replacing RPG^.282 with a linear equation wouldn’t really make the calculation any easier, but more importantly I don’t think it would result in an interesting alternative methodology to estimate W%. It would just result in a very slightly easier to calculate, less accurate Pythagenpat equation.

I previously wrote the general Pythagorean relationship as:

W% = R^x/(R^x + RA^x)

but note that we could equivalently define win ratio (W/L = WR) as:

WR = RR^x where RR = run ratio = R/RA

I will alternate between these two ways of writing the equation depending on whichever is most convenient for what we’re trying to do. In this case, I want to see what happens if we get rid of the exponent. The approach I will take is to replace the current function with a simplified function that produces the same result for a particular point. Of course we cannot replace the function with another that will produce the same results at all points, or even expect to find one that would produce the same results at multiple points. But we will be able to find a function that produces the same result at a given point.

Mathematically, this will the tangent line to the curve at that point. At that point, the tangent line intersects the curve and has the same slope as the curve. We will determine the slope by differentiating the function, and we will then determine the tangent line using the point-slope equation for the line as a starting point (to me, this is the most intuitive way to write the equation of a line, and if necessary we can simplify later). The point-slope equation of a line is:

y – y1 = m(x – x1)

where x1 and y1 and the x and y values for the point in question, and m is the slope of the curve at x1.

I’m going to switch to referring to the Pythagorean exponent as “a”, so that it doesn’t get confused with x, our independent variable (which is run ratio). So if we want the tangent line for the equation WR = RR^a, we first differentiate with respect to run ratio to get:

dWR/dRR = a*RR^(a – 1)

Now we just need to determine x1 and y1. Since we are going to be applying simplified win estimation formulas across the entire spectrum of possible team performance, it makes the most sense to look at a team with R = RA, that we expect to have a .500 W%. Picking the average will likely result in the most accurate simplified equation over the entire spectrum of teams.

Of course, by simplifying the equation, we will lose accuracy (at least when the result of our simplified equation is compared to the “parent” equation – we hope in this case that the Pythagorean form is more accurate or else the entire premise of Akousmatikoi win estimators is moot). However, the simplified equation will match the parent equation precisely at chosen point, and will produce very similar results near the chosen point, so picking a point in the center of the distribution should maximize accuracy.

So, if R = RR, then RR equals one, and so our slope is simply equal to a, which is the Pythagorean exponent. Our x value is RR, which is 1, and our y is the WR corresponding to a RR of 1, which is 1 for any value of a as WR = RR^a. So in point slope form:

y – 1 = a*(x – 1)

which can simplify to

y – 1 = a*x – a

y = a*x – a + 1

Remembering what y and x represent in this case:

WR = a*RR – a + 1

For a fixed Pythagorean exponent a = 2:

WR = 2*RR – 2 + 1 = 2*RR - 1

This relationship suggests that if a team scores 10% more runs than it allows, it should win 20% more games than it loses. In the 1984 Baseball Abstract, Bill James wrote:

Another method that I have never tested but which I suspect would work as well as the others would be just to “double the edge”; that is, if a team scores 10% more runs than their opponents, they should win 20% more games than their opponents. If they score 1% more runs, they should win 2% more games. That method would probably work as well or better than the Pythagorean approach.

To my knowledge that’s the extent of James’ writings on this subject, so I can’t say whether he either explicitly or implicitly inferred “double the edge” from the Pythagorean formula, or whether he came across it some other way. Either way, it can be directly related back to his own Pythagorean method.

If WR = a*RR – a + 1, and we already know that by definition W% = WR/(WR + 1), then we can convert this into a W% estimate as:

W% = (a*RR – a + 1)/(a*RR – a + 1 + 1) = (a*RR – a + 1)/(a*RR – a + 2)

For the special case of a = 2, this becomes:

W% = (2*RR – 2 + 1)/(2*RR – 2 + 2) = (2*RR – 1)/(2*RR) = 1 – 1/(2*RR) = 1 – 1/(2*R/RA)

= 1 - RA/(2*R)

This special case was noted by Bill Kross, and got a brief callout in The Hidden Game of Baseball. Kross also noticed that this method would not produce the same result for teams that had inverse runs and runs allowed. A team that scores 5 and allows 4 runs would have an estimated W% of 1 - 4/(2*5) = .600, but a team that scores 4 and allows 5 would have an estimated W% of 1 – 5/(2*4) = .375.

So Kross proposed that that for the case in which runs scored < run allowed, the W% would be estimated as R/(2*RA), which would produce 4/(2*5) = .400 for the the team scoring 4/allowing 5. Not only is it satisfying to get a consistent result for the two sides of the same coin, this modification significantly improves the accuracy when comparing empirically comparing estimated to actual W%s.

Expressing this inversion in terms of the general case above, in a case where R < RA, the estimated WR would be:

WR = 1/(a*1/RR – a + 1) = 1/(a/RR – a + 1)

and the W% would be:

W% = 1/(a/RR – a + 1)/(1/(a/RR – a + 1) + 1)

There are some ways to make that look nicer, but I don’t think any of them are sufficiently nice to bother with here. For the specific case when a = 2, Ralph Caola has suggested this formula as a clean way to boil the Kross equations down to one line:

W% = (R - RA)/(R + RA + ABS(R - RA)) + .5

You might be reading this and objecting “I thought you were going to simplify the Pythagorean relationship, but nothing about the equation with all of those reciprocals above looks simpler”. That is true – other than the special case when a = 2 and the Kross equations apply, this is not an easier way to calculate an estimated winning percentage provided you have a modern calculator or computer. However, it is “simpler” mathematically in the sense that we have eliminated exponents. Of course, in so doing we have lost some accuracy, particular for extreme cases. Next time, instead of starting with run ratio, we’ll start with run differential and see what shakes out of Pythagorean and how it compares to methods that have been developed independently of Pythagorean.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.