Saturday, January 21, 2006

Runs Per Win

Recently I have started reading the 2006 Hardball Times Baseball Annual. I will do a book review at some time in the future but for now it will suffice to say that you should probably get this book. Anyway, for now I just have some comments on a technical issue that was brought up by reading Dan Fox’s article “Are You Feeling Lucky?” Mr. Fox also has an excellent blog, Dan Agonistes (linked on side of page) in addition to his writing for the Hardball Times.

Anyway, the article examines team’s runs scored and allowed versus their BsR estimates, and runs scored and allowed versus W% by using Pythagenpat. There is a typo in the Pythpat formula--they have it as RPG^2.85, when it should be RPG^.285. But obviously the formula was applied correctly in the article, and it’s just a production mistake. There is also an error in the Indians’ and Mariners’ runs allowed that leads to a faulty conclusion about who “should have” won the AL Central (which this Indians fan just happened to notice). The Indians were not actually “lucky”--in fact, Bill James’ analysis in his Handbook, based on RC and RC Allowed, shows that the Indians were the best team in baseball, and easily the “unluckiest” or “least efficient”. Anyway, Dan told me that he will have an updated version of the article on their website, and will fix that minor snafu.

The main point here is not to criticize the article, because it’s a fine article, but to mention that there is a simple way to use the Pythagenpat relationship to estimate Runs Per Win. What Fox does is take, say, a seven game margin above Pythpat expectation, and multiply this by a RPW factor to give an equivalent number of runs. This is not technically precise, since RPW is a linear concept and Pythpat is not, but of course the linear approximation works very well and so this does not really present a problem in the analysis. Fox uses Palmer’s RPW = 10*sqrt(RPG/9). This formula is fine, but I would just like to point out there is a similar formula that comes directly from Pythpat. David Smyth, in the past, has published a formula that gives the RPW for any team, from Pyth:
RPW = 2*(R-RA)*(R^x + RA^x)/(R^x - RA^x)
Where R and RA are per game, and x is the exponent. You can check and verify that this formula works. However, at R = RA, it is undefined because the denominator will be zero. And this is a shame, because it is the point where R = RA that we would want to examine in order to conclude that in a context with an RPG of X, RPW is Y.

However, if we differentiate PW% with respect to RD, we will find a formula that gives the correct result at the R = RA point. This formula is:
RPW = ((2*RPG*(RR^x + 1)^2*(.5 - RD/(2*RPG))^2)/(x*RR^(x-1))

That’s confusing as heck, but remember, we want to evaluate it at R = RA. So RR = 1 and RD = 0. One raised to any power is one, so we can simplify to:
RPW = ((2*RPG*(1 + 1)^2*(.5 – 0/(2*RPG))^2)/(x)
= ((2*RPG*(2)^2*(.5)^2)/x
= (2*RPG)/x
And what is x? We’ve set x equal to RPG to some power. Various people use different values--I originally published it as .29, David Smyth originally published it as .287, Davenport and Fox used .285, Tango Tiger found that .28 would probably provide the best combination of accuracy with extreme and regular teams. I’ll continue using x here just so that it is applicable to any of these choices.

Since x = RPG^z, we have this equation for RPW:
RPW = (2*RPG)/RPG^z
And this can be rewritten as:
RPW = 2*RPG^(1 - z)
So this is somewhere around 2*RPG^.72. So at 9.18 RPG, the 2005 average value, Palmer gives 10*sqrt(9.18/9) = 10.10 and Pythpat gives 2*9.18^.72 = 9.87. In case you are curious how these work with some real teams, with 1984-2003 teams, Palmer’s formula gives a RMSE of 3.938 and the one presented here gives 3.895. So you do not have to sacrifice accuracy with the run-of-the-mill teams.

The known point discovered by Smyth, that at RPG = 1, x must equal 1, also by definition states that RPW must equal 2 when RPG = 1. If you have a team that scores 100 runs and allows 62 runs, they will go 100-62. Their RD is 38, and 38/2 = 19. 19 is your estimate of wins above .500, and .500 is 81 wins, so 81+19 = 100. So the RPW must be two when the RPG is one. The Pythpat-based formula of course returns this result. The Palmer RPW gives 3.33.

As a final note, one thing I cannot quite figure out from Fox’s article is whether he is using Pythpat and Palmer to find an overall value for the league, and then using that value for each team, or whether he is using the specific value for each team. The second approach would again be more precise, but the first is an alright assumption for simplicity’s sake.

4 comments:

  1. Thanks for the mention, and yes I have made the corrections we emailed about at http://www.hardballtimes.com/main/article/second-look-at-luck/.

    As to your last question, if I understand you right, I'm using the 10.10 value from Palmer and then applying that value to each team. You're right, that the other would be more accurrate.

    Thanks for remidning me of the simplified formula. I had seen it somewhere before but forgot all about it.

    ReplyDelete
  2. Being a subject close to my heart, I found “Runs per Win” very interesting. I have the following comments and questions:

    1. I also enjoyed Dan Fox’s article. However, I thought his use of PythagenPat was overkill. I think PythagenPat always overcomplicates calculations at a team level. A Pythagorean calculation using an exponent of 2 or 1.88 is plenty accurate enough.
    2. I also thought Fox’s use of Palmer’s “square root” formula for RPW was fine, but, being biased, I would have used the formula I derived (By The Numbers, November 2003). It is RPW=2*RPG/x, where x is the Pythagorean exponent. When x=2, RPW is simply RPG.
    3. Can you provide the reference for David Smyth’s formula for RPW?
    4. You mention that Smyth’s formula is undefined when R=Ra and I understand why you think “it’s a shame”. But, when R=Ra, W% is theoretically 0.500, and wins above 0.500 are theoretically zero, also. So, RPW will be 0/0, and, therefore, also undefined (unless you can apply L’Hopital’s Rule).
    5. A little bit of algebra on Smyth’s formula at x=2, yields RPW = 2*(R^2+Ra^2)/(R+Ra). With this formula, you never have the problem of division by zero.
    6. You wrote “However, if we differentiate PW% with respect to RD … we find … RPW = ((2*RPG*(RR^x + 1)^2*(.5 - RD/(2*RPG))^2)/(x*RR^(x-1))” Is PW% Pythagorean winning percentage? If so, what formula did you differentiate to get this? Maybe it doesn’t matter because when I did it, it also distilled down to RPW=2*RPW/x.

    Ralph Caola

    ReplyDelete
  3. 1. I don't disagree that it is ok to use 2, but I also don't see any reason why not to use a better estimate if you are inclined to do so.
    3. It was posted on a FanHome thread sometime in the past, but I don't think it is one that is currently on the board.
    4. I didn't think of it that way, that's a good point.
    6. Yes, I was using PW% to abbreviate Pythagorean W%. I differentiated Run Ratio with respect to Run Differential(dRR/dRD) and then PW% with respect to to RR(dPW%/dRR). I multiplied those to get dPW%/dRD, which is RPW. It's good to know that we got the same result.

    ReplyDelete
  4. Correction:

    In the last line of my previous comment I wrote:

    RPW=2*RPW/x

    It should be:

    RPW=2*RPG/x.

    ReplyDelete

I reserve the right to reject any comment for any reason.