Friday, March 10, 2006

Rate Stat Series, pt. 6

We have seen previously that R/PA does not properly take into account the effect of avoiding outs or creating more PA, and that R/O overstates the importance of avoiding outs for an individual by treating an individual as a complete lineup. With the two most intuitive candidates for a proper individual rate stat disqualified, where do we turn next?

It seems only natural that some sabermetricians decided to look back at where they started, with R/PA, and try to make adjustments to it to correct the problem that it has. The first published work that I personally saw that took this approach was done by the poster “Sibelius” in 2000 on FanHome. Sibelius saw the problems in both R/PA and R/O, but was frustrated that others did not share his concerns with R/O. So he published his own method which was based on a modification of R/PA to include the effect of extra PA.

His approach began with the truism that each additional out a player avoided would save the team average runs/out from being lost. So by simply comparing the out rate of the player to the out rate of the team, and multiplying by the number of PA for the player, you have a measure of how many outs he has avoided. Then each of these is valued at the team runs/out figure:
Runs Saved = (NOA - TmNOA)*PA*TmR/O

In Part 3 of the series, I used a hypothetical team with a .330 NOA, .12 R/PA, and .179 R/O. Suppose we had a player on this team with a .400 NOA in 550 PA. He would make (.4-.33)*550 = 38.5 less outs then an average hitter, and these would be worth 38.5*.179 = 6.89 runs.

To incorporate these into a rate stat, Sibelius simply added them to the basic Runs Created figure, and divided by Plate Appearances. So this stat is just R/PA PLUS the effect of avoiding outs. And so I will call it R+/PA.

Incidentally, I independently developed this approach shortly after Sibelius posted it. Independently is probably a bit of a stretch because I had read his work and agreed with his ideas--I just did not realize that the specific approach I developed was mathematically equivalent to his. My approach was to calculate the number of extra PA the player had generated (through a technique like that described in Part 2 of this series) rather then the number of outs he was avoided, and then to value each extra PA at the team R/PA. But as Sibelius pointed out to me, this produced identical results to his more simple approach.

So how does this do with the hypothetical players we have looked at before? In Part 1, we found that R/O rated a player who, when added to an otherwise average team, would score 5.046 R/G ahead of a player whose team would score 5.523 R/G. That first player draws 200 walks and makes 100 outs, while the second player hits 150 homers and makes 350 outs. They are added to a team with a .330 OBA, .12 R/PA, and .179 R/O as above.

In this case, Player A has a .667 OBA, and will save (.667-.33)*300*.179 = 18.08 runs, while Player B with his .300 OBA will save (.3-.33)*500*.179 = -2.69 runs. Player A had 54 RC to begin with, so he has 72.08 R+, or .240 R+/PA. Player B had 184 RC to being with, for 181.31 R+, or .363 R+/PA. This is the “right” decision, as Player B’s team scored more runs. R/O comes to the opposite conclusion, that Player A was more valuable..

I do not want to give the impression that because R+/PA meshes with our logic in this case, it will do so in all cases. Take the case of a batter who draws 499 walks in 500 PA. His team will have an OBA of around .404 and score 6.075 R/G. This player, who I’ll call Player C, has a “+” figure of 59.79 runs, plus 499*(1/3) = 166.33 RC, for a R/PA of .333, R/O of 166.3, and a R+/PA of .452.

Suppose that we have another player, D, who hits 170 home runs and makes 330 outs in 500 PA. At 1.4 runs, we’ll credit him with 238 RC, but his generation of PA is worth just .895 runs. He winds up with .476 R/PA, .721 R/O, and .478 R+/PA. But his team will have an OBA of “only” .331 and we expect them to score about 6.011 R/G.

So Player C is more valuable in this case, but has a lower R+/PA, although admittedly both the R/G and R+/PA differences are fairly small. His R/O, though, is wildly ahead of Player D’s, to an extent that does not at all reflect the impact they have on their team’s scoring. R/PA comes to the “right” decision here, but again, the difference between the two players is way out of proportion with the impact they have on their team’s offense.

From these results, perhaps you will agree with me if I state that R+/PA is a sort of third way between R/PA and R/O, that combines strengths and weaknesses. But I would not claim that it is the “correct” rate stat. We would expect a correct stat to always agree with the result of adding a player to a team, because that is how I defined the term “correct” in part 5.

But then again, the rate stat is just one component of our evaluation of a batter. The other is our value stat, which we have assumed is Runs Above Average for the sake of this discussion. So how do the RAA figures based on R+/PA differ from those based on R/O? RAA based on R/O is, in this case looking at the team as the base entity, (R/O - TmR/O)*O. RAA based on R+/PA is (R+/PA - TmR/PA)*PA. So, based on R/O:
Player A has RAA = (54/100 - .179)*100 = +36.1
Player B has RAA = (184/350 - .179)*350 = +121.35

Based on R+/PA, we have:
Player A: RAA = (.240 - .12)*300 = +36
Player B: RAA = (.363 - .12)*500 = +121.5

As you can see, the figures are nearly identical, for two pretty extreme players! They would be even closer, if not identical, had I not rounded the figures off in the process. So the only difference between rating players on R/O and R+/PA, at least against average, is the form and value that the rate stat takes--the value portions are equivalent.

But if two procedures yield identical values, shouldn’t they yield identical rates as well? The player has been to the plate the same number of times and made the same number of outs whether we calculate his value based on R/O or R+/PA. So why should his rate stat be different?

If you agree with this line of thinking, then you are forced to reach the conclusion that we are using the wrong rate stat. Of course, you could argue that neither R/O or R+/PA forms the proper framework for assessing value. But even if we accept that these frameworks are flawed, we can still accept that within that faulty framework, there is a better way to express the rate stat. This is the road that we will go down in the next installment.


  1. Looking forward to the next installment :-)

  2. Heh.

    Someday, I am going to rewrite this whole series. Someday.


I reserve the right to reject any comment for any reason.