## Saturday, December 03, 2005

### Rate Stat Series, pt. 2

Before we proceed with examining options for individual offensive rate stats, we need to establish some theoretical parameters explaining how teams generate Plate Appearances. Every team begins the game with 27 guaranteed Plate Appearances. By avoiding outs, they generate additional PAs throughout the course of the game. We can calculate with RC/PA how many runs a team should score for each PA. If we understand how many PA they will have in a game, we can calculate how many runs they should score per game.

Of course, we already know how many runs a team should score per game, because they have 27 outs and we can just multiply their R/O by the number of outs/game(O/G). But if we know the number of PA the team will have, we can calculate this equivalent number through R/PA.

Our first step is to calculate what I call the Not Out Average(NOA). It is very similar to OBA, except that it subtracts from time on base all outs that are accounted for in the official statistics that come on the bases(i.e. CS and DP). These events eliminate the out avoidance affect of hits, walks, etc. which are credited in the OBA numerator. So, depending on how much detail you are using from the statistics, the formula for NOA is:
NOA = (H + W + HB - CS - DP)/(AB + W + HB + SH + SF)

From NOA, we can easily calculate PA per Game as PA/G = (O/G)/(1-NOA). So a team in a league where the average is 25.5 outs/game(this is about what the average is if you consider just batting outs and CS) and a .330 NOA will get 25.5/(1-.330) = 38.06 PA/G.

Another more intuitive way to think about this is to consider how many extra PAs are generated by each guaranteed PA that the team begins the game with. The probability that any given PA generates another PA is the NOA. A .33 NOA means a 33% of avoiding an out for each guaranteed PA. The probability that 2 PAs are generate equals NOA plus NOA times NOA…ie, the probability that the first batter avoids an out times the probability the second batter avoids an out. The probability of a third is NOA times NOA times NOA. Writing this in a formula:
Extra PA/PA = NOA + NOA*NOA + NOA*NOA^2 + NOA*NOA^3 + … + NOA*NOA^n
Where n is infinity, because theoretically, an infinite number of PA could be generated. We can factor NOA out of the above equation to get:
Extra PA/PA = NOA(1 + NOA + NOA^2 + NOA^3 + … + NOA^n)
The sum of 1 + x + x^2 + … + x^n, where n = infinity and 0 is less then x which is less then 1(as is the case for NOA) is 1/(1-x). So our final equation is:
Extra PA/PA = NOA/(1-NOA)
To convert this to PA/G, we know that O/G number of PA are guaranteed, and that each of those guaranteed PAs will generate (O/G)*(Extra PA/PA) more PA. So:
PA/G = O/G + (O/G)*(Extra PA/PA)
Or:
PA/G = O/G*(1 + Extra PA/PA)
The denominator of Extra PA/PA is 1-NOA. 1 can be rewritten as (1-NOA)/(1-NOA), making the part in parentheses equal to ((1-NOA) + NOA)/(1-NOA), which simplifies to 1/(1-NOA), making the whole thing equal to O/G*1/(1-NOA) = (O/G)/(1-NOA). A lot of math gyrations to show why this formula works, rather then just stating that it does.

Now we want to see how a team’s number of PA will change when we add a player to the team. We will assume that this guy gets a certain fixed percentage of the team PA which we call PA%. Then the new team NOA is:
NewTmNOA = PA%*NOA + (1 - PA%)*TmNOA
So suppose we add a player with a .400 NOA to the .330 team and give him 1/9 of the team PAs. Before adding this guy, the team had 38.06 PA/G(figured above). With him, their NOA will be .338, and their PA/G will be 38.51.

We can also approach this problem through the Extra PA/PA approach. Let me abbreviate Extra PA/PA from here on out as EPA. If we add a player with a given NOA to a team, his NOA will generate the first PA, then the subsequent PAs will be generated by the NewTmNOA rate. So his EPA in this context(which I’ll abbreviate as EPAp) is:
EPAp = NOA + NOA*NewTmNOA + NOA*NewTmNOA^2 + NOA*NewTmNOA^3 + ... + NOA*NewTmNOA^n = NOA(1 + NewTmNOA + NewTmNOA^2 + NewTmNOA^3 + … + NewTmNOA^n) = NOA/(1 - NewTmNOA)
Likewise, the rest of the team(EPAr) will generate the first PA at TmNOA, and the subsequent PAs at NewTmNOA:
EPAr = TmNOA + TmNOA*NewTmNOA + NOA*NewTmNOA^2 + NOA*NewTmNOA^3 + ... + TmNOA*NewTmNOA^n = TmNOA(1 + NewTmNOA + NewTmNOA^2 + NewTmNOA^3 + ... + NewTmNOA^n) = TmNOA/(1 - NewTmNOA)
Then the NewTmEPA = PA%*EPAp + (1 - PA%)*EPAr
This is mathematically equivalent to the approach based directly on NewTmNOA.

We can estimate how many runs a team will score with an added player by calculating the team R/PA and the team’s new PA/G:
NewTmR/PA = PA%*(R/PA) + (1 - PA%)*(TmR/PA)
NewTmR/G = NewTmR/PA*(O/G)/(1 - NewTmNOA)

We get into some problems here, because as covered in other articles on this site and in many other sources, adding a player to a team changes the linear weight values, and so the linear weight RC calculations for both the team and player that we put into that formula will be different then what they produce in the new context. But these differences are often small enough that we can use this procedure to understand some basic concepts, if not have technical precision.

This problem is somewhat similar to another problem that we will encounter in discussing rate stats, which is how the runs created figures for players are calculated. We can divide RC methods into three classes:
1) Team methods(RC, BsR)
2) Linear methods
3) Theoretical team methods based on RC and BsR
Class 3 methods figure a team’s RC with the player minus the team’s RC without the player to give an estimate of runs added by the player. Anyway, class 1 methods treat players as if they are their own team. So using R/O may be justified with them. Class 2 and class 3 methods look at the player as a member of a team, and so R/O is an inappropriate choice, at least in theory. But class 3 methods take into account the effect that the player has on the team LW values, while class 2 methods do not.

This also relates to methods which convert individual run value to win value. In Jim Furtado’s Extrapolated Wins, he uses XR, a linear method, to denote individual run creation. But given that the player in XW is operating within a team, using a class 3 methods might be more appropriate. The moral of the story is that when choosing a rate stat, it is important to consider how the RC estimate is generated as well. The proper rate stat may differ depending on what class of RC method you are using. Mixing contexts may give “good” results, but not have the highest theoretical accuracy.

1. Patriot, I didn't follow you on what I think is the key point of your article. Why is R/O theoretically inappropriate for class 2 and 3 estimators?

2. What I was trying to convey is that the class 1 estimators, like RC and BsR, are designed to estimate team runs, incorporating interactive effects. Therefore, if you apply these directly to a player, you
are asking the question "how many runs would he score if he was a team". And for teams, R/O is clearly the correct rate stat.

In the class 2 and 3, you are trying to measure the change this guy has on his team's runs scored, and therefore your rate stat should reflect how he affects team runs scored. R/O is answering the team question.

Of course, the distortion involved here is pretty small. I use R/O on my website to find RAA or RAR, and so does just about everybody else, and I don't think it's really a problem, especially for real players. Even the Bondes,
Ruths, and Williamses, as far as I know, don't really cause any problems. But in the interest of a rate that is fully applicable to any possible hitter, I don't think R/O is technically correct with LW or TT RC, etc. But using it does not really screw things up, as far as I can tell. The rest "rate stat
series" is pretty low on practical value.

3. So RC/O overvalues OBA, and RC/PA undervalues OBA.

So why not combine them in some fashion? I remember that when I did this it turned out that equal weighting worked best. So we have outs plus PAs, which is (AB-H)+(AB+BB), or simply 2*AB+BB-H. To convert to "offensive games", find the empirical divisor to make the conversion, which turns out to be around 62.5.

So RC/G = RC/(2*AB+BB-H)/62.5