Monday, August 04, 2008

Run Estimation Stuff, pt. 6

Last time I wrote about the approach of applying BsR to individuals by using static intrinsic linear weights from some entity. This time, we will look at the differential and theoretical team approaches, each of which allow for some interaction between the player and the team, and cause the weights to be different for each individual player.

As I mentioned last time, these too approaches are equivalent if we use the same team totals for each. The TT formulas in use generally assume that the player gets exactly 1/9 of team PAs. This is not an inevitable choice, though.

Let’s first define the team’s A, B, C, and D factors without the player as T_A, T_B, T_C, and T_D. Then, we can estimate the number of runs the team will score with the player as:

(A + T_A)*(B + T_B)/(B + C + T_B + T_C) + D + T_D

The number of runs the team would score without the player is:

T_A*T_B/(T_B + T_C) + T_D

Thus, DBsR (Differential BsR) is the difference between the two, which simplifies a bit to:

DBsR = (A + T_A)*(B + T_B)/(B + C + T_B + T_C) + D - T_A*T_B/(T_B + T_C)

All a standard theoretical team formula does is assume that the player gets 1/9 of team PAs, and thus defines T_A, T_B, and T_C as some entity’s A/PA, B/PA, and C/PA times (eight times individual PA). I like to call A/PA “ROBA” (Runners On Base Average), B/PA “AF” (Advancement Factor), and C/PA “OA” (Out Average). With PA indicating individual PA, the generalized TT formula is):

TT BsR = (A + LgROBA*8*PA)*(B + LgAF*8*PA)/(B + C + (LgAF + LgOA)*8*PA) + D - LgROBA*LgAF/(LgAF + LgOA)*8*PA

Using the basic BsR equation spelled out in Part 5, the 1961-2005 major league averages are: .301 ROBA, .306 AF, .676 OA, producing this TT equation:

TT BsR = (A + 2.41PA)*(B + 2.45PA)/(B + C + 7.86PA) + D - .75PA

The 2007 AL averages are .308/.331/.666, giving this equation:

TT BsR = (A + 2.46PA)*(B + 2.65PA)/(B + C + 7.97PA) + D - .82PA

Here are each player’s TT BsR figured by each approach; “Long” is the long-term weights, “2007A” is the 2007 AL weights:

Once again, you can see how little difference this makes, and thus why Bill James can get away with using the same TT formula for all of baseball history (incidentally, I did not round off two decimal places in those calculations, so the results may be a little different if you try using the above formulas yourself).

Now let’s apply the differential method, figuring the difference between the player’s team’s BsR with and without them. I am using their actual teams; we could look at each player on any other team, but we would have to do it in TT fashion, or by weighting the player’s PA when figuring the “rest of team” BsR. I realize that is a poorly explained point, but let me try this. Suppose we subtracted ARod’s stats from those of the White Sox, pretending that he was actually a member of that team. Now the team that we’re adding him to is not just the worst offense in the league, they’re the worst offense in the league without the presence of the best hitter in the league--who never actually contributed to that team in the first place.

The point is that if you use the differential method, you need to use it with the entity the player actually belongs to or you need to use a TT approach to scale the “rest of team” stats properly (in the ARod/CHA case, the “rest of team” should hit just as well as they did, just with 600 less PAs (or however many PA ARod actually had)).

So these figures are for the players on each of their actual teams:

Again, you can see that no matter which approach we use, no matter which reasonable team we put the player on, we get very similar final estimates of individual runs created.

Some methods for estimating run contribution (most notably Dick Cramer’s Batter Win Average) have subtracted the player’s stats from those of the league and then found the difference. Cramer did not use an average team, but rather the composite league statistics for all X teams in the league. Here is what our differential figures for each player would look like using that approach:

I do not endorse this approach as a proper method to apply BsR to individuals (recognizing of course that the practical differences between this approach and the ones that I do endorse are minuscule). Runs are not created on a league level; they are created on a team level. If you want to take the player out of the context of a particular or theoretical team, and just get a “global” estimate of the player’s contributions, it is preferable IMO to apply the linear weight values. Pretending that runs are created on the league level and estimating the player’s contribution thusly is wrong in theory, and an inefficient use of time in practice.

So far I’ve shown you the final runs created estimate from each of these approaches. Now I will show the intrinsic weights that led to some of those estimates for Alex Rodriguez. Since many of the differential approaches produce very similar results, there is no need to clutter this up with the weights for each approach. The first column, “BsR”, shows the BsR intrinsic weights for his individual statistics. The second column, “LBsR”, is the linear weights for the 2007 AL as a whole. The third column, “LYanks”, is the linear weight values based on the 2007 Yankees as a whole. The fourth column, “TT Long”, are the intrinsic weights for ARod with the long-term TT formula; “TT 2007” are the intrinsic weights for ARod with the 2007 AL TT formula. The last column, titled “Big Diff”, is the difference between the maximum weight of the event and the minimum weight for the five approaches:

Here we see that while the final estimate stays within a two run range or so, the coefficients that get us there have a little more variation. As expected, the home run and the out are the most stable of the events. You can decide whether these differences are worth it to you to go through a theoretical team procedure, or whether you just want to stick with the linear weights. Personally, while I love the concept of a theoretical team have done as much work with it as anyone other than the aforementioned pioneers of the approach (at least as far as I can tell), I think that it is probably better to stick with the linear weights on a league level for most purposes. In stating that preference, I am not claiming that such an approach is “better” or any such thing. It is just my opinion that the extra effort put into the theoretical team calculation is not justified by the differences in the final estimates.

There is one issue hanging on the periphery of any theoretical team discussion that I would like to acknowledge, although I do not want to go into it here as it really is not so much about run estimation but about moving from run contribution to win contribution. That is the effect that a batter has on his team by creating additional opportunities for his teammates. Traditional runs created estimates, as well as the theoretical team variations presented here, do not account for this. It is pretty easy to tack what David Smyth called PAR (PA ratio) on to TT BsR. Whether you want to or not (and how the PAR approach compares to the old FanHome poster Sibelius’ R+/PA) depends on exactly what you are trying to measure. David Tate’s Marginal Lineup Value was the first method in this vein, and you could also adapt it to use BsR instead of RC.

That’s a topic for another day, but keep in mind that the TT BsR estimate as discussed here for any player measures his direct contribution only.


  1. Patriot,

    How do you incorporate the PAR adjustment into Baseruns when you figure the Baseruns estimates for the league with the player included, then figure the Baseruns estimate without the player, and take the difference between the two? On your website, you incorporated PAR into a Theoretical Team. I was wondering if you knew how to do this using the method of subtracting the player's batting line from the league batting line.

    How did you get the margins wider on your blog?

  2. PAR is certainly a little tricky to apply differentially, and doesn't fit nicely into the actual league figures. I have never actually sat down and figured out exactly how I would go about adjusting for it, since I never actually use the differential method. I'll think about it and see if I can come up with a good answer, as opposed to my default answer which would be to multiply the player's differential BsR by PAR.

    You can change the margins by clicking on "Customize" at the top of your blog (if you are logged in). You can edit the template and fiddle with it. I am by no means a HTML expert, but I did some trial and error and got this. If you want, I can post/email the settings that I used.

  3. I was just curious how you would incorporate PAR into the differential method. Don't put a lot of effort into thinking about how to do this. Your default answer sounds like the easiest thing to do.

    I viewed the HTML of your blog, and was able to expand the margins. Thanks!

  4. Patriot,

    Does this TT equation you posted on Tango's blog yield the same results as the differential method:

    TT BsR = (A + a*PA)(B + b*PA)/(B + C + (b+c)*PA) + D - (a*b/(b+c))*PA

  5. No, because that one redefines the rest of the team as having 8*PA, rather than the actual value. If, in place of "a*PA", "b*PA", etc. you used the actual rest of team, then it is the same thing.

    So that equation is the same as the differential in principle, but not in practice.

  6. The only way I'm going to understand the differences between the different theoretical team methods is to start experimenting with them. Maybe then, I could follow everything that is going-on with your Theoretical Team spreadsheet.

  7. Patriot,

    You have stated that Theoretical Team methods take into account the effect that a hitter has on the team LW values. If you use a league average theoretical team, or the league differential approach, does the individual effect the league LW values?

  8. The player's weights will reflect the difference between the league with him and the league without him...however, the effect will be lesser as you are diluting the player's influence by putting him in a context of 14 teams instead of just 1.

    If you removed the player's stats, then figured the LW for the league, they would obviously change a little bit too.

  9. Of course, linear methods take into account the effect a hitter has on the League LW values. I'm not quite sure what I was asking you in my last question.

  10. Is PAR R+/O+?

    I believe that I have been confusing PAR with R+/PA. I have been using PAR, thinking that it was R+/PA.

    Is it OK to apply PAR to an ERP style linear weights formula. Or, is PAR only meant to be applied to Theoretical team run-estimators.

  11. As David Smyth defined it, PAR is the PA-ratio--the ratio of new team PA with our batter compared to old team PA.

    R+/PA incorporates added PA by adding them in as additional runs created. If the batter generated 10 extra PA, for instance, he would get credit for 10*Lg(R/PA) additional runs, which are added to his RC.

    For LW, the "+" adjustment should be enough. PAR was designed to be the parallel to the "+" adjustment for theoretical teams, since you don't just assume that the additional PA add a fixed run value; they add a dynamic run value based (which of course BsR gives you).

    So you answered your own question correctly, I think. Use the "+" with a linear weights RC formula, use nothing with a linear weights above average formula, and use PAR with a TT formula.


I reserve the right to reject any comment for any reason.