Wednesday, September 01, 2021

Rate Stat Series, pt. 11: Rate Stats for the Theoretical Team Framework II

Once PAR has been incorporated, it should be clear that a different approach will be needed as our run estimate already includes the batter’s secondary contribution – it is starting out on a similar basis to wRC.  We also can fall back on our necessary test for a rate stat – it must produce the same RAA as the RAA produced by the full implementation of the framework we are looking at.

To argue why this should be the gateway criteria in a slightly different way than I did previously, consider the place of RAA in the theoretical team framework. The theoretical team framework is an attempt to value the batter by estimating the difference between the runs scored for a team on which he accumulates an even share of the plate appearances, and the runs scored for a team on which he does not play. While we have constructed an absolute estimate of runs created using this approach, it inherently is screaming out for a marginal approach – taking the difference between the team with the player and the team without the player. That is exactly what RAA is supposed to represent – and by calculating TT_RAA, we have (to the best of our ability) captured the batter’s primary, secondary, and tertiary contributions. TT_RAA is the marginal impact of the player on a team, and any rate stat that starts by using TT_BsRP should produce the same RAA figure as TT_RAA.

So let’s look at the key figures for our hitters:


Here, “RAA” is what we calculated above, based on TT_BsR/O or TT_BsR+/PA. As you can see, it does not match TT_RAA. If it did, we wouldn’t need to worry about a special set of rate stats for the TT framework – we could just piggyback on the same methodology used for linear weights. It’s clear that we need to apply something different to TT_BsRP when we use the full-blown theoretical team approach including PAR. I would also argue that this is evidence that the theoretical team approach should not be viewed simply as an alternative path to using linear weights, but rather a third unique framework for evaluating a batter’s contribution.

I would now go through a tortured explanation of the logic behind finding a denominator that achieves the desired result, but there is no need – David Smyth developed the answer and explained it clearly and concisely in a June 21, 2001 FanHome post:

On the subject of a rate stat such as R+/PA or R+/O, etc....

The whole idea behind this method is to compute the impact of a player on a theoretical team. On the team level (even a theoretical one), impact is in terms of runs and outs. The R+ generated by the procedure on this thread [NOTE: R+ is theoretically equivalent to what I am calling TT_BsRP] tells us the difference in runs between the theoretical team without the indicated batter (that is, a team of 8 average hitters), and the theoretical team with the indicated batter added. We can also compute the difference in the OUTS between these two teams, and use that total as the rate stat denominator. Call it O+.

And it's easy to calculate. You simply multiply the batter PA by the out percentage (1-OBA) of the reference team...It seems to me that the preferred rate stat for the R+ framework is not R+/PA nor R+/O; it's R+/O+. 

Frank Thomas had 149.9 TT_BsRP in 508 PA. The reference team had a .3433 OBA (same as the league average). Thus, his TT_BsRP/O+ (a specific implementation of Smyth’s R+/O+) would be:

O+ = PA*(1 – LgOBA) = 508*(1 - .3433) = 333.6

TTBsRP/O+ = TT_BsRP/O+ = 149.9/333.6 = .4494

What does this represent? It is not easy to explain it in a sentence, but I will try: Frank Thomas contributed .4494 runs to the theoretical team’s offense for each of its outs distributed to him. When Smyth said that PA*(1 – RefOBA) was equal to the difference in the team’s outs, he was referring to the theoretical team construct, in which the difference in team plate appearances is the batter’s own PA, since in the left hand term (T_BsR) the team has 9*PA, and in the right hand term (R_BsR), the team has 8*PA. The difference in outs is this difference (1*PA) times the team’s out rate.

In this sense, O+ can be thought of in terms of “freezing team outs”. We know that for a team (excluding the list of complicating circumstances like rainouts, foregone batting in the bottom of the ninth, and walkoffs), outs are fixed quality. Regardless of whether we add Frank Thomas or Matt Walbeck to a reference team, the team’s outs are fixed. In the TT/PAR approach, we start by capturing the batter in question’s secondary contribution through the PAR multiplier, but we never directly change the number of PA we start from. Thus, Thomas’ 508 PA and Walbeck’s 355 PA will turn into outs at the same rate for the purpose of the calculation. In reality, Thomas’ team will compile more PA thanks to his greater secondary contribution, but the equation handles this with a multiplier, freezing the original value.

You may not find that explanation convincing – I am struggling to articulate a concept that I may be fooling myself into believing I understand. You might be more convinced by the demonstration of what Thomas’ RAA is under this approach:

TT_RAA = (TT_BSRP/O+ - LgR/O)*O+

which for Thomas = (.4494 - .2075)*333.6 = 80.7 which is the same as his TT_RAA

In fact, I jumped the gun by calling it TT_RAA before I proved it was equal to what we previously called TT_RAA. It is in fact:


This leaves unanswered the question of what the rate stat using TT_RAA should be. Similar to how we had R+/PA and RAA/PA when working with linear weights, we should have an appropriate rate stat for the RAA figure. By manipulating the TT_RAA equation, we can see that TT_RAA/O+ should be consistent with R+/O+:

TT_RAA = (R+/O+ - LgR/O)*O+ 

so divide both sides by O+ to get:

TT_RAA/O+ = R+/O+ - LgR/O

Thus TT_RAA/O+ and R+/O+ are analogous to RAA/PA and R+/PA, except that the difference is LgR/O rather than LgR/PA.

So we’re done, right? Or is there something vaguely unsatisfying about all this, that after an entire series in which I argued that R/O was a team measure and not an individual one, does it bother you that we have left our rate stat for an individual in the form of R/O?

On one hand, it absolutely shouldn’t – R/O is the correct choice of rate stat for a team, and in this case we have modeled the player’s impact on the team and its R/O, and so there’s nothing wrong with expressing a final result in terms of R/O. The problem was not with R/O per se – it was with the inputs that we were putting in for individual batters. Additionally, R+/O+ allows our team and individual rate stats to converge. Team R+/O+ will reduce to team R/O if accept the premise that the proper “reference team” for a team is itself, and that it’s R+ is equal to its actual runs scored since actual runs scored obviously accounts for the team’s primary, secondary, and tertiary offensive actions. Not to mention any quaternary actions we could dream up or the fact that using those terms doesn’t even make sense when talking about teams. 

On the other hand, O+ is not in any way an intuitive metric – just re-read my tortured explanation of what it represents. A batter’s R+/O+ can be contextualized in any number of meaningful ways, just as regular old R/O can, but I’m not sure even those presentations (ala Runs Created/25.2 Outs) are truly relatable to a player’s performance except as a scaling device.

There is a very simple and equivalent alternative, though, and you may have already noticed what it might be from the formulas. We defined O+ as PA*(1 – LgOBA), which is really just plate appearances times a constant. If we just divide by that constant (1 – LgOBA), we can restate everything on the basis of PA, with no loss of ratio comparability. 

Doing this, we now have:

(TT_BsRP/PA – LgR/PA)*PA = TT_RAA

and TT_BsRP/PA = R+/O+ * (1 – LgOBA)

R+/PA+ = TTBsRP/PA

I am going to call TT_BsRP/PA (R+/PA+) even though PA+ is just equal to PA. I’m doing this primarily for ease of discussion, so that R+/PA+ will represent the theoretical team calculation, while R+/PA will represent the linear weights equivalent—and they are equivalent (which also means that R+/O+ could be applied to the linear weights framework – more on this in a later installment). My other justification is that in theoretical team framework, the actual number of plate appearances we plug in is not of importance when dealing with a rate stat, as we are always defining the reference team’s PA to be eight times the batter’s PA. We use the batter’s PA because it allows estimates like TT_BsR to reflect his actual playing time, but if all we care about is the rate at the end, we could use 1 plate appearance or 650 or any number we wanted. Thus I prefer to think about this quantity of plate appearances as the player’s share of the reference team’s PA rather than really being tied to his own, and feel justified in distinguishing it through the abbreviation PA+.

This approach to developing a rate stat for the TT framework can be thought of as “freezing plate appearances” as compared to the “freezing outs” approach of R+/O+. I first became aware of it through a FanHome post in 2007 by David Smyth (surprise). By that time it was over five years since Smyth had published the R+/O+ methodology and I had adopted it myself, so by the time we discussed what I am calling R+/PA+ I was in the interesting position of advocating one Smyth construct against the newer. We quickly verified their equivalence and left it there, which is the position I’m taking now. 

By definition, R+/O+ and R+/PA+ are perfectly correlated since the only difference between the two is multiplying by a constant. One allows us to express a rate stat in terms of R/O, which is consistent with how we would state a team rate stat; the other allow us to express it in terms of R/PA, which is how we would state a rate state from the linear weight framework. Both can be compared using ratios, which will be equivalent; both can be compared using differences, with the question of what the denominator for that difference should be up to the user. Both denominators can be used with RAA as well for RAA+/O+ or RAA+/PA+ (using RAA+ to refer RAA calculated within the full-blown theoretical team framework), and since R+/O+ and R+/PA+ can both be used to calculate RAA, they will also be consistent with RAA+ per O+ or PA+.

I’ll close with a sample calculation. Frank Thomas had 149.9 TT_BsRP in 508 PA, which is .2951 R+/PA+. The league average R/PA was .1363, so Thomas’ TT_RAA was (.2951 - .1363)*508 = 80.7, same as calculated previously using R+/O+ or directly from the original TT_RAA formula. The league OBA was .3433, so Thomas’ R+/O+  is equal to .2951/(1 - .3433) = .4494 as calculated previously. 

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.