Wednesday, April 28, 2021

Rate Stat Series, pt. 2: PA Generation

This is a little bit of a detour and certainly nothing new (I don’t know who originally laid out this logic/math – the earliest use I’m aware of was in 1960 by D’Esopo & Lefkowitz as part of their Scoring Index model), but I think a discussion of it is appropriate in the context of this series, and I will later make use of these formulas.  It’s also ground I covered in the original series, but I think my explanation this time is slightly more coherent.

Each batting team starts each inning (excluding scenarios where a walkoff is possible) with three plate appearances guaranteed. Thus each team starts each game with twenty-seven plate appearances guaranteed (excluding scenarios where the home team forgoes batting the bottom of the ninth, rainouts, post-2020 doubleheaders, etc.). Any plate appearances beyond that must be earned by batters avoiding outs. Since it’s more natural to think of a positive outcome rather than the avoidance of a negative outcome, I will simplify and say that each extra plate appearance must be earned by a batter reaching base (and not being subsequently retired on the basepaths).

For the sake of discussion (and keeping with the simple set of statistics being used in the metrics in this series), I’m going to ignore the existence of baserunning outs, including caught stealing, pickoffs, outs stretching, outs advancing, and runners retired on double/triple plays (although not on fielder’s choices, since the batter is charged with an out in that case). I’m going to assume that the out rate is the complement of on base average, which in this series will be defined simply as (H + W)/(AB + W). In reality, considering all the ways in which outs can be made, it would be a more involved equation (I’ve used the acronym NOA for Not Out Average and OA for the complement, Out Average) which would look something like this, although it still doesn’t think I’ve accounted for every possible event (you try incorporating fielders’ choices without complicating the equation significantly):

NOA = (H + W + HB + CI + ROE – CS – DP – Outs Stretching – Outs Advancing – Pickoffs – 2*TP)/(AB + W + HB + SF + SH + CI)

Alternatively, for a team when LOB data is available (and ignoring the walkoff situation), you could have OA = (Plate Appearances – Runs Scored – Left On Base)/Plate Appearances. All of this is just an attempt to calculate, as best we can from the available statistics we have restricted ourselves to, Outs/Plate Appearances. NOA or OA as appropriate could be substituted for OBA in the equations that follow as long as the appropriate corresponding adjustments are made to the numerator.

Let’s assume for the purpose of developing an equation for team plate appearances that the OBA is constant across each of the nine batters in the lineup and doesn’t vary for any other reason (this is obviously never true, but it is a fine simplifying assumption for modeling PA generation). Then a team will start an inning with three plate appearances. For each of those three guaranteed PAs, there is a probability (equal to OBA, given our assumption) that the batter avoids an out (reaches base, given that there are no baserunning outs). This increases the expected number of plate appearances by OBA.

It doesn’t stop there, though. Each additional PA that is generated also has an OBA chance of creating an additional PA, which itself has an OBA chance of creating an additional PA. Thus, for each of the guaranteed PA, the expected final number of team PA is:

OBA + OBA*OBA + OBA*OBA*OBA + … = OBA + OBA^2 + OBA^3 + … OBA^n

which when n is infinity and OBA is between 0 and 1 (which it must be by definition) resolves to:

OBA/(1 – OBA)

The 1994 AL had an OBA of .343. Thus, each guaranteed plate appearance should have generated .343/(1 - .343) = .522 additional plate appearances. In an average inning, starting with three guaranteed PA, we would expect 3 + 3*.522 = 3*(1 + .522) = 4.566 PA, and thus in a game we would expect 9*4.566 = 41.09 PA. Note that instead of calculating the .522 additional PA, we can simplify this to 3/(1 – OBA) for an inning or 27/(1 – OBA) for a game. In reality there were 39.24 PA, so we have an unacceptable 4.7% error. What went wrong?

I’m mixing definitions of plate appearances and definitions of OBA incorrectly, and also ignored that the three guaranteed PA are equal to the number of outs permitted in the inning. In order to estimate the number of plate appearances per inning or game consistently, we need to divide the average number of outs/game by 1 – OBA:

PA/G = (O/G)/(1 – OBA)

The definition of outs that corresponds to our simple (H + W)/(AB + W) complement of out average is AB – H. In the 1994 AL there were 25.19 outs/game using this definition, so our expected PA/G is:

25.19/(1 - .353) = 38.34

The actual average was 38.35; we’re off due to rounding as this is now just a mathematical truism since by our simplified definitions plate appearances = outs + times on base. Using this equation to estimate team PA/G from their OBA for the 1994 AL, the RMSE is .259, which is about .7% of the average PA/G. We shouldn’t expect perfect accuracy at the team level since team PA will be affected by different quantities of all the statistical categories we’re ignoring that have an impact on the actual number of PA a team generates, as well as differences in number of extra inning games, foregone bottom of the ninths, and walkoff-shortened innings.

The key points to keep in mind as we move forward in discussing rate stats are:

1.      The number of plate appearances a team will get is a function of their out rate, and simplifying terms we can very accurately estimate team PA as a function of on base average

2.      Since players have an impact on the number of plate appearances their team gets, and thus the number of plate appearances they get, a proper rate stat for measuring overall offensive productivity must account for that impact

2 comments:

  1. While the number of HR is low, it's also true that he can't be putout once reaching base. So, depending how you want to complicate things, we should remove HR. In addition, the chance of being putout once reaching by triple is less than reaching by double is less than reaching by single, walk, HBP. Again, depending how far you want to complicate things, a different out factor for each non-HR safe event can be considered.

    ReplyDelete
  2. Good points. The "NOA" equation in the middle of the post was my attempt to capture beginning outs, but is a theoretical exercise that runs well ahead of the traditional data (absent the Palmerian fudge of "ours on base").

    ReplyDelete

I reserve the right to reject any comment for any reason.