Thursday, December 03, 2020

Palmerian Park Factors

The first sabermetrician to publish extensive work on park effects was Pete Palmer. His park factors appeared in The Hidden Game of Baseball and Total Baseball and as such became the most-widely used factors in the field. They continue in this role thanks to their usage at Baseball-Reference.com.

Broadly speaking, all ratio-based run park factors are calculated in the same manner. The starting point is the ratio of runs at home to runs on the road. There are a number of different possible variations; some methods use runs scored by both teams, while others (including Palmer’s original methodology in The Hidden Game use only the runs scored by one of the teams (home or road)). The opportunity factor can also vary slightly; many people just use games, which is less precise than innings or outs. The variations are mostly technical rather than philosophical in nature, so they rarely get a lot of attention. Park factor calculations are accepted at face value to an extent that other classes of sabermetric tools (like run estimators) are not.

Among park factors in use, the actual computations in Palmer’s are the most unique, so I thought it would be worthwhile to walk through his methodology (as stated in Total Baseball V) and discuss its properties.

In the course of this discussion, I will primarily focus on the actual calculations and not the inputs. Where Palmer has made choices about what inputs to use, how to weight multiple seasons, and the like, I will not dwell, as I’m more interested in the aspects of the approach that can be applied to one’s own choice of inputs. Palmer uses separate park factors for batting and pitching (more on this later); I’ll focus on the batting ones here.

Palmer generally uses three years of data, unweighted, as the basis for the factors. There are some rules about which years to use when teams change parks, but those are not relevant to this discussion. The real meat of the method starts by finding the total runs scored and allowed per game at home and on the road, which I’ll call RPG(H) and RPG(R).

i (initial factor) = RPG(H)/RPG(R)

I will be using the 2010 Colorado Rockies as the example team here, considering just one year of data to keep things simple. Colorado played 81 games both home and away, scoring 479 and allowing 379 runs at home and scoring 291 and allowing 338 on the road. Thus, their RPG(H) = (479 + 379)/81 = 10.593 and RPG(R) = (291 + 338)/81 = 7.765. That makes i = 10.593/7.765 = 1.364 (I am rounding to three places throughout the post, which will cause some rounding discrepancies with the spreadsheet from which I am reporting the results).

The next step is to adjust the initial factor for the number of innings actually played rather than just using games as the denominator. This step can be ignored if you begin with innings or outs as the denominator rather than using games. The Innings Pitched corrector is:

IPC = (18.5 - Home W%)/(18.5 - (1 - Road W%))

Palmer explains that 18.5 is the average number of innings batted per game if the home team always bats in the ninth inning. Teams that win a higher percentage of games at home bat in less innings due to skipping the bottom of the ninth. The IPC seems to assume that in all games won by the home team, they do not bat in the bottom of the ninth.

Colorado was 52-29 (.642) at home and 31-50 on the road (.383), so their IPC is:

IPC = (18.5 - .642)/(18.5 - (1 - .383)) = .999

The initial factor is divided by the IPC to produce what the explanation refers to as Run Factor:

RF = i/IPC

For the Rockies, RF = 1.364/.999 = 1.366

The next step is the Other Parks Corrector (OPC). The OPC “[corrects] for the fact that the other road parks’ total difference from the league average is offset by the park rating of the club that is being rated.” The glossary explanation may be confusing, but the thought process behind it is pretty straightforward--a team’s own park makes up part of the league road average, but none of the team’s own road games are played there. Without accounting for this, all parks would be estimated to be more extreme than they are in reality.

The OPC is figured in the same manner as I do it in my park factors; I borrowed it from Craig Wright’s work without realizing that Palmer had done the same mathematical operation, but its derivation is fairly obvious and the equivalent appears in multiple park factor approaches. Let T equal the number of teams in the league:

OPC = T/(T - 1 + RF)

Basically, OPC assumes a balanced schedule, so each team’s schedule is made up half of games in its own park (hence the use of RF) and half in the other parks, of which there are T - 1. For a sixteen team league (and thus Colorado 2010):

OPC = 16/(16 - 1 + 1.366) = .978

The next step is to multiply RF by OPC, producing scoring factor:

SF = RF*OPC

For Colorado, SF = 1.366*.978 = 1.335

If all you were interested in was an adjustment factor of the park effect’s on scoring, rather than specific adjustments for the team’s batters and pitchers, this would be your stopping point. The scoring factor is the final park factor in that case, and with the exception of the Innings Pitched Corrector, it is equivalent to the approach used by Craig Wright, myself, and many others. (My park factors are then averaged with one to account for the fact that only half of the games for a given team are at home, but that only obscures the fact that the underlying approach is identical, and Palmer accounts for that consideration later in his process).

It is when the other factors are adjusted for that the math gets a little more involved. The first step is to calculate SF1, which is an adjustment to scoring factor:

SF1 = 1 - (SF - 1)/(T - 1)

For the 2010 Rockies:

SF1 = 1 - (1.335 - 1)/(16 - 1) = .978

While I am writing this explanation, I must stress that it is a walkthrough of another person’s method. I cannot fully explain the thought process behind it and justify every step. I decided to include that disclaimer at this point because I don’t understand what the purpose of SF1 is, as it is mathematically equivalent to OPC. Why it needed to be defined again and in a more obtuse way is beyond me.

In any event, the purpose of SF1 is to serve as a road park factor. If a team plays a balanced schedule and we take as a given that the overall league park factor should be equal to one, but we determine that its own park has a PF greater than one (favors hitters), then it must be the case that the road parks they play in have a composite PF less than one. That’s the function of the OPC, and of SF1. For the rest of this post, I will refer to OPC rather than SF1 when it is used in formulas.

Palmer uses separate factors for batters and pitchers to account for the fact that a player does not have to face his teammates. By doing so, Palmer’s park factors make their name something of a misnomer as they adjust for things other than the park. (One could certainly make the case that the park factor name is constantly misapplied in sabermetrics, as we can never truly isolate the effect of the park, and the sample data we have is affected by personnel and other decisions. Palmer takes it a step further, though, by accounting for things that have nothing to do with the park.) The park effect is generally stronger than the effect of not facing one’s own teammates, since a team plays in its park half the time but only misses out on facing its own pitchers 1/T percent of the time assuming a balanced schedule.

One can argue about the advisability of adjusting for the teammate factor at all, and if so it certainly is debatable whether it should be included in the park factor or spun off as a separate adjustment. I would find the later to be a much better choice that would result in a much more meaningful set of factors (both for the park and teammates), but Palmer chose the former.

The separate factors are calculated through an iterative process. One must know the R/G scored and allowed at home and away for the team (I’ll call these RG(H), RG(R), RAG(H), and RAG(R) for R/G at home, R/G on the road, RA/G at home, and RA/G on the road respectively). Additionally, one must know the average RPG for the entire league (which I’ll call 2*N, since I often use N to denote league average runs/game for one team). For the Rockies, we can determine from the data above that RG(H) = 5.913, RG(R) = 3.593, RAG(H) = 4.679, RAG(R) = 4.173, and N = 4.33.

The iterative process is used to calculate a team batter rating (TBR) and a team pitcher rating (TPR, not to be confused with Total Player Rating, another Palmer acronym). These steps are necessary since the strength of each unit cannot be determined wholly independently of the other. Just as the total impact of park goes beyond the effect on home games and into road games (necessitating the OPC, but not being of strong enough magnitude to swamp over the home factor), so the influence of the two units on another are codependent.

The first step of the process assumes that the pitching staff is average (TPR = 1). Then the TBR can be calculated as:

TBR = [RG(R)/OPC + RG(H)/SF]*[1 + (TPR - 1)/(T - 1)]/(2*N)

For Colorado:

TBR = [3.593/.978 + 5.913/1.335]*[1 + (1 - 1)/(16 - 1)]/(2*4.33) = .936

The first bracketed portion of the equation adds the teams’ adjusted (for park) R/G at home and on the road, using the home (SF) or road (OPC) park factor as appropriate. The second set of brackets multiplies the first by the adjustment for not facing the team’s pitching staff. The difference between the team pitching and average (TPR - 1) is divided by (T - 1) since playing a true balanced schedule, the team would only face those pitchers in 1/(T - 1) percent of the games anyway. If the team’s pitchers are above average (TPR < 1), then the correction will increase the estimate of team batting strength.

Then the entire quantity is divided by double the league average of runs scored per game by a single team. This may seem confusing at first glance, but it is only because the first bracketed portion did not weight the home and road R/G by 50% each. The formula could be written as the equivalent:

TBR = [.5*RG(R)/OPC + .5*RG(H)/SF]*[1 + (TPR - 1)/(T - 1)]/N

In this case, it is much easier to see that the first bracket is the average runs scored per game by the team, park-adjusted. Dividing this by the league average results in a very straightforward rating of runs scored relative to the league average. Both TBR and TPR are runs per game relative to the league average, but in both cases they are constructed as team figure/league figure. This means that the higher the TBR, the better, with the opposite being true for TPR.

The pitcher rating can then be estimated using the actual TBR just calculated, rather than assuming that the batters are average:

TPR = [RAG(R)/OPC + RAG(H)/SF]*[1 + (TBR - 1)/(T - 1)]/(2*N)

For the Rockies, this yields:

TPR = [4.173/.978 + 4.679/1.335]*[1 + (.936 - 1)/(16 - 1)]/(2*4.33) = .894

The first estimate is that the Rockies batters, playing in a neutral park and against a truly balanced schedule, would score 93.6% of the league average R/G. Rockie pitchers would be expected to allow 89.4% of the league average R/G. At first when I was performing the sample calculations, I thought I might have made a mistake since the Colorado offense was evaluated as so far below average, but I was forgetting that they would appear quite poor using a one-year factor for Coors Field. My instinct as to what Colorado’s TBR should look like was informed by my knowledge of the five-year PF that I use.

Now the process is repeated for three more iterations, each time using the most recently calculated value for TBR or TPR as appropriate. The second iteration calculations are:

TBR = [3.593/.978 + 5.913/1.335]*[1 + (.894 - 1)/(16 - 1)]/(2*4.33) = .929

TPR = [4.173/.978 + 4.679/1.335]*[1 + (.929 - 1)/(16 - 1)]/(2*4.33) = .893

Repeating the loop two more times should ensure pretty stable values. Theoretically, there’s nothing to stop you from setting up an infinite loop. I won’t insult your intelligence by spelling the second pair of iterations suggested by Palmer, but the final results for Colorado are a TBR of .929 and a TPR of .993.

So far, all we’ve done is figured teammate-corrected runs ratings for team offense and defense. Our estimate of the park factor remains stuck back at scoring factor. All that’s left now is correcting scoring factor for 1) the teammate factor and 2) the fact that a team plays half its games at home and half on the road. This will require two separate formulas--a batter’s PF (BPF) and a pitcher’s PF (PPF), which are twins in the same way that TBR and TPR are:

BPF = (SF + OPC)/[2*(1 + (TPR - 1)/(T - 1)]

PPF = (SF + OPC)/[2* (1 + (TBR - 1)/(T - 1)]

For Colorado:

BPF = (1.335 + .978)/[2*(1 + (.893 - 1)/(16 - 1)] = 1.165

PPF = (1.335 + .978)/[2*(1 + (.929 - 1)/(16 - 1)] = 1.162

The logic behind the two formulas should be fairly obvious at this point. Again, the multiplication by two in the denominator arises because the two factors in the numerator were not weighted at 50% each. You can write the formulas as:

BPF = (.5*SF + .5*OPC)/[1 + (TPR - 1)/(T - 1)]

PPF = (.5*SF + .5*OPC)/[1 + (TBR - 1)/(T - 1)]

Here, you can see more clearly that the numerator averages the home park factor (SF) and the road park factor (OPC). The numerator could be your final park factor if you wanted to account for road park but didn’t care about teammate effects. The denominator is where the teammate effect is taken into account, and in the same manner as in TBR and TPR. If TPR > 1, BPF goes down because the batters did not benefit from facing the team’s poor pitching. If TBR > 1, PPF goes down pitchers benefitted from not facing their batting teammates.

If one does not care to incorporate the teammate effect, then Palmer’s park factors are pretty much the same as any other intelligently designed park factors. This should not come as a surprise, because as with many implementations of classical sabermetrics, Palmer’s influence is towering. The iterative process used to generate the batting and pitching ratings is pretty clever, and incorporates a real albeit small effect that many of us (myself included) often gloss over.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.