Monday, March 23, 2020

Tripod: Theoretical Team Base Runs

See the first paragraph of this post for an explanation of this series.

While Base Runs is an incredibly flexible run estimator when it comes to working across a wide range of contexts, as a multiplicative formula it is not directly applicable to individual batters. However, there are a number of ways that you can use Base Runs to assist in your evaluation of batters. One way is to use Base Runs to calculate Linear Weights for your entity, and then apply these weights to the individual batters in the entity. You can find the weights for the 1978 AL and calculate Reggie Jackson's linear weights from this. Or you could find the weights for the Yankees and get a measure of Jackson's run creation in his own team context. Or you could find the weights for the Red Sox and see how many runs Jackson would have created in that context. The possibilities are close to limitless.

However, when you calculate Jackson's value in the Red Sox's context, you have not accounted for the fact that if Jackson played for the Red Sox, he would change that context. If you want to include this effect, things get a bit more complicated.

The basic ideas in this area were pioneered by David Tate, who published a method called Marginal Lineup Value which used Runs Created in a similar way. Keith Woolner also played an important role in the development of MLV. While the options I am detailing here are not directly adapted from Marginal Lineup Value, many of the ideas are and their work has set of the light in my head and those of others who have laid out similar techniques, so their contributions must be recognized.

Bill James "new" Runs Created introduced in the STATS All-Time Major League Handbook and used in their other publications since then(as well as the Bill James Handbook from Baseball Info Solutions) also incorporates many of these ideas and introduced an ingenious way to state absolute results--that is, the number of total runs created rather then runs above some baseline as the Tate/Woolner method did.

The first step in applying this method is to assume that we have a team of 8 average players each getting an equal number of Plate Appearances. Then we add the player in question to this team, with the same number of PAs as the other eight players, which we will make equal to the player in question's actual PA. Then we calculate the new A, B, C, and D factors for this team.

Let's use Mark McGwire's 1998 season as an example of how this works. We will put him on a team that performs at the 1961-2002 composite data discussed in the BsR article. This league has a ROBA of .3007, AF of .3047, OA of .6763, and HRPA of .0230. McGwire personally compiled an A factor of 244, a B factor of 267.69, a C factor of 357, and a D factor of 70(there will be rounding differences with the spreadsheet throughout this essay).

The non-McGwire portion of the team will have an A factor of 8*PA*LgROBA, where PA is McGwire's PA and LgROBA is the ROBA for the entity in question. We will call the 8*LgROBA portion as E. From there:

E = 8*LgROBA
F = 8*LgAF
G = 8*LgOA
H = 8*LgHRPA

For the 1961-2002 data(which I will call from here on out the "standard" or "reference" league) these values are E = 2.41, F = 2.45, G = 5.41, and H = .184.

Then, the new A factor for the team with McGwire will be A + E*PA, where A is McGwire's personal A and PA is, again for the last time, his personal PA. Then:

TmA = A + E*PA
TmB = B + F*PA
TmC = C + G*PA
TmD = D + H*PA

We then put these together to estimate the number of runs this team will score with McGwire as TmA*TmB/(TmB + TmC) + TmD, and subtract from this the number of runs the eight players would score without McGwire. Without McGwire, the team will score LgROBA*LgAF/(LgAF + LgOA) + LgHRPA times eight times PA. We can make a formula for I:

I = 8*(LgROBA*LgAF/(LgAF + LgOA) + LgHRPA)

For the standard league, I = .93. Then we can make a big equation for the difference between the team with McGwire and without McGwire:

TT BsR = (A + E*PA)*(B + F*PA)/((B + F*PA) + (C + G*PA)) + (D + H*PA) - I*PA

Which algebraically simplifies to:

TT BsR = (A + E*PA)*(B + F*PA)/(B + C + (F+G) * PA) + D - (I - H)*PA

Which, for the standard league is:

TT BsR = (A+ 2.41PA)*(B + 2.44PA)/(B + C + 7.86PA) + D - .75PA

For McGwire, we get a value of 169.03. This can be compared to his personal BsR, calculated through the team formula, of 174.86, or the LBsR for McGwire when you use the linear weights derived by BsR for the standard league of 168.38. So you can see that since McGwire was a high-production player, his personal BsR is higher then what you get if you put him on a standard team. But since McGwire personally alters the run environment of the team he is added to, his TT BsR is higher, although only slightly, then his LBsR.

We can also find McGwire's TT BsR above other baselines then absolute. I will use average here and below will (tentatively) sketch out a procedure to use replacement level(or any other baseline for that matter). To apply an average baseline, all we have to do is compare McGwire to a team of 9 average players rather then 8 average players. We can use the same formulas as above, except that for this team I will be figured as:

I = 9*(LgROBA*LgAF/(LgAF + LgOA) + LgHRPA)

I = 1.05 for the standard league, which gives this equation for TT BsR Above Average:

TT BsRAbvAvg = (A+ 2.41PA)*(B + 2.44PA)/(B + C + 7.86PA) + D - .87PA

For McGwire, this gives a value of +90.76 runs above average.

These formulas are very long and confusing. One thing we can do is differentiate them and state them as a new set of custom LW for the team now that we have added the player. The formula for this is:

LW = ((B + C + (F + G)*PA)*((A + E*PA)*(b + F*p) + (B + F*PA)*(a + E*p)) - (A + E*PA)*(B + F*PA)*(b + c + F*p + G*p))/((B + C + (F + G)*PA)^2) + d - I*p + H*p

In this formula, p is the derivative of the plate appearance function for each event, where PA = AB + W + HB + SH + SF. In the case of McGwire, we know that the LBsR weights for the standard league(displayed as S, D, T, HR, W, O) are: .476,.806,1.136,1.495,.320,-.095 which gives him 168.38 runs. Using the formula above, we get .490,.823,1.157,1.499,.331,-.103 which produces 169.03. These results are similar to calculating the new rate stats for the team with our player added. For example, TmROBA = 1/9*ROBA + 8/9*LgROBA, and on in this fashion, and then use the classic LW from BsR formula to find the LW(TmROBA is A, TmAF is B, etc.)

You can also use the above formula with the Above Average TT formula--the only difference is that you have to use the different I for the nine-man lineup. For McGwire, this gives these LW: .373,.707,1.040,1.383,.215,-.220. The effect of this technique is to subtract the League R/PA(as figured by BsR) from each event that accounts for a PA(or in the lingo of the method above, has p = 1), and makes no change to any event that does not accounts for a PA(p = 0). This happens because the only difference in the two formulas is the difference in the I values. The I value above average is 9*LgR/PA, and the I absolute I value is 8*LgR/PA. So the difference is LgR/PA, but this is only multiplied by PA. So an event like a steal that does not account for a PA does not lose any value at all between the two formulas. This probably illustrates that the TT Average technique is a shortcut but not a solution, because the difference is based on subtracting PA rather then comparing to outs or team outs, etc. The best way to find the TT BsR above some baseline would probably be to first find the Absolute TT BsR and then apply some baseline comparison as you would with any other runs created estimate.

When the Theoretical Team procedure is applied to Runs Created, it just so happens that TT RC = 1/9*Traditional RC + 8/9*Linear RC. In the past, I have incorrectly used this fact as the proof in my mind and said that the same was true for Base Runs. It is not true. I am not quite sure the technical reasons why this is, but I believe it is because the RC formula is pure multiplication. A*B*(1/C) if you will. But BsR involves two additions(B+C and adding D to the whole thing), and I think this eliminates the property. Anyway, it still comes pretty close to this. You can set up this equation:

TT BsR = x(BsR) + (1 - x)(LBsR)

If you solve for x:

x = (TT BsR - LBsR)/(BsR - LBsR)

If you do this for McGwire, you find that his TT BsR is made up 10.6% of his Straight BsR and 89.4% of his Linear BsR.

So far we have assumed that the player keeps the same number of PAs he had in actuality when we move him onto a new team. But we know that this, too, is a simplification. Just as the batter changes the run values of the team he is on by changing the context, his ability to avoid outs(or, equivalently ignoring outs made on the basepaths, get on base) will directly impact the number of Plate Appearances his teams will have in which to score runs. To account for this, we will add a new factor called PAR to the Theoretical Team BsR formulas.

Before we do this, though, it should be pointed out that when we do this we are leaving the realm of attempting to estimate the number of runs the player has actually created and are trying to estimate the number of runs the player would theoretically create if added to an otherwise average team. For one thing, the player's actual PA already incorporate the effect of the extra PAs he adds by getting on base. So we can easily overstate his impact by allowing him to further inflate his PA on an average team after inflating his own PA on his own team. If the team he actually plays for has an above average rate of getting on base with him included, we will overstate the PA he will wind up with on his theoretical team. What we could do is find the actual percentage of his actual team's PAs that he used, convert this to an equivalent percentage on an average team, and plug that into the formula.

However we choose to do this, we will have some number for PA and go from there. The first step will be to calculate what I will call Not Out Average(NOA). NOA is simply the percentage of Plate Appearances that do not result in outs as recorded in the official statistics. NOA = (H + W + HB - CS - DP)/(AB + W + HB + SH + SF). We will further say that the denominator AB + W + HB + SH + SF = P(replacing PA in the formulas to come), and that the numerator H + W + HB - CS - DP = N. The derivatives of these(with each event that is counted in P or N has a p or n of 1 respectively) will be called p and n.

We will first calculate the NOA for the team with our player added as TmNOA = NOA*(1/9) + LgNOA*(8/9). We know that PA/G can be estimated as X/(1 - NOA), where X is the number of outs/game in the league that are accounted for in the official statistics. So we want the ratio between the PA/G for the team with our player and PA/G without our player, which we will call PAR for PA Ratio(this is a term I have borrowed from David Smyth). PAR = (X/(1 - TmNOA))/(X/(1 - LgNOA)). Simplifying this results in PAR = (1 - LgNOA)/(1 - Tm NOA). Running through this with McGwire, the LgNOA = .3150, NOA = .4680, TmNOA = .3320, and PAR = 1.0254. So an average team with McGwire getting 1/9 of their PA will wind up with 2.54% more PA then a totally average team.

We then need to change each factor of that we put in the BsR equation to account for PAR. For example, we started with TmA = A + E*PA. When PAR is incorporated, this is now TmA = A*PAR + E*PA*PAR, which can be rewritten as TmA = (A + E*PA)*PAR. The TmB, TmC, and TmD calculations are analogous. We then simply substitute these formulas into the original TT BsR formulas to get:

TT BsR w/ PAR = PAR*((A + E*P)*(B + F*P)/(B + C + (F + G)*P) + (D + H*P)) - I*P

Remember, we are now using P as the abbreviation for our player's Plate Appearances. As you can see, the I*P portion is not multiplied by PAR. This is because this part represents the number of runs the team would score without our player. PAR measures the effect of our player on the team PA/G, so it is irrelevant to how many runs the team would score if he did not play for them.

Just as with the original formula, we can easily compare to average by changing the I value as done previously. With PAR, we find McGwire's absolute TT BsR as 189.26 and +110.99 above average.

Just as we have done previously, we can differentiate this equation to see the intrinsic linear weights that it uses. It is a long formula with an even longer derivative, so I will break the derivative up into two pieces.

The first step is to find the derivative of PAR with respect to each event. This is done by first differentiating NOA with respect to each event to get dNOA/dX, where X is S, D, T, HR, etc. Then we differentiate PAR with respect to NOA to get dPAR/dNOA. From here, (dPAR/dNOA)*(dNOA/dX) = dPAR/dX. This results in this formula:

dPAR/dX = (1/9)*(1 - LgNOA)/((1 - TmNOA)^2)*(P*n - N*p) /(P^2)

We can then differentiate the entire PAR TT BsR equation to get the formula for the linear weights there. In the equation below, dPAR/dX represents the derivative of PAR, figured by the above formula, with respect to whatever event we are differentiating the PAR TT BsR formula for:

LW = PAR*((B + C + (F + G)*P)*((A + E*P)*(b + F*p) + (B + F*P)*(a + E*p)) - (A + E*P)*(B + F*P)*(b + c + F*p + G*p))/((B + C + (F + G)*P)^2) + ((A + E*P)*(B + F*P)/(B + C + (F + G)*P) + D + H*P)*(dPAR/dX) - I*p

Yes, that is the longest sabermetric equation I have ever published on this website, or anywhere else for that matter. When we do this for Big Mac, we find .633,.975,1.317,1.669,.471,-.176. Again, by changing I to the average value we can get the LW for TT BsR Above Average W/ PAR, and again the difference is to subtract LgR/PA from each event where p = 1.

Applying Replacement Level

This is a real pain to calculate, and I don't use it, but I think it is a useful discussion to have for a number of reasons. If I wanted to apply a replacement level to TT BsR, I would calculate Absolute TT BsR and then apply the baseline from there. But we will look at the alternative.

To calculate Absolute TT BsR Above Replacement, all we would have to do is find an I value that would represent runs/PA for a team with 8 average players and 1 replacement player. The 8 average players part is easy, but in order to figure the replacement player in, we need to know how he will hit in terms of ROBA, AF, OA, and HRPA. Usually, though, we set replacement level as some percentage or linear difference of run production(be it in terms of per out or per PA, or Wins Above Average per PA, or R+/O+, or R+PA, etc.). But those assumptions don't tell us how the player will hit in terms of basic offensive events, just total production.

I will use 73% of the league runs/out as the baseline in this article(see the "Baselines" article for discussion of this), although you can apply a different baseline and still use the outlines of my procedure to do it. The first step will be to understand the Linear Weights Ratio(LWR). There are probably alternative ways to do this, but I have done it this way and it suits my purposes.

LWR is a great tool invented by Tango Tiger that uses the LW coefficients and converts it into a ratio of positive run production to outs. I have linked his little article on it at the bottom of the page, but will cover the basics again here. Before I start, I should discuss the treatment of various events in my concept of replacement level here. I am assuming that a replacement level player is a replacement level player because of his hitting performance(S, D, T, HR, W, outs). He will steal bases, bunt, hit sac flys, hit into DPs, etc., at a league average rate. There are certainly debatable assumptions in there, but you have to keep things reasonably simple.

To establish LWR, we put the positive value of S, D, T, HR, and W in the numerator. We then set the single weight to one and rescale all of the other coefficients based on their ratio to singles. So let d = LW(double)/LW(single), and t = LW(triple)/LW(single), etc. Then we have this formula(all of the terms in the formulas that follow unless otherwise marked apply to league statistics):

LWR = (S + d*D + t*T + hr*HR + w*W)/(AB-H)

For the standard league:

LWR = (S + 1.693*D + 2.386*T + 3.139*HR + .671*W)

Once we have this, we can this fact about LWR:

Runs/Out = LW(single)*LWR + LW(out)

For our league, the Runs/Out from LWR is .172(the LWR itself is .562), and the LW out value is -.095. 73% of .172 this is .126. What LWR will produce a R/O of .126? First, let x be the replacement rate(73%). Then RepLWR is given by the equation:

RepLWR = (x*(LgR/O) - out value)/LW(single)

This results in .464, which converts back to .126 runs/out.

So we know that a replacement player will put up a LWR of .464. Now we need to convert this relationship back into the effect on his component stats. What we do first is find a value that I will call Y. Y is the ratio of the quantity of "positive" in the LWR that the league has generated from a given event divided by the quantity of "positive" it has generated from singles. To illustrate, the standard league has a single per PA of .166. On a per PA basis, the positive LWR contribution of singles is 1*.166 = .166. The league has double/PA of .041. The positive LWR contribution of doubles per PA is 1.693*.041 = .069. .069/.166 = .418 is the Y value for doubles. Sum up the Y values for all events(including singles). Or if you prefer a formula:

Y = 1 + (d*D/P)/(S/P) + (t*T/P)/(S/P) + (hr*HR/P)/(S/P) + (w*W/P)/(S/P)

Y is 2.290 for the standard league.

We also need another quantity, Z. Z is simply the ratio of the rate of a given event divided by the rate of singles. So Z for doubles is .041/.166 = .244, and the formula for the summed Z values is:

Z = 1 + (D/P)/(S/P) + (T/P)/(S/P) + (HR/P)/(S/P) + (W/P)/(S/P)

Z is 1.950 for the standard league.

What exactly have these Y and Z steps done? They have converted all of the contribution of doubles, triples, home runs, and walks into an equivalent number of singles. What we are saying is that for the standard league, the quantity of positive LWR is equivalent to 2.290 times the number of singles(this is Y), and the number of runners on base is equivalent to 1.950 times the number of singles(this is Z). This procedure is in a similar spirit to the "Willie Davis method" introduced by Bill James in the New Historical Baseball Abstract, in which he expresses everything in terms of an equivalent number of hits. Why does he do this? Because it allows you to have one variable to solve for in an equation instead of five. Once we find the value of S that we are looking for, we can convert this back into D, T, HR, and W values.

What we are after is the rate at which a replacement player would hit singles to produce a .464 LWR. We have this equation:

RepLWR = Y*X/(1 - Z*X)

Where X is the S/PA for the replacement player. The equation to solve for X is:

X = RepLWR/(Y + RepLWR*Z)

So for the standard league, X = .145. The replacement player will get a single in 14.5% of his PAs compared to 16.6% for an average player. Since we have assumed that S, D, T, HR, and W will all be reduced by the same percentage, we divide .145 by .166 to get the "Multiplier". So Multiplier = X/(S/P) and is .875 for the standard league. So the replacement level player in the standard league will hit singles, doubles, triples, homers, and draw walks, at 87.5% of the rate that an average player would. Just to be absolutely clear, Rep(D/P) = (D/P)*Multiplier, and so on.

For the out value, there are two mathematically equivalent techniques. One is to find Rep(O/P) as 1 - Rep(S/P) - Rep(D/P) - Rep(T/P) - Rep(HR/P) - Rep(W/P). The second is to figure Rep(O/P) as 1 - (1 - O/P)*Multiplier. The second equation is essentially equivalent to saying that the OBA for the replacement player will be 87.5% of the OBA for the average player as well.

Once we have calculate the S, D, T, HR, W, and O per PA for a replacement player, we can calculate the ROBA, AF, OA, and HRPA for him(ignoring all terms other then those we have for the replacement player*S, D, T, HR, W, and O). We then calculate rtROBA as (1/9)*RepROBA + (8/9)*LgROBA(rtROBA is "replacement team" ROBA; that is, a team that is 8/9 average and 1/9 replacement). We calculate the other terms similarly and then figure the I value for the replacement comparison as:

I = (rtROBA*rtAF/(rtAF + rtOA) + rtHRPA)*9

For the standard league, I = 1.02 and McGwire is +108.38 runs above replacement. We can also apply PAR using the same formulas as above.

Let me now just briefly discuss the method I used to find stats for a replacement player. One major weakness that I already mentioned was limiting the difference between the replacement player and an average player to only the basic hitting events. Another is that I assume that among the basic hitting events, all deflate equally. The replacement player in the standard league has a rate of 12.5% less singles, 12.5% less doubles, etc. I have not studied the issue, but I would assume that replacement type players lose more in secondary offensive skills(power and walks) then they do in singles. Of course, you also get into an issue of whether the replacement player should be based on the various definitions of replacement level that have been offered, or whether it should be theoretical. If you are looking for a theoretical approach, assuming equal deflation of all basic offensive events can be justified.

Another concern is how to define replacement level, or baseline to be more general. I have used a default of 73% of league runs/out which corresponds to a .350 Offensive Winning Percentage which was used by Bill James and continues to be used by many analysts. Then I have used Linear Weight Ratio to estimate how their component stats would turn out. However, it might actually be more appropriate to set replacement level as a percentage of league LWR or some other approach. The method I have laid out here could be modified for other choices of definition, but it is not ready to handle another definition as is.

I will also point out that the replacement definition method has some broader applications than just replacement level. Suppose you have positional adjustments defined as a percentage of league R/O as I do elsewhere on this site. If first baseman perform at 115% of the league average R/O, what should their BA/OBA/SLG be? You can use the replacement level method here to get an estimate for that. How about you know that a park inflates runs by 10%. How much should it inflate OBA by? (If it affects all events equally, which it probably doesn't. But it could tell you what a theoretical park would do. Or maybe you know it won't affect walks, so you could hold those constant. You get the idea). I'm sure you could think up other uses as well. But that's another article.


Tango Tiger's LWR Page

Base Runs Spreadsheet

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.