In this installment I will derive Base Run equations based on the 1960-2004 linear weights (based on Tom Ruane’s work) introduced in the last segment. I will present four different BsR versions here, which is admittedly a bit of overkill. They will each follow a slightly different approach to the BsR model. First, we need to recall that BsR is A*B/(B + C) + D. The A factor is baserunners, the B factor is advancement, the C factor is outs, and the D factor is guaranteed runs.
There are a number of different ways to define each factor. Starting with A, we could look at A as “initial baserunners” or “final baserunners”. Initial baserunners would include all runners who reached base, or, since we are only dealing with the official categories here, H + W - HR + HB. This counts all of the runners that we know have reached base, and ignores any knowledge that we have about what happened once they were on base.
The final baserunner approach starts from the same count, but then removes the baserunners we know were put out on base. There are two categories in the official stats that tell us a runner was retired on base: CS and DP (grounded into double play, but I have referred to them simply as DP throughout the series). So final baserunners would be figured as H + W - HR + HB - CS - DP.
Now we can move on to C, a factor that is usually defined as batting outs, which in this case would be AB - H + SH + SF. It could also be set up as all outs, which would be AB - H + SH + SF + CS + DP. Then you could mix and match the two versions of C with the two versions of A. So there will be one version that is initial baserunners with batting outs, one that is initial baserunners with all outs, etc. Again, this most definitely is overkill, but I think it will be interesting to take a look at the results either way, and you can choose whichever one you want (most people who have gone before have used the initial baserunner, batting out approach, and if I had to recommend just one, I would agree with that choice).
D as guaranteed runs is usually glossed over as simply being equal to home runs, but that is not inevitable. It is true, for instance, that SF are a case in the official stats for which we know for a fact that a run was scored, and SF could be included in guaranteed runs. This approach could eventually lead to some absurdities; if the record-keeping was as detailed for all events as it is for SF, we could eventually have categories like “1-RBI singles”, “2-RBI doubles”, “1-RBI triples”, and we would end up with an equation that said Runs = Runs.
Taking the opposition to SF in the D factor a step farther, the general argument can also be used against the inclusion of a situational event like a SF or DP at all. After all, sacrifice flies are simply a subcategory of flyouts, and double plays of groundouts. The record keepers have not deemed to give us such minute breakdowns with other categories--as Tango Tiger has pointed out, events like caught stealing include cases in which there is actually no out recorded, and batting outs include reached on error, etc.
However, given that the data does exist, there are some people who want to utilize it. Also, the technical versions of Runs Created, which ideally would be supplanted by Base Runs, use this data, and so those users would presumably want a replacement equation based on the same inputs. For those who prefer the more granular approach, Tango Tiger’s full BsR version has already done the heavy lifting for you.
In addition to the potential for using SF as a D input, one could also go through and add fractional values for guaranteed runs on other events. For example, there will be some proportion of triples that result in runs due to an error that allows the batter to score. One could have these fractional categories in A, C, or D factors. For example, Tango Tiger’s full BsR version counts 8% of SH towards A, as around 8% of batters credited with a sacrifice reach base safely. Another example is that one could put a fractional weight on CS in C, because not all CS result in outs. Leaving aside the question of how an estimate of “X% of triples result in runs” fits into a factor I’ve billed as “guaranteed runs” (obviously the words used to define the category can be finessed), I will just sidestep the whole issue by saying I have not dealt with fractional weights anywhere (except of course in B). That doesn’t mean that it would be illegitimate to due so.
Finally, we come back around to B, which I glossed over the first time. B is usually considered to be the nebulous “advancement”, but it can also be looked at as the balancing part of the formula, where the values of events are forced into line with what we know them to be for an average team. Since we have already established the formulas and values for A, C, and D, we can calculate the B factor necessary to force the linear weights to the long-term averages derived in the last post, and repeated here:
LW = .460S + .756D + 1.037T + 1.405HR + .304(W - IW) + .174IW + .329HB + .192SB - .260CS - .068(AB - H - DP - K) - .107K - .459DP + .071SH + .154SF
I have described the process for doing this several times, and will not further clutter this page by doing it again, but here is a link to the BsR page in Tango Tiger’s wiki, where it is covered. Now, let me define four different BsR formulas (with their naming style in homage to Bill James) that we are going to look at, and the component they use (A, B, C, D):
Full-1: iA, B, bC, HR
Full-2: iA, B, aC, HR
Full-3: fA, B, bC, HR
Full-4: fA, B, aC, HR
iA = H + W - HR + HB (initial baserunners)
fA = H + W - HR + HB - CS - DP (final baserunners)
bC = AB - H + SH + SF (batting outs)
aC = AB - H + SH + SF + CS + DP (all outs)
Writing out the formulas for each of the resulting B factors would be very cumbersome, so I have put them in chart form:
You may notice some oddities in the B weights as you peruse the table. Most problematic is that the walk coefficient is negative. This obviously will cause a whole bunch of problems for theoretical situations with extreme walk rates. Thus, I have also presented a “corrected” version (in the style of Full-1, with initial baserunners and batting outs as the definitions for the A and C factors respectively; it is “F-1W” in the table) in which the walk is given a coefficient of .025--I chose this number haphazardly, and you could very well improve it. However, my primary objective here is just to clean up the obvious problems caused by assigning a negative advancement value to the walk. This requires some juggling of the other B coefficients, and the linear weights that it generates when applied to our long-term stats will no longer be the same as the target linear weights (the modified Ruane weights). This is unfortunate, but it also is necessary in this case to avoid having a negative weight for the walk:
A = H + W - HR + HB
B = .719S + 2.098D + 3.408T + 1.887HR + .025(W - IW) - .613IW + .109HB + .895SB - 1.211CS + .121(AB - H - K - DP) - .061K - 1.701DP + .769SH + 1.155SF
C = AB - H + SH + SF
Here is a comparison of the target weights (“Ruane”), the weights generated by this equation (“Result”), and the difference (Result-Ruane):
Thus, we still have a pretty decent match for the target weights, with the walk not surprisingly as the biggest source of error.
Let me close this out by also presenting a formula that only looks at the basic events, and matches these weights that we derived last time:
LW = .473S + .769D + 1.050T + 1.418HR + .304W + .192SB - .265CS - .088(AB - H)
I have two versions here; one is with initial baserunners (which I will call B1, where iA is H + W - HR) and one with final baserunners (which I will call B2, where fA is H + W - HR - CS):
B1 = .764S + 2.169D + 3.503T + 1.985HR - .039W + .912SB - 1.258CS + .036(AB - H)
B2 = .762S + 2.247D + 3.658T + 2.098HR - .087W + .964SB + .283CS + .032(AB - H)
Finally, I apologize for any formatting errors in this post. Blogger seems to have changed its text editor, and when I copy and past from Word, it leaves just one space after a period. This annoys me as a reader, but it is a real pain in the burro to go back and add all of the necessary spaces.
EDIT: Well isn't this lovely? Now it messed up the formatting for the entire front page. If Blogger can't make their editor workable for people who aren't HTML experts, then I'm going to have to go elsewhere. I'm a poor enough writer as it is; I don't need to have a blogging platform make my posts look like they were formatted by a future Michigan alum trying to pass kindergarten on his third attempt.