This series will revolve around my attempt to make a new Base Runs (for those living in a sabermetric cave, Base Runs is the multiplicative run estimator developed by David Smyth) formula that is based only on the official offensive stats, but that includes all of the official stats. As such, it breaks no new ground, it regurgitates a bunch of stuff about run estimators that you already know, and will generally infuriate those people who say “Enough with all of the run estimators already! Quit beating that dead horse and get on to more important stuff like whatever my personal matter of interest is.” This is not a bad thing; while it is true that there has been a lot of progress made on run estimators and offensive evaluation in general, and that what I am doing here is definitely old hat, it just happens to be one of the things that I am interested in, and so I will continue to dabble in it. If you’re not interested, or think you already know what you need to know on the subject, great.
Incidentally, I’ve had this sitting here for a while now, but I decided to run it today after reading Tango’s post on BsR, RC, and Bill James at Inside the Book.
Part one will focus on establishing a long-term set of Linear Weight values that can be used as a basis for the Base Run formulas. Part two will discuss building those Base Run formulas, and part three will be a little bit about their accuracy.
To start out with, we need a good set of Linear Weight values. The best way to generate LW values, particularly as a starting point (we can’t use Base Runs to estimate them, since we are trying to build a BsR formula based on them), is empirically by measuring the change in run expectancy for each play. The linear weight of a given event is then just the average change in run expectancy for plays of that type.
One can figure LW in this manner through Retrosheet data with the requisite technical skills, but luckily, we don’t need to do that. Tom Ruane has published empirical LW for each league, 1960-2004("The Value Added Approach to Evaluating Performance" by Tom Ruane, available from the "Research Papers" link under Features). He used only the official categories, and he used all of them, which means that they are perfect for our purposes.
To get a long-term set of coefficients, I weighted each year’s figures by the league PA for that year. This process produces these weights (I carried the coefficients to three decimal places as that is how Ruane presented them):
LW = .458S + .754D + 1.035T + 1.403HR + .302(W - IW) + .172IW + .327HB + .192SB - .427CS - .232(AB - H - K - DP) - .271K - .785DP - .093SH - .010SF
There is a slight problem, which is that these weights do not add up to zero when applied to the composite stats for the period. In order to fix this, I found the shortfall per PA, and then added this to all of the events that represent a PA (in other words, all except stolen base attempts). This adds about .002 to each event, so the new weights look quite similar:
LW = .460S + .756D + 1.037T + 1.405HR + .304(W - IW) + .174IW + .329HB + .192SB - .427CS - .230(AB - H - K - DP) - .269K - .783DP - .091SH - .008SF
These now sum to zero for the period as a whole as we would expect (I have included a chart at the bottom of the post with the weights as well as the frequency of each event in the period). Obviously, these are baselined against an average player; for our purposes we need the absolute runs formula. Converting them is a very simple process. First, we need to find the runs per out (AB - H + DP + SH + SF in this case) for the period. The R/O is .162.
The R/O is then added to all of the events that include an out, and twice to DPs, since obviously those include two outs. We now have these coefficients:
LW = .460S + .756D + 1.037T + 1.405HR + .304(W - IW) + .174IW + .329HB + .192SB - .260CS - .068(AB - H - DP - K) - .107K - .459DP + .071SH + .154SF
Finally, I will use Ruane’s weights to work out a formula that includes only the basic offensive data (AB, H, D, T, HR, W, SB, CS). First, we can eliminate IW by finding the weighted average of IW and unintentional walks (.292 runs). The outs are easily taken care of by finding the weighted average for the three AB-H events (K, DP, and neither), which is -.088.
Still left to be dealt with are HB, SH, and SF. Since we are treating these as non-events, there is no pre-existing category that they can be folded into, and we are short a bunch of runs because we have pretended they don’t exist. The way I will address this is by adding the shortfall per PA (under our new definition of PA, AB + W) to each PA event (again, everything but stolen base attempts). This adds .004 to each event and gives these weights:
LW = .464S + .760D + 1.041T + 1.409HR + .296W + .192SB - .265CS - .084(AB - H)
The alternative way that we could do this is add the excess only to the hits and walks, which will actually work just a little better in practice (in terms of RMSE for the teams in our sample), so we can look at those as well:
LW = .473S + .769D + 1.050T + 1.418HR + .304W + .192SB - .265CS - .088(AB - H)