tag:blogger.com,1999:blog-121333352020-03-23T19:17:08.443-04:00Walk Like a SabermetricianOccasional commentary on baseball and sabermetricsphttp://www.blogger.com/profile/18057215403741682609noreply@blogger.comBlogger604125tag:blogger.com,1999:blog-12133335.post-7702568382805788352020-03-23T19:17:00.000-04:002020-03-23T19:17:08.357-04:00Tripod: Theoretical Team Base Runs<i>See the first paragraph of <a href="https://walksaber.blogspot.com/2020/01/tripod-ability-v-value.html">this post</a> for an explanation of this series.</I><br /><br />While Base Runs is an incredibly flexible run estimator when it comes to working across a wide range of contexts, as a multiplicative formula it is not directly applicable to individual batters. However, there are a number of ways that you can use Base Runs to assist in your evaluation of batters. One way is to use Base Runs to calculate Linear Weights for your entity, and then apply these weights to the individual batters in the entity. You can find the weights for the 1978 AL and calculate Reggie Jackson's linear weights from this. Or you could find the weights for the Yankees and get a measure of Jackson's run creation in his own team context. Or you could find the weights for the Red Sox and see how many runs Jackson would have created in that context. The possibilities are close to limitless.<br /><br />However, when you calculate Jackson's value in the Red Sox's context, you have not accounted for the fact that if Jackson played for the Red Sox, he would change that context. If you want to include this effect, things get a bit more complicated.<br /><br />The basic ideas in this area were pioneered by David Tate, who published a method called Marginal Lineup Value which used Runs Created in a similar way. Keith Woolner also played an important role in the development of MLV. While the options I am detailing here are not directly adapted from Marginal Lineup Value, many of the ideas are and their work has set of the light in my head and those of others who have laid out similar techniques, so their contributions must be recognized.<br /><br />Bill James "new" Runs Created introduced in the STATS All-Time Major League Handbook and used in their other publications since then(as well as the Bill James Handbook from Baseball Info Solutions) also incorporates many of these ideas and introduced an ingenious way to state absolute results--that is, the number of total runs created rather then runs above some baseline as the Tate/Woolner method did.<br /><br />The first step in applying this method is to assume that we have a team of 8 average players each getting an equal number of Plate Appearances. Then we add the player in question to this team, with the same number of PAs as the other eight players, which we will make equal to the player in question's actual PA. Then we calculate the new A, B, C, and D factors for this team.<br /><br />Let's use Mark McGwire's 1998 season as an example of how this works. We will put him on a team that performs at the 1961-2002 composite data discussed in the BsR article. This league has a ROBA of .3007, AF of .3047, OA of .6763, and HRPA of .0230. McGwire personally compiled an A factor of 244, a B factor of 267.69, a C factor of 357, and a D factor of 70(there will be rounding differences with the spreadsheet throughout this essay).<br /><br />The non-McGwire portion of the team will have an A factor of 8*PA*LgROBA, where PA is McGwire's PA and LgROBA is the ROBA for the entity in question. We will call the 8*LgROBA portion as E. From there:<br /><br />E = 8*LgROBA<br />F = 8*LgAF<br />G = 8*LgOA<br />H = 8*LgHRPA<br /><br />For the 1961-2002 data(which I will call from here on out the "standard" or "reference" league) these values are E = 2.41, F = 2.45, G = 5.41, and H = .184.<br /><br />Then, the new A factor for the team with McGwire will be A + E*PA, where A is McGwire's personal A and PA is, again for the last time, his personal PA. Then:<br /><br />TmA = A + E*PA<br />TmB = B + F*PA<br />TmC = C + G*PA<br />TmD = D + H*PA<br /><br />We then put these together to estimate the number of runs this team will score with McGwire as TmA*TmB/(TmB + TmC) + TmD, and subtract from this the number of runs the eight players would score without McGwire. Without McGwire, the team will score LgROBA*LgAF/(LgAF + LgOA) + LgHRPA times eight times PA. We can make a formula for I:<br /><br />I = 8*(LgROBA*LgAF/(LgAF + LgOA) + LgHRPA)<br /><br />For the standard league, I = .93. Then we can make a big equation for the difference between the team with McGwire and without McGwire:<br /><br />TT BsR = (A + E*PA)*(B + F*PA)/((B + F*PA) + (C + G*PA)) + (D + H*PA) - I*PA<br /><br />Which algebraically simplifies to:<br /><br />TT BsR = (A + E*PA)*(B + F*PA)/(B + C + (F+G) * PA) + D - (I - H)*PA<br /><br />Which, for the standard league is:<br /><br />TT BsR = (A+ 2.41PA)*(B + 2.44PA)/(B + C + 7.86PA) + D - .75PA<br /><br />For McGwire, we get a value of 169.03. This can be compared to his personal BsR, calculated through the team formula, of 174.86, or the LBsR for McGwire when you use the linear weights derived by BsR for the standard league of 168.38. So you can see that since McGwire was a high-production player, his personal BsR is higher then what you get if you put him on a standard team. But since McGwire personally alters the run environment of the team he is added to, his TT BsR is higher, although only slightly, then his LBsR.<br /><br />We can also find McGwire's TT BsR above other baselines then absolute. I will use average here and below will (tentatively) sketch out a procedure to use replacement level(or any other baseline for that matter). To apply an average baseline, all we have to do is compare McGwire to a team of 9 average players rather then 8 average players. We can use the same formulas as above, except that for this team I will be figured as:<br /><br />I = 9*(LgROBA*LgAF/(LgAF + LgOA) + LgHRPA)<br /><br />I = 1.05 for the standard league, which gives this equation for TT BsR Above Average:<br /><br />TT BsRAbvAvg = (A+ 2.41PA)*(B + 2.44PA)/(B + C + 7.86PA) + D - .87PA<br /><br />For McGwire, this gives a value of +90.76 runs above average.<br /><br />These formulas are very long and confusing. One thing we can do is differentiate them and state them as a new set of custom LW for the team now that we have added the player. The formula for this is:<br /><br />LW = ((B + C + (F + G)*PA)*((A + E*PA)*(b + F*p) + (B + F*PA)*(a + E*p)) - (A + E*PA)*(B + F*PA)*(b + c + F*p + G*p))/((B + C + (F + G)*PA)^2) + d - I*p + H*p<br /><br />In this formula, p is the derivative of the plate appearance function for each event, where PA = AB + W + HB + SH + SF. In the case of McGwire, we know that the LBsR weights for the standard league(displayed as S, D, T, HR, W, O) are: .476,.806,1.136,1.495,.320,-.095 which gives him 168.38 runs. Using the formula above, we get .490,.823,1.157,1.499,.331,-.103 which produces 169.03. These results are similar to calculating the new rate stats for the team with our player added. For example, TmROBA = 1/9*ROBA + 8/9*LgROBA, and on in this fashion, and then use the classic LW from BsR formula to find the LW(TmROBA is A, TmAF is B, etc.)<br /><br />You can also use the above formula with the Above Average TT formula--the only difference is that you have to use the different I for the nine-man lineup. For McGwire, this gives these LW: .373,.707,1.040,1.383,.215,-.220. The effect of this technique is to subtract the League R/PA(as figured by BsR) from each event that accounts for a PA(or in the lingo of the method above, has p = 1), and makes no change to any event that does not accounts for a PA(p = 0). This happens because the only difference in the two formulas is the difference in the I values. The I value above average is 9*LgR/PA, and the I absolute I value is 8*LgR/PA. So the difference is LgR/PA, but this is only multiplied by PA. So an event like a steal that does not account for a PA does not lose any value at all between the two formulas. This probably illustrates that the TT Average technique is a shortcut but not a solution, because the difference is based on subtracting PA rather then comparing to outs or team outs, etc. The best way to find the TT BsR above some baseline would probably be to first find the Absolute TT BsR and then apply some baseline comparison as you would with any other runs created estimate.<br /><br />When the Theoretical Team procedure is applied to Runs Created, it just so happens that TT RC = 1/9*Traditional RC + 8/9*Linear RC. In the past, I have incorrectly used this fact as the proof in my mind and said that the same was true for Base Runs. It is not true. I am not quite sure the technical reasons why this is, but I believe it is because the RC formula is pure multiplication. A*B*(1/C) if you will. But BsR involves two additions(B+C and adding D to the whole thing), and I think this eliminates the property. Anyway, it still comes pretty close to this. You can set up this equation:<br /><br />TT BsR = x(BsR) + (1 - x)(LBsR)<br /><br />If you solve for x:<br /><br />x = (TT BsR - LBsR)/(BsR - LBsR)<br /><br />If you do this for McGwire, you find that his TT BsR is made up 10.6% of his Straight BsR and 89.4% of his Linear BsR.<br /><br />So far we have assumed that the player keeps the same number of PAs he had in actuality when we move him onto a new team. But we know that this, too, is a simplification. Just as the batter changes the run values of the team he is on by changing the context, his ability to avoid outs(or, equivalently ignoring outs made on the basepaths, get on base) will directly impact the number of Plate Appearances his teams will have in which to score runs. To account for this, we will add a new factor called PAR to the Theoretical Team BsR formulas.<br /><br />Before we do this, though, it should be pointed out that when we do this we are leaving the realm of attempting to estimate the number of runs the player has actually created and are trying to estimate the number of runs the player would theoretically create if added to an otherwise average team. For one thing, the player's actual PA already incorporate the effect of the extra PAs he adds by getting on base. So we can easily overstate his impact by allowing him to further inflate his PA on an average team after inflating his own PA on his own team. If the team he actually plays for has an above average rate of getting on base with him included, we will overstate the PA he will wind up with on his theoretical team. What we could do is find the actual percentage of his actual team's PAs that he used, convert this to an equivalent percentage on an average team, and plug that into the formula.<br /><br />However we choose to do this, we will have some number for PA and go from there. The first step will be to calculate what I will call Not Out Average(NOA). NOA is simply the percentage of Plate Appearances that do not result in outs as recorded in the official statistics. NOA = (H + W + HB - CS - DP)/(AB + W + HB + SH + SF). We will further say that the denominator AB + W + HB + SH + SF = P(replacing PA in the formulas to come), and that the numerator H + W + HB - CS - DP = N. The derivatives of these(with each event that is counted in P or N has a p or n of 1 respectively) will be called p and n.<br /><br />We will first calculate the NOA for the team with our player added as TmNOA = NOA*(1/9) + LgNOA*(8/9). We know that PA/G can be estimated as X/(1 - NOA), where X is the number of outs/game in the league that are accounted for in the official statistics. So we want the ratio between the PA/G for the team with our player and PA/G without our player, which we will call PAR for PA Ratio(this is a term I have borrowed from David Smyth). PAR = (X/(1 - TmNOA))/(X/(1 - LgNOA)). Simplifying this results in PAR = (1 - LgNOA)/(1 - Tm NOA). Running through this with McGwire, the LgNOA = .3150, NOA = .4680, TmNOA = .3320, and PAR = 1.0254. So an average team with McGwire getting 1/9 of their PA will wind up with 2.54% more PA then a totally average team.<br /><br />We then need to change each factor of that we put in the BsR equation to account for PAR. For example, we started with TmA = A + E*PA. When PAR is incorporated, this is now TmA = A*PAR + E*PA*PAR, which can be rewritten as TmA = (A + E*PA)*PAR. The TmB, TmC, and TmD calculations are analogous. We then simply substitute these formulas into the original TT BsR formulas to get:<br /><br />TT BsR w/ PAR = PAR*((A + E*P)*(B + F*P)/(B + C + (F + G)*P) + (D + H*P)) - I*P<br /><br />Remember, we are now using P as the abbreviation for our player's Plate Appearances. As you can see, the I*P portion is not multiplied by PAR. This is because this part represents the number of runs the team would score without our player. PAR measures the effect of our player on the team PA/G, so it is irrelevant to how many runs the team would score if he did not play for them.<br /><br />Just as with the original formula, we can easily compare to average by changing the I value as done previously. With PAR, we find McGwire's absolute TT BsR as 189.26 and +110.99 above average.<br /><br />Just as we have done previously, we can differentiate this equation to see the intrinsic linear weights that it uses. It is a long formula with an even longer derivative, so I will break the derivative up into two pieces.<br /><br />The first step is to find the derivative of PAR with respect to each event. This is done by first differentiating NOA with respect to each event to get dNOA/dX, where X is S, D, T, HR, etc. Then we differentiate PAR with respect to NOA to get dPAR/dNOA. From here, (dPAR/dNOA)*(dNOA/dX) = dPAR/dX. This results in this formula:<br /><br />dPAR/dX = (1/9)*(1 - LgNOA)/((1 - TmNOA)^2)*(P*n - N*p) /(P^2)<br /><br />We can then differentiate the entire PAR TT BsR equation to get the formula for the linear weights there. In the equation below, dPAR/dX represents the derivative of PAR, figured by the above formula, with respect to whatever event we are differentiating the PAR TT BsR formula for:<br /><br />LW = PAR*((B + C + (F + G)*P)*((A + E*P)*(b + F*p) + (B + F*P)*(a + E*p)) - (A + E*P)*(B + F*P)*(b + c + F*p + G*p))/((B + C + (F + G)*P)^2) + ((A + E*P)*(B + F*P)/(B + C + (F + G)*P) + D + H*P)*(dPAR/dX) - I*p<br /><br />Yes, that is the longest sabermetric equation I have ever published on this website, or anywhere else for that matter. When we do this for Big Mac, we find .633,.975,1.317,1.669,.471,-.176. Again, by changing I to the average value we can get the LW for TT BsR Above Average W/ PAR, and again the difference is to subtract LgR/PA from each event where p = 1.<br /><br /><b>Applying Replacement Level</b><br /><br />This is a real pain to calculate, and I don't use it, but I think it is a useful discussion to have for a number of reasons. If I wanted to apply a replacement level to TT BsR, I would calculate Absolute TT BsR and then apply the baseline from there. But we will look at the alternative.<br /><br />To calculate Absolute TT BsR Above Replacement, all we would have to do is find an I value that would represent runs/PA for a team with 8 average players and 1 replacement player. The 8 average players part is easy, but in order to figure the replacement player in, we need to know how he will hit in terms of ROBA, AF, OA, and HRPA. Usually, though, we set replacement level as some percentage or linear difference of run production(be it in terms of per out or per PA, or Wins Above Average per PA, or R+/O+, or R+PA, etc.). But those assumptions don't tell us how the player will hit in terms of basic offensive events, just total production.<br /><br />I will use 73% of the league runs/out as the baseline in this article(see the "Baselines" article for discussion of this), although you can apply a different baseline and still use the outlines of my procedure to do it. The first step will be to understand the Linear Weights Ratio(LWR). There are probably alternative ways to do this, but I have done it this way and it suits my purposes.<br /><br />LWR is a great tool invented by Tango Tiger that uses the LW coefficients and converts it into a ratio of positive run production to outs. I have linked his little article on it at the bottom of the page, but will cover the basics again here. Before I start, I should discuss the treatment of various events in my concept of replacement level here. I am assuming that a replacement level player is a replacement level player because of his hitting performance(S, D, T, HR, W, outs). He will steal bases, bunt, hit sac flys, hit into DPs, etc., at a league average rate. There are certainly debatable assumptions in there, but you have to keep things reasonably simple.<br /><br />To establish LWR, we put the positive value of S, D, T, HR, and W in the numerator. We then set the single weight to one and rescale all of the other coefficients based on their ratio to singles. So let d = LW(double)/LW(single), and t = LW(triple)/LW(single), etc. Then we have this formula(all of the terms in the formulas that follow unless otherwise marked apply to league statistics):<br /><br />LWR = (S + d*D + t*T + hr*HR + w*W)/(AB-H)<br /><br />For the standard league:<br /><br />LWR = (S + 1.693*D + 2.386*T + 3.139*HR + .671*W)<br /><br />Once we have this, we can this fact about LWR:<br /><br />Runs/Out = LW(single)*LWR + LW(out)<br /><br />For our league, the Runs/Out from LWR is .172(the LWR itself is .562), and the LW out value is -.095. 73% of .172 this is .126. What LWR will produce a R/O of .126? First, let x be the replacement rate(73%). Then RepLWR is given by the equation:<br /><br />RepLWR = (x*(LgR/O) - out value)/LW(single)<br /><br />This results in .464, which converts back to .126 runs/out.<br /><br />So we know that a replacement player will put up a LWR of .464. Now we need to convert this relationship back into the effect on his component stats. What we do first is find a value that I will call Y. Y is the ratio of the quantity of "positive" in the LWR that the league has generated from a given event divided by the quantity of "positive" it has generated from singles. To illustrate, the standard league has a single per PA of .166. On a per PA basis, the positive LWR contribution of singles is 1*.166 = .166. The league has double/PA of .041. The positive LWR contribution of doubles per PA is 1.693*.041 = .069. .069/.166 = .418 is the Y value for doubles. Sum up the Y values for all events(including singles). Or if you prefer a formula:<br /><br />Y = 1 + (d*D/P)/(S/P) + (t*T/P)/(S/P) + (hr*HR/P)/(S/P) + (w*W/P)/(S/P)<br /><br />Y is 2.290 for the standard league.<br /><br />We also need another quantity, Z. Z is simply the ratio of the rate of a given event divided by the rate of singles. So Z for doubles is .041/.166 = .244, and the formula for the summed Z values is:<br /><br />Z = 1 + (D/P)/(S/P) + (T/P)/(S/P) + (HR/P)/(S/P) + (W/P)/(S/P)<br /><br />Z is 1.950 for the standard league.<br /><br />What exactly have these Y and Z steps done? They have converted all of the contribution of doubles, triples, home runs, and walks into an equivalent number of singles. What we are saying is that for the standard league, the quantity of positive LWR is equivalent to 2.290 times the number of singles(this is Y), and the number of runners on base is equivalent to 1.950 times the number of singles(this is Z). This procedure is in a similar spirit to the "Willie Davis method" introduced by Bill James in the New Historical Baseball Abstract, in which he expresses everything in terms of an equivalent number of hits. Why does he do this? Because it allows you to have one variable to solve for in an equation instead of five. Once we find the value of S that we are looking for, we can convert this back into D, T, HR, and W values.<br /><br />What we are after is the rate at which a replacement player would hit singles to produce a .464 LWR. We have this equation:<br /><br />RepLWR = Y*X/(1 - Z*X)<br /><br />Where X is the S/PA for the replacement player. The equation to solve for X is:<br /><br />X = RepLWR/(Y + RepLWR*Z)<br /><br />So for the standard league, X = .145. The replacement player will get a single in 14.5% of his PAs compared to 16.6% for an average player. Since we have assumed that S, D, T, HR, and W will all be reduced by the same percentage, we divide .145 by .166 to get the "Multiplier". So Multiplier = X/(S/P) and is .875 for the standard league. So the replacement level player in the standard league will hit singles, doubles, triples, homers, and draw walks, at 87.5% of the rate that an average player would. Just to be absolutely clear, Rep(D/P) = (D/P)*Multiplier, and so on.<br /><br />For the out value, there are two mathematically equivalent techniques. One is to find Rep(O/P) as 1 - Rep(S/P) - Rep(D/P) - Rep(T/P) - Rep(HR/P) - Rep(W/P). The second is to figure Rep(O/P) as 1 - (1 - O/P)*Multiplier. The second equation is essentially equivalent to saying that the OBA for the replacement player will be 87.5% of the OBA for the average player as well.<br /><br />Once we have calculate the S, D, T, HR, W, and O per PA for a replacement player, we can calculate the ROBA, AF, OA, and HRPA for him(ignoring all terms other then those we have for the replacement player*S, D, T, HR, W, and O). We then calculate rtROBA as (1/9)*RepROBA + (8/9)*LgROBA(rtROBA is "replacement team" ROBA; that is, a team that is 8/9 average and 1/9 replacement). We calculate the other terms similarly and then figure the I value for the replacement comparison as:<br /><br />I = (rtROBA*rtAF/(rtAF + rtOA) + rtHRPA)*9<br /><br />For the standard league, I = 1.02 and McGwire is +108.38 runs above replacement. We can also apply PAR using the same formulas as above.<br /><br />Let me now just briefly discuss the method I used to find stats for a replacement player. One major weakness that I already mentioned was limiting the difference between the replacement player and an average player to only the basic hitting events. Another is that I assume that among the basic hitting events, all deflate equally. The replacement player in the standard league has a rate of 12.5% less singles, 12.5% less doubles, etc. I have not studied the issue, but I would assume that replacement type players lose more in secondary offensive skills(power and walks) then they do in singles. Of course, you also get into an issue of whether the replacement player should be based on the various definitions of replacement level that have been offered, or whether it should be theoretical. If you are looking for a theoretical approach, assuming equal deflation of all basic offensive events can be justified.<br /><br />Another concern is how to define replacement level, or baseline to be more general. I have used a default of 73% of league runs/out which corresponds to a .350 Offensive Winning Percentage which was used by Bill James and continues to be used by many analysts. Then I have used Linear Weight Ratio to estimate how their component stats would turn out. However, it might actually be more appropriate to set replacement level as a percentage of league LWR or some other approach. The method I have laid out here could be modified for other choices of definition, but it is not ready to handle another definition as is.<br /><br />I will also point out that the replacement definition method has some broader applications than just replacement level. Suppose you have positional adjustments defined as a percentage of league R/O as I do elsewhere on this site. If first baseman perform at 115% of the league average R/O, what should their BA/OBA/SLG be? You can use the replacement level method here to get an estimate for that. How about you know that a park inflates runs by 10%. How much should it inflate OBA by? (If it affects all events equally, which it probably doesn't. But it could tell you what a theoretical park would do. Or maybe you know it won't affect walks, so you could hold those constant. You get the idea). I'm sure you could think up other uses as well. But that's another article.<br /><br /><a href="http://www.tangotiger.net/lwr.html"><br />Tango Tiger's LWR Page</a><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTdGXFd54H4cnrTTskAWPQ7iYJmhSOnnn3t6NQpUjjCuTrrdPg97tYhkxjAzu4VSQ/pub?output=xlsx">Base Runs Spreadsheet</a>phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-64499480897689199382020-03-10T17:37:00.000-04:002020-03-10T19:04:26.912-04:00Tripod: Base Runs II<i>See the first paragraph of <a href="https://walksaber.blogspot.com/2020/01/tripod-ability-v-value.html">this post</a> for an explanation of this series.</I><br /><br />This is the second page on Base Runs that I have written up for this site. It makes no attempt to cover new concepts that weren't addressed in the original page. What it does try to do is write-up the information from the first page in a more accessible way. I have seen comments that the Base Runs page on this site is hard to understand. Unfortunately, the main cause of this problem is probably my writing style and skill (or more appropriately lack thereof). However, it is true that the original page was created by adding on new concepts as time passed, and therefore is somewhat of a hodge-podge of different ideas, written at different times, without a comprehensive master plan in mind. This page will attempt to address this.<br /><br /><b>Philosophy and Origins of Base Runs</b><br /><br />Base Runs is a run estimator developed in the early 1990s by David Smyth. Like Runs Created, BsR is designed to estimate the number of runs that a team would score. Methods of this type attempt to incorporate the interactive effect of offensive events. A linear weights formula like Extrapolated Runs or Estimated Runs Produced (or even, essentially, Clay Davenport's Equivalent Runs) applies a static run value to each event. Usually these formulas weight walk at around 1/3 of a run. And in most circumstances, this is a good estimate of the number of runs that will result from a walk. But in a game in which a team draws a walk and makes 27 outs, the walk will not have the same value. In fact, since estimators like ERP apply a value of about -1/10 of a run for every out, it will predict somewhere in the neighborhood of -2.4 runs for that game. This answer is obviously wrong.<br /><br />The reason why this is that linear formulas are designed to work with a certain range of data that corresponds to the range in which normal major league teams perform. When you apply the method in these contexts, it will give very accurate estimates. But when you attempt to take the method outside of the context in which it was developed for, problems will result. None of this is meant to put down linear formulas which are very useful in sabermetrics. It only stands to illustrate the much more difficult task that BsR or RC attempt to perform. Ideally, they should be models of run scoring that work over a wide range of contexts and can give an accurate estimate for unusual or extreme situations. Another way to look at this is that Base Runs generates custom linear weights that are intrinsically generated and then applied in all situations.<br /><br />Unfortunately, Runs Created does a very poor job of estimating in extreme contexts--in fact, in many cases poorer then linear methods! The reason for this is that while RC is constructed based on reasonable principles of how an offense it works, it does not recognize certain constraints on the number of runs that will be scored.<br /><br />For example, a home run will always produce at least one run. It does not matter if every other batter has made an out, the team will get a run if they go deep. The Basic RC formula of (H+W)*TB/(AB+W) would predict (1+0)*4/(28) = .14 runs for a team that hit a homer and made 27 outs. But we know that they must score at least one run.<br /><br />Furthermore, if all you do is hit home runs, each home run will produce just one run. Suppose a team entered the bottom of the ninth trailing by two runs, and the first two batters hit home runs. RC would predict (2+0)*8/2 = 8 runs, or 4 from each home run. This is another impossibility.<br /><br />Another "known point" is the case of all outs. You will score zero runs if all you do is make outs, but you cannot wind up with negative runs. RC correctly predicts zero runs, but all linear methods must predict negative runs below a certain level of production in order to have any accuracy in normal levels.<br /><br />Base Runs gives much more reasonable estimates in these extreme circumstances. This is because it starts with a true model of how runs are scored. Each batter that comes to the plate will eventually do one of three things: make a batting out, hit a home run, or reach base. Once he has reached base, there are three more potential outcomes: he will score, make an out on the bases, or be left on base at the end of the innings. Simplifying further, an identity for the number of runs scored can be written as Baserunners * % of base runners who score + Home Runs. This is an undeniably true statement. BsR uses this model to derive an estimate of runs scored.<br /><br />Although the identity is undeniably true, the estimates that the formula uses are not. If we are given a team's offensive statistics but not their runs scored, we can never know for sure what percentage of baserunners will score--if we knew this, we would have a method with 100% accuracy. We do know for sure the number of home runs, and we do have a very good estimate of the number of baserunners (but we don't know, for instance, how many runners will be retired stretching doubles into triples). It is the percentage of baserunners who score that involves an estimate that is not assured of being almost 100% correct, and therefore this component is a crucial determinant of the accuracy of the estimate.<br /><br />Smyth broke his formula into four factors denoted as A, B, C, and D. A is simply the number of baserunners. D is simply the number of home runs. B is the "advancement factor", representing the advance of baserunners towards scoring. C is the number of outs. B/(B + C) serves as the estimate of the number of runs that will score. Putting it all together, the construct for BsR is:<br /><br />BsR = A*B/(B + C) + D<br /><br />Or if its easier for you to see this way:<br /><br />BsR = A*(B/(B + C)) + D<br /><br />An important note here is that the use of B/(B+C) is not an inevitable one. Any formula that accurately estimates the percentage of baserunners that will score could be used. However, the basic B/(B+C) model developed by Smyth is the most accurate currently known. It may well be possible to improve the accuracy, but it would probably involve a much more confusing or expansive formula. The important point is that B/(B+C) is used because it has been empirically shown to work.<br /><br />Other run estimators have incorporated the idea of Runs = baserunners*% who score + HR, such as Eric Van's Contextual Runs. Van modeled the scoring percentage as B/C, where B was advancement (although with radically different weights) and C was outs. Using this ratio results in poorer results when the number of outs is low, though. But BsR potentially could be improved if a more accurate model of the percentage of baserunners who score was found.<br /><br /><b>Base Runs formulas</b><br /><br />Many different formulas for Base Runs have been created and used. This has led to some confusion about what the "true" or "official" formula was. One of the great beauties of BsR is that it is very flexible and the basic construct can lead to many different versions. But in the interest of alleviating some of the confusion, Smyth published three versions, each designed to work with different datasets. The most basic of these is:<br /><br />A = H + W - HR<br />B = (1.4*TB - .6*H - 3*HR + .1*W)*1.02<br />C = AB - H<br /><br />Another version included all of the offensive events (contained in the official statistics) with the exception of sacrifices:<br /><br />A = H + W + HB - HR - .5*IW<br />B = (1.4*TB - .6*H - 3*HR + .1*(W + HB - IW) + .9*(SB - CS - DP))*1.1<br />C = AB - H + CS + DP<br /><br />Finally, a version applicable with official pitching statistics:<br /><br />A = H + W - HR<br />B = (1.4*TBe - .6*H - 3*HR + .1*W)*1.1<br />C = 3*IP<br />Where TBe = 1.12*H + 4*HR<br /><br />Another important version of the formula was published by Tango Tiger. He developed it from the use of 1974-1990 play-by-play data from Retrosheet, so it includes many categories that aren't included in the official statistics. It is best that you read Tango's explanation of this formula if you are interested, so please visit <a href="http://www.tangotiger.net/bsrexpl.html">his article</a>.<br /><br /><b>Applying BsR</b><br /><br />There are many different ways to apply BsR, and this section does not purport to examine all of the possibilities. There are some basic principles to lay out, though. The key is that BsR should NOT be applied to individual hitters. Base Runs models the run scoring of a team. Individual players do not act as entire teams--they act as one part out of nine in a team. Barry Bonds' walks do not interact directly with his home runs--they interact with the home runs of his entire team. So it is wrong to apply BsR to individual hitters, just as it is to apply RC to individual hitters. It is true that applying BsR to individual hitters will often result in a decent estimate and will do better then RC (because the flaws of RC combine with the incorrect application to produce even worse results), but it is not recommended. In general, it will serve to overrate good offensive player's run production and underestimate for bad players.<br /><br />But by the same logic, you should apply Base Runs to individual pitchers, because when a pitcher is in the game, his performance interacts with no other pitchers. He is the lone pitcher for his team, and he dramatically affects the run environment he pitches in--in fact, he alone determines it (to the extent that a pitcher can). Linear Weight formulas, as discussed earlier, are designed to work in normal major league team contexts. Replacing an average hitter with even an extreme hitter like Bonds does change the run environment, but generally not enough to severely impact the accuracy of the LW estimate. This is not at all true for pitchers. A team that hit all the time as the average batter does against Johan Santana would be laughed out of the league and would fall outside of the range of best accuracy for LW formulas. Base Runs attempts and does a good job at adapting to these extreme circumstances and should be applied to pitchers and teams, but not individual batters.<br /><br /><b>Versions of BsR Used on This Site</b><br /><br />On this site, I use three BsR formulas: one that incorporates only the basic offensive events, one that incorporates SB and CS, and a third that incorporates all of the official offensive categories. I do not claim these formulas to be more accurate or "better" then others--in fact, they are probably less accurate then other formulas. However, they still are very accurate at estimating runs scored and can be used without too much concern. I have used them for the examples of other concepts involving BsR on this page and in the accuracy test published here.<br /><br />While the A, B, C, and D factors all have straightforward definitions, this does not make the choice of which events to put in them inevitable. David Smyth, Tango Tiger, Robert Dudek, myself, and possibly others have developed BsR versions and have used different philosophies to guide what to include in each factor. For instance, Smyth once published a version with D = HR + SF, since like HR, SF are guaranteed runs. Another common quandary is whether CS should be a loss of a baserunner, an additional out, or both. As we will see later, there are also advantages to giving each event a B value, even if it has been included in the other factors.<br /><br />In the versions presented here, I have used the following thinking to guide my choices. I don't claim these choices as the correct or best choices, but to me, they are the most logical and easiest to work with.<br /><br />The A factor represents "final" baserunners. What I mean by this is that it is number of baserunners that, as far as we can tell from the official statistics, were not retired once they reach base. So, in versions that utilize those stats, caught stealings and double plays are removed because those runners are known to have been out.<br /><br />The B factor, which represents advancement as always, includes all events with the exception of outs in the first two versions. However, the "full" version that incorporates all of the official offensive categories puts every event in the B factor, as this greatly helps to balance the formula and makes it much easier to construct.<br /><br />The C factor includes batting outs; outs made by BATTERS. So CS and DP are not batting outs; the baserunner was caught stealing and the fact that the batter was retired on the double play was already accounted for in his AB-H total. But SH and SF are batting outs.<br /><br />The D factor is home runs, always. While it is true that we know for each SF a run will score, I consider this an accident of the official statistics and not a fundamental facto of baseball. For instance, we could also easily have an official statistic for "RBI Groundouts". But we do not. And suppose the statistics broke down each hit type into "RBI Singles" or "Non-RBI Triples", etc. If we put each of these events into D, we would eventually wind up with a formula that eventually just said that Runs = Runs. For this reason, I do not consider SF as a "guaranteed run" under the BsR definition. Maybe you could define D as "guaranteed runs created without the use of a baserunner", since the batter who hits a home run does not become a baserunner and the SF requires a runner on third base to result in a run.<br /><br />Based on these underpinnings, here are the formulas used on this site for Base Runs (I should point out that the basic and SB versions were actually originally published by David Smyth a few years ago, but I have continued to employ them):<br /><br />BASIC<br /><br />A = H + W - HR<br />B = (2*TB - H - 4*HR + .05*W)*.78 = .78*S + 2.34*D + 3.9*T + 2.34*HR + .039*W<br />C = AB - H<br /><br />STOLEN BASE<br /><br />A = H + W - HR - CS<br />B = (2*TB - H - 4*HR + .05*W + 1.5*SB)*.76 = .76*S + 2.28*D + 3.8*T + 2.28*HR + .038*W + 1.14*SB<br />C = AB - H<br /><br />FULL<br /><br />A = H + W + HB - HR - CS - DP<br />B = .777*S + 2.61*D + 4.29*T + 2.43*HR + .03*(W + HB - IW) - .747*IW + 1.30*SB + .13*CS + 1.08*SH + 1.81*SF + .70*DP - .04*(AB - H)<br />C = AB - H + SH + SF<br /><br />An alternate B factor incorporated a different value for strikeouts then other outs:<br /><br />B = .781*S + 2.61*D + 4.28*T + 2.42*HR + .034*(W + HB - IW) - .741*IW + 1.29*SB + .125*CS + 1.07*SH + 1.81*SF + .69*DP - .029*(AB - H) - .086*K<br /><br /><b>Determining the B Factor</b><br /><br />Since the B factor is where the most estimation is involved (in fact, if you follow a strict definition of the factors as I did above, it is the only place where you have any choices t make in developing a formula), it is often possible to improve accuracy by tweaking it. Also, if one wishes to perform a regression equation to find B coefficients, he would need to know the actual B value necessary to equal runs scored for the entity (team or league generally, but an individual player or any combination of baseball data could be considered an entity as well) in question. Here are two equivalent methods to determine what I will call ActB, the actual B factor.<br /><br />The first is just to do algebra to rearrange the formula R = A*B/(B + C) + D to solve for B. You wind up with B = (R - D)*C/(A - R + D). A second way is to determine the actual percentage of baserunners that score, which I'll denote as Z. Z = (R - D)/A, which leads to B = Z*C/(1 - Z).<br /><br />To adjust the B factor of a given formula, just find the value of your formula B for the entity in question and call it EstB. Then ActB/EstB is multiplied by the B coefficients you have, and then you wind up with a new B formula for your entity.<br /><br /><b>Accuracy of BsR</b><br /><br />Various questions have been raised about the accuracy of BsR. Some people have claimed that since Base Runs purports to be accurate in extreme contexts, it must necessarily give up accuracy with normal teams. Other people are caught in the "accuracy trap"--they claim that the best run estimator is the one with the lowest Root Mean Square Error (RMSE) when applied to normal team data.<br /><br />I will address the later viewpoint first. Almost by definition, the highest accuracy in terms of RMSE will come using a linear multiple regression equation for runs. However, regression is a purely statistical tool and does not consider the fundamental facts of baseball as BsR does, or even to a lesser extent the human developers of other run estimators have. Related to this, regression equations are tailored specifically to idiosyncrasies within their dataset and will not hold up when applied to a different dataset (although a larger sample size does help). While regression equations can be useful, very few people who are in the camp of "lowest RMSE" advocate using regression equations. This causes me to question their true adherence to this belief. Methods like Extrapolated Runs attempt to blend results from regression equations, skeletons, and empirical linear weights (see the "Linear Weights" article on this site), often sacrificing theoretical accuracy for results. A hybrid method like XR does test with greater accuracy then BsR in general, but at what cost?<br /><br />Other run estimators simply cannot be trusted in their estimations in extreme contexts. Base Runs has its flaws too, but is generally a much better estimator across the entire spectrum of production. Methods like XR may be more accurate, slightly, on normal teams, but are far, far less accurate on extreme teams. This may be a trade that you are willing to make, depending on your needs, but it is not an inevitable one to every sabermetrician.<br /><br />As to the claim that Base Runs does not have comparable accuracy when applied to regular teams as other run estimators, this is simply not true. The Stolen Base version of BsR presented above has a lower RMSE when applied to 1961-2004 data (excluding the strike-shortened seasons of 1981 and 1994) then does Stolen Base RC, ERP, Equivalent Runs, or Ugly Weights. The only methods which beat it in the test, which can be seen on the "Accuracy" page on this site, were a regression equation based on those teams, and XR. Base Runs' accuracy with actual teams is comparable to any of the other run estimators that have been published, and in many cases, better.<br /><br /><b>Writing Base Runs as a Rate</b><br /><br />It is often helpful to be able to write Base Runs or other run estimators in terms of rates rather then raw numbers. The easiest way to do this is to calculate BsR/PA. To do this, simply divide the A, B, C, and D factors by plate appearances (figure PA using whatever data is used in the specific BsR equation you are using). I call A/PA "Runners On Base Average" (ROBA), B/PA "Advancement Factor" (AF), C/PA "Out Average" (OA), and D/PA simply as Home Runs per Plate Appearance (HRPA). Then BsR/PA is very simple:<br /><br />BsR/PA = ROBA*AF/(AF + OA) + HRPA<br /><br />One advantage of the Basic BsR employed on this page is that it is written without knowing singles, doubles, and triples specifically, just hits, total bases, and home runs. This may not result in the most precise equation, but it does allow the rate stats above to be written in terms of BA, OBA, SLG, and HRPA (where OBA is just (H+W)/(AB+W)):<br /><br />ROBA = OBA - HRPA<br />AF = ((2*SLG - BA)*(1 - OBA)/(1 - BA) - 4*HRPA + .05*(OBA - BA)/(1 - BA))*.78<br />OA = 1 - OBA<br /><b><br />Linear BsR</b><br /><br />Base Runs, as already discussed, is a multiplicative formula. However, there are many advantages to Linear Weight formulas, including their ability to be used as a measure of individual hitter performance. Since Base Runs is an accurate estimator of run scoring across a wide range of contexts, we can use it to estimate linear weight values across a similarly wide range of contexts.<br /><br />When Base Runs or any other estimator, evaluates a certain set of data, it intrinsically weights the various events. Because BsR is a multiplicative formula, the intrinsic weights will vary from entity to entity and from context to context. This gives us custom, dynamic linear weights--if we can find the intrinsic weighting used in the estimation of each entity's runs scored.<br /><br />One way to do this is to take the data that we have for an entity, add a certain number of events to it, recalculate BsR, and then find the difference between our new estimate and the original estimate, and divide by the number of events added to find the value of each event added. That is a mouthful, so I will spell it out more clearly with an example.<br /><br />Take the famed 1961 Yankees as an example. They had 987 singles, 194 doubles, 40 triples, 240 home runs, 543 walks, and 4098 outs. Plug this into the basic BsR formula:<br /><br />A = H + W - HR = 1764<br />B = (2*TB - H - 4*HR + .05*W)*.78 = 1962.597<br />C = AB - H = 4098<br />D = HR = 240<br /><br />Plugging this into BsR, we get 1764*1962.597/(1962.597 + 4098) + 240 = 811.2343368<br /><br />Now suppose we added 10 singles. A would increase to 1774, B would increase to 1970.397, and C and D would remain the same. Our new BsR estimate would be 816.0144364, a difference of 4.780099617. Since we added 10 singles, each single would be worth .4780099617 runs. That is the LW per single of 10 added singles for the 1961 New York Yankees.<br /><br />This doesn't truly isolate the value of a single in the Yankees true context, though, because when we add ten singles, we change the context and we affect all of the other values. The larger the change in context, the further we get from an estimate that relates to the actual context. If we added 1000 singles, for example, we would raise the Yankees Batting Average from .263 to .375. This would radically change the context and the estimate of the additional value of each single would have almost no connection to the original context we wanted to evaluate.<br /><br />Now, the differences for ten singles will probably not be that bad. But if we want more precision, we should add less events. So let's add just one single. This is what I and others in the past have used to evaluate linear weights from multiplicative formulas and called the "+1 method". If we add one single to the Yankees, we find a LW value of .477405417.<br /><br />Adding one single still changes the context, though. So let's add progressively less singles and see what happens. If we add .1 singles, the LW value is .477344886. If we add .01, it is .477338832. If we add .00001, it is .477338142. As you can see, the values are changing less and less each time. But we still have not completely isolated the value of a single for the 1961 Yankees, because we are still changing the context, albeit by a very small amount.<br /><br />What we really want to do is add the smallest amount of singles that we possibly can; we want an infinitesimal number of singles. We want to find the change in LW per event added as we add an event that is almost zero. What we want, mathematically speaking, is the limit of the change in LW, divided by X, as X approaches zero, where X is the number of events we add. This concept is called the derivative in calculus.<br /><br />Since Base Runs has multiple variables, we need multivariable calculus to find this limit. This is done through a technique called "partial differentiation". I am not a calculus teacher, and so I cannot explain all of the details of how to do this with BsR. What I can do is give you a formula that you can apply.<br /><br />Let A, B, C, and D be the totals calculated for our entity from the A, B, C, and D formulas, and let a, b, c, and d be the coefficient for each event in the A, B, C, and D formulas we are using (zero if the event is not included). Then the Linear Weight of a given event is equal to:<br /><br />LW = ((B + C)*(A*b + B*a) - (A*B)*(b + c))/((B + C)^2) + d<br /><br />When you find the coefficient of each event in each factor, you need to look at the full, expanded equation for each factor. Take A for example. A = H + W - HR. But H = S + D + T + HR. So actually, if you expand it, A = S + D + T + HR + W. So the coefficient of each of those events is 1. The HR coefficient is NOT -1, because H + W - HR is just an easy way to write what we actually mean, which is S + D + T + W. This can be tricky, so you need to fully expand each factor to find a, b, c, and d for each event.<br /><br />Anyway, applying this to the 1961 Yankees, we find that the LW value of a single is (technical note: the linear weights I am referring to here are absolute linear weights, not the kind that are calculated directly from run expectancy) .47738159. This is the value that our values were converging towards.<br /><br />With this formula in hand, we can calculate the linear weight values for any entity with any BsR version. I have provided a spreadsheet to do this with the official offensive statistics (the older BsR article on this site provides a spreadsheet to use with Tango's expanded BsR formula using Retrosheet data). I have already entered the coefficients for my basic version coupled with composite 1961-2004 data (excluding 1981 and 1994). I do not have all of the event frequency information, but you could fill that in for that dataset or any other if you desire.<br /><br />Using the basic formula from this page on the 1961-2004 data gives these LW for S, D, T, HR, W, O: .475, .805, 1.135, 1.494, .319, -.095. We will use these values later.<br /><br />There is another very useful application of this concept. As discussed previously, the B coefficients are the only ones we need to test to find in most cases after we have defined what goes in A, C, and D. Sometimes, though, you know what Linear Weights you would like to generate for the entity as a whole. If you do, you can find the exact B coefficient that you need to produce it for each event through this formula:<br /><br />b = ((B + C)^2*(L - d) - B^2*a - B*C*a + A*B*c)/(A*C)<br /><br />Where L is the Linear Weight value you want to get for the event in question. B here is the Exact B that you calculate from actual runs scored, A, C, and D, as you do not yet have B coefficients for each event and therefore cannot compute B. I have provided a spreadsheet which you can use to do this as well.<br /><br />Unfortunately, all of the events included in any of the factors must be included in the B factor in order to properly reconcile. This can be cumbersome as you often don't want to include outs or some other event in B, but it is a necessity if you want the precise B coefficients.<br /><br /><b>Known Limitations of Base Runs</b><br /><br />This section is not meant to be comprehensive, it is just meant to be a quick discussion of a few of the problems that have been discovered in Base Runs. While it is the author's strong opinion that Base Runs is the most powerful run estimator yet created, because of its applicability across a wide range of contexts, its ease of customization, and its accuracy with regular teams that is comparable to that of any other method, it would be dishonest and unhelpful to pretend that the method is without flaws. These are just a few of the KNOWN issues with Base Runs that may or may not be symptoms of the same underlying problem.<br /><br />Both of these were discovered by Tango Tiger. The first was detailed in his three part series on run estimators. Base Runs overestimated the run value of events in the approximate range of .500-.800 OBA. The second flaw was that at certain extreme levels of offense, Base Runs failed to follow the obvious baseball truth that the number of runners left on base must be capped at 3.<br /><br />No advocates of Base Runs claim that it is perfect. However, it does have a logical construction that follows known "laws" of baseball. The area where the accuracy of Base Runs could be enhanced is through a better estimator of the score rate. However, a future solution would almost certainly increase the complexity of the formula. B/(B+C) is a very simple but very effective estimator. However, I look forward to the day when some sabermetrician might correct some of the flaws in Base Runs through a more complex score rate estimate. Whatever the future holds for Base Runs, David Smyth should be remembered for providing the first real new advance in run estimators in over a decade.phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-5291572469929081532020-03-02T18:41:00.000-05:002020-03-02T18:41:28.777-05:00Tripod: Base Runs<i>See the first paragraph of <a href="https://walksaber.blogspot.com/2020/01/tripod-ability-v-value.html">this post</a> for an explanation of this series.</I><br /><br /><u>Breaking Down BsR</u><br /><br />It is sometimes useful to write a stat like Base Runs in rate form. It helps greatly in making the Theoretical Team equations, for one thing, and it is also useful to be able to write BsR completely in terms of BA, OBA, SLG, and HR/PA. To do this, you need to start with each component and divide it by PA. So, A/PA, B/PA, C/PA, and D/PA. (Since I am using a basic version of Base Runs, you need PA=AB+W). You can call these, resepectively, Runners On Base Average(ROBA), Advancement Factor(AF), 1-OBA, and HR/PA. Then<br /><br />BsR/PA = ROBA*AF/(AF+1-OBA)+HR/PA<br /><br />For the Basic version I use, these are the equations for each component:<br /><br />ROBA = (H+W-HR)/(AB+W) = OBA-HR/PA<br />AF = (2*TB-H-4*HR+.05*W)*.78 = ((2*SLG-BA)*(1-OBA)/(1-BA)-4*HR/PA+.05*(OBA-BA)/(1-BA))*.78<br />1-OBA = 1-(H+W)/(AB+W)<br />HR/PA = HR/(AB+W)<br /> <br />In the Base Runs article linked above, I gave the equations that I use for each factor in this basic version. The B multiplier is based on the composite MLB stats of 1946-1995. In this period, the average for each components are:<br /><br />ROBA AF OBA HR/PA<br />.303 .308 .325 .0222<br /> <br />You can use these to put together the Theoretical Team factors. The TT concept, which I will not explain here in every detail, is that since Base Runs(or Runs Created) is a run estimator devised for estimating team runs, there is an interactivity between the values of the offensive events. As the offensive production increases, the value of each event goes up(with the exception of the special case, HR). So applying BsR to Babe Ruth gives him an unfair advantage because he is not playing on a team by himself; he is playing on a team with 8 other players. So the TT formula puts the the player on a team with 8 average players. So, we assume that each player on the theroetical team gets the same number of PA as our player. So the teams new A factor can be calculated as (A+LgROBA*PA*8), where A is the individual's A factor. So you apply this technique to the B, C, and D terms, using the long term averages above(you really should have a seperate version each year, but small changes in ROBA, AF, etc. don't significantly change the results of the formula).<br /> <br />Then, to see how much the player has helped this team, we compare him to a team of 8 average players in his number of PA each. If we wanted to compare the player to the league average, we would compare him to 9 average players. If you work all this out and simplify, you get this equation for TT BsR, which I like to call Individual Base Runs(IBR).<br /> <br />IBR = (A+2.42PA)(B+2.46PA)/(B+C+7.86PA)+HR-.76PA<br /> <br />Lest it seem as if I am taking credit for coming up with all of this, the pioneering TT work was done by Dave Tate and Bill James, and the application of the TT concept to BsR was also the work of David Smyth.<br /> <br /><u>Stolen Base BsR</u><br /> <br />It is useful and necassary to get some more categories into a Runs Created formula, and so here we'll put SB and CS in(this is again based on Smyth's work). The other categories we could add, like SF, SH, and DP, I choose to ignore. For one, they are very situation dependent and therefore I'm not 100% comfortable in including in an individual formula, and secondly and more importantly, I am lazy and don't want to deal with them. Anyway, for BsR including SB:<br />A = H + W - HR - CS<br />B = (2*TB - H - 4*HR + .05*W +1.5*SB)*.76<br />C = AB-H<br /> <br />The IBR formula for the standard league is:<br /><br />IBR =(A+2.34PA)(B+2.58PA)/(B+C+7.98PA)+HR-.76PA<br /> <br />ROBA and AF are no longer the rate stats; I call these AROBA and AAF for "advanced". Anyway, the long term averages are:<br />AROBA AAF OBA HR/PA<br />.293 .323 .325 .0222<br /> <br /><u>Full BsR</u><br /><br />Here is a version of the BsR formula that you can use if you have all of the minor(SH, SF, DP, etc.) offensive stats. It is not as clean and nice looking as the other versions on this page, but there needs to be more of a give-and-take between the various events when you include the other stats. It is also not straightforward as to which events should be placed in which factor(s). I took the convention that A is final baserunners; baserunners less those who we know have been thrown out on the bases or taken out on a DP. Everything goes in B to balance everything out and produce good linear weights, while C is batting outs. D remains home runs. There are other ways to define these terms and Smyth, TangoTiger, and Robert Dudek have all done these in different ways then I have. There are certainly arguments to be made for all of the differnt approaches, but a discussion of that will have to wait for another day.<br /><br />A = H + W + HB - HR - CS - DP<br />B = .777S + 2.61D + 4.29T + 2.43HR + .03(W + HB - IW) - .747IW + 1.30SB + .13CS + 1.08SH + 1.81SF + .70DP -.04(AB-H)<br />C = AB - H + SH + SF<br /><br />If you want to include strikeouts, they go in this B factor which is coupled with the A and C factors given above: B = .781S + 2.61D + 4.28T + 2.42HR + .034(W + HB - IW) - .741IW + 1.29SB + .125CS + 1.07SH + 1.81SF +.69DP - .029(AB-H-K) - .086K<br /><br /><u>Finding the B Multiplier</u><br /> <br />The B multiplier is designed so that the BsR formula will produce the correct number of runs for the entity you are using. This is because A as baserunners, C as outs, and D as Home Runs, all are straightforward and obvious formulas. <br /> <br />You can calculate, based on A, C, and D, the actual B factor required to equate BsR with R, by this formula: (R-D)*C/(A-R+D). What can you do with the actual B value? For one thing, if you already have a set formula for B(ignoring the multiplier), you can divide actual B by estimated B to get the correct multiplier. Another thing you can do is run a regression to find weights for TB, H, etc. by using those stats to predict Actual B, or use other approaches like trial and error, etc. All of these approaches had a role in finding the B component used in the official versions of BsR.<br /> <br />An alternate way to find B is to calculate Z=(R-D)/A, then B=Z*C/(1-Z). It is the same thing, and longer and more complicated, but it is equivalent. (I include it because it was the way I did it until I took the time to work out the algebra to derive the other formula).<br /> <br /><u>Building the TT BsR Formula</u><br /> <br />Here are the technical steps to be building the TT formula. These are not very interesting for most people, but hard core sabermetricians may find them useful(although hard core sabermetricians probably already know how to do it themselves):<br /><br />IBR can be written as:<br />(A+X*PA)(B+Y*PA)/((B+Y*PA)+(C+Z*PA))+HR+T(PA)-(V)PA which simplifies too:<br />(A+X*PA)(B+Y*PA)/(B+C+(Y+Z)PA)+HR-(V-T)PA<br />where X is the remainder of team ROBA<br /> Y is the remainder of team AF<br /> Z is the remainder of team 1-OBA<br /> T is the remainder of team HRPA<br /> V is the R/PA for the comparison lineup multiplied by the number of players<br /> in the comparison lineup<br /> <br />OK, since we always add the player to a team with 8 average playes:<br />X = LgROBA*8 Y = LgAF*8 Z=(1-LgOBA)*8 T = LgHR/PA*8<br /> <br />Depending on what baseline we use though, V will vary. For absolute runs, we compare the player to a team with 8 average hitters. For runs above average, we compare the player to a team with 9 average hitters. For runs above replacement, we compare the player to a team with 8 average hitters plus one replacement level hitter. So, it is very straightforward to find V for absolute: 8*LgBsR/PA. For average, V = 9*LgBsR/PA.<br /> <br />For replacment, we need to first set a replacement level, and then determine what ROBA, AF, OBA, and HRPA a replacement player will have. I assume 25 batting outs(AB-H)/G, and use BsR/PA to calculate the R/G for the league. (BsR/PA)/(1-OBA)*25, since BsR/O = (BsR/PA)/(1-OBA). (Keeping in mind that BsR/PA = ROBA*AF/(AF+1-OBA) + HRPA). Then, I assume the replacement rate is 1 run/game below average, so I take that R/G, subtract 1, and divide by 25. This is the replacement player's R/O. In the standard league we are using, the BsR/PA = .117, R/O = .173, and RepR/O(R/O for the replacement) = .133. Then we need to find the value, X, by which the each component stat for the league(ROBA, AF, OBA, and HRPA) needs to be deflated by for R/O to equal .133. We multiply each term in the BsR/O formula by X. This, when simplified, gives this equation:<br />RAX^2/(1+X(A-O)+HX)/(1-OX) = Rep R/O<br /> <br />R is LgROBA, A is LgAF, O is LgOBA, and H is LgHRPA. I have no idea how to solve for X by hand, but my TI-83 calculator will do it, and it gives .89 for the standard league(this will all vary based on the league offensive levels, and of course how you personally choose to define replacement rate). Any way, we then multiply each component by .89 to find we expect our replacement to hit:<br />ROBA AF OBA HRPA<br />.269 .274 .289 .02<br /> <br />So this gives him a BsR/PA of .095. We then calculate the V value for the replacement baseline as 8*LgBsR/PA+RepBsR/PA. Here is a chart showing the values you need to fill in for the TT components at each baseline in the standard league:<br />BASELINE X Y Z T V V-T<br />Absolute 2.42 2.46 5.40 .178 .937 .759<br />Average " " " " 1.054 .876<br />Replacement " " " " 1.031 .853<br /> <br />If you want to get more complex, there is something that we have failed to adress. That is that if you really add a player to a team, he will change the number of PA everyone in the lineup gets. A player with a higher OBA than his teammates will generate more PA; one with a lower one will generate less. In the TT formula above, we have held PA constant. What if we let them vary? We can calculate the OBA the team would have with the player as 8/9*LgOBA+1/9*OBA. Call this Q. Then, figure (1-LgOBA)/(1-Q). Call this PAR of PA-added ratio. Then, multiply every individual term(the new A, the new B, the new C, and the new D), by PAR, and proceed as usual.<br /> <br />Is this worth it? Who knows. Some of these bells and whistles might wash out when you convert them to win values. Maybe they don't. A straight linear system, though, might be correct, and it will help you keep your sanity.<br /> <br /><u>Fundamental Structure of BsR</u><br /> <br />The fundamental structure of BsR is its key asset. That fundamental structure is based on the simple, undeniable truth that runs scored = baserunners*% of baserunners who score + home runs. "Basrunners" does not include home runs. Anyway, in BsR, the A factor represents baserunners and the D factor represents home runs. The % of baserunners who score, which we'll call score rate, is estimated as B/(B+C), where B is advancement and C is outs.<br /> <br />Other run estimators are not backed up by a fundamental theory of how runs are scored. Runs Created's downfall is its failure to account for the unique nature of the HR(that it always produces at least one run, and if it occurs by itself, it will produce only one run). Static LW formulas fail to account for the fact taht the value of each event varies based on the context. BsR is based on a true equation of how runs are scored. That does not mean, though, that BsR is the one true correct run estiamtor by any stretch. The equation of B/(B+C) to estimate score rate has good empirical accuracy, but also has been found to not work very well in some circumstances(such as OBA between .500 and .800--see Tango's article on Primer about this). Maybe score rate should be estimated in a totally different way. But the structure of the BsR equation is sound. If we want a better run estimator, we need a better estimator of score rate.<br /> <br /><u>Linear BsR</u><br /> <br />You can figure how a non-linear RC formula values each event in the context you are interested in(it can be the league, a specific team, or even a hypothetical lineup of the same player over and over again). All you have to do is calculate BsR for the entity, and then add one single, recompute BsR, and subtract the first figure. This is the value of one additional single. Then you do the same with every other event, and you'll have LBsR. You have to be careful to account for everywhere the event is involved; for example, a single not only adds a hit but also a Total Base and an At Bat. If you run the LBsR for the long-term stats, you get these values:<br /><br />LBsR = .48S+.81D+1.14T+1.50HR+.32W-.096(AB-H)<br />LBsR(sb) = .47S+.77D+1.07T+1.45HR+.33W+.23SB-.41CS-.093(AB-H)<br /> <br />Of course, you could add something other than one. You could subtract one, or add 10, or add 15. The further you get away from 0, the more the results will vary. Adding 1000 singles will have a much different effect, even per single, then adding 1 single. Really, as Tango has pointed out, we want to get as close to adding 0 singles as possible. Adding .00001 singles changes the run enviornment and the values of the other events very little, and that is what we are looking to do. It is sort of like a limit in calculus. Actually, I guess that's exactly what it is. We want to find the limit of (new BsR minus old BsR) divided by X, as X approaches 0, where X is the number of the event that we are adding. Somebody who knows a lot about calculus could probably tell me if I'm right about that, and if so, come up with a formula to calculate the limit precisely instead of having to do trial and error in a spreadsheet.<br /> <br />I have included a spreadsheet which runs through this approach for the 1979 Pirates. You can change the data in cells B2 to G2 to whatever you want to do this with other entities. Anyway, I show the LW generated by adding 10 of each event, 1 of each event, .1 of each event, etc. and the same for -10, -1, -.1, etc. I have highlighted in pink the positive and negative points at which the convergence, the limit, occurs. If you go past that(I put it at one ten-millionth, 10^-6), the values start fluctuating again. My suspicion is that this is because of the spreadsheet not having perfect accuracy, internal rounding and the like, but I could be wrong. Anyway, you can see there is not a lot of difference. The +10 weight for a Pirate single for instance is .4898824, the +1 is .4892998, and the limit is around .4892350. So you really don't need to do that, but it is nice to illustrate the property.<br /> <br />Added 4/7/04: Using calculus, you can figure this precisely using partial derivatives. The value of the single for instance is equal to the partial derivative of the BsR function with respect to singles. You can still do this even if you don't know calculus, because the math works out simple with BsR. The formula winds up being:<br /><br />((B+C)*(A*b+B*a)-(A*B)*(b+c))/((B+C)^2)+d<br /><br />Let A, B, C, and D be the respective total factors for the entity you are interested in. Let a, b, c, and d be the A, B, C, and D coefficients of the event you are interested in. That's it. Thank goodness all of the formulas for the pieces of BsR are linear. <br /> <br />There is a spreadsheet linked at the bottom that shows this. It is based on Tango's full BsR which is <a href="http://www.tangotiger.net/bsrexpl.html">available at</a> the link.<br /><br />If you don't want to deal with a category, just set the coefficients to 0. You can change the coefficients for the other events to use any BsR equation you want all with this spreadsheet. Of course you can also change the "#" column, which is the frequency of the event for the dataset you're using. Enjoy.<br /> <br /><u>Matching LW Values</u><br /> <br />Based on the formula above to calculate the Linear Weight value of a certain event using BsR, you can also fix the B coefficients so that they produce desired LWs. For example, on my LW page there is the ERP formula that I use, based on 1951-1998 composite major league data. Suppose I want to force my BsR formula to produce the same LW as are used in ERP. How do I go about doing this?<br /> <br />Well, first, I have to clearly define which events are in the A, C, and D factors, and what coefficient they have there. For my case, I will use S, D, T, HR, W, and O as the only events. S, D, T, and W each have a coefficient of 1 in A; O has a coefficient of 1 in C; and HR has a coefficient of 1 in D. <br /><br />Now, I we need to calculate the A, C, and D factors for the entity I am working with(in my case, all teams 1951-1998). Then, I use these to calculate what we will call B--the actual B value required for BsR to equal runs scored. The formula for ActB is (R-D)*C/(A-R+D), where R is the actual runs scored we want to match. <br /><br />So, now we have everything we need. a, b, c, and d are still the coefficient for the given event in the respective factors. And we can calculate b as:<br /><br />B = ((B+C)^2*(L-d)-B^2*a-B*C*a+A*B*c)/(A*C)<br /><br />Voila. So, let's look at my ERP equation. It is (TB+W+.5H-.3(AB-H))*.324, which as LW for S, D, T, HR, W, O is .486, .81, 1.134, 1.458, .324, -.0972. The B that I use for BsR((2TB-H-4HR+.05W)*.78) is:<br /><br />B = .78S+2.34D+3.9T+2.34HR+.039W<br /><br />Now, with all of this data, we can force the LW values. When we do this(which you can do with the spreadsheet linked at the bottom of the page, the same one that gives the actual LW values), it seems to give a result that's decent to .001 or so. It might be rounding error, or it might be something else, but either way, it's pretty close. So, to match the linear weight values I wanted, my B would be be:<br /><br />B = .833S+2.360D+3.888T+2.159HR+.0692W-.010(O)<br /><br />Yes, the outs have to be included as well. That's kind of cumbersome if you don't want outs in B, but it's necessary to force the values. Are you sufficient confused yet? I am.<br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vSN9W6awYVdV7dhDyApk0usGQ4tpQviq5AUizD8IfKP8yBv6DdMBG9Qxu4lGuH0Dg/pub?output=xlsx">1979 Pirates </a><br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vSB2qG4s99qH1_TfqA1CrcVOQVNjiwfqsSToNPH2zGgIu6lqKe-P08KtXBgmsRWuw/pub?output=xlsx">Full BsR LW</a>phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-23264274369635483782020-02-25T21:31:00.002-05:002020-02-25T21:31:46.522-05:00Tripod: Runs Created<i>See the first paragraph of <a href="https://walksaber.blogspot.com/2020/01/tripod-ability-v-value.html">this post</a> for an explanation of this series.</I><br /><br />Bill James' Runs Created remains the most used run estimator, although there is no good reason for that being the case. It is odd that sabermetricians, generally a group inclined to fight preconceived notions and to not worship tradition for the heck of it, continue to use a method like RC.<br /><br />Let me be clear: Bill James is my favorite author, he is the most influential and important(and one of the very best) sabermetricians of all-time. When he developed RC, it was just about as good as anything that anybody else had developed to estimate team runs scored and the thought process that went into developing it was great. But the field moves forward and it has left RC behind it.<br /><br />I will now go into looking at the theory of RC and get back to explaining the alternative methods and the deficiencies of the method later. The basic structure of Runs Created is that runs are scored by first getting runners on base and then driving them in, all occurring within an opportunity space. Since getting runners on base and advancing them is an interactive process(if there is no one on base to drive in, all the advancement in the world will get you know where and getting runners on base but not driving them in will not score many runs either), the on base component and the advancement component are multiplied and divided by the opportunity component. A represents on base, B represents advancement, and C represents opportunity. The construct of RC is A*B/C.<br /><br />No matter how many elements are introduced into the formula, it maintains the A*B/C structure. The first version of the formula, the basic version, is very straightforward. A = H+W, B = TB, and C = AB+W, or RC = (H+W)*TB/(AB+W). This simple formula is fairly accurate in predicting runs, with a RMSE in the neighborhood of 25(when I refer to accuracy right now I'm talking solely about predicting runs for normal major league teams).<br /><br />The basic form of RC has several useful properties. The math simplifies so that it can be written as OBA*SLG*AB, which is also OBA*TB. Or if you define TB/PA as Total Base Average, you can write it as OBA*TBA*(AB+W). Also, RC/(AB-H), runs/out, is OBA*SLG/(1-BA).<br /><br />The basic rate rewrite for RC is useful, (A/C)*(B/C)*C, which is easily seen to be A*B/C. If you call A/C modified OBA(MOBA) and B/C modified TBA(MTBA), you can write all versions of RC as MOBA*MTBA*C and as we will see, this will come in handy later.<br /><br />James' next incarnation was to include SB and CS in the formula as they are fairly basic offensive stats. A became H+W-CS, B became TB+.7*SB, and C became AB+W+CS.<br /><br />A couple years later (in the 1983 Abstract to be precise), James introduce an "advanced" version of the formula that included just about all of the official offensive statistics. This method was constructed using the same reasoning as the stolen base version. Baserunners lost are subtracted from the A factor, events like sacrifice flies that advance runners are credited in the B factor, and all plate appearances and extra outs consumed(like CS and DP) are counted as opportunity in the C factor.<br /><br />A = H+W+HB-CS<br />B = TB+.65(SB+SH+SF)<br />C = AB+W+HB+SH+SF+CS+DP<br /><br />In his 1984 book, though, James rolled out a new SB and technical version, citing their higher accuracy and structural problems in his previous formulas. The key structural problem was including outs like CS and DP in the C factor. This makes a CS too costly. As we will see later in calculating the linear weights, the value of a CS in the original SB version is -.475 runs(using the 1990 NL for the event frequencies). The revision cuts this to -.363 runs. That revision is:<br /><br />A = H+W-CS<br />B = TB+.55*SB<br />C = AB+W<br /><br />In addition to being more accurate and more logical, the new version is also simpler. The revision to the technical formula would stand as the state of RC for over ten years and was figured thusly:<br /><br />A = H+W+HB-CS-DP<br />B = TB+.26(W+HB-IW)+.52(SB+SH+SF)<br />C = AB+W+HB+SH+SF<br /><br />Additionally, walks are introduced into the B factor; obviously walks have advancement value, but including them in the basic version would have ruined the elegance of OBA*TB. With the added complexity of the new formula, James apparently saw no reason not to include walks in B.<br /><br />The technical formula above is sometimes called TECH-1 because of a corresponding series of 14 technical RC formulas designed to give estimates for the majors since 1900.<br /><br />Around 1997, James made additional changes to the formula, including strikeouts in the formula for the first time, introducing adjustments for performance in two "clutch" hitting situations, reconciling individual RC figures to equal team runs scored, and figuring individual RC within a "theoretical team" context. James also introduced 23 other formulas to cover all of major league history. The modern formula is also known as HDG-1(for Historical Data Group). The changes to the regular formula itself were quite minor and I will put them down without comment:<br /><br />A = H+W+HB-CS-DP<br />B = TB+.24(W+HB-IW)+.5(SH+SF)+.62SB-.03K<br />C = AB+W+HB+SH+SF<br /><br />Whether or not the clutch adjustments are appropriate is an ability v. value question. Value-wise, there is nothing wrong with taking clutch performance into account. James gives credit for hitting homers with men on base at a higher rate then for overall performance, and for batting average with runners in scoring position against overall batting average. The nature of these adjustments seems quite arbitrary to this observer--one run for each excess home run or hit. With all of the precision in the rest of the RC formula, hundredth place coefficients, you would think that there would be some more rigorous calculations to make the situational adjustments. These are added to the basic RC figure--except the basic RC no longer comes from A*B/C it comes from (A+2.4C)(B+3C)/(9C)-.9C(more on this in a moment). That figure is rounded to a whole number, the situational adjustments are added, then the figures for each hitter on the team are summed. This sum is divided into the team runs scored total to get the reconciliation factor, which is then multiplied by each individual's RC, which is once again rounded to a whole number to get the final Runs Created figure.<br /><br />Quite a mouthful. Team reconciliation is another area that falls into the broad ability v. value decision. It is certainly appropriate in some cases and inappropriate in others. For Bill James' purpose of using the RC figures in a larger value method(Win Shares), in this observer's eyes they are perfectly appropriate. Whether they work or not is a question I'll touch on after explaining the theoretical team method.<br /><br />The idea behind the theoretical team is to correct one of the most basic flaws of Runs Created, one that Bill James had noticed at least as early in 1985. In the context of introducing Paul Johnson's ERP, a linear method(although curiously it is an open question whether James noticed this at the time, as he railed against Pete Palmer's Batting Runs in the Historical Abstract), James wrote: "I've known for a little over a year that the runs created formula had a problem with players who combined high on-base percentages and high slugging percentages—-he is certainly correct about that—and at the time that I heard from him I was toying with options to correct these problems. The reasons that this happens is that the players' individual totals do not occur in an individual context...the increase in runs created that results from the extension of the one[on base or advancement ability] acting upon the extension of the other is not real; it is a flaw in the run created method, resulting from the player's offense being placed in an individual context."<br /><br />The basic point is that RC is a method designed to estimate team runs scored. By putting a player's statistics in a method designed to estimate team runs scored, you are introducing problems. Each member of the team's offensive production interacts with the other eight players. But Jim Edmonds' offense does not interact with itself; it interacts with that of the entire team. A good offensive player like Edmonds, who has superior OBA and TBA, benefits by having them multiplied. But in actuality, his production should be considered within the context of the whole team. The team OBA with Edmonds added is much smaller then Edmonds' personal OBA, and the same for TBA.<br /><br />So the solution(one which I am quite fond of and, following the lead of James, David Tate, Keith Woolner, and David Smyth among others have applied to Base Runs) that James uses is to add the player to a team of fairly average OBA and TBA, and to calculate the difference between the number of runs scored with the player and the runs scored without the player, and call this the player's Runs Created. This introduces the possibility of negative RC figures. This is one of those things that is difficult to explain but has some theoretical basis. Mathematically, negative RC must be possible in any linear run estimation method. It is beyond the scope of this review of Runs Created to get into this issue in depth.<br /><br />The theoretical team is made up of eight players plus the player whose RC we are calculating. The A component of the team is (A+2.4C). This is the player's A, plus 2.4/8=.3 A/PA for the other players. Remember, A/PA is MOBA(and B/PA is MTBA). So the eight other players have a MOBA of .300. The B component of the team is (B+3C), so 3/8=.375 B/PA or a .375 MTBA for the remainder of the team. Each of the eight players has C number of plate appearances(or the player in question's actual PA), so the team has 9C plate appearances, and their RC estimate is (A+2.4C)(B+3C)/(9C). The team without the player has an A of 2.4C, a B of 3C, and a C of 8C, giving 2.4C*3C/8C=.9C runs created. Without adding the ninth player, the team will score .9C runs. So this is subtracted, and the difference is Runs Created.<br /><br />James does not do this, but it is easy to change the subtracted value to give runs above average(just use nine players with MOBA .300 and MTBA .375, or adjust these values to the league or some other entity's norms, and then run them through the procedure above). Generally, we can write TT RC as:<br /><br />(A+LgMOBA*C)(B+LgMTBA*C)/(9C)-LgMOBA*LgMTBA*8C(or 9C for average)<br /><br />This step of the RC process is correct in my opinion, or at least justifiable. But one question that I do have for Mr. James is why always .300/.375? Why not have this value vary by the actual league averages, or some other criteria? It is true that slight changes in the range of major league MOBA and MTBA values will not have a large effect on the RC estimates, but if everything is going to be so precise, why not put precision in the TT step? If we are going to try to estimate how many runs Jim Edmonds created for the 2004 Cardinals, why not start the process by measuring how Jim Edmonds would effect a team with the exact offensive capabilities of the 2004 Cardinals? Then when you note the amount of precision(at least computationally if not logically) in Win Shares, you wonder even more. Sure, it is a small thing, but there are a lot of small things that are carefully corrected for in the Win Share method.<br /><br />Just to illustrate the slight differences, let's take a player with a MOBA of .400 and a MTBA of .500 in 500 PA and calculate his TT RC in two situations. One is on the team James uses--.300/.375. His RC will be (.400*500+.300*500*8)(.500*500+.375*500*8)/(9*500)-.9*500, or 94.44. On a .350/.425 team(a large difference of 32% more runs/plate appearance), his RC figured analogously will be 98.33. A difference of less then four runs for a huge difference in teams. So while ignoring this probably does not cause any noticeable problems for either RC or WS estimates, it does seem a little inconsistent.<br /><br />But while the TT procedure is mathematically correct and sabermetrically justifiable, it does not address the larger problem of RC construction. Neither does Bill's latest tweak to the formula, published in the 2005 Bill James Handbook. He cites declining accuracy of the original formula in the current high-home run era and proposes this new B factor:<br /><br />B = 1.125S+1.69D+3.02T+3.73HR+.29(W-IW+HB)+.492(SB+SH+SF)-.04K<br /><br />None of these changes corrects the most basic, most distorting flaw of Runs Created. That is its treatment of home runs. David Smyth developed Base Runs in the 1990s to correct this flaw. He actually tried to work with the RC form to develop BsR, but couldn't get it to work. So instead he came up with a different construct(A*B/(B+C)+D) that was still inspired by the idea of Runs Created. Once again, James' ideas have been an important building block for run estimation thinking. RC was fine in its time. But its accuracy has been surpassed and its structure has been improved upon.<br /><br />A home run always produces at least one run, no matter what. In RC, a team with 1 HR and 100 outs will be projected to score 1*4/101 runs, a far cry from the one run that we know will score. And in an offensive context where no outs are made, all runners will eventually score, and each event, be it a walk, a single, a home run--any on base event at all--will be worth precisely one run. In a 1.000 OBA context, RC puts a HR at 1*4/1 = 4 runs. This flaw is painfully obvious at that kind of extreme point, but the distorting effects begin long before that. The end result is that RC is too optimistic for high OBA, high SLG teams and too pessimistic for low OBA, low SLG teams. The home run flaw is one of the reason why James proposed the new B factor in 2004--but that may cause more problems in other areas as we will see.<br /><br />One way to evaluate Runs Created formulas is to see what kind of inherent linear weights they use. We know, based on empirical study, very good values for the linear weight of each offensive event. Using calculus, we can find precisely, for the statistics of any entity, the linear weights that any RC formula is using in that case. I'll skip the calculus, but for those who are interested, it involves partial derivatives.<br /><br />LW = (C(Ab + Ba) - ABc)/C^2<br /><br />Where A, B, and C are the total calculated A, B, and C factors for the entity in question, and a, b, and c are the coefficients for the event in question(single, walk, out, etc.) in the RC formula being used. This can be written as:<br /><br />LW = (B/C)*a + (A/C)*b - (A/C)*(B/C)*c<br />= MTBA(a) + MOBA(b) - MOBA*MTBA*c<br /><br />Take a team with a .350 MOBA and a .425 MTBA. For the basic RC formula, the coefficients for a single in the formula are a = 1, b = 1, c = 1, so the linear weight of a single is .425*1 + .350*1 - .425*.350*1 = .626 runs. Or a batting out, which is a = 0, b = 0, c = 1 is worth -.425*.350*1 = -.149 runs.<br /><br />Let's use this approach with a fairly typical league(the 1990 NL) to generate the Linear Weight values given by three different RC constructs: basic, TECH-1, and the 2004 update.<br /><br />Single: .558, .564, .598<br />Double: .879, .855, .763<br />Triple: 1.199, 1.146, 1.150<br />Home Run: 1.520, 1.437, 1.356<br />Walk/Hit Batter: .238, .348, .355<br />Intentional Walk: N/A, .273, .271<br />Steal: N/A, .151, .143<br />Caught Stealing: N/A, -.384, -.382<br />Sacrifice Hit: N/A, .039, .032<br />Sacrifice Fly: N/A, .039, .032<br />Double Play: N/A, -.384, -.382<br />Batting Out(AB-H): -.112, -.112, N/A<br />In Play Out(AB-H-K): N/A, N/A, -.111<br />Strikeout: N/A, N/A, -.123<br /><br />Comparing these values to empirical LW formulas and other good linear formulas like ERP, we see, starting with the Basic version, that all of the hits are overemphasized while walks are severely underemphasized. The TECH-1 version brings the values of all hit types in line(EXCEPT singles), and fixes the walk problems. The values generated by TECH-1, with the glaring exception of the single, really aren't that bad. However, the 2004 version grossly understates the impact of extra base hits. I don't doubt James claim that it gives a lower RMSE for normal major league teams then the previous versions, but theoretically, it is a step backwards in my opinion.<br /><br />You can use these linear values as a traditional linear weight equation if you want, but they are at odds in many cases with empirical weights and those generated through a similar process by BsR. One good thing is that Theoretical Team RC is equal to 1/9 times traditional RC plus 8/9 of linear RC. Traditional RC is the classic A*B/C construct, whereas the linear RC must be appropriate for the reference team used in the TT formula.<br />phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-50547163032830628662020-02-18T18:18:00.000-05:002020-02-18T18:18:14.283-05:00Tripod: Linear Weights<i>See the first paragraph of <a href="https://walksaber.blogspot.com/2020/01/tripod-ability-v-value.html">this post</a> for an explanation of this series.</I><br /><br />I certainly am no expert on Linear Weight formulas and their construction-leave that to people like Tango Tiger and Mickey Lichtman. However, I do have some knowledge on LW methods and thought I would explain some of the different methods of generating LW that are in use.<br /><br />One thing to note before we start is that every RC method is LW. If you use the +1 technique, you can see the LWs that are used in a method like RC, BsR, or RPA. A good way to test non-linear RC formulas is to see how they stack up against LW methods in the context the LW are for. LW will vary widely based on the context. In normal ML contexts, though, the absolute out value is close to -.1, and the HR value stays close to 1.4. David Smyth provided the theory(or fact, I guess you could say), that as the OBA moves towards 1, the value of all events LWs converge towards 1.<br /><br />Now what I understand of how LW are generated:<br /><br /><b>Empirical LW</b><br /><br />Empirical LW have been published by Pete Palmer and Mickey Lichtman. They can be considered the true Linear Weight values. Empirical LW are based on finding the value of each event with the base/out table, and then averaging the value for all singles, etc. This is the LW for the single. Another way to look at it is that they calculate the value of an event in all 24 base/out situations, and then multiply that by the proportion of that event that occurs in that situation, and then sum those 24 values.<br /><br />Palmer's weights were actually based on simulation, but as long as the simulation was well-designed it shouldn't be an issue. One way you could empirically derive different LW is to assume that the events occur randomly, i.e. assuming that the proportion of overall PAs in each base/out situation is the same as the proportion of the event that occur in this situation. For instance, if 2% of PA come with the bases loaded and 1 out, then you assume that 2% of doubles occur with the bases loaded and 1 out as well. This is an interesting idea for a method. If you see a double hit in a random situation, you could make the argument that this method would give you the best guess weight for this event. But that is only if you assume that the base/out situation does not effect the probability of a given event. Does it work out that way?<br /><br />Tango Tiger told me that the only event that comes up with a significantly different LW value by the method I have just described is the walk. This is another way of saying that walks tend to occur in lower leverage situations then most events. But the difference is not that large.<br /><br /><b>Modeling</b><br /><br />You can also use mathematical modeling to come up with LW. Tango Tiger and David Smyth have both published methods on FanHome.com that approach the problem from this direction. Both are approximations and are based on some assumptions that will vary slightly in different contexts. Tango, though, has apparently developed a new method that gives an accurate base/out table and LW based on mathematical modeling and does it quite well.<br />The original methods published by the two are very user-friendly and can be done quickly. Smyth also published a Quick and Dirty LW method that works well in normal scoring contexts and only uses the number of runs/game to estimate the value of events.<br /><br /><b>Skeletons</b><br /><br />Another way to do this is to develop a skeleton that shows the relationships between the events, and then finds a multiplier to equate this to the actual runs scored. The advantage of this method is that you can focus on the long-term relationships between walks v. singles, doubles v. triples, etc, and then find a custom multiplier each season, by dividing runs by the result of the skeleton for the entity(league, team, etc.) you are interested in. Recently, I decided to take a skeleton approach of a LW method. Working with data for all teams, 1951-1998, I found that this skeleton worked well: TB+.5H+W-.3(AB-H), with a required multiplier of .324. Working SB and CS into the formula, I had: TB+.5H+W-.3(AB-H)+.7SB-CS, with an outward multiplier of .322. When I took a step back and looked at what I had done though, I realized I had reproduced Paul Johnson's Estimated Runs Produced method. If you look at Johnson's method:<br /><br />(2*(TB+W)+H-.605*(AB-H))*.16<br /><br />If you multiply my formula by 2, you get:<br /><br />(2*(TB+W)+H-.6*(AB-H))*.162<br /><br />As you can see, ERP is pretty much equal to my unnamed formula. Since it is so similar to ERP, I just will consider it to be ERP. You can then find the resulting LW by expanding the formula; for example, a double adds 2 total bases and 1 hit, so it has a value of (2*2+1)*.162=.81.<br /><br />Working out the full expansion of my ERP equations, we have:<br /><br />ERP = .49S+.81D+1.13T+1.46HR+.32W-.097(AB-H)<br />ERP = .48S+.81D+1.13T+1.45HR+.32W+.23SB-.32CS-.097(AB-H)<br /><br />I have recently thrown together a couple of versions that encompass all of the official offensive stats:<br /><br />ERP = (TB+.5H+W+HB-.5IW+.3SH+.7(SF+SB)-CS-.7DP-.3(AB-H))*.322<br />ERP = (TB+.5H+W+HB-.5IW+.3SH+.7(SF+SB)-CS-.7DP-.292(AB-H)-.031K)*.322<br /><br />Or:<br /><br />ERP = .483S+.805D+1.127T+1.449HR+.322(W+HB)-.161IW+.225(SB+SF-DP)+.097*SH-.322CS-.097(AB-H)<br />ERP = .483S+.805D+1.127T+1.449HR+.322(W+HB)-.161IW+.225(SB+SF-DP)+.097*SH-.322CS-.094(AB-H-K)-.104K<br /><br />Here are a couple versions you can use for past eras of baseball. For the lively ball era, the basic skeleton of (TB+.5H+W-.3(AB-H)) works fine, just use a multiplier of .33 for the 1940s and .34 for the 1920s and 30s. For the dead ball era, you can use a skeleton of (TB+.5(H+SB)+W-.3(AB-H)) with a multiplier of .341 for the 1910s and .371 for 1901-1909. Past that, you're on your own. While breaking it down by decade is not exactly optimal, it is an easy way to group them. The formulas are reasonably accurate in the dead ball era, but not nearly as much as they are in the lively ball era.<br /><br /><b>Regression</b><br /><br />Using the statistical method of multiple regression, you can find the most accurate linear weights possible for your dataset and inputs. However, when you base a method on regression, you often lose the theoretical accuracy of the method, since there is a relationship or correlation between various stats, like homers and strikeouts. Therefore, since teams that hit lots of homers usually strike out more than the average team, strikeouts may be evaluated as less negative then other outs by the formula, while they should have a slightly larger negative impact. Also, since there is no statistic available to measure baserunning skills, outside of SB, CS, and triples(for instance we dont know how many times a team gets 2 bases on a single), these statistics can have inflated value in a regression equation because of their relationship with speed. Another concern that some people have with regression equations is that they are based on teams, and they should not be applied to individuals. Anyway, if done properly, a regression equation can be a useful method for evaluating runs created. In their fine book, Curve Ball, Jim Albright and Jay Bennett published a regression equation for runs. They based it on runs/game, but I went ahead and calculated the long term absolute out value. With this modification, their formula is:<br /><br />R = .52S+.66D+1.17T+1.49HR+.35W+.19SB-.11CS-.094(AB-H)<br /><br />A discussion last summer on FanHome was very useful in providing some additional ideas about regression approaches(thanks to Alan Jordan especially). You can get very different coefficients for each event based on how you group them. For instance, I did a regression on all teams 1980-2003 using S, D, T, HR, W, SB, CS, and AB-H, and another regression using H, TB, W, SB, CS, and AB-H. Here are the results:<br /><br />R = .52S+.74D+.95T+1.48HR+.33W+.24SB-.26CS-.104(AB-H)<br /><br />The value for the triple is significantly lower then we would expect. But with the other dataset, we get:<br /><br />R = .18H+.31TB+.34W+.22SB-.25CS-.103(AB-H)<br /><br />which is equivalent to:<br /><br />R = .49S+.80D+1.11T+1.42HR+.34W+.22SB-.25CS-.103(AB-H)<br /><br />which are values more in line with what we would expect. So the way you group events(this can also be seen with things like taking HB and W together or separately. Or if there was a set relationship you wanted(like CS are twice as bad as SB are good), you could use a category like SB-2CS and regress against that) can make a large difference in the resulting formulas.<br /><br />An example I posted on FanHome drives home the potential pitfalls in regression. I ran a few regression equations for individual 8 team leagues and found this one from the 1961 NL:<br /><br />R = 0.669 S + 0.661 D - 1.28 T + 1.05 HR + 0.352 W - 0.0944 (AB-H)<br /><br />Obviously an 8 team league is too small for a self-respecting statistician to use, but it serves the purpose here. A double is worth about the same as a single, and a triple is worth NEGATIVE runs. Why is this? Because the regression process does not know anything about baseball. It just looks at various correlations. In the 1961 NL, triples were correlated with runs at r=-.567. The Pirates led the league in triples but were 6th in runs. The Cubs were 2nd in T but 7th in runs. The Cards tied for 2nd in T but were 5th in runs. The Phillies were 4th in triples but last in runs. The Giants were last in the league in triples but led the league in runs. If you too knew nothing about baseball, you too could easily conclude that triples were a detriment to scoring runs.<br /><br />While it is possible that people who hit triples were rarely driven in that year, it's fairly certain an empirical LW analysis from the PBP data would show a triple is worth somewhere around 1-1.15 runs as always. Even if such an effect did exist, there is likely far too much noise in the regression to use it to find such effects.<br /><br /><b>Trial and Error</b><br /><br />This is not so much its own method as a combination of all of the others. Jim Furtado, in developing Extrapolated Runs, used Paul Johnson's ERP, regression, and some trial and error to find a method with the best accuracy. However, some of the weights look silly, like the fact that a double is only worth .22 more runs than a single. ERP gives .32, and Palmer's Batting Runs gives .31. So, in trying to find the highest accuracy, it seems as if the trial and error approach compromises theoretical accuracy, kind of as regression does.<br /><br />Skeleton approaches, of course, use trial and error in many cases in developing the skeletons. The ERP formulas I publish here certainly used a healthy dose of trial and error.<br /><br /><b>The +1 Method/Partial Derivatives</b><br /><br />Using a non-linear RC formula, you add one of each event and see what the difference in estimated runs would be. This will only give you accurate weights if you have a good method like BsR, but if you use a flawed method like RC, take the custom LWs with a grain of salt or three.<br /><br />Using calculus, and taking the partial derivative of runs with respect to a given event, you can determine the precise LW values of each event according to a non-linear run estimator. See my BsR article for some examples of this technique.<br /><br /><b>Calculating the Out Value</b><br /><br />You can calculate a custom out value for whatever entity you are looking at. There are three possible baselines: absolute runs, runs above average, and runs above replacement. The first step to find the out value for any of these is to find the sum of all the events in the formula other than AB-H. AB-H are called O for outs, and could include some other out events(like CS) that you want to have the value vary, but in my ERP formula it is just AB-H in the O component. Call this value X. Then, with actual runs being R, the necessary formulas are:<br /><br />Absolute out value = (R-X)/O<br /><br />Average out value = -X/O<br /><br />For the replacement out value, there is another consideration. First you have to choose how you define replacement level, and calculate the number of runs your entity would score, given the same number of outs, but replacement level production. I set replacement level as 1 run below the entity's average, so I find the runs/out for a team 1 run/game below average, and multiply this by the entity's outs. This is Replacement Runs, or RR. Then you have:<br /><br />Replacement out value = (R-RR-X)/Ophttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-36783738705148746922020-02-17T17:40:00.000-05:002020-02-17T17:41:37.931-05:00All I Have to Say About the Astros I have always been completely unable to relate to people are so weak-minded that they demand that history literally be rewritten to fit their own value judgments. If you want to judge/discount the accomplishments of the Astros, you have complete freedom of conscience to do so. Why do you need some authority figure to tell you how to think?<br /><br />Of course, most of the people who demand asterisks and vacated games and forfeits and all of the other Stalinist trappings of the NCAA, IOC, and other contemptible organizations are already quite busy making their own value judgments about every damn thing in the entire world, thank you very much. If they just wanted some authority to tell them what to think, they would be sad, pathetic little creatures, worthy of the pity of free-thinking individuals and nothing more. But that's not what they want - they want some authority to tell <b>me</b> what to think. They seek to shift the burden of proof, as it were, from those who would deny the objective facts of reality to those who would uphold them. <br /><br />I didn't call it "Stalinist" lightly - in a different cultural environment, the illiberal nature of the entire endeavor would be breathtaking in its audacity and its chutzpah. Instead, it is just another day in a world of creeping totalitarianism where the acceptable avenues of thought are controlled by the armed guards of some authority or the other. Historians of the future will learn much more about the America of 2020 from the response to the Astros scandal than they could ever hope to glean from the fact that it happened.phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-36467112294216139322020-02-13T17:56:00.000-05:002020-02-13T17:56:48.983-05:00Tripod: Clay Davenport's Equivalent Runs<i>See the first paragraph of <a href="https://walksaber.blogspot.com/2020/01/tripod-ability-v-value.html">this post</a> for an explanation of this series. The content of this article is also the topic of <a href="https://walksaber.blogspot.com/search?q=equivalent+runs">better</a>, more <a href="https://walksaber.blogspot.com/2008/05/analysis-of-clay-davenports-eqr-and-eqa.html">recent</a> posts.</i><br /><br />Equivalent Runs and Equivalent Average are offensive evaluation methods published by Clay Davenport of <u>Baseball Prospectus</u>. Equivalent Runs(EQR) is an estimator of runs created. Equivalent Average(EQA) is the rate stat companion. It is EQR/out transposed onto a batting average scale.<br /><br />There seems to be a lot of misunderstanding about the EQR/EQA system. Although I am not the inventor of the system and don't claim to speak for Davenport, I can address some of the questions I have seen raised as an objective observer. The first thing to get out of the way is how Davenport adjusts his stats. Using Davenport Translations, or DTs, he converts everyone in organized baseball's stats to a common major league. All I know about DTs is that Davenport says that the player retains his value(EQA) after translating his raw stats (except, of course, that minor league stats are converted to Major League equivalents). <br /><br />But the DTs are not the topic here; we want to know how the EQR formula works. So here are Clay's formulas, as given in the 1999 BP:<br /><br />RAW = (H+TB+SB+1.5W)/(AB+W+CS+.33SB)<br />EQR(absolute) = (RAW/LgRAW)^2*PA*LgR/PA<br />EQR(marginal) = (2*RAW/LgRAW-1)*PA*LgR/PA<br />EQA =(.2*EQR/(AB-H+CS))^.4<br /><br />where PA is AB+W<br /><br />When I refer to various figures here, like what the league RAW was or what the RMSE of a formula was, it is based on data for all teams 1980-2000. Now, RAW is the basis of the whole method. It has a good correlation with runs scored, and is an odd formula that Davenport has said is based on what worked rather than on a theory.<br /><br />Both the absolute and marginal EQR formulas lay out a relationship between RAW and runs. The absolute formula is designed to work for teams, where their offensive interaction compounds and increases scoring(thus the exponential function). The marginal formula is designed to estimate how much a player has added to the league(and is basically linear). Both formulas though, try to relate the Adjusted RAW(ARAW,RAW/LgRAW) to the Adjusted Runs/PA(aR/PA). This brings in one of the most misunderstood issues in EQR.<br /><br />Many people have said that Davenport "cheated" by including LgRAW and LgR/PA in his formula. By doing this, they say, you reduce the potential error of the formula by honing it in to the league values, whereas a formula like Runs Created is estimating runs from scratch, without any knowledge of anything other than the team's basic stats. This is true to some extent, that if you are doing an accuracy test, EQR has an unfair advantage. But every formula was developed with empirical data as a guide, so they all have a built in consideration. To put EQR on a level playing field, just take a long term average for LgRAW and LgR/PA and plug that into the formula. For the 1980-2000 period we are testing, the LgRAW is .746 and the LgR/PA is .121. If we use these as constants, the accuracy test will be fair.<br /><br />One of the largest(and most widely read) errors in this area is an accuracy test written up by Jim Furtado in the 1999 Big Bad Baseball Annual. Furtado tests EQR in both the ways prescribed by Davenport and the way he converts all rate stats to runs. Furtado takes RAW/LgRAW*LgR/O*O. He also does this for OPS, Total Average, and the like. Davenport railed against this test in the 2000 BP, and he was right to do so. First of all, most stats will have better accuracy if the comparison is based on R/PA, which is why Davenport uses R/PA in his EQR statistic in the first place. In all fairness to Furtado, though, he was just following the precedent set by Pete Palmer in The Hidden Game of Baseball, where he based the conversion of rate stats on innings batted, essentially outs/3. Unfortunately, Furtado did not emulate a good part of Palmer's test. Palmer used this equation to relate rate stats to runs:<br /><br />Runs = (m*X/LgX+b)*IB*LgR/IB<br /><br />Where X is the rate stat in question and IB is Innings Batted. m and b are, respectively, the slope and intercept of a linear regression relating the adjusted rate stat to the adjusted scoring rate. This is exactly what Davenport did; he uses m=2 and b=-1. Why is this necessary? Because the relationship between RAW and runs is not 1:1. For most stats the relationship isn't; OBA*SLG is the only one really, and that is the reason why it scores so high in the Furtado study. So Furtado finds RAW as worse than Slugging Average just because of this issue. The whole study is a joke, really-he finds OPS worse than SLG too! However, when EQR's accuracy comes up, people will invariably say, "Furtado found that..." It doesn't matter-the study is useless.<br /><br />Now let's move on to a discussion of the Absolute EQR formula. It states that ARAW^2 = aR/PA, and uses this fact to estimate runs. How well does it estimate runs? In the period we are studying, RMSE = 23.80. For comparison, RC comes in at 24.80 and BsR is at 22.65. One thing that is suspicious about the formula is that the exponent is the simple 2. Could we get better results with a different exponent? We can determine the perfect exponent for a team by taking (log aR/PA)/(log ARAW). The median value for our teams is 1.91, and plugging that in gives a RMSE of 23.25.<br /><br />In the BsR article, I describe how you can find linear values for a non-linear formula. Using the long term stats we used in the BsR article(1946-1995), this is the resulting equation for Absolute EQR: <br />.52S+.83D+1.14T+1.46HR+.36W+.24SB-.23CS-.113(AB-H)<br /><br />Those weights are fairly reasonable, but unfortunately, the Absolute EQR formula isn't. We can demonstrate using BsR that as the OBA approaches 1, the run value of the offensive events converge around 1. We can see the flaw in Absolute EQR by finding the LW for Babe Ruth's best season, 1920:<br /><br />EVENT BsR EQR<br /><br />S .68 .74<br /><br />D 1.00 1.28<br /><br />T 1.32 1.82<br /><br />HR 1.40 2.36<br /><br />W .52 .47<br /><br />O -.22 -.33<br /><br />SB .24 .31<br /><br />CS -.52 -.68<br /><br />As you can see, absolute EQR overestimates the benefit of positive events and the cost of negative events. The reason for this is that the compounding effect in EQR is wrong. When a team has a lot of HR, it also means that runners are taken off base, reducing the potential impact of singles, etc. that follow. The Absolute EQR seems to assume that once a runner gets on base, he stays there for a while-thus the high value for the HR. Besides, the Absolute EQR formula is supposed to work better for teams, but the Marginal EQR formula has a RMSE of 23.23, better than Absolute EQR. So the entire Absolute EQR formula should be scrapped(incidentally, I haven't seen it in print since 1999, so it may have been).<br /><br />The Marginal formula can also be improved. If we run a linear regression of ARAW to predict aR/PA for our sample, we get:<br /><br />EQR=(1.9*ARAW-.9)*PA*LgR/PA, which improves the RMSE to 22.89. <br /><br />Some misunderstanding has also been perpetuated about the linearity of Marginal EQR. Basically, Marginal EQR is technically not linear but it is very close to it. If the denominator for RAW was just PA, it would be linear because it would cancel out with the multiplication by PA. But since SB and CS are also included in the denominator, it isn't quite linear. However, since most players don't have high SB or CS totals, the difference is hard to see. So Marginal EQR is essentially linear. Some, myself included, would consider it a flaw to include SB and CS in the denominator. It would have been better, for linearity's sake, to put just PA in the denominator and everything else in the numerator. But Davenport apparently was looking to maximize accuracy, and it may be the best way to go for his goals. One possible solution would be to use the RAW denominator as the multiplier in place of PA, and multiply this by LgR/Denominator. However, I tried this, and the RMSE was 23.04. I'll publish the formula here: EQR = (1.92*RAW/LgRAW-.92)*(AB+W+CS+.33SB)*LgR/(AB+W+CS+.33SB)<br /><br />Now, back to the material at hand, Davenport's EQR. If we find the linear weights for the marginal equation we get:<br /><br />.52S +.84D+1.16T+1.48HR+.36W+.24SB-.23CS-.117(AB-H)<br /><br />As was the case with the Absolute formula, I generated these weights through Davenport's actual formula, not my proposed modification using 1.9 and .9 rather than 2 and 1 for the slope and intercept. I wondered what difference this would make if any, so I tried it with my formula:<br /><br />.50S+.80D+1.11T+1.41HR+.35W+.23SB-.22CS-.105(AB-H)<br /><br />These values seem to be more in line with the "accepted" LW formulas. However, EQR does not seem to properly penalize the CS-it should be more harmful than the SB is helpful.<br /><br />Finally, we are ready to discuss EQA. Most of the complaints about EQA are along the lines of taking an important value, like runs/out, and putting it on a scale(BA), which has no organic meaning. Also mentioned is that it dumbs people down. In trying to reach out to non-sabermetricians and give them standards that they understand easily, you fail to educate them about what is really important. Both of these arguments have merit. But ultimately, it is the inventor's call. You can convert between EQA and R/O, so if you don't like how Clay publishes it, you can convert it to R/O yourself. R/O = EQA^2.5*5.<br /><br />Personally, I don't like EQA because it distorts the relationship between players:<br /><br />PLAYER R/O EQA<br /><br />A .2 .276<br /><br />B . .3 .325<br /><br />Player B has a R/O 1.5x that of player A, but his EQA is only 1.18x player Bs-the 2.5th root of 1.5. <br /><br />But again, this is a quick thing you can change if you so desire, so I think it is wrong to criticize Davenport for his scale because it is his method.phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-19130029348799738192020-02-10T18:46:00.002-05:002020-02-10T18:47:49.169-05:00Tripod: Appraised Runs<i>See the first paragraph of <a href="https://walksaber.blogspot.com/2020/01/tripod-ability-v-value.html">this post</a> for an explanation of this series. The content of this article is also the topic of the more recent post <a href="https://walksaber.blogspot.com/2006/01/another-superfluous-run-estimator.html">here</a>.</i><br /><br />Mike Gimbel's stat, Run Production Average, is a very unique look at runs created, that, though published almost a decade ago, has gotten very little attention from other sabermetricians. RPA uses an initial RC formula based on what Gimbel calls Run Driving values, that underweight walks and overrate extra base hits. But Gimbel accounts for this with a set-up rating which evaluates the added impact of extra base hits in removing baserunners for the following batters. Gimbel's method has tested accuracy, with real teams, very similar to that of Runs Created, Base Runs, and Linear Weights. That it does not hold up at the extremes like Base Runs prevents it from being the best structure for RC we have, but it is an interesting alternative to RC. This is a compilation of some posts from FanHome on my knockoff of RPA, Appraised Runs. RPA uses some categories, like balks and wild pitches, that we do not have readily available. So I rigged up, following Gimbel's example, a similar formula using the typical categories. In doing so, I probably lost some of the true nature of Gimbel's creation. Gimbel obviously is the expert on his own stat, but hopefully AR is not too flawed to be useful in looking at the concept of RPA. This is <a href="https://www.baseballthinkfactory.org/btf/scholars/visiting/articles/RPA_explanation.htm">Gimbel's RPA article</a>.<br /><br />Here is a compilation of posts from a thread on AR on FanHome. You can see the errors I made the first time, although I am not sure that the second version is much of an improvement.<br /><br />Patriot - Dec 31, 2000<br /><br />Mike Gimbel has a stat called Run Production Average. It is basically a R:PA method, but the way he gets runs is unlike any other construct I've seen. He starts by using a set of LW that reflect advancement values(or Run Driving), not total run scoring like most LW formulas. Than he adjusts half of this for the batter's Set Up ability, representing runners on base for the following batters. It is an interesting concept, but his formula has all sorts of variables that aren't available, like ROE, Balks, and WPs. So I tried to replicate his work.<br /><br />As a starting point I used the Runs Produced formula laid out by Steve Mann in the 1994 Mann Fantasy Baseball Guide. The weights are a little high compared to other LW formulas, but oh well:<br /><br />RP=.575S+.805D+1.035T+1.265HR+.345W-.115(AB-H)<br /><br />Working with this formula, I saw that Gimbel's weights were similar to the (event value-walk value), and that Gimbel's walk value was similar to the (RP run value/2) The HR value seems to be kept.. This gives Run Driving, RD, as: .23S+.46D+.69T+1.265HR+.138W<br /><br />The set-up values were similar to 1-Run Driving value, so the Set-Up Rating, which I'll call UP is (.77S+.54D+.31T-.265HR+.862W)/(AB-H) Gimbel used (AB+W) in the denominator, but outs works better.<br /><br />Then Gimbel would take UP/LgUP*RD*.5+RD*.5, thus weighting half of the RD by the adjusted UP. But I found that UP correlated better with runs scored than RD, so we get:<br /><br />AR = UP/LgUP*RD*.747+RD*.390<br /><br />Where AR is Appraised Runs, the name I gave to this thing. LgUP can be constant @ .325 if you like it better.<br /><br />Anyway, this had an AvgE in predicting team runs of 18.72, which is a little bit better than RC. So it appears as if Gimbel's work can be taken seriously as an alternative Run Production formula, like RC, LW, or BsR.<br /><br />Please note that I am not endorsing this method. I'm just playing with it.<br /><br />David Smyth - Jan 1, 2001<br /><br />There is no doubt that Gimbel's method was ahead of its time, and that it can, properly updated, be as accurate as any other RC method. It has a unique advantage in being equally applicable to any entity (league, team, or individual), I think.<br /><br />I support your effort to work on it a bit, and get rid of the odd categories he includes.<br /><br />Basically what he was saying is that part of scoring is linear, and part is not. This is in between all-linear formulas such as XR, and all non-linear ones such as RC and BsR. The new RC is 89% linear and 11% non-linear, I recall. I'm not sure what the percentage is for RPA. As a team formula, it's certainly not perfect; at theoretical extremes it will break down. The only team formula I'm aware of which doesn't have that problem is BsR. There is probably a 'compromise' between RPA and BsR which would be great. IOW, you could probably use the fixed drive-in portion from RPA, and a modification of BsR for the non-linear part.<br /><br />My position on these things is that both parts of the complete method--the run part and the win part--should be consistent with each other. For example, XR is linear and XW is non-linear. BsR is non-linear and BsW is linear. That bothers me, so I've chosen to go with a linear run estimator and BsW. Linear-linear. It's not so much a question of which is 'right'; it's a question of which frame of reference is preferable. If you want an individual frame of reference, go with RC or BsR, OWP, Off. W/L record, etc. If you want a team frame of reference, go with RPA or the new RC and XW. If you want a global (league or group of leagues) frame of reference, go with an XR-type formula and BsW. IMO, global has a simplicity and elegance which is unmatchable. Global would also include the Palmer/mgl LWts, using the -.30 type out value--another excellent choice.<br /><br />There are also methods with enhanced accuracy such as Value Added Runs, and Base Production (Tuttle). These methods require tons of data. It's all a question of where to draw the line between accuracy, the amount of work, and what you're trying to measure. I tend to draw the line in favor of simplicity, because I've yet to be convinced that great complexity really pays off.<br /><br />Patriot - Jan 2, 2001(clipped)<br /><br />Anyway, since I have it here, this is the AR stolen base version:<br />RD = .23S+.46D+.69T+1.265HR+.138W+.092SB<br />UP = (.77S+.54D+.31T-.265HR+.862W+.092SB-.173CS)/(AB-H+CS)<br />AR = UP/LgUP*RD*.737+RD*.381<br />LgUP can be held constant @ .325<br /><br />Patriot - Jun 13, 2001(clipped)<br /><br />I have been working with this again, not because I endorse the construct or method but because the first time I did one amazingly crappy job.<br /><br />For example, Ruth in 1920 has 205 RC, 191 BsR, and 167 RP. And 248 AR! Now, we don't know for sure how many runs Ruth would have created on his own, but anything that's 21% higher than RC makes me immediately suspicious.<br /><br />Anyway, the problem comes from the UP term mostly. Gimbel used AB+W as the denominator and I used AB-H. Neither of us were right. Gimbel's method doesn't give enough penalty for outs, and mine overemphasizes out making to put too much emphasis on a high OBA. The solution is to subtract .115(that is the value from RP which I based everything on) times outs from the UP numerator because every out(or at least every third out) reduces the number of runners on base to zero.<br /><br />Gimbel's RD values were also meant to estimate actual runs scored. So I applied a fudge factor to my RD to make it do the same. Anyway, this is the new Appraised Runs method:<br /><br />RD = .262S+.523D+.785T+1.44HR+.157W<br />UP = (.77S+.54D+.31T-.265HR+.862W-.115(AB-H))/(AB+W)<br />AR = UP/AvgUP*RD*.5+RD*.5 AvgUP can be held @.145<br /><br />This decreases the RMSE of the formula and also makes a better estimate IMO for extreme teams. Ruth now has 205 AR, more in line with the other estimators, although if you wanted to apply this method TT is the way to go.<br /><br />The new AR stolen base version is:<br /><br />RD = .262S+.523D+.785T+1.44HR+.157W+.079SB-.157CS<br /><br />UP = (.77S+.54D+.31T-.265HR+.862W-.115(AB-H)+.262SB-CS)/(AB+W)<br /><br />AR = UP/AvgUP*RD*.5+RD*.5 AvgUP can be held @ .140<br /><br />Corrections - July 2002<br /><br />I have had those Appraised Runs formulas for over a year now, and never bothered to check and see if they held up to the LW test. Here are the LW for AR from the +1 method for the long term ML stats(the display is S,D,T,HR,W,SB,CS,O): .52,.69,.86,1.28,.46,.19,-.57,-.106<br /><br />You can see that we have some serious problems. The single, steal, and out are pegged pretty much perfectly. But extra base hits are definitely undervalued and the CS is wildly overvalued. So, I tried to revise the formula to improve these areas.<br /><br />And I got nowhere. Eventually I scrapped everything I had, and went back to Gimbel's original values, and just corrected it for the fact that we didn't have some of his data. His RD portion worked fine, but I couldn't get his UP to work at all. Finally, I scrapped UP altogether. I decided instead to focus on the UP ratio(UP/AvgUP). This value is multiplied by half of the RD, and added to the other half of the RD to get AR. We'll call the UP/AvgUP ratio X. If you know RD, which I did based on Gimbel's work(I used his RD exactly except with a fudge factor to make it equate with runs scored, and dropping the events I didn't want/have), you have this equation:<br /><br />R = RD*.5+RD*.5*X<br /><br />Rearranging this equation to solve for X, you have:<br /><br />X = R/(RD*.5)-1<br /><br />So, with the actual X value for each team known, I set off to find a good way to estimate X. I didn't want to compare to the average anymore-if you think about it, it doesn't matter what the LgUP is, the number of baserunners on should depend only on the team's stats. So I did some regressions, found one that worked well, streamlined and edited the numbers, and wound up with these equations for AR:<br /><br />RD1 = .289S+.408D+.697T+1.433HR+.164W<br /><br />UP1 = (5.7S+8.6(D+T)+1.44HR+5W)/(AB+W)-.821<br /><br />AR1 = UP*RD*.5 + RD*.5<br /><br />RD2 = .288S+.407D+.694T+1.428HR+.164W+.099SB-.164CS<br /><br />UP2 = (5.7S+8.6(D+T)+1.44HR+5W+1.5SB-3CS)/(AB+W)-.818<br /><br />AR2 = UP*RD*.5 + RD*.5<br /><br />These equations had RMSEs on the data for 1970-1989 of 22.64 and 21.79 respectively. For comparison, Basic RC was at 24.93 and Basic ERP was at 23.08, so the formulas are quite accurate when used for real teams. The linear values were: .51,.80,1.09,1.42,.35,.187,-.339,-.106<br /><br />When applied to Babe Ruth, 1920, he had 205 AR, which is a reasonable value for an RC-like formula. Hopefully this new version of AR will turn out to be one that I can actually keep-maybe the third time is a charm.<br />phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-38677710150815729202020-02-08T10:37:00.000-05:002020-02-08T10:37:01.604-05:00The Beals Goes OnGreg Beals is entering his tenth year at the helm of the OSU baseball program, having somehow parlayed a second run to the Big Ten Tournament title in three years into a three year contract extension. This despite his overall record over those nine seasons being the worst for the program in thirty years. At some point, it becomes an exercise in masochism even to repeat these facts. Greg Beals is apparently the coach for life. <br /><br />This year, expectations are high. <u>Baseball America</u> ranked OSU #24 in their preseason Top 25, with only the forces of darkness joining them from the Big Ten at #8. <a href="https://www.baseballamerica.com/stories/2020-ncaa-top-25-preview-no-24-ohio-state/">Their explanation</a>: “Ohio State brings its entire rotation back from that team and has a star behind the plate in Dingler”. That rotation will be fronted by redshirt sophomore lefty Seth Lonsway, who is likely to be a high draft pick come June. His strikeouts (12.3 per nine) outshone his overall performance (a good but not great +9 RAA) but offer the promise of an ace-level breakout. Fellow soph Garrett Burhenn was just as effective in 2019 (+10 RAA), but with a much more pedestrian strikeout rate. Junior lefty Griffan Smith was above average (+3 RAA) and should be a solid #3. It’s easy to see why this rotation - all of whom made at least fifteen starts and topped ninety innings – is highlighted as a strength.<br /><br />The same can not be said for the bullpen, which is filled with significant question marks after Andrew Magno’s graduation. Last year, the weekday starters were by committee; only Jake Vance, now a senior, made more than three starts, and he only logged 41 innings over his 11 appearances/9 starts, and was not effective in doing so (7.90 RA). Sophomore Will Pfennig may be used as the relief ace, but also is a potential starter as he pitched 58 innings over 24 appearances in 2019.<br /><br />Grad transfer lefty Patrick Murphy pitched sparingly during his time at Marshall, and in 18 innings last year allowed 7 runs with a troubling 11/15 K/W, but he Beals loves deploying lefty specialists and he may fit the bill. A couple of sophomore righties threw hard but didn’t know where it was going (Bayden Root with11.8 K/7.2 W over 35 innings and TJ Brock with 6.7/5.8 over 31) and their lefty classmate Mitch Milheim allowed 23 runs in as many innings (Milheim is another potential starter). Senior Joe Gahm only logged 19 innings; he was effective with a 4.26 RA but his peripherals tell a different story (6.63 eRA). He is one of only four returning Buckeye pitchers who had a RA better than the conference average in 2019 – the three starters are the others, which explains my concern about the bullpen. A cadre of freshman righties (Ethan Hammerberg, Cam Hubble, Tyler Kean, Wyatt Loncar, and Yianni Skeriotis) could be in the mix, and if there’s any justice in baseball than Ethan Hammerberg is a future lockdown closer. <br /><br />Junior Dillon Dingler will handle the catching, and was <u>Baseball America</u>’s choice as preseason Big Ten Player of the Year. He did everything at the plate but hit for power last year (.291/.391/.424). His primary backup will be junior Brent Todys, who hit well enough last year to get at bats at DH where he is also penciled in as the starter for 2020 (.256/.345/.462). The four backstops on the rosters are all juniors as Dingler and Todys are joined by transfers Ronnie Allen and Archer Brookman.<br /><br />Senior Conor Pohl is the incumbent at first, coming off a very consistent two year run of middling averages and power but solid walk rates (.279/.377/.393 in 2018 and .264/.350/.396 in 2019). Senior Matt Carpenter emerged from the bench as a Beals favorite as the second baseman, but his production (.257/.300/.324) left much to be desired. Sophomore Zach Dezenzo did an admirable job with an average offensive performance (.250/.316/.440) despite being stretched at shortstop due to an injury to now-senior Noah West. He will be counted on to be a middle of the order hitter for this squad. The aforementioned West is a solid fielder who was average at the plate in 86 PA before his injury, an improvement from his first two campaigns. Sophomore Nick Erwin will be a key backup; he struggled to a .235/.288/.272 line after being pressed into duty at the hot corner when Dezenzo slid over to short. Junior transfers Colton Bauer and Sam Wilson, sophomore Aaron Hughes, and freshman Avery Fisher round out the roster.<br /><br />The outfield will have to be rebuilt as OSU’s top two offensive performers from 2019 (LF Brady Cherry and RF Dominic Canzone) are gone; they combined for a whopping 60 RAA. Also gone is center fielder Ridge Winand, although he will be easier to replace (-2 RAA). The only returning player with any significant experience is sophomore Nolan Clegg, who is penciled in to play right (.286/.348/.476 in 47 PA). The other spots are slated to go to freshman Mitchell Okuley (left) and Nate Karaffa (center), but there could be opportunities for a number of other players including juniors Jake Ruby and Scottie Seymour, redshirt freshman Alec Taylor, and true freshmen Joey Aden and Caden Kaiser.<br /><br />OSU will open the season next weekend against lower-tier northern teams (St. Joe’s, Pitt, and Indiana State) in Port Charlotte, FL, then go to Georgia Tech and Lispcomb for true road series before facing Stetson, Harvard, and Fairfield at neutral sites and North Florida on the road. March 13 is the home opener at Bill Davis Stadium with a weekend series against Liberty, with the succeeding weekend opponents being Rutgers, @ Indiana, MSU, @ the forces of darkness, Illinois, The Citadel, @ Nebraska, Maryland, @ Northwestern. Mid-week opponents include Wright State (away), Bowling Green, Toledo, Morehead State, Dayton, Miami, Ohio University, Cincinnati (away), and Xavier (away).<br /><br />Far be it from me to question <u>Baseball America</u>, but this does not look anything like a top 25 national team to me. The offense is not likely to be good; only Dingler and Dezenzo figure to be well above average performers, and the entire outfield is a question mark. The starting pitching is strong, but the depth behind it and the bullpen give less reason for optimism than the outfield, where at least hope can be placed on the shoulders of freshmen. Most of the non-weekend pitchers have already struggled; while we should expect a couple to take a step forward, they aren’t a blank slate on which to project hopes and dreams.<br /><br />And there’s nothing in the world of Buckeye baseball further from such a blank slate than Greg Beals. Beals is what he is – a coach running a middle-tier Big Ten program in perpetuity, lucking his way to Big Ten Tournament titles that satiate his apathetic athletic director and even occasionally fool the wise folks at <u>Baseball America</u>. Coach for life despite having never won a conference title – it’s good work if you can get it, but it doesn’t make for a good fan experience.<br /><br /><br /><br />phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-11181036903276613642020-02-06T19:07:00.000-05:002020-02-06T19:11:18.977-05:00Tripod: Run Estimators & Accuracy<i>See the first paragraph of <a href="https://walksaber.blogspot.com/2020/01/tripod-ability-v-value.html">this post</a> for an explanation of this series.</i><br /><br />This page covers some run estimators. It by no means includes all of the run estimators, of which there are dozens. I may add some more descriptions at a later time. Anyway, Base Runs and Linear Weights are the most important and relevant. Equivalent Runs is often misunderstood. Appraised Runs is my twist on the funny looking, flawed, but no more so than Runs Created method of Mike Gimbel.<br /><br />I guess I'll also use this page to make some general comments about run estimators that I may expand upon in the future. I posted these comments on Primer in response to an article by Chris Dial saying that we should use RC (or at least that it was ok as an accepted standard) and in which me mentioned something or the other about it being easy to understand for the average fan:<br /><br />If you want a run statistic that the general public will understand, wouldn't it be better to have one that you can explain what the structure represents?<br /><br />Any baseball fan should be able to understand that runs = baserunners *% of baserunners who score + home runs. Then you can explain that baserunners and home runs are known, and that we have to estimate % who score, and the estimate we have for it may not look pretty, but it's the best we've been able to do so far, and that we are still looking for a better estimator. So, you've given them:<br /><br />1. an equation that they can understand and know to be true<br /><br />2. an admission that we don't know everything<br /><br />3. a better estimator than RC<br /><br />And I think the "average" fan would have a much easier time understanding that the average value of a single is 1/2 a run, the average value of a walk is 1/3 of a run, the average value of an out is -1/10 of a run, then that complicated, fatally flawed, and complex RC equation. But to each his own I suppose.<br /><br />I will also add that the statement that "all RC methods are right" is simply false IMO. It is true that there is room for different approaches. But, for instance, RC and BsR both purport to model team runs scored in a non-linear fashion. They can't both be equally right. The real answer is that neither of them are "right"; but one is more "right" than the other, and that is clearly BsR. But which is more right, BsR or LW? Depends on what you are trying to measure.<br /><br />********<br /><br />When I started this page, I didn't intend to include anything about the accuracy of the various methods other than mentioning it while discussing them. A RMSE test done on a large sample of normal major league teams really does not prove much. There are other concerns which are more important IMO such as whether or not the method works at the extremes, whether or not it is equally applicable to players as teams, etc. However, I am publishing this data in response to the continuing assertation I have seen from numerous people that BsR is more accurate at the extremes but less accurate with normal teams then other methods. I don't know where this idea got started, but it is prevelant with uninformed people apparently, so I wanted to present a resource where people could go and see the data disproving this for themselves. <br /><br />I used the Lahman database for all teams 1961-2002, except 1981 and 1994 for obvious reasons. I tested 10 different RC methods, with the restricition that they use only AB, H, D, T, HR, W, SB, and CS, or stats that can be derived from those. This was for three reasons: one, I personally am not particularly interested in including SH, SF, DP, etc. in RC methods if I am not going to use them on a team; two, I am lazy and that data is not available and I didn't feel like compiling it; three, some of the methods don't have published versions that include all of the categories. As it is, each method is on a fair playing field, as all of them include all of the categories allowed in this test. Here are the formulas I tested:<br /><br />RC: Bill James, (H+W-CS)*(TB+.55SB)/(AB+W)<br /><br />BR: Pete Palmer, .47S+.78D+1.09T+1.4HR+.33W+.3SB-.6CS-.090(AB-H)<br />.090 was the proper absolute out value for the teams tested<br /><br />ERP: originally Paul Johnson, version used in "Linear Weights" article on this site<br /><br />XR: Jim Furtado, .5S+.72D+1.04T+1.44HR+.34W+.18SB-.32CS-.096(AB-H)<br /><br />EQR: Clay Davenport, as explained in "Equivalent Runs" article on this site<br /><br />EQRme: my modification of EQR, using 1.9 and -.9, explained in same article<br />For both EQR, the LgRAW for the sample was .732 and the LgR/PA was .117--these were held constant<br /><br />BsR: David Smyth, version used published in "Base Runs" article on this site<br /><br />UW: Phil Birnbaum, .46S+.8D+1.02T+1.4HR+.33W+.3SB-.5CS-(.687BA-1.188BA^2+.152ISO^2-1.288(WAB)(BA)-.049(BA)(ISO)+.271(BA)(ISO)(WAB)+.459WAB-.552WAB^2-.018)*(AB-H)<br />where WAB = W/AB<br /><br />AR: based on Mike Gimbel concept, explained in "Appraised Runs" article on this site<br /><br />Reg: multiple regression equation for the teams in the sample, .509S+.674D+1.167T+1.487HR+.335W+.211SB-.262CS-.0993(AB-H)<br /><br />Earlier I said that all methods were on a level playing field. This is not exactly true. EQR and BR both take into account the actual runs scored data for the sample, but only to establish constants. BSR's B component should have this advantage too, but I chose not to so that the scales would not be tipped in favor of BsR, since the whole point is to demonstrate BsR's accuracy. Also remember that the BsR equation I used is probably not the most accurate that you could design, it is one that I have used for a couple years now and am familiar with. Obviously the Regression equation has a gigantic advantage. <br /><br />Anyway, what are the RMSEs for each method? <br /><br />Reg-------22.56<br />XR--------22.77<br />BsR-------22.93<br />AR--------23.08<br />EQRme-----23.12<br />ERP-------23.15<br />BR--------23.29<br />UW--------23.34<br />EQR-------23.74<br />RC--------25.44<br /><br />Again, you should not use these figures as the absolute truth, because there are many other important factors to consider when choosing a run estimator. But the important things to recognize IMO are:<br /><br />* all of the legitamite published formulas have very similar accuracy with real major league teams' seasonal data<br /><br />* if accuracy on team seasonal data is your only concern, throw everything away and run a regression (the reluctance of people who claim to be totally concerned about seasonal accuracy to do this IMO displays that they aren't really as stuck on seasonal team accuracy as they claim to be)<br /><br />* RC is way behind the other methods, although I think if it included W in the B factor as the Tech versions do it would be right in the midst of the pack<br /><br />* BsR is just as accurate with actual team seasonal data as the other run estimators<br /><br />Anyway, the spreadsheet is available <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vQkR2ZShL-XnygE2IaBVVaXkejlFcTkj5rdhFPMRWA3r5RlV2XdesJaUnWHCGPmiQ/pub?output=ods">here</a>, and you can plug in other methods and see how they do. But here is the evidence; let the myths die.<br /><br />Here are some other accuracy studies that you may want to look at. One is by John Jarvis. My only quibble with it is that he uses a regression to runs on each RC estimator, but it is a very interesting article that also applies the methods to defense as well, and is definitely worth reading (NOTE: sadly this link is dead)<br /><br />And this is <a href="https://www.baseballthinkfactory.org/btf/scholars/furtado/articles/accuracy.htm">Jim Furtado's article</a> as published in the 1999 BBBA. He uses both RMSE and regression techniques to evaluate the estimators. Just ignore his look at rate stats--it is fatally flawed by assuming there is a 1:1 relationship between rate stats and run scoring rate. That is pretty much true for OBAxSLG only and that is why it comes in so well in his survey.phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-18121129684575323992020-01-28T18:21:00.002-05:002020-01-28T18:21:42.719-05:00Tripod: Common Fallacies<i>See the first paragraph of <a href="https://walksaber.blogspot.com/2020/01/tripod-ability-v-value.html">this post</a> for an explanation of this series.</i><br /><br />Here I deal with some misinformation that is sometimes spread about sabermetrics, or poorly designed statistical methods that are against sabermetric principles. The most important things to remember about sabermetrics are 1) that it is not the numbers themselves that matter, it is what the numbers mean and 2) the only thing that matter is wins, and the only things that lead to wins are runs and outs. Those two principles serve to explain most of the folly behind these fallacies.<br /><br /><u>The "Bases" Fallacy</U><br /><br />There are many methods proposed, by many different people, that use bases and outs as the two main components. These include Boswell's Total Average, Offense Ratio, Codell's Base Out Percentage. There are others too, either looking at bases/out or bases/PA. Not all of the people who have designed these methods fall into the fallacy. Specifically I'll look at John McCarthy and his 1994 book from Betterway Books, <u>Baseball's All-Time Dream Team</u>.<br /><br />McCarthy rates the great players of all time by what he calls the Earned Bases Average. EBA = (TB + W + SB - CS)/(AB + W). McCarthy mentions that he has read the sabermetric research, but that the sabermetric work is too difficult for the average fan to understand. He goes on to talk about how Linear Weights puts a HR as 3.15 times more valuable than a single, a triple 2.2, a double 1.7, and so on. He then says, "I believe that the value of a baseball game is more than just runs and winning. Winning is the player's aim, but there is also a transcendent beauty to great hits. It is that beauty that puts fans into the seats and visions of grandeur into kids' fantasies. A home can immeasurably lift the spirits of the team, or take the wind out of opponents. So I challenge a mathematical concept which devalues the extra bases earned by sluggers and speedsters."<br /><br />Now, Mr. McCarthy may indeed have a point when he speaks of "grandeur" and stuff like that. It is OK if you want to design a method to measure the grandeur of players. Just don't get that confused with what actually wins baseball games. He later explains that the estimated values are not "tangible or real", and that "they are too complicated and many times are just clearly wrong." Sorry, buddy, it is you who are clearly wrong. A baseball game is not played in a vacuum. A player must interact with his teammates. The situations that occur by runners and outs effect the value of offensive events. Sure they are not always constant. That is why you must decide what you are measuring, be it ability or value, and choose value added runs or context neutral runs. But the fact is, a home run is not four times more valuable than a single. It just isn't. And a stolen base is clearly not as valuable as a single, because it advances just one baserunner by one base, whereas a single advances the hitter by one base, and advances most runners by at least one and sometimes two bases. Plus it gives an extra Plate Appearance to the team's offense. A stolen base does none of this.<br /><br />The basic problem with McCarthy's thinking is that bases are not what matters. The game may be called baseball, but the winner is not the one with the most bases but the one with the most runs. You must relate everything to run scoring eventually if you want to really approximate its value. And TA and EBA and the like can be decent estimators of runs. But all bases are not created equal. A SB is worth always at least one base and a HR at least four. But a SB can only be worth one base and a HR can be worth as many as ten bases. The EBA concept is assuming that the only bases that matter are the one that individual genereates for himself, but again, no player is an island. Everything eventually comes down to runs and outs, not bases.<br /><br /><u>The Right-Handed Hitter Adjustment Fallacy</U><br /><br />This is one that you can try to sneak by people. After all, sabermetricians seeming like to adjust for everything, whether or not it needs to be adjusted for, right? So, since there are more right-handed pitchers than southpaws, and righties hit worse versus righties, shouldn't they get credit for dealing with this disadvantage? No way, Jose.<br /><br />Well, I suppose that if you want to measure literal ability, you want a right handed adjustment. But literal ability had nothing to do with winning baseball games. It has to do with batting practice and skills competitions, and jaw dropping, but not winning. Just as, because of the dynamics of baseball, not all bases are created equal, a lefty hitter is worth more than a righty of the same literal ability, assuming the normal left/right effect holds for them both. I view this extra credit for righties as tantamount to giving credit for ability to play the banjo. I mean, if I had a clone, the same as me in every way, except he could play the banjo and I couldn't, that would make him a more interesting guy than me, no? Sure. What does playing the banjo have to do with winning baseball games? About the same amount as being right-handed.<br /><br />Seriously, being a right-handed hitter in baseball is a small handicap, just as being unable to hit home runs is a handicap, and having an 85 mph fastball is not as good as a 90 mph fastball. It is a great deal like if we gave Muggsy Bouges extra credit for being 5"5. That certainly hurts his stats, so why don't we adjust for it? Because it's a fact of life that these things are disadvantages, and the goal of baseball is to win games, not to look good.<br /><br />Here is an example of a biased man who manipulates the numbers in this way. Giving Jim Rice 73% of his PAs vs. lefties is stupid, because 73% of the plate appearances pitched in baseball are not by lefty pitchers. <br /><br /><u>The Fallacy of the Ecological Fallacy</U><br /><br />From time to time, someone who has a background in formal statistics will claim that applying various measures tested at the team-level to individual players(usually a run estimator) is falling prey to the Ecological Fallacy and is thus invalid.<br /><br />Not having a formal statistics background, it may be hazardous to talk about something that I don’t fully understand. But I can tell you that to the extent that I understand the ecological fallacy, the idea that it applies to individual runs created estimates is hokum.<br /><br />According to this link, the ecological fallacy occurs when “making an unsupported generalization from group data to individual behavior”. They then use an example of voting. One community has 25% who make over $100K a year, and 25% who vote Republican. Another has 75% who make over $100K and 75% who vote Republican. To use this data to conclude that there is a perfect correlation between individuals voting Republican and making over $100K would be the ecological fallacy. In fact, they show how the data could be distributed so that the correlation between individuals voting Republican and making over $100K is actually negative.<br /><br />People will then go on to claim that since Runs Created methods are tested on teams, it is wrong to apply them to individuals and assume accuracy. It is true that multiplicative methods like Runs Created and Base Runs make assumptions about how runs are created that are true when applied to teams but cannot be applied to individuals(the well-documented problem of driving yourself in; Barry Bonds’ high on base factor interacts with his high advancement factor in RC, but in reality interacts with the production of his teammates). It is also true that regression equations have many potential pitfalls when applied to teams, let alone taking team regressions and applying them to individuals. However, these limitations are well known by most sabermetricians (although some stubbornly continue to use James’ RC for individual hitters).<br /><br />The ecological fallacy claim, though, is extended by some to every run estimator that is verified against team data. The claim is that there “need not be little to no connection between team-level functions and player-level functions”. I also saw a critic point out once that run estimators did not do a good job of predicting individual runs scored.<br /><br />My retort was that the low temperature today in Mozambique did not do a good job of predicting individual runs scored either. To assume that the team runs scored function and the individual runs scored function are the same is to be ignorant of the facts of baseball. A walk and a single have an equal run-scoring value for an individual, and a home run will always have an individual run-scoring value of 1. This is not true for a team, because, except in the case of the home run, it takes another player to come along and drive his teammate in. In the team case, all of these individuals stats are aggregated. The home run by one batter not only scores him, it scores any teammates on base. And therefore the act of scoring runs, for a team, incorporates advancement value as well. A single will create more runs, in average circumstances, then will a walk.<br /><br />Therefore, when we have a formula that estimates runs scored for a team, it does not estimate the same function as runs scored for a player. It instead approximates another function that we choose to call “runs created” or “runs produced” or what have you. Now it could be claimed, I suppose, that the runs created function cannot be applied to individuals? But why not? If a double creates .8 runs for a team, and a hitter hits a double, why can’t we credit him with creating .8 of the team’s runs? All we are doing is assigning what we know are properly generated coefficients for the team to the player who actually delivered them. Or you can look at it, in the case of theoretical team RC, that we are isolating the player’s contribution by comparing team runs scored with him to team runs scored without him.<br /><br />Furthermore, the individual runs created function and the team runs scored function are the same function. They have to be. Who causes the team to score runs, the tooth fairy? In the case of the voting situation which was said to be the ecological fallacy, you are artificially forming groups of people that don’t actually interact with each other. I can vote Republican, and you can vote Republican, but we’re not working together in that. You can vote Democrat and I can still vote Republican; our choices are independent. Then you make this group that voted Republican, and look at the their income, and yes, you can reach misleading conclusions.<br /><br />The point I’m trying to make is that voting is not a community-level function, and therefore it is wrong to attribute the community level data pattern to individuals. People vote as individuals, not as communities. But scoring runs is a team-level function. People create runs as teams, each contributing. If we use a different voting analogy, that of the electoral college, people cast electoral votes as states. And therefore we can break down how much of the electoral vote of Montana that each citizen was responsible for(one share of however many if they voted for the winning candidate, zero if they did not). And that’s what we are doing by looking at individual runs created.<br /><br />I think the problem, and I don’t mean this to apply to all statisticians who dabble in sabermetrics, but to some, particularly those who don’t have a strong traditional sabermetric background to go along with their statistical knowledge, is that they tend to take all of the things they know can often happen in statistical practice and apply them to sabermetrics, without seeing whether the conditions are in place. In the same way, they will use statistical methods like regression when they are not necessary. If you are studying phenomenon that you don’t have a good theory on, then regression can be a great tool. But if you are studying a baseball offense, you’re better off constructing a logical expression of the run scoring process like Base Runs or using the base/out table to construct Linear Weights. You don’t need a regression to ascertain the run values of events--baseball offenses are complex, but they are not nearly as complex as many of the other phenomenons in the world.<br /><br /><a href="http://www.janda.org/c10/Lectures/topic08/L30-ecological%20fallacy/"><br />Explanation of Ecological Fallacy</a><br /><br /><a href="https://www.baseballthinkfactory.org/newsstand/discussion/espncom_mlb_mariners_give_boone_a_three_year_deal/">Ec. Fallacy claim applied to RC</a>phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-87288806028828455162020-01-27T17:12:00.000-05:002020-02-08T10:11:07.055-05:00Tripod: Ability v. Value<i>Since I haven't been producing much in the way of new content, I've decided to re-publish some of the articles I posted on my old Tripod website (see link on the side of the page if interested, or just wait for all of the content to show up here). I don't know how long that platform will exist, so the objective is to move stuff over to this blog to preserve it for myself. It's all old - most of it was written between 2001 - 2005, as when this blog started I switched to posting here. When I first started this blog, I had the crazy notion that the content would flow the other way - that I would convert blogposts into "article" format and move them to Tripod site. This piece may include the only successful such migration in the addendum, which first appeared on this blog. I've since written about most of these topics again here, and I certainly think my later work is better, more correct, etc. than the old stuff. I have not done any editing, so there are typos and "thens" in the places of "thans" and the like. I will be putting "Tripod:" in the tile of these re-posts. There will certainly be some statements that didn't age well - see "literal ability" below for a prime example.</i><br /><br />This is a topic that gets brought up all the time, both directly and indirectly, in sabermetric circles. If you are discussing the rankings of players, Park Factors, era adjustments, or clutch hitting, this debate will quickly become an issue. Each definition of what we are trying to measure has certain things that should and shouldn't go with it, and so you need to clearly define what you are looking for before you start arguing about it. All of the different definitions are valid and useful things; but which one you are most interested in depends on your preferences and opinions. I personally am most interested in performance, with ability and value both being things I like to look at as well. Literal value or ability does not interest me at all(actually, literal ability probably doesn't interest many sabermetricians at all, because that's what scouts are for and they can probably do it better than we can, although not objectively). So here are the five definitions that I consider:<br /><br /><u>Literal Value</u><br /><br />In a literal value method, you are looking to find the actual value of the player to his team. This means that if the player gets lucky in clutch situations or is used by his manager in a way that enhances his value beyond that of a player with identical basic stats who works in a less valuable situation(like being a closer v. setup man), you take this into account. Literal value is best measured through calculating the players impact on the Win Expectancy of the team, although Run Expectancy methods can also fall under this category. Examples of literal value stats include the Mills brothers' Player Win Average and Tom Ruane's Value Added Batting Runs.<br /><br /><u>Value</u><br /><br />A value method uses conventional statistics, but attempts to do a similar thing with those as the literal value method did-determine how much the player has actually contributed to his team in terms of winning. The basic difference is the lack of a play by play database. It is impossible to implement a literal value system for, say, 1934 because the data that is required just doesn't exist. But in this category, if you have data like batting with runners in scoring position, you can include this. Or considering saves instead of just innings and runs allowed. Many value stats will try to reconcile the individual contributions with those of the team. Some examples of value stats are Bill James' Win Shares and Linear Weights modified for the men on base situations as Tango Tiger does.<br /><br /><u>Performance</u><br /><br />Performance is the category that I am most interested in. In a performance method, you try to ascertain the players performance, based on his basic stats and with no consideration for what game situation they occurred under. A home run in a 15-2 game is just as valuable as a home run in a 2-2 game. A solo home run is equal in value to a game-winning grand slam. This is clearly wrong if you want to determine the players actual value, but many sabermetricians believe that clutch hitting effects are luck, so the method will correlate better from year to year if you look at all events equally. An appropriate Park Factor to couple with a performance measure is a run based park factor, although the line between performance and ability is somewhat blurred, so you could also use a specific event park factor. Some examples of performance measures are Pete Palmer's Total Player Rating, Keith Woolner's VORP, and Jim Furtado's Extrapolated Wins.<br /><br /><u>Ability</u><br /><br />An ability method attempts to remove the player from his actual context completely and put him on an average team. The only proper park factors for an ability method are those that deal with each event separately, since a player can be hurt by playing in a park that doesn't fit his skills, like Juan Pierre with the Rockies. Here, you account for that. Other than that, an ability method will wind up being very similar to performance measures. I can't think of a pure ability method that is commonly used.<br /><br /><u>Literal Ability</u><br /><br />Tango Tiger has called this skill, and that is a good description as well. Literal ability is not really quantifiable in sabermetrics. You can attempt to find a players' literal ability in a certain area of his game, like using Speed Score or Isolated Power. But a players total literal ability is hard to put your finger on. This is what scouts measure-they don't pay attention to the actual results the players put up, but rather how they look while doing it. Actually, if you wanted to do a sabermetric measure for literal ability, there are a host of other factors to consider. For example, I write about the silliness of adjusting for whether a player is right or left handed. This is all assuming you are measuring something other than literal ability. In a literal ability sense, a right handed hitter could be better than a left handed hitter in terms of their pure skills like speed and power, but be less valuable on the field because of the dynamics of the game.<br /><br />If we can all decide which of the five we are interested in measuring, a lot of silly arguments can be prevented. People frequently criticize the Park Factors in Total Baseball because they are uniformly applied to all players, regardless of whether they hit lefty or righty or whether they have power or not. In terms of literal value, value, or performance, this is a proper decision. But if you want to measure ability, it is an incorrect Park Factor to use.<br /><br /><u>Additional Thoughts(added 12/05)</u><br /><br />In the above article, I defined two classes of value, "Literal Value" and the regular "Value". Literal value, as I define it, involves only methods that track actual changes in run and win expectancy, like Value-Added Batting Runs or Win Probability Added. Value includes methods which use composite season statistics, but give credit for things like hitting with runners in scoring position or a pitcher who pitches in a lot of high leverage situation.<br /><br />I also broke down ability into "Ability" and "Literal Ability". Ability is defined as "theoretical value", i.e. the value that a player would be expected to accumulate, on average, if he played in a given set of circumstances. Usually this would be our expectation for a player in a neutral park, but it could be "ability to help the team win games in Coors Field" or "in 1915" or "batting fifth in a lineup with A, B, C, and D hitting ahead of him and E, F, G, and H hitting behind him". There are all sorts of different ways you could define ability, but the mathematical result you get will be specific for the context you choose.<br /><br />Literal ability goes even further, and attempts to distill the player's skill in a given area of the game (such as power, or speed, or drawing walks), or his "overall ability". This is very tricky, because nothing happens in a vacuum, everything happens in some sort of context, and so divorcing a metric from context is pretty much impossible. Therefore literal ability is more of a theoretical concept and not a measurable quantity (although methods like Speed Score are an attempt to measure literal ability in speed, but of course are acknowledged by their creators as approximations).<br /><br />Anyway, to generalize, value is backwards-looking, and ability is forwards-looking (or at least what might have happened in a different context given the same production in a given timeframe).<br /><br />The recent signing of BJ Ryan to a large contract by the Blue Jays has put the issue of when to time the value measurement into my head. Literal value methods like Win Probability Added value on a real-time basis. If at a given moment the probability of winning is 60%, and after the next play it increases to 62%, then the player responsible for that play is said to have added .02 wins. So a closer, who pitches as the highest leverage time, will come out with a higher WPA then a starter who had the same performance in the same number of innings.<br /><br />But if we are ascertaining value after the fact, why do we have to do it in real time? Suppose that Scott Shields is called in to pitch on the road in the bottom of the seventh inning with a one-run lead. According to Tango Tiger's <a href="http://www.tangotiger.net/welist.html">WE chart</a>, the win probability is .647. He retires the side and at the end of the inning, the probability is .732, so he is +.085. He starts the eighth inning with a probability of .704, retires the side, and leaves with a probability of .842, so he is +.138 for the inning and +.223 for the game. In the bottom of the ninth, it is still a one-run game and Francisco Rodriguez is summoned with a win probability of .806. He finishes it off and of course the win probability is then 1, so he is +.194 wins. So Shields, for two innings of scoreless work, only gets .029 more wins then Rodriguez did in one inning. Is this fair? Sure, if you define value real-time. Rodriguez pitched in a more critical situation and his performance did more to increase the real-time win probability.<br /><br />But since we are looking backwards, why can't we step back and, now, omniscient about what happened in the game, ascertain what value the events actually had? Each out in the game had a win value of 1/27, and since neither allowed any runs or anything else, we don't have to consider that. So Shields should have added 6/27 wins and Rodriguez 3/27. Viewed from the post-game perspective, Shields performance is much more valuable then Rodriguez'. Now you could also argue that if you took this perspective far enough, any event that didn't lead to a run in the end(like a hit that does not score) has no value. And that's a possible outcome of this school of thought.<br /><br />Now the point is not that real-time value determinations are incorrect or invalid. They are simply a different way of defining literal value. But I would contend that they are not the only way to define literal value. It is one of the easiest to explain and define, and it certainly makes sense. I'm not arguing against it, just arguing that it is not an undeniable choice for what I have called "literal value". Of course, you can define "value" or "literal value" reasonably, in such a way as to make it an obvious choice.phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-10324958903114009942020-01-13T19:13:00.000-05:002020-01-13T19:13:45.865-05:00Run Distribution and W%, 2019In 2019, the major league average was 4.83 runs/game. It was distributed thusly:<br /><br /><a href="https://2.bp.blogspot.com/-sM-iJN-iUrg/Xh0FpJ07ujI/AAAAAAAACu0/2ml7K3mnYv8VxjefTy7Y9aBcqzcoxGfvwCLcBGAsYHQ/s1600/rd19a.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-sM-iJN-iUrg/Xh0FpJ07ujI/AAAAAAAACu0/2ml7K3mnYv8VxjefTy7Y9aBcqzcoxGfvwCLcBGAsYHQ/s400/rd19a.JPG" width="400" height="357" data-original-width="576" data-original-height="514" /></a><br /><br />The “marg” column shows the marginal W% for each additional run scored. The mode of runs scored was four, and the fourth run was also the most valuable marginal run; 4.83 is a fairly high scoring environment and not surprisingly these are both higer than the comparable figures from recent seasons.<br /><br />The Enby distribution (shown below for 4.85 R/G) did its usual decent job of estimating that distribution given the average; underestimating shutouts and one run games while underestimating the frequency of games with two to four runs scored is par for the course, but I dare say it’s still a respectable model:<br /><br /><a href="https://1.bp.blogspot.com/-Lj7i_mWtlCk/Xh0GGIEzXQI/AAAAAAAACu8/M8fubPUrwYIMvEjSlgWnmDcvLXVe3ILMwCLcBGAsYHQ/s1600/rd19b.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-Lj7i_mWtlCk/Xh0GGIEzXQI/AAAAAAAACu8/M8fubPUrwYIMvEjSlgWnmDcvLXVe3ILMwCLcBGAsYHQ/s400/rd19b.JPG" width="208" height="400" data-original-width="248" data-original-height="476" /></a><br /><br /><a href="https://2.bp.blogspot.com/-_BJYanjZBAg/Xh0GQrlpBfI/AAAAAAAACvA/3uRIlc8JgE4GtFN6oIbolTzVVKvM3AC1gCLcBGAsYHQ/s1600/rd19c.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-_BJYanjZBAg/Xh0GQrlpBfI/AAAAAAAACvA/3uRIlc8JgE4GtFN6oIbolTzVVKvM3AC1gCLcBGAsYHQ/s400/rd19c.JPG" width="400" height="228" data-original-width="1077" data-original-height="615" /></a><br /><br />One way that you can use Enby to examine team performance is to use the team’s actual runs scored/allowed distributions in conjunction with Enby to come up with an offensive or defensive winning percentage. The notion of an offensive winning percentage was first proposed by Bill James as an offensive rate stat that incorporated the win value of runs. An offensive winning percentage is just the estimated winning percentage for an entity based on their runs scored and assuming a league average number of runs allowed. While later sabermetricians have rejected restating individual offensive performance as if the player were his own team, the concept is still sound for evaluating team offense (or, flipping the perspective, team defense).<br /><br />In 1986, James sketched out how one could use data regarding the percentage of the time that a team wins when scoring X runs to develop an offensive W% for a team using their run distribution rather than average runs scored as used in his standard OW%. I’ve been applying that concept since I’ve written this annual post, and last year was finally able to implement an Enby-based version. I will <a href="https://walksaber.blogspot.com/2019/09/enby-distribution-pt-11-game-expected-w.html">point you here</a> if you are interested in the details of how this is calculated, but there are two main advantages to using Enby rather than the empirical distribution:<br /><br />1. While Enby may not perfectly match how runs are distributed in the majors, it sidesteps sample size issues and data oddities that are inherent when using empirical data. Use just one year of data and you will see things like teams that score ten runs winning less frequently than teams that score nine. Use multiple years to try to smooth it out and you will no longer be centered at the scoring level for the season you’re examining.<br /><br />2. There’s no way to park adjust unless you use a theoretical distribution. These are now park-adjusted by using a different assumed distribution of runs allowed given a league-average RA/G for each team based on their park factor (when calculating OW%; for DW%, the adjustment is to the league-average R/G).<br /><br />I call these measures Game OW% and Game DW% (gOW% and gDW%). One thing to note about the way I did this, with park factors applied on a team-by-team basis and rounding park-adjusted R/G or RA/G to the nearest .05 to use the table of Enby parameters that I’ve calculated, is that the league averages don’t balance to .500 as they should in theory. The average gOW% is .489 and the average gDW% is .510. <br /><br />For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 3 wins between the two metrics were (all of these are the g-type less the regular estimate, with the teams in descending order of absolute value of the difference): <br /><br />Positive: None<br />Negative: HOU, OAK, LA<br /><br />I used to show +/- 2, but with the league gOW% being .490, there’s nothing abnormal about a two win difference (at least on the negative side). Were I more concerned with analysis rather than the concept, I would take some stronger efforts to clean up this issue with more precise application of park factors and Enby coefficients, but I consider this post to be more of an annual demonstration of concept.<br /><br />Teams with differences of +/- 3 defensive wins were:<br /><br />Positive: PIT, MIL, SEA, BAL<br />Negative: None<br /><br />I usually run a graph showing the actual v. Enby run distribution of the team with the biggest gap on offense or defense, which was Houston’s offense. However, I don’t find their situation particularly compelling, as they had a handful of games with a lot of runs scored, which is easy to understand. More interesting is Pittsburgh, whose defensive run distribution produced a .445 gDW% but only a .416 DW% (the graph shows the Enby probabilities for a team that allowed 5.6 R/G using c = .852, which is used to calculate the gOW%/gDW% estimates):<br /><br /><a href="https://2.bp.blogspot.com/-dmRXV2fEX1k/Xh0HAtHfg5I/AAAAAAAACvQ/GYpN_fjUr7c62ahyNAjPzKwATkhVnPuGACLcBGAsYHQ/s1600/rd19d.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-dmRXV2fEX1k/Xh0HAtHfg5I/AAAAAAAACvQ/GYpN_fjUr7c62ahyNAjPzKwATkhVnPuGACLcBGAsYHQ/s400/rd19d.JPG" width="400" height="228" data-original-width="1054" data-original-height="601" /></a><br /><br />Even as someone who has looked at a lot of these, it's hard to articulate why this was a good thing for Pittsburgh (good in the sense that their runs allowed distribution should have resulted in more wins than a typical runs allowed distribution for a team allowing 5.6 per game). The Pirates had slightly more shutouts and one run allowed games than you’d expect given their overall RA/G, but the they had many more two run games, which are games that a team with an average offensive in a 4.75 R/G environment (which is the league average after adjusting for the PIT park factor) should have won 82.0% of the time. They gave some of this advantage back by giving up three runs more often (still a good amount to allow with a .686 W%), but they also had more four run games (still a .543 W%). They allowed 5-9 runs less often then expected, and those are games that they would be expected to lose (especially 6+, as the expected W% drops from .409 when allowing five to .296 when allowin six).<br /><br />I don’t have a good clean process for combining gOW% and gDW% into an overall gEW%; instead I use Pythagenpat math to convert the gOW% and gDW% into equivalent runs and runs allowed and calculate an EW% from those. This can be compared to EW% figured using Pythagenpat with the average runs scored and allowed for a similar comparison of teams with positive and negative differences between the two approaches:<br /><br />Positive: MIL, BAL<br />Negative: OAK, BOS, LA<br /><br />The table below has the various winning percentages for each team:<br /><br /><a href="https://2.bp.blogspot.com/-tNQYdQuJq4g/Xh0HosfFxxI/AAAAAAAACvY/pNd2SJn5Lw0iqo1LlbPLX5YuJNeLnvjhACLcBGAsYHQ/s1600/rd19e.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-tNQYdQuJq4g/Xh0HosfFxxI/AAAAAAAACvY/pNd2SJn5Lw0iqo1LlbPLX5YuJNeLnvjhACLcBGAsYHQ/s400/rd19e.JPG" width="302" height="400" data-original-width="444" data-original-height="589" /></a>phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-7690197774704125492019-12-23T07:59:00.000-05:002019-12-23T07:59:07.014-05:00Crude Team Ratings, 2019Crude Team Rating (CTR) is my name for a simple methodology of ranking teams based on their win ratio (or estimated win ratio) and their opponents’ win ratios. A full explanation of the methodology is <a href="http://walksaber.blogspot.com/2011/01/crude-team-ratings.html">here</a>, but briefly:<br /><br />1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.<br /><br />2) Figure the average win ratio of the team’s opponents.<br /><br />3) Adjust for strength of schedule, resulting in a new set of ratings.<br /><br />4) Begin the process again. Repeat until the ratings stabilize.<br /><br />The resulting rating, CTR, is an adjusted win/loss ratio rescaled so that the majors’ arithmetic average is 100. The ratings can be used to directly estimate W% against a given opponent (without home field advantage for either side); a team with a CTR of 120 should win 60% of games against a team with a CTR of 80 (120/(120 + 80)).<br /><br />First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS: <br /><br /><a href="https://1.bp.blogspot.com/-P8ivEa81XfE/Xf5RU1m5TnI/AAAAAAAACuA/u11CquMj_nUsa0GkWJq0bVTWFZnzUicMgCLcBGAsYHQ/s1600/CTR19A.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-P8ivEa81XfE/Xf5RU1m5TnI/AAAAAAAACuA/u11CquMj_nUsa0GkWJq0bVTWFZnzUicMgCLcBGAsYHQ/s400/CTR19A.JPG" width="274" height="400" data-original-width="402" data-original-height="586" /></a><br /><br />The ten playoff teams almost occupied the top ten spots, Cleveland just barely edging out Milwaukee.<br /><br />I’ve switched how I aggregate for division/league ratings over time, but I think I’ve settled on the right approach, which is just to take the average aW% for each:<br /><br /><a href="https://1.bp.blogspot.com/-a0skgLGTZ0Y/Xf5RZm9HINI/AAAAAAAACuE/Qki-ar_htckr0K9yaZZamyrxbfw1yPH1wCLcBGAsYHQ/s1600/CTR19B.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-a0skgLGTZ0Y/Xf5RZm9HINI/AAAAAAAACuE/Qki-ar_htckr0K9yaZZamyrxbfw1yPH1wCLcBGAsYHQ/s400/CTR19B.JPG" width="349" height="400" data-original-width="150" data-original-height="172" /></a><br /><br />It was finally the NL’s year to top the AL, with the latter dragged down by the three worst teams, including Detroit which I believe turned in the lowest CTR since I’ve been calculating these. Amazingly, the AL Central was actually better than in 2019, increasing their average aW% from .431. This was because the Indians were slightly better in 2019 (116 to 113 CTR), the Twins shot from 85 to 140, and the White Sox graduated from horrible to merely bad (58 to 74).<br /><br />The CTRs can also use theoretical win ratios as a basis, and so the next three tables will be presented without much comment. The first uses gEW%, which is a measure I calculate that looks at each team’s runs scored distribution and runs allowed distribution separately to calculate an expected winning percentage given average runs allowed or runs scored, and then uses Pythagorean logic to combine the two and produce a single estimated W% based on the empirical run distribution: <br /><a href="https://4.bp.blogspot.com/-FWVbni4430Y/Xf5ReWpMpxI/AAAAAAAACuI/2PIJ8wvLvBUQIeil27cWW1yzNDYYUxR3wCLcBGAsYHQ/s1600/CTR19C.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-FWVbni4430Y/Xf5ReWpMpxI/AAAAAAAACuI/2PIJ8wvLvBUQIeil27cWW1yzNDYYUxR3wCLcBGAsYHQ/s400/CTR19C.JPG" width="271" height="400" data-original-width="400" data-original-height="591" /></a><br /><br />Next EW% based on R and RA:<br /><br /><a href="https://4.bp.blogspot.com/-PbslevlH-6k/Xf5RorG6o3I/AAAAAAAACuQ/_311M9T5OPQXSC1ylAZ1X13AV1D69OQmgCLcBGAsYHQ/s1600/CTR19D.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-PbslevlH-6k/Xf5RorG6o3I/AAAAAAAACuQ/_311M9T5OPQXSC1ylAZ1X13AV1D69OQmgCLcBGAsYHQ/s400/CTR19D.JPG" width="273" height="400" data-original-width="402" data-original-height="590" /></a><br /><br />And PW% based on RC and RCA:<br /><br /><a href="https://1.bp.blogspot.com/-JMYkaaR0D9M/Xf5RuUqq8tI/AAAAAAAACuY/5mhw4vU9OCkgmOrYJFh0jLs-PrgGYqjVgCLcBGAsYHQ/s1600/CTR19E.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-JMYkaaR0D9M/Xf5RuUqq8tI/AAAAAAAACuY/5mhw4vU9OCkgmOrYJFh0jLs-PrgGYqjVgCLcBGAsYHQ/s400/CTR19E.JPG" width="273" height="400" data-original-width="402" data-original-height="589" /></a><br /><br />The final set of ratings is based on actual wins and losses, but includes the playoffs. I am not crazy about this view; while it goes without saying that playoff games provide additional data regarding the quality of teams, I believe that the playoff format biases the ratings against teams that lose in the playoffs, particularly in series that end well before the maximum number of games. It’s not that losing three straight games in a division series shouldn’t hurt a team’s rating, it’s that terminating the series after three games and not playing out the remaining creates bias. Imagine what would happen to CTRs based on regular season games if a team’s regular season terminated when they fell ten games out of a playoff spot. Do you think this would increase the variance in team ratings? The difference between the playoffs and regular season on this front is that the length of the regular season is independent of team performance, but the length of the playoffs is not.<br /><br />My position is not that the playoffs should be ignored altogether, but I don’t have a satisfactory suggestion on how to correct the playoff-inclusive ratings for this bias without injecting a tremendous amount of my own subjective approach into the mix (one idea would be to add in the expected performance over the remaining games of the series based on the odds implied from the regular season CTRs, but of course this is begging the question to a degree). So I present here the ratings including playoff performance, with each team’s regular season actual CTR and the difference between the two:<br /><br /><a href="https://2.bp.blogspot.com/-qaNqzk2HXXs/Xf5RyuyC7bI/AAAAAAAACuc/r1xIneBtzwAPlwXmf1kz712OHzlgAr3zQCLcBGAsYHQ/s1600/CTR19F.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-qaNqzk2HXXs/Xf5RyuyC7bI/AAAAAAAACuc/r1xIneBtzwAPlwXmf1kz712OHzlgAr3zQCLcBGAsYHQ/s400/CTR19F.JPG" width="359" height="400" data-original-width="531" data-original-height="591" /></a>phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-56447903725262521172019-12-13T16:01:00.000-05:002019-12-13T16:01:56.707-05:00Hitting by Position, 2019The first obvious thing to look at is the positional totals for 2018, with the data coming from Baseball-Reference.com. "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the total for all positions, including pitchers (but excluding pinch hitters). “LPADJ” is the long-term positional adjustment that I am now using, based on 2010-2019 data (see more below). The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively: <br /><br /><a href="https://4.bp.blogspot.com/-Zdxh7YiqKok/XfP61J_xS9I/AAAAAAAACtE/VJH4C3-lxmk7ZP_wH0Oq48Fyu-f8ftNNQCLcBGAsYHQ/s1600/hitpos19a.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-Zdxh7YiqKok/XfP61J_xS9I/AAAAAAAACtE/VJH4C3-lxmk7ZP_wH0Oq48Fyu-f8ftNNQCLcBGAsYHQ/s400/hitpos19a.JPG" width="400" height="122" data-original-width="1002" data-original-height="305" /></a><br /><br />There’s nothing too surprising here, although third basemen continue to hit above their historical norm and corner outfielders outhit 1B/DH ever so slightly.<br /><br />All team figures from this point forward in the post are park-adjusted. The RAA figures for each position are baselined against the overall major league average RG for the position, except for left field and right field which are pooled. NL pitching staffs by RAA (note that the runs created formula I use doesn’t account for sacrifice hits, which matters more when looking at pitcher’s offensive performance than any other breakout you can imagine):<br /><br /><a href="https://4.bp.blogspot.com/-jqjCHpJZkJo/XfP7CDk5WFI/AAAAAAAACtI/ayUklF5xBEgfcHxQMxL_qbGiYKuD272lwCLcBGAsYHQ/s1600/hitpos19b.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-jqjCHpJZkJo/XfP7CDk5WFI/AAAAAAAACtI/ayUklF5xBEgfcHxQMxL_qbGiYKuD272lwCLcBGAsYHQ/s400/hitpos19b.JPG" width="400" height="354" data-original-width="345" data-original-height="305" /></a><br /><br />This range is a tad narrower than the norm which is around +/- 20 runs; no teams cost themselves a full win at the plate. This is the second year in a row in which this is the case; of course as innings pitched by starters decline, the number of plate appearances for pitchers does as well.<br /><br />The teams with the highest RAA at each position were:<br /><br />C—SEA, 1B—NYN, 2B—MIL, 3B—WAS, SS—HOU, LF—WAS, CF—LAA, RF—MIL<br /><br />Usually the leaders are pretty self-explanatory, although I did a double-take on Seattle catchers (led by Omar Narvaez with a quietly excellent 260 plate appearances from Tom Murphy) and Milwaukee second basemen (combination of Keston Hiura and Mike Moustakas). I always find the list of positional trailers more interesting (the player listed is the one who started the most games at that position; they usually are not solely to blame for the debacle ):<br /><br /><a href="https://3.bp.blogspot.com/-lAMBlU05U9s/XfP7M56SQYI/AAAAAAAACtQ/mNt65-NRkZwZxoOKLAXvLsFoIGa-tRUzQCLcBGAsYHQ/s1600/hitpos19c.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-lAMBlU05U9s/XfP7M56SQYI/AAAAAAAACtQ/mNt65-NRkZwZxoOKLAXvLsFoIGa-tRUzQCLcBGAsYHQ/s400/hitpos19c.JPG" width="400" height="177" data-original-width="431" data-original-height="191" /></a><br /><br />Four teams hogged eight spots to themselves, kindly leaving one leftover for another AL Central bottom feeder. Moustakas featured prominently in Milwaukee’s successful second base and dreadful third base performances, but unfortunately Travis Shaw was much more responsible for the latter (503 OPS in 66 games! against Moustakas’ solid 815 in 101 games). Also deserving of a special shoutout for his contributions to the two moribund White Sox positions is Daniel Palka, who was only 0-7 as a DH but had a 421 OPS in 78 PA as a right fielder. His total line for the season was 372 OPS in 93 PA; attending a September White Sox/Indians series, it was hard to take one’s eyes off his batting line of the scoreboard (his pre-September line was .022/.135/.022 in 52 PA).<br /><br />The next table shows the correlation (r) between each team’s RG for each position (excluding pitchers) and the long-term position adjustment (using pooled 1B/DH and LF/RF). A high correlation indicates that a team’s offense tended to come from positions that you would expect it to:<br /><br /><a href="https://3.bp.blogspot.com/-etj-2Aro7gc/XfP7YcBxVPI/AAAAAAAACtY/6wovC6YRWu4TUmF9EQ192GI40HaSIOBCACLcBGAsYHQ/s1600/hitpos19d.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-etj-2Aro7gc/XfP7YcBxVPI/AAAAAAAACtY/6wovC6YRWu4TUmF9EQ192GI40HaSIOBCACLcBGAsYHQ/s400/hitpos19d.JPG" width="330" height="400" data-original-width="253" data-original-height="307" /></a><br /><br /><br />The following tables, broken out by division, display RAA for each position, with teams sorted by the sum of positional RAA. Positions with negative RAA are in red, and positions that are +/-20 RAA are bolded: <br /><br /><a href="https://3.bp.blogspot.com/-dM3M3tBp684/XfP7mxkg7pI/AAAAAAAACtg/z1FZTRxNS8QqV_xw9azoFaJEZFDsghdEwCLcBGAsYHQ/s1600/hitpos19e.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-dM3M3tBp684/XfP7mxkg7pI/AAAAAAAACtg/z1FZTRxNS8QqV_xw9azoFaJEZFDsghdEwCLcBGAsYHQ/s400/hitpos19e.JPG" width="400" height="277" data-original-width="552" data-original-height="382" /></a><br /><br /><a href="https://4.bp.blogspot.com/-ck-_pM087w8/XfP7xgpo6rI/AAAAAAAACto/NeuLpDg4EY8Ka9dBamwMZpDDZmA6Jn_FwCLcBGAsYHQ/s1600/hitpos19f.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-ck-_pM087w8/XfP7xgpo6rI/AAAAAAAACto/NeuLpDg4EY8Ka9dBamwMZpDDZmA6Jn_FwCLcBGAsYHQ/s400/hitpos19f.JPG" width="400" height="303" data-original-width="503" data-original-height="381" /></a><br /><br />A few observations:<br /><br />* The Tigers were below-average at every position; much could (and has) been written about Detroit’s historic lack of even average offensive players, but a positional best of -9 kind of sums it up<br /><br />* The Indians had only one average offensive position, which was surprising to me as I would have thought that even while not having his best season, Franciso Lindor would have salvaged shortstop (he had 17 RAA personally). Non-Lindor Indian shortstops had only 92 PA, but they hit .123/.259/.173 (unadjusted).<br /><br />* That -30 at third base for the Angels, wonder what they’ll do to address that<br /><br />* Houston had 109 infield RAA, the next closest team was the Dodgers with 73. The Dodgers had the best outfield RAA with 77; the Astros were fifth with 46.<br /><br />Finally, I alluded to an update to the long-term positional adjustments I use above. You can see my <a href="https://walksaber.blogspot.com/2019/10/end-of-season-statistics-2019.html">end of season stats post</a> for some discussion about why I use offensive positional adjustments in my RAR estimates. A quick summary of my thinking:<br /><br />* There are a lot of different RAR/WAR estimates available now. If I can offer a somewhat valid but unique perspective, I think that adds more value than a watered down version of the same thing everyone else is publishing.<br /><br />* When possible, I like to publish metrics that I have had some role in developing (please note, I’m not saying that any of them are my own ideas, just that it’s nice to be able to develop your own version of a pre-existing concept). I don’t publish my own defensive metrics and while defensive positional adjustments are based on more than simply player’s comparative performance across positions using fielding metrics, they are basic starting point for that type of analysis.<br /><br />* While I do not claim that the relationship is or should be perfect, at the level of talent filtering that exists to select major leaguers, there should be an inverse relationship between offensive performance by position and the defensive responsibilities of the position. Not a perfect one, but a relationship nonetheless. An offensive positional adjustment than allows for a more objective approach to setting a position adjustment. Again, I have to clarify that I don’t think subjectivity in metric design is a bad thing - any metric, unless it’s simply expressing some fundamental baseball quantity or rate (e.g. “home runs” or “on base average”) is going to involve some subjectivity in design (e.g linear or multiplicative run estimator, any myriad of different ways to design park factors, whether to include a category like sacrifice flies that is more teammate-dependent)<br /><br />I use the latest ten years of data for the majors (2010 – 2019), which should smooth out some of the randomness in positional performance. Than I simply calculate RG for each position and divide by the league average of positional performance (i.e. excluding pinch-hitters and pinch-runners). I then pool 1B/DH and LF/RF. Only looking at positional performance is necessary because the goal is not to express the position average relative to the league, but rather to the other positions for the purpose of determining their relative performance. If pinch-hitters perform worse than position players, I don’t want them to bring down the league average and thus raise the offensive positional adjustment, because pinch-hitters will not be subject to the offensive positional adjustment when calculating their RAR. (I suppose if you were so inclined, you could include them, and use that as your backdoor way of accounting for the pinch-hitting penalty in a metric, but I assign each player to a primary position (or some weighted average of their defensive positions) and so this wouldn’t really make sense, and would result in positional adjustments that are too high when they are applied to the league average RG.<br /><br />For 2010-2019, the resulting PADJs are:<br /><br /><a href="https://4.bp.blogspot.com/-suYzbhduYBs/XfP794pqLaI/AAAAAAAACtw/rlXFj5g7CSoO3LZUeHA5r_PKl8a5DPsgQCLcBGAsYHQ/s1600/hitpos19g.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-suYzbhduYBs/XfP794pqLaI/AAAAAAAACtw/rlXFj5g7CSoO3LZUeHA5r_PKl8a5DPsgQCLcBGAsYHQ/s400/hitpos19g.JPG" width="371" height="400" data-original-width="266" data-original-height="287" /></a><br />phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-25185685347013460512019-12-04T18:46:00.000-05:002019-12-04T18:46:14.655-05:00Leadoff Hitters, 2019In the past I’ve wasted time writing in a structured format, instead of just explaining how the metrics are calculated and noting anything that stands out to me. I’m opting for the latter approach this year, both in this piece and in other “end of season” statistics summaries.<br /><br />I’ve always been interested in leadoff hitter performance, despite not being particularly not claiming that it held any particular significance beyond the obvious. The linked spreadsheet includes a number of metrics, and there are three very important caveats:<br /><br />1. The data from Baseball-Reference and includes the performance of anyone who hit in the leadoff spot during a game. I’ve included the names and number of games starting at leadoff for all players with twenty or more starts.<br /><br />2. Many of the metrics shown are descriptive, not quality metrics<br /><br />3. None of this is park-adjusted<br /><br />The metrics shown in the spreadsheet are:<br /><br />* Runs Scored per 25.5 outs = R*25.5/(AB – H + CS)<br /><br />Runs scored are obviously influenced heavily by the team, but it’s a natural starting point when looking at leadoff hitters.<br /><br />* On Base Average (OBA) = (H + W + HB)/(AB + W + HB)<br /><br />If you need this explained, you’re reading the wrong blog.<br /><br />* Runners On Base Average (ROBA) = (H + W + HB – HR – CS)/(AB + W + HB)<br /><br />This is not a quality metric, but it is useful when thinking about the run scoring process as it’s essentially a rate for the Base Runs “A” component, depending on how you choose to handle CS in your BsR variation. It is the rate at which a hitter is on base for a teammate to advance.<br /><br />* “Literal” On Base Average (LOBA) = (H + W + HB – HR – CS)/(AB + W + HB – HR)<br /><br />This is a metric I’ve made up for this series that I don’t actually consider of any value; it is the same as ROBA except it doesn’t “penalize” homers by counting them in the denominator. I threw scare quotes around “penalize” because I don’t think ROBA penalizes homers; rather it recognizes that homers do not result in runners on base. It’s only a “penalty” if you misuse the metric.<br /><br />* R/RBI Ratio (R/BI) = R/RBI<br /><br />A very crude way of measuring the shape of a hitter’s performance, with much contextual bias.<br /><br />* Run Element Ratio (RER) = (W + SB)/(TB – H)<br /><br />This is an old Bill James shape metric which is a ratio between events that tend to be more valuable at the start of an inning to events that tend to be more valuable at the end of an inning. As such, leadoff hitters historically have tended to have high RERs, although recently they have just barely exceeded the league average as is the case here. Leadoff hitters were also just below the league average in Isolated Power (.180 to .183) and HR/PA (.035 to .037)<br /><br />* Net Stolen Bases (NSB) = SB – 2*CS<br /><br />A crude way to weight SB and CS, not perfectly reflecting the run value difference between the two<br /><br />* 2OPS = 2*OBA + SLG<br /><br />This is a metric that David Smyth suggested for measuring leadoff hitters, just an OPS variant that uses a higher weight for OBA than would be suggested by maximizing correlation to runs scored (which would be around 1.8). Of course, 2OPS is still closer to ideal than the widely-used OPS, albeit with the opposite bias.<br /><br />* Runs Created per Game – see End of Season Statistics post for calculation<br /><br />This is the basic measure I would use to evaluate a hitter’s rate performance.<br /><br />* Leadoff Efficiency – This is a theoretical measure of linear weights runs above average per 756 PA, assuming that every plate appearance occurred in the quintessential leadoff situation of no runners on, none out. 756 PA is the aveage PA/team for the leadoff spot this season. See <a href="https://walksaber.blogspot.com/2010/12/leadoff-hitters-2010.html">this post</a> for a full explanation of the formula; the 2019 out & CS coefficients are -.231 and -.598 respectively.<br /><br />A couple things that jumped out at me:<br /><br />* Only six teams had just one player with twenty or more starts as a leadoff man. Tampa Bay was one of those teams; Austin Meadows led off 53 times, while six other players lead off (this feels like it should be one word) between ten and twenty times.<br /><br />* Chicago was devoid of quality leadoff performance in either circuit, but the Cubs OBA woes really stand out; at .296, they were fourteen points worse than the next-closest team, which amazingly enough was the champion of their divison. The opposite was true in Texas, where the two best teams in OBA reside. <br /><br />See the link below for the spreadsheet; if you change the end of the URL from “HTML” to “XLSX”, you can download an Excel version:<br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vQOqIyPG9YGKRpjBUe18Y_TstfjjYABZOqWrXvG1Kf2L41tU7WSqNem8nSdHlbWcIjDBeh0kQtpjUKH/pub?output=html">2019 Leadoff Hitters<br /></a><br /> phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-63509217854714655372019-11-11T13:21:00.002-05:002019-11-11T13:21:19.335-05:00Hypothetical Award Ballots, 2019In the past I’ve split these up into three separate posts, but it’s dawned on me that maybe if combined they will be long enough to actually merit a post. I should note that this is something I write not because I think anyone will be interested in, but because I enjoy having a record of what I thought about these things years later. In reviewing some of those posts from prior years, I’ve concluded that they had way too many numbers in an attempt to justify every ballot spot. I publish the RAR figures that are the starting point for any retrospective player valuation exercise I engage in -- I no longer see a need to regurgitate them all unless it’s important to a point. <br /><br />AL ROY:<br /><br />1. DH Yordan Alvarez, HOU<br />2. SP John Means, BAL<br />3. SP Zach Plesac, CLE<br />4. 2B Brandon Lowe, TB<br />5. 2B Cavan Biggio, TOR<br /><br />Alvarez is an easy choice – while he only had 367 PA, the only AL hitter with a better RG was Mike Trout. The only real competition is John Means, who turned in a fine season pitching for Baltimore, although his peripherals were far less impressive than his actual results, which was also true for Zach Plesac. I slid Brandon Lowe just ahead of Cavan Biggio on the basis of fielding, which is also why they got the nod over Eloy Jimenez and Luis Arraez.<br /><br />NL ROY:<br /><br />1. 1B Pete Alonso, NYN<br />2. SP Mike Soroka, ATL<br />3. SS Fernando Tatis, SD<br />4. LF Bryan Reynolds, PIT<br />5. SP Chris Paddack, SD<br /><br />Any of the first three would top my AL ballot. On a pure RAR basis, Soroka would edge out Alonso, but Soroka’s peripherals were not as strong as his actual runs allowed which drops him a bit. It’s worth noting that on a rate basis Fernando Tatis was better than Alonso -- he had 40 RAR in 84 games, which over a 150 game season would have put him squarely in the MVP race. Of course, he was unlikely to have kept up that pace, and his underlying performance may not have been the equals of those numbers. But on the other hand, he is four years younger than Alonso and much more likely to be a long-term star. Bryan Reynolds had a quietly good season, but there were other strong position player candidates including Keston Hiura, Kevin Newman, Tommy Edman, and Christian Walker, any of whom would have edged out the second basemen on my AL ballot. The same is also true of pitchers -- I went with Chris Paddack over Sandy Alcantara, Dakota Hudson, and Zac Gallen. Gallen was brilliant over 80 innings (2.63 RRA with lesser but still strong peripherals like a 3.70 dRA), but it’s not enough when Paddack tossed 140 innings with 10.6 K/2.1 W per game.<br /><br />AL Cy Young:<br /><br />1. Justin Verlander, HOU<br />2. Gerrit Cole, HOU<br />3. Shane Bieber, CLE<br />4. Lance Lynn, TEX<br />5. Charlie Morton, TB<br /><br />I expect Cole to win, but my vote would go to Verlander. Verlander threw ten more innings with a better RRA and the same eRA, although Cole does better in dRA as Verlander’s BABIP was low (.226 to Cole’s .279). I give that some weight, but not enough to overcome Verlander’s lead, and one could argue that Verlander’s high home run rate should offset his low BABIP when making adjustments for peripherals. Sam Miller pointed out on <U>Effectively Wild</u> that Verlander has had a disproportionate number of second-place finishes in Cy voting. I concur, and while none of them were cases in which the actual choice was a poor one, for my money Verlander was the AL’s top pitcher in 2011, 2012, 2016, 2018, and 2019. Mike Minor’s high dRA knocked him off my ballot in favor of teammate Lance Lynn and Charlie Morton.<br /><br />NY Cy Young:<br /><br />1. Jacob deGrom, NYN<br />2. Stephen Strasburg, WAS<br />3. Max Scherzer, WAS<br />4. Jack Flaherty, STL<br />5. Hyun-Jin Ryu, LA<br /><br />deGrom was an easy choice for the top of the ballot, but after that I used a fair amount of judgment. Strasburg had the most consistent RAR figures, whether using RRA, eRA, or dRA; Flaherty and Ryu both had significantly worse dRAs, which dropped them behind the Nationals on my ballot. There also should be some recognition of Zack Greinke; had he spent his entire season in the NL he would have ranked second here, but if it’s an NL award I don’t think AL performance should get any credit, and so he doesn’t rank in the top five.<br /><br />AL MVP:<br /><br />1. CF Mike Trout, LAA<br />2. 3B Alex Bregman, HOU<br />3. SP Justin Verlander, HOU<br />4. SP Gerrit Cole, HOU<br />5. SP Shane Bieber, CLE<br />6. SP Lance Lynn, TEX<br />7. SP Charlie Morton, TB<br />8. SS Marcus Semien, OAK<br />9. SP Mike Minor, TEX<br />10. CF George Springer, HOU<br /><br />Had Mike Trout not been sidelined by a foot issue in September, this wouldn’t even be a question. I still think Trout is the clear (if not inarguable) choice; he starts ahead of Bregman by just a single run in RAR, and if you give full credit to fielding metrics, Bregman could be ahead as Trout’s BP/UZR/DRS fielding runs saved were (7, -1, -1) compared to Bregman’s (11, 2, 7). However, I only give half-credit as the uncertainty regarding fielding performance means an estimated fielding run saved is not as conclusive of value as an estimated offensive run contributed. The other major area of the game not taken into account in my RAR estimates is baserunning, and using BP’s figures, Trout was +3 runs and Bregman -4 (removing basestealing runs, which I already take into account). That wipes out any advantage Bregman might have in the field, and all things being equal I would take the player who contributes equal RAR in less playing time - just because I think that if I’ve erred in setting replacement level, I’ve erred by setting it too low. The slotting of position players otherwise follows RAR except that Xander Bogaerts had dreadful fielding metrics (-21, 1, -21) which knocks him out.<br /><br />If you just look at RAR, Verlander could rank ahead of either of the hitters, but while I have absolutely no problem supporting a pitcher as MVP, I do think in such a case that they should have better RAR not just when using their actual runs allowed, but using peripherals as well. Verlander has 91, 83, or 64 RAR depending on the inputs you use; I have Trout as 80 when considering fielding and baserunning, and that sixteen run gap using Verlander’s dRA is too large for me to put him on top.<br /><br />I’ve never put six pitchers on a hypothetical MVP ballot before, and as you’ll see with the NL, a full half of my MVP ballot spots went to pitchers. One thing I should revisit is the replacement level I’m using for starters, which is 128% of the league average RA; I had previously used 125%, and with the continual decline in the share of innings borne by starters and the 2019 development that starters had a better overall eRA than relievers, it’s worth revisiting the replacement level I’m using for starters and considering adjusting it downward. <br /><br />NL MVP:<br /><br />1. CF Cody Bellinger, LA<br />2. RF Christian Yelich, MIL<br />3. SP Jacob deGrom, NYN<br />4. 3B Anthony Rendon, WAS<br />5. SP Stephen Strasburg, WAS<br />6. SP Max Scherzer, WAS<br />7. 1B Pete Alonso, NYN<br />8. CF Ronald Acuna, ATL<br />9. LF Juan Soto, WAS<br />10. SP Jack Flaherty, STL<br /><br />Bellinger and Yelich were very close in RAR, but this is a case where fielding gives Bellinger (15, 10, 19) a clear edge over Yelich (-1, 0, -3). That’s pretty much the only place that needs explanation beyond just perusing the RAR figures, except that Starling Marte’s (-12, -1, -1) fielding puts him behind the young outfielders of the AL East.<br /><br />phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-79492196632306788552019-10-04T11:25:00.000-04:002019-10-04T11:26:18.287-04:00End of Season Statistics, 2019The spreadsheets are published as Google Spreadsheets, which you can download in Excel format by changing the extension in the address from "=html" to "=xlsx", or in open format as "=ods". That way you can download them and manipulate things however you see fit.<br /><br />The data comes from a number of different sources. Most of the data comes from Baseball-Reference. KJOK's park database is extremely helpful in determining when park factors should reset. <br /><br />The basic philosophy behind these stats is to use the simplest methods that have acceptable accuracy. Of course, "acceptable" is in the eye of the beholder, namely me. I use Pythagenpat not because other run/win converters, like a constant RPW or a fixed exponent are not accurate enough for this purpose, but because it's mine and it would be kind of odd if I didn't use it.<br /><br />If I seem to be a stickler for purity in my critiques of others' methods, I'd contend it is usually in a theoretical sense, not an input sense. So when I exclude hit batters, I'm not saying that hit batters are worthless or that they *should* be ignored; it's just easier not to mess with them and not that much less accurate (note: hit batters are actually included in the offensive statistics now).<br /><br />I also don't really have a problem with people using sub-standard methods (say, Basic RC) as long as they acknowledge that they are sub-standard. If someone pretends that Basic RC doesn't undervalue walks or cause problems when applied to extreme individuals, I'll call them on it; if they explain its shortcomings but use it regardless, I accept that. Take these last three paragraphs as my acknowledgment that some of the statistics displayed here have shortcomings as well, and I've at least attempted to describe some of them in the discussion below.<br /><br />The League spreadsheet is pretty straightforward--it includes league totals and averages for a number of categories, most or all of which are explained at appropriate junctures throughout this piece. The advent of interleague play has created two different sets of league totals--one for the offense of league teams and one for the defense of league teams. Before interleague play, these two were identical. I do not present both sets of totals (you can figure the defensive ones yourself from the team spreadsheet, if you desire), just those for the offenses. The exception is for the defense-specific statistics, like innings pitched and quality starts. The figures for those categories in the league report are for the defenses of the league's teams. However, I do include each league's breakdown of basic pitching stats between starters and relievers (denoted by "s" or "r" prefixes), and so summing those will yield the totals from the pitching side. The one abbreviation you might not recognize is "N"--this is the league average of runs/game for one team, and it will pop up again.<br /><br />The Team spreadsheet focuses on overall team performance--wins, losses, runs scored, runs allowed. The columns included are: Park Factor (PF), Home Run Park Factor (PFhr), Winning Percentage (W%), Expected W% (EW%), Predicted W% (PW%), wins, losses, runs, runs allowed, Runs Created (RC), Runs Created Allowed (RCA), Home Winning Percentage (HW%), Road Winning Percentage (RW%) [exactly what they sound like--W% at home and on the road], Runs/Game (R/G), Runs Allowed/Game (RA/G), Runs Created/Game (RCG), Runs Created Allowed/Game (RCAG), and Runs Per Game (the average number of runs scored an allowed per game). Ideally, I would use outs as the denominator, but for teams, outs and games are so closely related that I don’t think it’s worth the extra effort.<br /><br />The runs and Runs Created figures are unadjusted, but the per-game averages are park-adjusted, except for RPG which is also raw. Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. The formula is:<br /><br />A = H + W - HR - CS<br />B = (2TB - H - 4HR + .05W + 1.5SB)*.76<br />C = AB - H<br />D = HR<br />Naturally, A*B/(B + C) + D.<br /><br />I have explained the methodology used to figure the PFs before, but the cliff’s notes version is that they are based on five years of data when applicable, include both runs scored and allowed, and they are regressed towards average (PF = 1), with the amount of regression varying based on the number of years of data used. There are factors for both runs and home runs. The initial PF (not shown) is:<br /><br />iPF = (H*T/(R*(T - 1) + H) + 1)/2<br />where H = RPG in home games, R = RPG in road games, T = # teams in league (14 for AL and 16 for NL). Then the iPF is converted to the PF by taking x*iPF + (1-x), where x = .6 if one year of data is used, .7 for 2, .8 for 3, and .9 for 4+.<br /><br />It is important to note, since there always seems to be confusion about this, that these park factors already incorporate the fact that the average player plays 50% on the road and 50% at home. That is what the adding one and dividing by 2 in the iPF is all about. So if I list Fenway Park with a 1.02 PF, that means that it actually increases RPG by 4%.<br /><br />In the calculation of the PFs, I did not take out “home” games that were actually at neutral sites (of which there were a rash this year).<br /><br />There are also Team Offense and Defense spreadsheets. These include the following categories:<br /><br />Team offense: Plate Appearances, Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Walks and Hit Batters per At Bat (WAB), Isolated Power (SLG - BA), R/G at home (hR/G), and R/G on the road (rR/G) BA, OBA, SLG, WAB, and ISO are park-adjusted by dividing by the square root of park factor (or the equivalent; WAB = (OBA - BA)/(1 - OBA), ISO = SLG - BA, and SEC = WAB + ISO).<br /><br />Team defense: Innings Pitched, BA, OBA, SLG, Innings per Start (IP/S), Starter's eRA (seRA), Reliever's eRA (reRA), Quality Start Percentage (QS%), RA/G at home (hRA/G), RA/G on the road (rRA/G), Battery Mishap Rate (BMR), Modified Fielding Average (mFA), and Defensive Efficiency Record (DER). BA, OBA, and SLG are park-adjusted by dividing by the square root of PF; seRA and reRA are divided by PF.<br /><br />The three fielding metrics I've included are limited it only to metrics that a) I can calculate myself and b) are based on the basic available data, not specialized PBP data. The three metrics are explained in this post, but here are quick descriptions of each:<br /><br />1) BMR--wild pitches and passed balls per 100 baserunners = (WP + PB)/(H + W - HR)*100<br /><br />2) mFA--fielding average removing strikeouts and assists = (PO - K)/(PO - K + E)<br /><br />3) DER--the Bill James classic, using only the PA-based estimate of plays made. Based on a suggestion by Terpsfan101, I've tweaked the error coefficient. Plays Made = PA - K - H - W - HR - HB - .64E and DER = PM/(PM + H - HR + .64E)<br /><br />Next are the individual player reports. I defined a starting pitcher as one with 15 or more starts. All other pitchers are eligible to be included as a reliever. If a pitcher has 40 appearances, then they are included. Additionally, if a pitcher has 50 innings and less than 50% of his appearances are starts, he is also included as a reliever (this allows some swingmen type pitchers who wouldn’t meet either the minimum start or appearance standards to get in). This would be a good point to note that I didn't do much to adjust for the opener--I made some judgment calls (very haphazard judgment calls) on which bucket to throw some pitchers in. This is something that I should definitely give some more thought to in coming years.<br /><br />For all of the player reports, ages are based on simply subtracting their year of birth from 2019. I realize that this is not compatible with how ages are usually listed and so “Age 27” doesn’t necessarily correspond to age 27 as I list it, but it makes everything a heckuva lot easier, and I am more interested in comparing the ages of the players to their contemporaries than fitting them into historical studies, and for the former application it makes very little difference. The "R" category records rookie status with a "R" for rookies and a blank for everyone else; I've trusted Baseball Prospectus on this. Also, all players are counted as being on the team with whom they played/pitched (IP or PA as appropriate) the most.<br /><br />For relievers, the categories listed are: Games, Innings Pitched, estimated Plate Appearances (PA), Run Average (RA), Relief Run Average (RRA), Earned Run Average (ERA), Estimated Run Average (eRA), DIPS Run Average (dRA), Strikeouts per Game (KG), Walks per Game (WG), Guess-Future (G-F), Inherited Runners per Game (IR/G), Batting Average on Balls in Play (%H), Runs Above Average (RAA), and Runs Above Replacement (RAR).<br /><br />IR/G is per relief appearance (G - GS); it is an interesting thing to look at, I think, in lieu of actual leverage data. You can see which closers come in with runners on base, and which are used nearly exclusively to start innings. Of course, you can’t infer too much; there are bad relievers who come in with a lot of people on base, not because they are being used in high leverage situations, but because they are long men being used in low-leverage situations already out of hand.<br /><br />For starting pitchers, the columns are: Wins, Losses, Innings Pitched, Estimated Plate Appearances (PA), RA, RRA, ERA, eRA, dRA, KG, WG, G-F, %H, Pitches/Start (P/S), Quality Start Percentage (QS%), RAA, and RAR. RA and ERA you know--R*9/IP or ER*9/IP, park-adjusted by dividing by PF. The formulas for eRA and dRA are based on the same Base Runs equation and they estimate RA, not ERA.<br /><br />* eRA is based on the actual results allowed by the pitcher (hits, doubles, home runs, walks, strikeouts, etc.). It is park-adjusted by dividing by PF.<br /><br />* dRA is the classic DIPS-style RA, assuming that the pitcher allows a league average %H, and that his hits in play have a league-average S/D/T split. It is park-adjusted by dividing by PF.<br /><br />The formula for eRA is:<br /><br />A = H + W - HR<br />B = (2*TB - H - 4*HR + .05*W)*.78<br />C = AB - H = K + (3*IP - K)*x (where x is figured as described below for PA estimation and is typically around .93) = PA (from below) - H - W<br />eRA = (A*B/(B + C) + HR)*9/IP<br /><br />To figure dRA, you first need the estimate of PA described below. Then you calculate W, K, and HR per PA (call these %W, %K, and %HR). Percentage of balls in play (BIP%) = 1 - %W - %K - %HR. This is used to calculate the DIPS-friendly estimate of %H (H per PA) as e%H = Lg%H*BIP%.<br /><br />Now everything has a common denominator of PA, so we can plug into Base Runs:<br /><br />A = e%H + %W<br />B = (2*(z*e%H + 4*%HR) - e%H - 5*%HR + .05*%W)*.78<br />C = 1 - e%H - %W - %HR<br />cRA = (A*B/(B + C) + %HR)/C*a<br /><br />z is the league average of total bases per non-HR hit (TB - 4*HR)/(H - HR), and a is the league average of (AB - H) per game.<br /><br />Also shown are strikeout and walk rate, both expressed as per game. By game I mean not nine innings but rather the league average of PA/G. I have always been a proponent of using PA and not IP as the denominator for non-run pitching rates, and now the use of per PA rates is widespread. Usually these are expressed as K/PA and W/PA, or equivalently, percentage of PA with a strikeout or walk. I don’t believe that any site publishes these as K and W per equivalent game as I am here. This is not better than K%--it’s simply applying a scalar multiplier. I like it because it generally follows the same scale as the familiar K/9.<br /><br />To facilitate this, I’ve finally corrected a flaw in the formula I use to estimate plate appearances for pitchers. Previously, I’ve done it the lazy way by not splitting strikeouts out from other outs. I am now using this formula to estimate PA (where PA = AB + W):<br /><br />PA = K + (3*IP - K)*x + H + W<br />Where x = league average of (AB - H - K)/(3*IP - K)<br /><br />Then KG = K*Lg(PA/G) and WG = W*Lg(PA/G).<br /><br />G-F is a junk stat, included here out of habit because I've been including it for years. It was intended to give a quick read of a pitcher's expected performance in the next season, based on eRA and strikeout rate. Although the numbers vaguely resemble RAs, it's actually unitless. As a rule of thumb, anything under four is pretty good for a starter. G-F = 4.46 + .095(eRA) - .113(K*9/IP). It is a junk stat. JUNK STAT JUNK STAT JUNK STAT. Got it?<br /><br />%H is BABIP, more or less--%H = (H - HR)/(PA - HR - K - W), where PA was estimated above. Pitches/Start includes all appearances, so I've counted relief appearances as one-half of a start (P/S = Pitches/(.5*G + .5*GS). QS% is just QS/(G - GS); I don't think it's particularly useful, but Doug's Stats include QS so I include it.<br /><br />I've used a stat called Relief Run Average (RRA) in the past, based on Sky Andrecheck's article in the August 1999 By the Numbers; that one only used inherited runners, but I've revised it to include bequeathed runners as well, making it equally applicable to starters and relievers. One thing that's become more problematic as time goes on for calculating this expanded metric is the sketchy availability of bequeathed runner data for relievers. As a result, only bequeathed runners left by starters (and "relievers" when pitching as starters) are taken into account here. I use RRA as the building block for baselined value estimates for all pitchers. I explained RRA in this article, but the bottom line formulas are:<br /><br />BRSV = BRS - BR*i*sqrt(PF)<br />IRSV = IR*i*sqrt(PF) - IRS<br />RRA = ((R - (BRSV + IRSV))*9/IP)/PF<br /><br />The two baselined stats are Runs Above Average (RAA) and Runs Above Replacement (RAR). Starting in 2015 I revised RAA to use a slightly different baseline for starters and relievers as described here. The adjustment is based on patterns from the last several seasons of league average starter and reliever eRA. Thus it does not adjust for any advantages relief pitchers enjoy that are not reflected in their component statistics. This could include runs allowed scoring rules that benefit relievers (although the use of RRA should help even the scales in this regard, at least compared to raw RA) and the talent advantage of starting pitchers. The RAR baselines do attempt to take the latter into account, and so the difference in starter and reliever RAR will be more stark than the difference in RAA.<br /><br />RAA (relievers) = (.951*LgRA - RRA)*IP/9<br />RAA (starters) = (1.025*LgRA - RRA)*IP/9<br />RAR (relievers) = (1.11*LgRA - RRA)*IP/9<br />RAR (starters) = (1.28*LgRA - RRA)*IP/9<br /><br />All players with 250 or more plate appearances (official, total plate appearances) are included in the Hitters spreadsheets (along with some players close to the cutoff point who I was interested in). Each is assigned one position, the one at which they appeared in the most games. The statistics presented are: Games played (G), Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Runs Created (RC), Runs Created per Game (RG), Speed Score (SS), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), Hitting Runs Above Replacement (HRAR), and Runs Above Replacement (RAR).<br /><br />Starting in 2015, I'm including hit batters in all related categories for hitters, so PA is now equal to AB + W+ HB. Outs are AB - H + CS. BA and SLG you know, but remember that without SF, OBA is just (H + W + HB)/(AB + W + HB). Secondary Average = (TB - H + W + HB)/AB = SLG - BA + (OBA - BA)/(1 - OBA). I have not included net steals as many people (and Bill James himself) do, but I have included HB which some do not.<br /><br />BA, OBA, and SLG are park-adjusted by dividing by the square root of PF. This is an approximation, of course, but I'm satisfied that it works well (I plan to post a couple articles on this some time during the offseason). The goal here is to adjust for the win value of offensive events, not to quantify the exact park effect on the given rate. I use the BA/OBA/SLG-based formula to figure SEC, so it is park-adjusted as well.<br /><br />Runs Created is actually Paul Johnson's ERP, more or less. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available.<br /><br />For 2015, I refined the formula a little bit to:<br /><br />1. include hit batters at a value equal to that of a walk<br />2. value intentional walks at just half the value of a regular walk<br />3. recalibrate the multiplier based on the last ten major league seasons (2005-2014)<br /><br />This revised RC = (TB + .8H + W + HB - .5IW + .7SB - CS - .3AB)*.310<br /><br />RC is park adjusted by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the Runs Created per Game rate, is RC/O*25.5. I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).<br /><br />Several years ago I switched from using my own "Speed Unit" to a version of Bill James' Speed Score; of course, Speed Unit was inspired by Speed Score. I only use four of James' categories in figuring Speed Score. I actually like the construct of Speed Unit better as it was based on z-scores in the various categories (and amazingly a couple other sabermetricians did as well), but trying to keep the estimates of standard deviation for each of the categories appropriate was more trouble than it was worth.<br /><br />Speed Score is the average of four components, which I'll call a, b, c, and d:<br /><br />a = ((SB + 3)/(SB + CS + 7) - .4)*20<br />b = sqrt((SB + CS)/(S + W))*14.3<br />c = ((R - HR)/(H + W - HR) - .1)*25<br />d = T/(AB - HR - K)*450<br /><br />James actually uses a sliding scale for the triples component, but it strikes me as needlessly complex and so I've streamlined it. He looks at two years of data, which makes sense for a gauge that is attempting to capture talent and not performance, but using multiple years of data would be contradictory to the guiding principles behind this set of reports (namely, simplicity. Or laziness. You're pick.) I also changed some of his division to mathematically equivalent multiplications.<br /><br />There are a whopping four categories that compare to a baseline; two for average, two for replacement. Hitting RAA compares to a league average hitter; it is in the vein of Pete Palmer’s Batting Runs. RAA compares to an average hitter at the player’s primary position. Hitting RAR compares to a “replacement level” hitter; RAR compares to a replacement level hitter at the player’s primary position. The formulas are:<br /><br />HRAA = (RG - N)*O/25.5<br />RAA = (RG - N*PADJ)*O/25.5<br />HRAR = (RG - .73*N)*O/25.5<br />RAR = (RG - .73*N*PADJ)*O/25.5<br /><br />PADJ is the position adjustment, and it has now been updated to be based on 2010-2019 offensive data. For catchers it is .92; for 1B/DH, 1.14; for 2B, .99; for 3B, 1.07; for SS, .95; for LF/RF, 1.09; and for CF, 1.05. As positional flexibility takes hold, fielding value is better quantified, and the long-term evolution of the game continues, it's right to question whether offensive positional adjustments are even less reflective of what we are trying to account for than they were in the past. I have a general discussion about the use of offensive positional adjustments below that I wrote a decade ago, but I will also have a bit more to say about this and these specific adjustments in my annual post on Hitting by Position which hopefully will actually be published this year. <br /><br />That was the mechanics of the calculations; now I'll twist myself into knots trying to justify them. If you only care about the how and not the why, stop reading now.<br /><br />The first thing that should be covered is the philosophical position behind the statistics posted here. They fall on the continuum of ability and value in what I have called "performance". Performance is a technical-sounding way of saying "Whatever arbitrary combination of ability and value I prefer".<br /><br />With respect to park adjustments, I am not interested in how any particular player is affected, so there is no separate adjustment for lefties and righties for instance. The park factor is an attempt to determine how the park affects run scoring rates, and thus the win value of runs.<br /><br />I apply the park factor directly to the player's statistics, but it could also be applied to the league context. The advantage to doing it my way is that it allows you to compare the component statistics (like Runs Created or OBA) on a park-adjusted basis. The drawback is that it creates a new theoretical universe, one in which all parks are equal, rather than leaving the player grounded in the actual context in which he played and evaluating how that context (and not the player's statistics) was altered by the park.<br /><br />The good news is that the two approaches are essentially equivalent; in fact, they are precisely equivalent if you assume that the Runs Per Win factor is equal to the RPG. Suppose that we have a player in an extreme park (PF = 1.15, approximately like Coors Field pre-humidor) who has an 8 RG before adjusting for park, while making 350 outs in a 4.5 N league. The first method of park adjustment, the one I use, converts his value into a neutral park, so his RG is now 8/1.15 = 6.957. We can now compare him directly to the league average:<br /><br />RAA = (6.957 - 4.5)*350/25.5 = +33.72<br /><br />The second method would be to adjust the league context. If N = 4.5, then the average player in this park will create 4.5*1.15 = 5.175 runs. Now, to figure RAA, we can use the unadjusted RG of 8:<br /><br />RAA = (8 - 5.175)*350/25.5 = +38.77<br /><br />These are not the same, as you can obviously see. The reason for this is that they take place in two different contexts. The first figure is in a 9 RPG (2*4.5) context; the second figure is in a 10.35 RPG (2*4.5*1.15) context. Runs have different values in different contexts; that is why we have RPW converters in the first place. If we convert to WAA (using RPW = RPG, which is only an approximation, so it's usually not as tidy as it appears below), then we have:<br /><br />WAA = 33.72/9 = +3.75<br />WAA = 38.77/10.35 = +3.75<br /><br />Once you convert to wins, the two approaches are equivalent. The other nice thing about the first approach is that once you park-adjust, everyone in the league is in the same context, and you can dispense with the need for converting to wins at all. You still might want to convert to wins, and you'll need to do so if you are comparing the 2015 players to players from other league-seasons (including between the AL and NL in the same year), but if you are only looking to compare Christian Yelich to Matt Carpenter, it's not necessary. WAR is somewhat ubiquitous now, but personally I prefer runs when possible--why mess with decimal points if you don't have to?<br /><br />The park factors used to adjust player stats here are run-based. Thus, they make no effort to project what a player "would have done" in a neutral park, or account for the difference effects parks have on specific events (walks, home runs, BA) or types of players. They simply account for the difference in run environment that is caused by the park (as best I can measure it). As such, they don't evaluate a player within the actual run context of his team's games; they attempt to restate the player's performance as an equivalent performance in a neutral park.<br /><br />I suppose I should also justify the use of sqrt(PF) for adjusting component statistics. The classic defense given for this approach relies on basic Runs Created--runs are proportional to OBA*SLG, and OBA*SLG/PF = OBA/sqrt(PF)*SLG/sqrt(PF). While RC may be an antiquated tool, you will find that the square root adjustment is fairly compatible with linear weights or Base Runs as well. I am not going to take the space to demonstrate this claim here, but I will some time in the future.<br /><br />Many value figures published around the sabersphere adjust for the difference in quality level between the AL and NL. I don't, but this is a thorny area where there is no right or wrong answer as far as I'm concerned. I also do not make an adjustment in the league averages for the fact that the overall NL averages include pitcher batting and the AL does not (not quite true in the era of interleague play, but you get my drift).<br /><br />The difference between the leagues may not be precisely calculable, and it certainly is not constant, but it is real. If the average player in the AL is better than the average player in the NL, it is perfectly reasonable to expect the average AL player to have more RAR than the average NL player, and that will not happen without some type of adjustment. On the other hand, if you are only interested in evaluating a player relative to his own league, such an adjustment is not necessarily welcome.<br /><br />The league argument only applies cleanly to metrics baselined to average. Since replacement level compares the given player to a theoretical player that can be acquired on the cheap, the same pool of potential replacement players should by definition be available to the teams of each league. One could argue that if the two leagues don't have equal talent at the major league level, they might not have equal access to replacement level talent--except such an argument is at odds with the notion that replacement level represents talent that is truly "freely available".<br /><br />So it's hard to justify the approach I take, which is to set replacement level relative to the average runs scored in each league, with no adjustment for the difference in the leagues. The best justification is that it's simple and it treats each league as its own universe, even if in reality they are connected.<br /><br />The replacement levels I have used here are very much in line with the values used by other sabermetricians. This is based both on my own "research", my interpretation of other's people research, and a desire to not stray from consensus and make the values unhelpful to the majority of people who may encounter them.<br /><br />Replacement level is certainly not settled science. There is always going to be room to disagree on what the baseline should be. Even if you agree it should be "replacement level", any estimate of where it should be set is just that--an estimate. Average is clean and fairly straightforward, even if its utility is questionable; replacement level is inherently messy. So I offer the average baseline as well.<br /><br />For position players, replacement level is set at 73% of the positional average RG (since there's a history of discussing replacement level in terms of winning percentages, this is roughly equivalent to .350). For starting pitchers, it is set at 128% of the league average RA (.380), and for relievers it is set at 111% (.450).<br /><br />I am still using an analytical structure that makes the comparison to replacement level for a position player by applying it to his hitting statistics. This is the approach taken by Keith Woolner in VORP (and some other earlier replacement level implementations), but the newer metrics (among them Rally and Fangraphs' WAR) handle replacement level by subtracting a set number of runs from the player's total runs above average in a number of different areas (batting, fielding, baserunning, positional value, etc.), which for lack of a better term I will call the subtraction approach.<br /><br />The offensive positional adjustment makes the inherent assumption that the average player at each position is equally valuable. I think that this is close to being true, but it is not quite true. The ideal approach would be to use a defensive positional adjustment, since the real difference between a first baseman and a shortstop is their defensive value. When you bat, all runs count the same, whether you create them as a first baseman or as a shortstop.<br /><br />That being said, using "replacement hitter at position" does not cause too many distortions. It is not theoretically correct, but it is practically powerful. For one thing, most players, even those at key defensive positions, are chosen first and foremost for their offense. Empirical research by Keith Woolner has shown that the replacement level hitting performance is about the same for every position, relative to the positional average.<br /><br />Figuring what the defensive positional adjustment should be, though, is easier said than done. Therefore, I use the offensive positional adjustment. So if you want to criticize that choice, or criticize the numbers that result, be my guest. But do not claim that I am holding this up as the correct analytical structure. I am holding it up as the most simple and straightforward structure that conforms to reality reasonably well, and because while the numbers may be flawed, they are at least based on an objective formula that I can figure myself. If you feel comfortable with some other assumptions, please feel free to ignore mine.<br /><br />That still does not justify the use of HRAR--hitting runs above replacement--which compares each hitter, regardless of position, to 73% of the league average. Basically, this is just a way to give an overall measure of offensive production without regard for position with a low baseline. It doesn't have any real baseball meaning.<br /><br />A player who creates runs at 90% of the league average could be above-average (if he's a shortstop or catcher, or a great fielder at a less important fielding position), or sub-replacement level (DHs that create 3.5 runs per game are not valuable properties). Every player is chosen because his total value, both hitting and fielding, is sufficient to justify his inclusion on the team. HRAR fails even if you try to justify it with a thought experiment about a world in which defense doesn't matter, because in that case the absolute replacement level (in terms of RG, without accounting for the league average) would be much higher than it is currently.<br /><br />The specific positional adjustments I use are based on 2002-2011 data. I stick with them because I have not seen compelling evidence of a change in the degree of difficulty or scarcity between the positions between now and then, and because I think they are fairly reasonable. The positions for which they diverge the most from the defensive position adjustments in common use are 2B, 3B, and CF. Second base is considered a premium position by the offensive PADJ (.99), while third base and center field have larger adjustments in the opposite direction (1.05 and 1.07).<br /><br />Another flaw is that the PADJ is applied to the overall league average RG, which is artificially low for the NL because of pitcher's batting. When using the actual league average runs/game, it's tough to just remove pitchers--any adjustment would be an estimate. If you use the league total of runs created instead, it is a much easier fix.<br /><br />One other note on this topic is that since the offensive PADJ is a stand-in for average defensive value by position, ideally it would be applied by tying it to defensive playing time. I have done it by outs, though.<br /><br />The reason I have taken this flawed path is because 1) it ties the position adjustment directly into the RAR formula rather than leaving it as something to subtract on the outside and more importantly 2) there’s no straightforward way to do it. The best would be to use defensive innings--set the full-time player to X defensive innings, figure how Derek Jeter’s innings compared to X, and adjust his PADJ accordingly. Games in the field or games played are dicey because they can cause distortion for defensive replacements. Plate Appearances avoid the problem that outs have of being highly related to player quality, but they still carry the illogic of basing it on offensive playing time. And of course the differences here are going to be fairly small (a few runs). That is not to say that this way is preferable, but it’s not horrible either, at least as far as I can tell.<br /><br />To compare this approach to the subtraction approach, start by assuming that a replacement level shortstop would create .86*.73*4.5 = 2.825 RG (or would perform at an overall level of equivalent value to being an average fielder at shortstop while creating 2.825 runs per game). Suppose that we are comparing two shortstops, each of whom compiled 600 PA and played an equal number of defensive games and innings (and thus would have the same positional adjustment using the subtraction approach). Alpha made 380 outs and Bravo made 410 outs, and each ranked as dead-on average in the field.<br /><br />The difference in overall RAR between the two using the subtraction approach would be equal to the difference between their offensive RAA compared to the league average. Assuming the league average is 4.5 runs, and that both Alpha and Bravo created 75 runs, their offensive RAAs are:<br /><br />Alpha = (75*25.5/380 - 4.5)*380/25.5 = +7.94<br /><br />Similarly, Bravo is at +2.65, and so the difference between them will be 5.29 RAR.<br /><br />Using the flawed approach, Alpha's RAR will be:<br /><br />(75*25.5/380 - 4.5*.73*.86)*380/25.5 = +32.90<br /><br />Bravo's RAR will be +29.58, a difference of 3.32 RAR, which is two runs off of the difference using the subtraction approach.<br /><br />The downside to using PA is that you really need to consider park effects if you do, whereas outs allow you to sidestep park effects. Outs are constant; plate appearances are linked to OBA. Thus, they not only depend on the offensive context (including park factor), but also on the quality of one's team. Of course, attempting to adjust for team PA differences opens a huge can of worms which is not really relevant; for now, the point is that using outs for individual players causes distortions, sometimes trivial and sometimes bothersome, but almost always makes one's life easier.<br /><br />I do not include fielding (or baserunning outside of steals, although that is a trivial consideration in comparison) in the RAR figures--they cover offense and positional value only). This in no way means that I do not believe that fielding is an important consideration in player evaluation. However, two of the key principles of these stat reports are 1) not incorporating any data that is not readily available and 2) not simply including other people's results (of course I borrow heavily from other people's methods, but only adapting methodology that I can apply myself).<br /><br />Any fielding metric worth its salt will fail to meet either criterion--they use zone data or play-by-play data which I do not have easy access to. I do not have a fielding metric that I have stapled together myself, and so I would have to simply lift other analysts' figures.<br /><br />Setting the practical reason for not including fielding aside, I do have some reservations about lumping fielding and hitting value together in one number because of the obvious differences in reliability between offensive and fielding metrics. In theory, they absolutely should be put together. But in practice, I believe it would be better to regress the fielding metric to a point at which it would be roughly equivalent in reliability to the offensive metric.<br /><br />Offensive metrics have error bars associated with them, too, of course, and in evaluating a single season's value, I don't care about the vagaries that we often lump together as "luck". Still, there are errors in our assessment of linear weight values and players that collect an unusual proportion of infield hits or hits to the left side, errors in estimation of park factor, and any number of other factors that make their events more or less valuable than an average event of that type.<br /><br />Fielding metrics offer up all of that and more, as we cannot be nearly as certain of true successes and failures as we are when analyzing offense. Recent investigations, particularly by Colin Wyers, have raised even more questions about the level of uncertainty. So, even if I was including a fielding value, my approach would be to assume that the offensive value was 100% reliable (which it isn't), and regress the fielding metric relative to that (so if the offensive metric was actually 70% reliable, and the fielding metric 40% reliable, I'd treat the fielding metric as .4/.7 = 57% reliable when tacking it on, to illustrate with a simplified and completely made up example presuming that one could have a precise estimate of nebulous "reliability").<br /><br />Given the inherent assumption of the offensive PADJ that all positions are equally valuable, once RAR has been figured for a player, fielding value can be accounted for by adding on his runs above average relative to a player at his own position. If there is a shortstop that is -2 runs defensively versus an average shortstop, he is without a doubt a plus defensive player, and a more valuable defensive player than a first baseman who was +1 run better than an average first baseman. Regardless, since it was implicitly assumed that they are both average defensively for their position when RAR was calculated, the shortstop will see his value docked two runs. This DOES NOT MEAN that the shortstop has been penalized for his defense. The whole process of accounting for positional differences, going from hitting RAR to positional RAR, has benefited him.<br /><br />I've found that there is often confusion about the treatment of first baseman and designated hitters in my PADJ methodology, since I consider DHs as in the same pool as first baseman. The fact of the matter is that first baseman outhit DH. There are any number of potential explanations for this; DHs are often old or injured, players hit worse when DHing than they do when playing the field, etc. This actually helps first baseman, since the DHs drag the average production of the pool down, thus resulting in a lower replacement level than I would get if I considered first baseman alone.<br /><br />However, this method does assume that a 1B and a DH have equal defensive value. Obviously, a DH has no defensive value. What I advocate to correct this is to treat a DH as a bad defensive first baseman, and thus knock another five or so runs off of his RAR for a full-time player. I do not incorporate this into the published numbers, but you should keep it in mind. However, there is no need to adjust the figures for first baseman upwards --the only necessary adjustment is to take the DHs down a notch.<br /><br />Finally, I consider each player at his primary defensive position (defined as where he appears in the most games), and do not weight the PADJ by playing time. This does shortchange a player like Ben Zobrist (who saw significant time at a tougher position than his primary position), and unduly boost a player like Buster Posey (who logged a lot of games at a much easier position than his primary position). For most players, though, it doesn't matter much. I find it preferable to make manual adjustments for the unusual cases rather than add another layer of complexity to the whole endeavor.<br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vR_acLogCW22NoOw7_rfiCXn85Ft6oDHP5ZXC7sr9tklxAwNhWeIbr84IH9f9Mk3lHqFRCjBRVlQM8Y/pub?output=HTML">2019 League</a><br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTrHwr8h9xH7oiyrGBLuoFuqfkRjMopi_0LYN_y8k6nWwyWMds73hLsgwQ6vMXcqHd4GQUix3Tmqjp-/pub?output=html">2019 PF</a><br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vTfJPLaAzwXiO6AfXEcVN0YgmdM2-eiiwh4iNr4GVufqa09lF5tiNh1AR4aEO8g4rGGOQNjoaz81Ect/pub?output=html">2019 Teams</a><br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vQILe7gamRtgQpEcuVtbgnzyj2rlzM40Ne9fLDxfK5nTBXYYzepoHqwNfg3SYrOy6W1VkYAM4fZXVes/pub?output=html">2019 Team Defense</a><br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vRL6QYSzpgFmrBojCCtOy6NqF6jyrRHzFIPKZtJpP3H6MfpCNdGs78ast1_VjAQ3Sv_OOGUFWGqj1ic/pub?output=html">2019 Team Offense</a><br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vRnyG9VJ7woC_5a5-rNzsZ2XADRCBH1JhoCi4-WXKyyX1dPJEz8O4l0TQ-Lq6hM_v2YY5B0OkUh9jkj/pub?output=html">2019 AL Relievers</a><br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vSmV5PxNU0T3RP_0_XVH5MeTOQlQvvsryjxNLmmdjRrXeoG4IyIXFNOJwPogxvgMnhghT-oMoPeFUxh/pub?output=html">2019 NL Relievers</a><br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vQdulwbcP7GxDHbbGxRcNRgcKwyNlNcHU4B6tVZ5HYeWgym6DmjJYreo-M78EaG9fU20J6KrQkCJUDW/pub?output=html">2019 AL Starters</a><br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vRd3bNTV7Lq4w8QVv5dq3EnEGFWpKN4VIoB3vW-ECF_b_W4N5iR2X-mxlySY4bcepO50hVmwdtiWXwb/pub?output=html">2019 NL Starters</a><br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vS9oiEWKUlISU-d43WyWHdiWfyD3taSGlyUTz6MA9MAuLIP576KIbAPQstoYaexCrfimnODFRwZ-Jah/pub?output=html">2019 AL Hitters</a><br /><br /><a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vRGzl7_LApCqbiLCJ51sRTpzfZ2jRVbEjdf--OFeXN5_p1J41jBZka1ryRxbXJnE7-pSBGstZrd3GF8/pub?output=html">2019 NL Hitters</a>phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-18538015217922491452019-09-30T18:33:00.000-04:002019-09-30T18:33:44.772-04:00Crude Playoff Odds -- 2019These are very simple playoff odds, based on my crude rating system for teams using an equal mix of W%, EW% (based on R/RA), PW% (based on RC/RCA), and 69 games of .500. They account for home field advantage by assuming a .500 team wins 54.2% of home games (major league average 2006-2015). They assume that a team's inherent strength is constant from game-to-game. They do not generally account for any number of factors that you would actually want to account for if you were serious about this, including but not limited to injuries, the current construction of the team rather than the aggregate seasonal performance, pitching rotations, estimated true talent of the players, etc.<br /><br />The CTRs that are fed in are:<br /><br /><a href="https://4.bp.blogspot.com/-qmpxo6zrJ_A/XZKA7iYiHwI/AAAAAAAACsA/O8Tyw6wAKj0_B6UKuTK3LA0nubbZIH-fACLcBGAsYHQ/s1600/19odds1.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-qmpxo6zrJ_A/XZKA7iYiHwI/AAAAAAAACsA/O8Tyw6wAKj0_B6UKuTK3LA0nubbZIH-fACLcBGAsYHQ/s400/19odds1.JPG" width="368" height="400" data-original-width="173" data-original-height="188" /></a><br /><br />Wilcard game odds (the least useful since the pitching matchups aren’t taken into account, and that matters most when there is just one game):<br /><br /><a href="https://4.bp.blogspot.com/-epVSBrupjD4/XZKBAW_Wk2I/AAAAAAAACsE/Qh-MGUqEML0HFE_PXTNl4yxHYMKZoAkqwCLcBGAsYHQ/s1600/19odds2.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-epVSBrupjD4/XZKBAW_Wk2I/AAAAAAAACsE/Qh-MGUqEML0HFE_PXTNl4yxHYMKZoAkqwCLcBGAsYHQ/s400/19odds2.JPG" width="400" height="81" data-original-width="291" data-original-height="59" /></a><br /><br />LDS:<br /><br /><a href="https://3.bp.blogspot.com/-2_D1KQYFBu8/XZKBLiktZpI/AAAAAAAACsQ/7YnYkdfmpTAr3R7Tj3S8vT35VbI2F0laQCLcBGAsYHQ/s1600/19odds3.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-2_D1KQYFBu8/XZKBLiktZpI/AAAAAAAACsQ/7YnYkdfmpTAr3R7Tj3S8vT35VbI2F0laQCLcBGAsYHQ/s400/19odds3.JPG" width="400" height="108" data-original-width="501" data-original-height="135" /></a><br /><br />LCS:<br /><br /><a href="https://1.bp.blogspot.com/-QCM7IBnM3X0/XZKBF0jitgI/AAAAAAAACsI/uq9KHrSn_7Ez2UGEE5cee8FizcNRgx_FQCLcBGAsYHQ/s1600/19odds4.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-QCM7IBnM3X0/XZKBF0jitgI/AAAAAAAACsI/uq9KHrSn_7Ez2UGEE5cee8FizcNRgx_FQCLcBGAsYHQ/s400/19odds4.JPG" width="400" height="197" data-original-width="501" data-original-height="247" /></a><br /><br />WS:<br /><br /><a href="https://4.bp.blogspot.com/-RdDxmKHj4-A/XZKBQjuQ0WI/AAAAAAAACsU/iu5tXqUtuBIhy9f0hOvPco_yttiIoAN7QCLcBGAsYHQ/s1600/19odds5.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-RdDxmKHj4-A/XZKBQjuQ0WI/AAAAAAAACsU/iu5tXqUtuBIhy9f0hOvPco_yttiIoAN7QCLcBGAsYHQ/s400/19odds5.JPG" width="400" height="394" data-original-width="502" data-original-height="495" /></a><br /><br />It was easier to run this when World Series home field advantage was determined by league rather than team record. The record approach is not as arbitrary as alternating years or as silly as using the All-Star game result, but it does produce its own share of undesirable outcomes. Houston would have home field over Los Angeles, but given that the NL was finally stronger than the AL this year, the Astros' one game edge suggests an inferior record to that of the Dodgers, not a superior one. Even worse are the tiebreakers - after head-to-head, the edge goes to the team with the better intradivisional records favors teams from weak divisions, who likely performed less well than their raw win-loss record would suggest. The same is true of intraleague record which is the next tiebreaker. If some division/league breakout is the criteria of choice, it should be inter-, not intra-.<br /><br />Putting it all together:<br /><br /><a href="https://2.bp.blogspot.com/-3LkQKQ5-UUE/XZKCnIAwUFI/AAAAAAAACsk/wopO19Fxo20a9WNnVdWI6-HnoyOdxfQsQCLcBGAsYHQ/s1600/19odds6.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-3LkQKQ5-UUE/XZKCnIAwUFI/AAAAAAAACsk/wopO19Fxo20a9WNnVdWI6-HnoyOdxfQsQCLcBGAsYHQ/s400/19odds6.JPG" width="400" height="248" data-original-width="337" data-original-height="209" /></a><br /><br />phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-32236641781504133352019-09-25T08:43:00.000-04:002019-09-30T18:44:30.673-04:00Enby Distribution, pt. 11--Game Expected W%This is (finally!) the last post in this series, at least for now.<br /><br />In the Mets essay for the 1986 <u>Baseball Abstract</u>, Bill James focused on data he was sent by a man named Jeffrey Eby on the frequency of teams scoring and allowing X runs in a game, and their winning percentage when doing so. After some discussion of this data, and a comparison of the Mets and Dodgers offense (the latter was much efficient at clustering its runs scored in games to produce wins), he wrote:<br /><br />“One way to formalize this approach would be to add up the ‘win expectations’ for each game. That is, since teams which score one run will win 14.0% of the time, then for any game in which a team scores exactly one run, we can consider them to have an ‘offensive winning percentage’ for that game of .140. For any game in which the team scores give runs, they have an offensive winning percentage of .695. Their offensive winning percentage for the season is the average of their offensive wining [sic] percentages for all the games.”<br /><br />It stuck James at the time, and me reading it many years later, as a very good way to boil the data we have about team runs scored by game and boil it down into a single number that gets to the heart of the matter – how efficient was a team at clustering their runs to maximize their expected wins? James (in the essay) and I (for the last eight seasons or so on this blog) used the empirical data on the average winning percentage of teams when scoring or allowing X runs to calculate the winning percentage he described. I have called these gOW% and gDW%, for “game” offensive and defensive W%. However, there are a number of drawbacks to using empirical data.<br /><br />To repeat myself from my 2016 review of the data, these include:<br /><br />1. The empirical distribution is subject to sample size fluctuations. In 2016, all 58 times that a team scored twelve runs in a game, they won; meanwhile, teams that scored thirteen runs were 46-1. Does that mean that scoring 12 runs is preferable to scoring 13 runs? Of course not--it's a quirk in the data. Additionally, the marginal values (i.e. the change in winning percentage from scoring X runs to X+1 runs) don’t necessary make sense even in cases where W% increases from one runs scored level to another.<br /><br />2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored.<br /><br />3. Related to #2 (really its root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce more quirks into the data.<br /><br />Given these constraints, I have always promised to use Enby to develop estimated rather than empirical probabilities of winning a game when scoring X runs, given some fixed average runs allowed per game (or the complement from the defensive perspective). Suppose that the major league average is 4.5 runs/game. Given this, we can use Enby to estimate the probability of scoring X runs in a game (since the goal here is to estimate W%, I am using Enby with a Tango Distribution c parameter = .852, which is used for head-to-head matchups):<br /><br /><a href="https://1.bp.blogspot.com/-l193ny9CSQI/XYrjE1A-FyI/AAAAAAAACrQ/Xf9OGMS9XLogi6jy73ChgDDfU0xMb3uZwCLcBGAsYHQ/s1600/gow1.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-l193ny9CSQI/XYrjE1A-FyI/AAAAAAAACrQ/Xf9OGMS9XLogi6jy73ChgDDfU0xMb3uZwCLcBGAsYHQ/s400/gow1.JPG" width="139" height="400" data-original-width="131" data-original-height="376" /></a><br /><br />From here, the logic to estimate the probability of winning is fairly straightforward. If you score zero runs, you always lose. If you score one run, you win if you allow zero runs. If you allow one run, then the game goes to extra innings (I’m assuming that Enby represents per nine inning run distributions, just as we did for the Cigol estimates. Since the major league average innings/game is pretty close to nine, this is a reasonable if slightly imprecise assumption), in which case we’ll assume you have a 50% chance to win (we’re not building any assumptions about team quality in as we do in Cigol, necessitating an estimate of winning in extra innings that reflects expected runs and expected runs allowed). So a team that scores 1 run should win 5.39% + 10.11%/2 = 10.44% of those games.<br /><br />If you score two runs, you win all of the games where you allow zero or one, and half of the games where you allow 2, so 5.39% + 10.11% + 13.53%/2 = 22.26%. This can be very easily generalized:<br /><br />P(win given scoring X runs) = sum (from n = 0 to n = x - 1) of P(n) + P(x)/2<br /><br />Where P(y) = probability of allowing y runs<br /><br />Thus we get this chart:<br /><br /><a href="https://4.bp.blogspot.com/-k0SNZ2pDkw4/XYrjp_3HEdI/AAAAAAAACrY/mli98Gy_cT0owXelYRkVvuOADHAdw176gCLcBGAsYHQ/s1600/gow2.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-k0SNZ2pDkw4/XYrjp_3HEdI/AAAAAAAACrY/mli98Gy_cT0owXelYRkVvuOADHAdw176gCLcBGAsYHQ/s400/gow2.JPG" width="380" height="400" data-original-width="356" data-original-height="375" /></a><br /><br />It should be evident that the probability of winning when allowing X runs is the complement of the probability of winning when scoring X runs, although this could also be calculated directly from the estimated run distribution.<br /><br />Now, instead of using the empirical data for any given league/season to calculate gOW%, we can use Enby to generate the expected W%s, eliminating the sample size concerns and enabling us to customize the run environment under consideration. I did just that for the 2016 majors, where the average was 4.479 R/G (Enby distribution parameters are r = 4.082, B = 1.1052, z = .0545):<br /><br /><a href="https://2.bp.blogspot.com/-_rxB8XZ8QQ0/XYrj6l5ZvJI/AAAAAAAACrg/1KDvlUdSXoIhv5QiVXRvpM8-_T5QBwLDQCLcBGAsYHQ/s1600/gow3.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-_rxB8XZ8QQ0/XYrj6l5ZvJI/AAAAAAAACrg/1KDvlUdSXoIhv5QiVXRvpM8-_T5QBwLDQCLcBGAsYHQ/s400/gow3.JPG" width="200" height="400" data-original-width="196" data-original-height="392" /></a><br /><br />The first two columns compare the actual 2016 run distribution to Enby. The next set compares the empirical probability of winning when scoring X runs (I modified it to use a uniform value for games in which 12+ runs were scored, for the purpose of calculating gOW% and gDW%) to the Enby estimated probability. The Enby probabilities are generally consistent with the observed probabilities for 2016, but as expected there are some differences, and note that Enby is assuming independence of runs scored and runs allowed in a single game which environmental conditions alone make an assumption that can be most positively described as “simplifying”.<br /><br />The resulting gOW% and gDW% from using the Enby estimated probabilities:<br /><br /><a href="https://1.bp.blogspot.com/-VQxjpG1ELPQ/XYrklgpWcfI/AAAAAAAACro/v30lbAHmbMcqrkZKuIMCo138Rc1DDx7TACLcBGAsYHQ/s1600/gow4.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-VQxjpG1ELPQ/XYrklgpWcfI/AAAAAAAACro/v30lbAHmbMcqrkZKuIMCo138Rc1DDx7TACLcBGAsYHQ/s400/gow4.JPG" width="333" height="400" data-original-width="327" data-original-height="393" /></a><br /><br />There is not a huge difference between these and the empirical figures. One thing that is lost by switching to theoretical values is that the league does not necessarily balance to .500. In 2016 the average gOW% was .497 and the average gDW% was .502.<br /><br />However, the real value of this approach is that we no longer are forced to pretend that runs are equally valuable in every context. Note that Colorado had the second-highest OW% and third-lowest DW% in the majors. Anyone reading this blog knows that this is mostly a park illusion. If you look at park-adjusted R/G and RA/G, Colorado ranked seventeenth and nineteenth-best respectively, with 4.42 and 4.50 (again the league average R/G was 4.48), so the Rockies were slightly below average offensively and defensively. While we certainly don’t expect our estimate of their offensive or defensive quality using aggregate season runs to precisely match our estimate when considering their run distributions on a game basis (if they did, this whole exercise would be a complete waste of time), it would be quite something if a single team managed to be wildly efficient on offense and wildly inefficient on defense. <br /><br />When we consider that Colorado’s park factor was 1.18, in order to compute gOW%/gDW% in the run environment in which they played, we need to take the league average of 4.479 R/G x 1.18 = 5.29. (We could of course use the NL average R/G here as well; I’m intending this post as an example of how to do the calculations, not a full implementation of the metrics. For the same reason, I will round that park adjusted average up a tick to 5.3 R/G, since I already have the Enby distribution parameters handy at increments of .05 R/G). With c = .852, we have an Enby distribution with r = 5.673, B = .9363, z = .0257. The resulting Enby estimates of scoring frequency and W% scoring/allowing X runs are:<br /><br /><a href="https://2.bp.blogspot.com/-HaGoxvLWUcI/XYrk0BtbViI/AAAAAAAACrs/Apz1ZwLJpzwyi09mHLr0YEV4rHaGKzzsACLcBGAsYHQ/s1600/gow5.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-HaGoxvLWUcI/XYrk0BtbViI/AAAAAAAACrs/Apz1ZwLJpzwyi09mHLr0YEV4rHaGKzzsACLcBGAsYHQ/s400/gow5.JPG" width="379" height="400" data-original-width="356" data-original-height="376" /></a><br /><br />Using these estimated W%s, the Rockies gOW% drops from .560 to .485 and their gDW% increases from .437 to .508. As suggested by their park-adjusted R/G figures, Colorado’s offense and defense were both about average; their defense fares a little better when looking at the game distribution than when using aggregate totals, and the opposite for the offense.<br /><br />Some readers are doubtlessly thinking that by aggregating at the season level, we’ve lost some key detail. We could have looked at Colorado home and road games separately, each with a distinct set of Enby parameters and corresponding probabilities of winning when scoring X runs rather than lumping it altogether and applying the park factor that considers that half of the games are on the road. This of course is true; you can slice and dice however you’d like. I find the team seasonal level to be a reasonable compromise.<br /><br />This is beyond the scope of this series, so I will mention it briefly and move on. I have previously combined gOW% and gDW% into a single W% estimate by converting each into an equivalent run ratio using Pythagenpat math, then using Pythagenpat to convert those ratios into a W% estimate. This makes theoretical sense, although it loses sight of using the actual runs scored and allowed distributions of a team in a season and rearranging them (“bootstrapping” if you must). It occurred to me in writing this post that I could just use the same logic I use to convert Enby probabilities of scoring X runs into an estimated W% for the team. For example, we could use the Rockies runs scored distribution to estimate how often they would win when allowing x runs and use this in conjunction with their runs allowed distribution to estimate a W% given their runs allowed distribution. Then we could do the same with their runs scored/runs allowed to estimate a W% given their runs scored distribution. Averaging these two estimates would, in essence, put together every possible combination of their actual runs distribution from the season and calculate the average expected wins. For a simple example that avoids “ties”, if a team played two games, winning one 3-1 and the other 7-5, we would make every possible combination (3-1, 3-5, 7-1, 7-5) and estimate a .750 gEW%, compared to a 1.000 W% and a .720 Pythagenpat W%.<br /><br />Here’s an example for the 2016 Rockies:<br /><br /><a href="https://2.bp.blogspot.com/-cVpvlHjc3DE/XYrlc4T23MI/AAAAAAAACr0/wIM6k070OZsvt4yLhCQCveuaB13NaCu-ACLcBGAsYHQ/s1600/gow6.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-cVpvlHjc3DE/XYrlc4T23MI/AAAAAAAACr0/wIM6k070OZsvt4yLhCQCveuaB13NaCu-ACLcBGAsYHQ/s400/gow6.JPG" width="400" height="299" data-original-width="456" data-original-height="341" /></a><br /><br />The first two columns tell us that the Rockies scored two runs in 16 games and allowed two in 15 games. After converting these to frequencies, we can easily calculate the probability of winning giving that the team scores X runs in the same manner as we did above with Enby probabilities. For example, when the Rockies score two runs, they will win if they allowed zero (5.56%) or one (8.64%), and half of games in which they allow two (9.26%), for a win probability of 5.56% + 8.64% + 9.26%/2 = 18.8%. Figured this way, the Colorado’s gOW% is .494, their gDW% is .496, and thus their gEW% is .495. Please note that I’m not suggesting that using the team’s actual frequencies of scoring/allowing X runs is preferable to using league averages or Enby. Furthermore, the gOW% and gDW% components are not useful, since they make the estimate of the quality of the offense or defense dependent on its counterpart. <br />phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-69754639994691666502019-08-21T07:13:00.000-04:002019-08-21T07:13:07.383-04:00A Most Pyrrhic VictoryIt’s never fun to be in a position where you feel like your team’s short-term success will hamper its long-term prospects. For one, it is inherently an arrogant thought - holding that you can perceive something that often the professionals that run the team cannot (although one of the most common occurrences of this phenomenon in sports, rooting against current wins with an eye to draft position, doesn’t fit). It feels like a betrayal of the loyalty you supposedly hold as a fan, specifically with the players that you like who are caught in the crossfire. Most significantly, it’s just not fun - sports are fun when your team wins, not win they lose, even if you rationalize those losses as just one piece of a grander design.<br /><br />It is even harder when the team in question represents your alma mater, an institution to which you feel an immense loyalty and pride, one far deeper than anything you feel towards any kind of social or religious institution, professional organization, or (of course) a government. Such is the predicament that I find myself in when following the fortunes of Ohio State baseball. It is a position I have never been in before as a fan of OSU sports - I have rarely been part of the rabble calling for a regime change in any sport, and in the one case I can recall in which I was, it wasn’t with any kind of glee or malice. I believed that the coach in question wanted to win, was trying their best, was a worthy representative of the university, might even succeed in turning it around if given the opportunity - but that it was probably time to reluctantly pull the plug.<br /><br />None of this holds when considering the position of Greg Beals. Beals’ tenure at OSU now stretches, incredibly, over nine seasons, nine seasons that are much worse than any nine season stretch that proceeded it in the last thirty years of OSU baseball. A stretch of nine seasons in which a Big Ten regular season title has rarely been more than a pipe dream. I don’t feel like recounted the depressing details in this space - the season preview posts for the next four seasons will provide ample opportunity. That’s right - Beals now holds a three-year extension that takes him through 2023.<br /><br />How has he managed to pull this off? Apparently with another well-timed run in the Big Ten Tournament, winning the event for the second time and thus earning an automatic bid to the NCAA tournament. It’s not as if the Buckeyes were on the bubble before the tournament - well actually, they were. They were squarely on the bubble for the <i>Big Ten</i> tournament. OSU’s overall season record ended up at 36-27, but if you look deeper it was worse than that. At 12-12 in the Big Ten, they finished in a three-way tie for sixth place, needing help on the final day to qualify for the eight team field. Then they turned around and won it.<br /><br />In the NCAA tournament, the Buckeyes were thumped by Vanderbilt, eked out a thirteen-inning victory over McNeese to stay alive, then falling to Indiana State. To add insult to injury, another Big Ten team, the one from the heart of darkness, also had an unlikely tournament run. Except that outfit, channeling the spirit of departed basketball coach/practitioner of the dark arts John Beilein, made their run in the NCAA tournament, all the way to 1-0 lead in the final series before the aforementioned Commodores restored order to the universe. <br /><br />The Buckeyes were actually outscored by one run on the season, averaging 5.56 runs scored and 5.57 runs allowed per game. Compared to the average Big Ten team, the Bucks were +10 runs offensively and -15 runs defensively. However, this obscures some promising developments on the pitching side. The weekend rotation of Seth Lonsway (9 RAA, 12.3 K/5.8 W), Garrett Burhenn (10, 6.8/3.1), and Griffan Smith (3, 8.9/3.8) was surprisingly effective given its youth (sophomore, sophomore, freshman respectively). Relief ace Andrew Magno was absolutely brilliant (22, 10.4/5.0) with some heroic and perhaps ill-advised extended appearances in the tournaments; he was popped in the fifteenth round by Detroit. Outside of them, there were a group of relievers clustered between 2 and -3 RAA (Joe Gahm, Thomas Waning, Will Pfenig, and TJ Root) and a few rough lines - midweek starter Jake Vance had a 7.90 RA in 41 innings for -11 RAA, and three relievers (Mitch Milheim, TJ Brock, and usual position player Brady Cherry) combined for 57 innings and a whopping 65 runs allowed for -31 RAA. Thankfully most of these were low-leverage innings. <br /><br />The pitching was also not done any favors by the defense, as Ohio recorded a DER of just .641 compared to a conference average of .656. The good news is that the offense made up for it at the plate; the bad news is that the best hitters have exhausted or foregone their remaining eligibility. The biggest excpetion was sophomore Dillon Dingler, who returned to his natural position behind the plate after a freshman year spent in center and hit .291/.391/.424 for 9 RAA. Junior Connor Pohl was just an average hitter playing first base, but is a solid defender and was durable. Senior Kobie Foppe lost the second base job as he struggled mightily over his 118 PA (.153/.284/.194); junior utility man Matt Carpenter assumed the role but only hit .257/.300/.324 himself. Sophomore Noah West started the season at shortstop and was much improved offensively (.284/.318/.420), but his injury led to a reshuffling of the defensive alignment, with freshman Zack Dezenzo moving over from third (he hit a solid .256/.316/.440 with 10 longballs) and classmate Marcus Ernst assuming the hot corner (.257/.316/.300 over 76 PA) before yielding to yet another freshman, Nick Erwin (.235/.288/.272 over 147 PA). <br /><br />Senior Brady Cherry finally fulfilled his potential, mashing .314/.385/.563 for 23 RAA in left field. Little-used fifth-year senior Ridge Winand wound up as the regular center fielder, although his bat did not stand out (.243/.335/.347). In right field, junior Dominic Canzone had one the finest seasons ever by a Buckeye hitter, parlaying a .345/.432/.620 (37 RAA) line into an eighth-round nod from Arizona. Sophomore backup catcher Brent Todys eventually assume DH duties thanks to his power (.256/.345/.462); his .206 ISO trailed only Canzone and Cherry, who each blasted sixteen homers.<br /><br />So the Beals era rolls on, and at least another Big Ten tournament title has been added to the trophy case. When official SID releases after the season-ending NCAA tournament loss to Indiana State say “Buckeyes Championship Season Comes to an End”, you wonder whether there is some sarcasm even amongst people who are paid to provide favorable coverage. And then you realize no, it’s not even spin, they really believe it. Once #BealsBall takes root, it is nigh near impossible to make it just go away.phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-23627275225629540972019-05-29T08:12:00.000-04:002019-05-29T08:12:02.280-04:00Enby Distribution, pt. 10: Behavior Near 1 RPGEven for this series, this is an esoteric topic, but I wanted to specifically explore how Enby, Cigol, runs per win, Pythagorean exponent, etc. behaved around 1 RPG. 1 RPG is not a particularly interesting point from a real-world baseball perspective. Take 20 RPG. This is an outlandish level of scoring for teams, but one can easily imagine a theoretical scenario constructed from real players, and using the types of constructs that have sometimes been used by sabermetricians (for instance, a team of Babe Ruths with average pitching playing a team of Ty Cobbs with average pitching) in which 20 RPG would be the context. But 1 RPG? Maybe if you have a team of Rey Ordonezes facing Pedro Martinez 1999, but Pedro Martinez 1999 is backed by a team of Bill Bergens and they have to face Lefty Grove 1931?<br /><br />Still, 1 RPG is of interest in the world of win estimators, as it is the point that led to Pythagenpat (and thus my own intense interest in win estimators). As you know, 1 RPG is the minimum possible scoring level since a game doesn’t end until at least one run is scored. This insight, which to my knowledge was first proffered by David Smyth, led to my discovery of the Pythagenpat exponent (and I believe Smyth’s as well). So it will always hold a special interest to me, regardless of how impractical any application may be.<br /><br />In order to facilitate this, I expanded my list of Enby and Cigol parameters (the difference is that Enby uses c = .767 in the Tango Distribution and Cigol uses c = .852) to look at each .05 RPG interval from .05 - 1.95. First, using the Enby pararmeters is a graph of the estimated probability of scoring X runs for teams that average .5, 1, 1.5, and 2 R/G:<br /><br /><a href="https://1.bp.blogspot.com/-HXkaNGw3qAY/XO3PJSPsgsI/AAAAAAAACpw/jxT452PUB8Y-jGDWWEUFjjjzwubL0GsQgCLcBGAs/s1600/enby10-1.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-HXkaNGw3qAY/XO3PJSPsgsI/AAAAAAAACpw/jxT452PUB8Y-jGDWWEUFjjjzwubL0GsQgCLcBGAs/s400/enby10-1.JPG" width="400" height="287" data-original-width="995" data-original-height="713" /></a><br /><br />I deliberately cut-off the .5 R/G team’s probability of being shutout, which is 68.7%, in order to increase the space available for other points by about 40%. One thing that should stand out if you’ve looked at any of the other graphs of this type I’ve posted is that the distinctive shape (which for the lack of a more precise term I’ll call left tail truncated, extremely elongated right tail bell curve) is not present. For all of these teams except the 2 R/G, the probability of scoring x+1 runs is always lower than the probability of scoring x runs. The 2 R/G team is actually the first at .05 intervals that achieves this modest success; teams that average 1.95 R/G are expected to be shutout in 25.1% of games and score one run in 25.0%. At 2, it is 24.3% and 24.7% respectively.<br /><br />My real interest with these teams is how RPW and Pythagenpat exponent might behave at such low levels of scoring. In order to test this, I generated a Cigol W% for each possible matchup between teams average .05 - 2 R/G at intervals of .05. I included inverse matchups (e.g. 1.25 R/G and 2 RA/G as well as 2 R/G and 1.25 RA/G), but eliminated cases where R = RA (obviously W% is .500 at these points). I also eliminated cases in which R + RA < 1, since these are impossible: <a href="https://3.bp.blogspot.com/-Rf9YPahtFEU/XO3P3H4E-YI/AAAAAAAACp4/be2QrwgkZlMEoESdQwZz1Is27cRtDQC0QCLcBGAs/s1600/enby10-2.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-Rf9YPahtFEU/XO3P3H4E-YI/AAAAAAAACp4/be2QrwgkZlMEoESdQwZz1Is27cRtDQC0QCLcBGAs/s400/enby10-2.JPG" width="400" height="287" data-original-width="989" data-original-height="709" /></a><br /><br />The relationship between RPG and RPW, even in this extremely low scoring context, is generally as we’d expect. The power regression line is a decent fit and takes a very satisfying form, as Pythagenpat RPW <a href="https://walksaber.blogspot.com/2009/01/runs-per-win-from-pythagenpat.html">can be shown</a> to be equal to 2*RPG^(1 - z). The implied z value here is lower than the .27 - .29 used for more normal environments, but close enough to suggest that Pythagenpat, which is correct by definition at 1 RPG, remains a useful tool at slightly higher RPGs.<br /><br />To test that more directly, we can look at the required Pythagorean exponents for these teams plotted against RPG as well:<br /><br /><a href="https://4.bp.blogspot.com/-Jtc0MjZOH6g/XO3Qee3C2ZI/AAAAAAAACqA/4_tvawj0bio1VJdGc2Zo4TFsBrxE0rvDwCLcBGAs/s1600/enby10-3.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-Jtc0MjZOH6g/XO3Qee3C2ZI/AAAAAAAACqA/4_tvawj0bio1VJdGc2Zo4TFsBrxE0rvDwCLcBGAs/s400/enby10-3.JPG" width="400" height="287" data-original-width="989" data-original-height="709" /></a><br /><br />This graph is less encouraging. At first glance the most disturbing this is that the power regression doesn’t do a great job of fitting the data, as it produces Pythagorean exponents too low for the higher scoring contexts. The only way to achieve a RPG approaching 4 given how I defined this dataset is to have teams that are fairly evenly matched, while wide gaps in team quality can pop up at low RPG (for example, we could get 1 RPG from .05 R/.95 RA at one extreme of imbalance or .5 R/.5 RA at the other). This again suggests that the imbalance between the two teams has a material impact on the needed Pythagorean exponent, but one that I’ve as of yet been unable to successfully capture in a satisfactory equation.<br /><br />The more alarming thing about these results is they show a fraying of the Cigol W% estimates from Smyth’s logical conclusion that underpins Pythagenpat--namely that a 1 RPG team will win the same number of games as runs they score. For the nine unique pairs of R/RA (not counting their inverses), the Cigol W% is off slightly, as you can see the needed Pythagorean exponents at 1 RPG are not equal to 1:<br /><br /><a href="https://1.bp.blogspot.com/-xRLwBhp2UmQ/XO3QsIGda_I/AAAAAAAACqE/IUqJvB2K-CMqADuyqAEj21eeFbIpoiVJACLcBGAs/s1600/Enby10-4.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://1.bp.blogspot.com/-xRLwBhp2UmQ/XO3QsIGda_I/AAAAAAAACqE/IUqJvB2K-CMqADuyqAEj21eeFbIpoiVJACLcBGAs/s400/Enby10-4.JPG" width="400" height="175" data-original-width="507" data-original-height="222" /></a><br /><br />True W% is equal to R/G, and the error/162 is (Cigol W% - True W%)*162. The errors are not horrible, all well within one standard deviation of the typical Pythagenpat error for normal major league teams, but they still could into question the theoretical validity of the Cigol estimates in extremely low scoring contexts.<br /><br />I redid the graph by replacing the Cigol estimates for these nine teams and their inverses with the True W%. This only corrects the W% for cases where we think for the moment that by definition Cigol is wrong; if that is so, Cigol is likely causing significant distortions at scoring levels just above 1 RPG as well, which are not corrected. I never expected Cigol to be a perfect model (or, to phrase it more precisely, I never expected any actual implementation of Cigol to be a perfect model; the mathematical underpinnings of Cigol, given the assumption of independence of runs scored and allowed, are true by definition), but I have written much of this series as if Cigol and the previously unnamed “True W%” were one in the same. This is not the case, but it is always a bit disappointing when you find a blemish in your model.<br /><br />With these corrections, we have this graph and regression equations:<br /><br /><a href="https://2.bp.blogspot.com/-z8cLKvKCZ0s/XO3Q8XGpHGI/AAAAAAAACqQ/B-HL6Tfp3c4SSjnHvLxxIOjwBf-x2iWeQCLcBGAs/s1600/Enby10-5.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-z8cLKvKCZ0s/XO3Q8XGpHGI/AAAAAAAACqQ/B-HL6Tfp3c4SSjnHvLxxIOjwBf-x2iWeQCLcBGAs/s400/Enby10-5.JPG" width="400" height="286" data-original-width="987" data-original-height="706" /></a><br /><br />This doesn’t do much to change the regression equations (changing eighteen observations out of 1,398 generally will not), but at least it looks better to have observations at (1, 1). I don’t have any correction to offer to Enby/Cigol itself to solve this problem; my inclination is to assume there are two problems at play:<br /><br />1) that the estimate probability of being shutout, the Enby parameter z, for which I use the Tango Distribution to estimate, doesn’t hold up at these extremely low scoring levels. Maybe the Tango Distribution c parameter, which varies based on whether the question revolves around one team’s runs per inning scoring distribution or at matchup between two teams, inherently assumes covariance between R and RA that doesn’t hold when only one team scores in a game by definition (at 1 RPG, and many other games between teams for which RPG is slightly greater than 1 would end 1-0 as well). But that is just a guess, and one that might appear to a reader to throw the other method under the bus. I don’t mean it in that way at all, of course; the Tango Distribution was not developed to be an input into a runs/game distribution. <br /><br />2) Regardless of the z parameter, Cigol assumes that runs scored and runs allowed are independent between the two teams and from game to game. But when I say that a team that plays scored .6 R/G and allows .4 must have a .600 W%, I am referring to a team that has actually put up those figures over some period of time. This is still not the same as saying that the team is a true .6/.4 team. And so there is not necessarily a flaw in Cigol at all. Enby (using the c = .852 parameters) expects a true talent .6 R/G team to score more than one run in 13.9% of their games. So it would be extremely unlikely that any team, even at these ridiculously low scoring levels, could ever produce a 1 RPG over a period of several games or longer.<br /><br />But redefining the question in terms of true talent means that you could have a true talent .3 R/.4 RA team, for instance. I unceremoniously tossed these teams out of the dataset earlier, but they should have been included. So I will quickly look at Cigol’s estimate of the necessary Pythagorean exponent for these teams (these are teams scoring and allowing .05 - .9 runs per game at intervals of .05 with a total R+RA < 1): <a href="https://3.bp.blogspot.com/-mu7Sk5c88do/XO3RnVIAbbI/AAAAAAAACqY/OkF3ovCXWx83V9Wm7vf0upYQuZSjJnhxwCLcBGAs/s1600/Enby10-6.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-mu7Sk5c88do/XO3RnVIAbbI/AAAAAAAACqY/OkF3ovCXWx83V9Wm7vf0upYQuZSjJnhxwCLcBGAs/s400/Enby10-6.JPG" width="400" height="294" data-original-width="983" data-original-height="722" /></a><br /><br />This isn’t interesting except as confirmation that the lower bound for the exponent is 1, which means that Pythagenpat fails for these teams. Pythagenpat will allow these teams to have exponents below 1. For example, .5 RPG is a Pythagenpat exponent around .5^.28 = .824.<br /><br />For the sake of the rest of this discussion, I will no longer hew to a strict requirement that the exponent be equal to 1 at any point (only that it never dip below 1). In its place, let me propose an alternate set of rules for an equation to estimate the Pythagorean exponent to be valid:<br /><br />1) the exponent must always increase with RPG if R = RA (or, the equation need not be strictly limited to using RPG; however, it must strictly increase with RPG for a theoretically average team. I don’t know for sure that this is a theoretical imperative, but I want to preclude the use of a quadratic model that might appear to be a good fit but with a negative coefficient for the x^2 term which results in a negative derivative when x is large<br /><br />2) the exponent must be close to 1 at 1 RPG. If we came up with a power regression that said the exponent = 1.02*RPG^.272, for instance, that would be fine. It’s close to 1.<br />Once I decided that I didn’t need to adhere to the constraint that x = 1 when RPG = 1, I tried a number of forms of x = RPG^z plus some other term that incorporated run differential. Here are a handful of the more promising ones:<br /><br />x = 1.03841*RPG^.265 + .00114*RD^2 (RMSE = 4.0084)<br />x = 1.04567*RPG^.2625 + .00113*RD^2 (RMSE = 4.0082)<br />x = 1.05299*RPG^.26 + .00113*RD^2 (RMSE = 4.0080)<br />x = 1.05887*RPG^.258 + .00113*RD^2 (RMSE = 4.0077)<br />x = 1.03059*RPG^.27 + .16066*(RD/RPG)^2 (RMSE = 4.0076)<br />x = 1.04561*RPG^.265 + .15274*(RD/RPG)^2 (RMSE = 4.0076)<br />x = 1.01578*RPG^.275 + .16862*(RD/RPG)^2 (RMSE = 4.0080)<br /><br />I must have run thirty regressions, looking for some formula that would beat 4.0067 (the minimum RMSE for an optimized Pythagenpat for 1961-2014 major league teams). Just to give you an idea of how silly I got, I tried this equation to estimate x (the Pythagorean exponent, eschewing the Pythagenpat construct):<br /><br />x = 10^(.30622 * log(RPG) + .0091*log(RD^2/RPG) - .01342) (RMSE = 4.011)<br /><br />Abandoning for a moment the attempt to get a lower RMSE with major league teams, how do those equations fare with the full Cigol dataset compared to Pythagenpat? In this case the RMSE is comparing the estimated W% from the formula in question to the Cigol estimate. Using z = .2867 (the value that optimizes RMSE for the 1961-2014 major league teams), the RMSE (per 162 games) is .46784. Using z = .2852 (the value that optimized RMSE for the full Cigol dataset), the RMSE is .46537. For each of the equations above:<br /><br />x = 1.03841*RPG^.265 + .00114*RD^2 (RMSE = .37791)<br />x = 1.04567*RPG^.2625 + .00113*RD^2 (RMSE = .40180)<br />x = 1.05299*RPG^.26 + .00113*RD^2 (RMSE = .42551)<br />x = 1.05887*RPG^.258 + .00113*RD^2 (RMSE = .44487)<br />x = 1.03059*RPG^.27 + .16066*(RD/RPG)^2 (RMSE = .56590)<br />x = 1.04561*RPG^.265 + .15274*(RD/RPG)^2 (RMSE = .60852)<br />x = 1.01578*RPG^.275 + .16862*(RD/RPG)^2 (RMSE = .52524)<br /><br />At least we can do better with the full Cigol dataset with a more esoteric construct than just using a fixed z value. But the practical impact is very small, and as we’ve seen these formulas add nothing to the accuracy of estimates for normal major league teams and sacrifice a bit of theoretical grounding. phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-37318550981131228992019-03-27T20:51:00.000-04:002019-03-28T08:51:25.967-04:002019 PredictionsThis is a blog intended for sabermetrically-inclined readers. I shouldn’t have to spell out a list of caveats about the for entertainment purposes only content that follows, and I won’t.<br /><br />AL EAST<br /><br />1. New York<br />2. Boston (wildcard)<br />3. Tampa Bay<br />4. Toronto<br />5. Baltimore<br /><br />I usually don’t actually show the numbers that come out of my “system” such as it is - it is not as robust a system as PECOTA or Fangraphs’ or Clay Davenport’s predictions, simplifying where the others are more rigorous and fed by other people’s player projections, because why bother reinventing that wheel when others have already done it so well? But in the case of the 2019 AL I think the estimates for the top four teams are illustrative of my failure to commit to any of this:<br /><br />NYA 822/653, 100<br />HOU 814/653, 99<br />BOS 850/683, 99<br />CLE 783/634, 98<br /><br />That’s (R/RA, Wins) in case it wasn’t obvious. So I can make bland statements like “the Red Sox appear to have a little better offense but worst defense than the Yankees”, but beyond that there’s not much to say other than it should be another entertaining season. It does appear to me that the Yankees and Astros have more surplus arms sitting around than the other contenders, and that’s certainly not a bad thing and something that the crude projection approach I take ignores. I’d expect Tampa Bay to take a step back from 2018 with a subpar offense. The Blue Jays are interesting as a sleeper, especially if the prospects show up and play more to their potential than their 2019 baseline expectation. Baltimore has two things going for them - I have Miami as worse on paper, and at least they’re trying a new approach. Actually three, because Camden Yards is awesome.<br /><br />AL CENTRAL<br /><br />1. Cleveland<br />2. Minnesota <br />3. Detroit<br />4. Kansas City<br />5. Chicago<br /><br />The Indians are still the easy divisional favorite, to an extent that surprised me when I actually put the numbers to it. They are closer to the big three in the AL (in fact, right behind by my reckoning) than they are to the Twins. It’s easy to look at the negatives – a borderline embarrassing outfield, an unsettled bullpen with little attempt to add high upside depth, a clustering of the team’s excellence in starting pitching which is more prone to uncertainty. But it’s worth keeping in mind that Cleveland underplayed their peripherals last year (although less their PW% than their EW%) - they have some room to decline while still projecting to win 90 as they did last year. Names like Sano and Buxton both make the Twins offense look better than it actually figures to be while also giving it more upside than a typical team, but they look like a slightly above average offense and slightly below average defense. You can throw a blanket over the three teams at the bottom - the order I’ve picked them for 2019 is the reverse order of the optimism I’d hold for 2020 as a fan of those teams.<br /><br />AL WEST<br /><br />1. Houston<br />2. Los Angeles (wildcard)<br />3. Oakland<br />4. Seattle<br />5. Texas<br /><br />Houston is an outstanding team once again, a World Series contender with room for misfortune. The Angels are my tepid choice for second wildcard - the Rays are in a tough division, the Twins could feast on the Central underlings but look like about as .500 of a team on paper as you can get, while the A’s can expect some regression on both offense and the bullpen. The Angels have huge rotation question marks, but all of these teams are flawed. The Mariners and Rangers both strike me as teams that could easily outplay projections; alas, it would take a surfeit of that to get into the race.<br /><br />NL EAST<br /><br />1. Philadelphia<br />2. Washington (wildcard)<br />3. New York<br />4. Atlanta<br />5. Miami<br /><br />This should be interesting. It’s easy to overrate the Phillies given that they were in the race last year when they really shouldn’t have been as close. It would be easy to overrate the Braves, who arrived early. It would be easy to underrate the Nationals, losing their franchise icon while bringing in another ace and graduating another potential outfield star. It would be easy to underrate the Mets, who are generally a disaster but still have talent. The only thing that wouldn’t be easy to do is trick yourself into thinking the Marlins are going to win.<br /><br />NL CENTRAL<br /><br />1. Chicago<br />2. Milwaukee (wildcard)<br />3. St. Louis<br />4. Cincinnati<br />5. Pittsburgh<br /><br />I have this about dead even on paper, but I give a slight edge to the Cubs with a bounce back from Kris Bryant and a more settled (if aging) rotation. The Brewers are legit, and their rotation should benefit from some arms that were used as swingmen last year getting a shot at starting. But the bullpen will likely be worse and some offensive regression shouldn’t come as a surprise. The Cardinals and Reds are a bit further back on paper, but close enough that it wouldn’t be that surprising if they played themselves into the mix. As a semi-Reds fan I’m a little skeptical about the chances of the quick transitional rebuild actually paying off. The Pirates look easily like the best team that I’ve picked last; the start of 2018 is a good reminder that teams like this can find themselves in the race.<br /><br />NL WEST<br /><br />1. Los Angeles<br />2. Colorado<br />3. Arizona<br />4. San Diego<br />5. San Francisco<br /><br />The Dodgers run in the NL West is underappreciated due to their failure to win the World Series and people inclined to write it off because of their payroll. I like their divisional chances better in 2019 as only the Rockies are real challengers. I’d put Colorado in the second tier of NL contenders with Cincinnati, St. Louis, New York, and Atlanta. If you can figure out if Arizona is starting a rebuild or trying to do one of those on-the-fly retools, let me know. Maybe let Mike Hazen know too. The Padres are interesting in that the prospects that have shown up so far haven’t lived up to expectations yet, but there are more and LOOK MANNY MACHADO. The Giants with Machado or Harper would have been the opposite of the Padres, more or less, which is considerably less interesting.<br /><br />WORLD SERIES<br /><br />Houston over Los Angeles<br /><br />Or Houston or Boston. They’re basically interchangeable. <br /><br />AL MVP: CF Mike Trout, LAA<br />AL Cy Young: Trevor Bauer, CLE<br />AL Rookie of the Year: LF Eloy Jimenez, CHA<br />NL MVP: 3B Nolan Arenado, COL<br />NL Cy Young: Aaron Nola, PHI<br />NL Rookie of the Year: CF Victor Robles, WAS<br />phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-77697849969953912452019-02-09T16:34:00.000-05:002019-02-09T16:34:17.247-05:00Pitching Optional?What happens when you take a team that got into the NCAA tournament despite finishing in the middle of the pack in its conference and relying on a makeshift pitching staff and remove the few reliable pitchers while leaving much of the offense intact? Does this sound interesting to you, like an experiment cooked up in the lab of a mad sabermetrician (or more likely a resident of that state up north)? If so, you may be interested in the 2019 Buckeyes.<br /><br />In the ninth season of the seemingly never-ending Greg Beals regime, he once again has an entire unit with next to no returning experience. Sometimes this is unavoidable in college sports, but it happens to Beals with regularity as player development does not appear to be a strong suit of the program. Players typically either make an impact as true freshmen or are never heard from, while JUCO transfers are a roster staple to paper over the holes. The only difference with this year’s pitching situation is that holes are largely being plugged with freshmen rather than transfers.<br /><br />The three pitchers penciled in as the rotation have precious little experience, with two true freshmen and a junior with 24 appearances and 11 starts in his career. Lefty Seth Lonsway was a nineteenth-round pick of Cincinnati and will be joined by classmate Garrett Burhenn, with Jake Vance as the junior veteran. Vance was +3 RAA in 36 innings last year, which doesn’t sound like much until you consider the dearth of returning performers on the rest of the staff.<br /><br />Midweek starts and long relief could fall to sophomore lefty Griffan Smith, who was not effective as a freshman (-7 RAA in 32 innings). The other veteran relievers are junior Andrew Magno (sidelined much of last season with an injury, but Beals loves his lefty specialists so if healthy he will see the mound) and senior sidewarmer Thomas Waning, who was promising as a sophomore but coughed up 18 runs in 16 frames in 2018. A trio of freshmen righties are said to throw 90+ MPH (Bayden Root, TJ Brock, Will Pfenning) joined by other freshmen in Cole Niekamp and lefty Mitch Milheim. Joe Gahm is a junior transfer from Auburn via Chattahoochee Valley Community College and given his experience and BA ranking as a top 30 Big Ten draft prospect should find a role. Senior Brady Cherry will also apparently get a chance to pitch this season, something he has yet to do in his Buckeye career.<br /><br />The Buckeye offense is more settled, and unless the pitchers exceed reasonable expectations will have to carry the team in 2019. Sophmore Dillon Dingler moves in from center field (that’s nothing, as recent OSU catcher Jalen Washington moved to shortstop) to handle the catching duties and was raved about by the coaches last season so big things are expected despite a .244/.325/.369 line. He’ll be backed up by sophomore transfer Brent Todys from Andrew College, with senior Andrew Fishel, junior Sam McClurg and freshman Mitchell Smith rounding out the roster.<br /><br />First base will belong to junior Conner Pohl after he switched corners midway through 2018; he also played the keystone as a freshman so he’s been all over the infield. While his production was underwhelming for first base, at 3 RAA he was a contributor last season and looks like a player who should add power as he matures. Senior Kobie Foppe got off to a slow start last year, flipped from shortstop to second base, and became an ideal leadoff man (.335/.432/.385); even with some BABIP regression he should be solid. Third base will go to true freshman Zach Dezenzo, while junior shortstop Noah West needs to add something besides walks to his offensive game (.223/.353/.292). The main infield backups are freshman Nick Erwin at short, sophomore Scottie Seymour and freshmen Aaron Hughes and Marcus Ernst at the corners, and junior Matt Carpenter everywhere just like his MLB namesake (albeit without the offensive ability).<br /><br />I’ll describe the outfield backwards from right to left, since junior right fielder Dominic Canzone is the team’s best offensive player (.323/.396/.447 which was a step back from his freshman campaign) and will be penciled in as the #3 hitter. The other two spots are not as settled as one would hope given the imperative of productive offense for this team. A pair of seniors will battle for center: Malik Jones did nothing at the plate as a JUCO transfer last year besides draw walks (245/.383/.286 in 63 PA) while Ridge Winand has barely seen the field. In left, senior Nate Romans has served as a utility man previously, although he did contribute in 93 PA last year (.236/.360/.431). Senior Brady Cherry completes his bounce around the diamond which has included starting at third and second; in 2018 he hit just .226/.321/.365, a step back from 2017. While he could get time in left, it’s more likely he’ll DH since the plan is to use him out of the bullpen as well. Other outfield backups are freshman Nolan Clegg in the corners and Alec Taylor in center.<br /><br />OSU opens the season this weekend with an odd three-game series against Seton Hall in Pt. Charlotte, Florida. It is the start of a very lackluster non-conference schedule that doesn’t figure to help the Buckeyes’ cause come tournament time as the schedule did last year (although unfortunately as you can probably tell I tend to think the resume will be beyond help). There are no games against marquee names, although OSU will play MSU in a rare non-conference Big Ten matchup. The home schedule opens March 15 with a three-game series against Lipscomb, a one-off with Northern Kentucky, and a four-game series against Hawaii, whose players will probably wondering what they did to wind up in Columbus in mid-March when they could be home.<br /><br />Big Ten play opens March 29 at Rutgers, with the successive weekends home to Northwestern and the forces of darkness, at Maryland, home to Iowa, at Minnesota, home to PSU, and at Purdue. Midweek opponents are the typical fare of local nines, including Toledo, Cincinnati, Ohio University (away), Dayton, Xavier, Miami (away), Wright State, and Youngstown State (away). The Big Ten tournament will be played May 22-26 in Omaha.<br /><br />It’s hard to be particularly optimistic that another surprise trip to the NCAA tournament is in the cards. Even some of the best pitchers who have come through OSU have struggled as freshman so it’s hard to project the starting pitching to be good, and while there are productive returnees at multiple positions, only Canzone is a proven excellent hitter and a couple positions are occupied by players who must make serious improvement to be average. The non-conference schedule may be soft enough to keep the record respectable, but there are few opportunities to grab wins that will help come selection time. Aspiring to qualify for the Big Ten tournament seems a more realistic goal. Beals is the longest-tenured active coach at OSU in any of the four sports that I follow rabidly, which on multiple levels is concerning (although two of the three other program have coaches in place who have demonstrated their value at OSU, and the third did well in a three-game trial). Yet somehow Beals marches on, floating aimlessly in the middle of an improved Big Ten.<br /><br />Note: This preview is always a combination of my own knowledge and observation along with the <a href="https://ohiostatebuckeyes.com/2019-baseball-season-outlook/">official season outlook</a> released by the program, especially as pertains to position changes and newcomers about which I have next to no direct knowledge. That reliance was even greater this year due to the turnover on the mound.phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0tag:blogger.com,1999:blog-12133335.post-64645848492143471322019-02-04T08:03:00.000-05:002019-02-04T08:03:12.227-05:00Enby Distribution, pt. 9: Cigol at the Extremes--Pythagenpat ExponentIn the last installment, I explored using the Cigol dataset to estimate the Pythagorean exponent. Alternatively, we could sidestep the attempt to estimate the exponent and try to directly estimate the z parameter in the Pythagenpat equation x = RPG^z.<br /><br />The positives of this approach include being able to avoid the scalar multipliers that move the estimator away from a result of 1 at 1 RPG, and also maintains a form that has been found useful by sabermetricians in the last decade or so. The latter is also the biggest drawback to this approach--it assumes that the form x = RPG^z is correct, and foregoes the opportunity of finding a form that provides a better fit, particularly with extreme datapoints. It’s also fair to question my objectivity in this matter, given that a plausible case could be made that I have a vested interest in “re-proving” the usefulness of Pythagenpat. That’s not my intent, but I would be remiss in not raising the possibility of my own (unintentional) bias influencing this discussion.<br /><br />Given that we know the Pythagorean exponent x as calculated in the last post, it is quite simple to compute the corresponding z value:<br /><br />z = log(x)/log(RPG)<br /><br />For the full dataset I’ve used throughout these posts, a plot of z against RPG looks like this:<br /><br /><a href="https://3.bp.blogspot.com/-0joxK_2uIkE/XFDQijU2SoI/AAAAAAAACos/2Fzg5hJoEWEe9xZJJs3SHV7f9Ei4ouvYwCLcBGAs/s1600/cigol9a.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-0joxK_2uIkE/XFDQijU2SoI/AAAAAAAACos/2Fzg5hJoEWEe9xZJJs3SHV7f9Ei4ouvYwCLcBGAs/s400/cigol9a.JPG" width="400" height="267" data-original-width="1048" data-original-height="699" /></a><br /><br />A quick glance suggests that it may be difficult to fit a clean function to this plot, as there is no clear relationship between RPG and z. It appears that in the 15-20 RPG range, there are a number of R/RA pairs for which a higher z is necessary than for the pairs at 20-30 RPG. While I have no particular reason to believe that the z value should necessarily increase as RPG increases, I have strong reason to doubt that the dataset I’ve put together allows us to conclude otherwise. Based on the way the pairs were chosen, extreme quality differences are overrepresented in this range. For example, there are pairs in which a team scores 14 runs per game and allows only 3. The more extreme high RPG levels are only reached when both teams are extremely high scoring; the most extreme difference captured in my dataset at 25 RPG is 15 R/10 RA.<br /><br />The best fit to this graph comes from a quadratic regression equation, but the negative coefficient for RPG^2 (the equation is z = -.0002*RPG^2 + .0062*RPG + .2392) makes it unpalatable from a theoretical perspective. The apparent quadratic shape may well be an accident of the data points used as described in the preceding paragraph. Power and logarithmic functions fail to produce the upward slope from 5-10 RPG, as does a linear equation. The latter has a very low r^2 (just .022) but results in an aesthetically pleasing gently increasing exponent as RPG increases (equation of .2803 + .00025*RPG). The slope is so gentle as to result in no meaningful difference when applying the equation to actual major league teams, leaving it as useless as the r^2 suggests it would be (RMSE of 4.008 for 1961-2014, with same result if using the z value based on plugging in the average of RPG of 8.805 for that period).<br /><br />It’s tempting to assume that z is higher in cases in which there is a large difference in runs scored and runs allowed. This could potentially be represented in an equation by run differential or run ratio, and such a construct would not be without sabermetric precedent, as other win estimators have been proposed that explicitly consider the discrepancy between the two teams (explicitly as in beyond the obvious truth that as you score more runs than you allow, you will win more games). (See the discussion of Tango’s old win estimator in <a href="https://walksaber.blogspot.com/2018/05/enby-distribution-pt-7-cigol-at.html">part 7</a>).<br /><br />First, let’s take a quick peak at the z versus RPG plot we’d get for the limited dataset I’ve used throughout the series (W%s between .3 and .7 with R/G and RA/G between 3 and 7):<br /><br /><a href="https://4.bp.blogspot.com/-_-dNMBMbDqY/XFDRffss4II/AAAAAAAACo4/5Tp2YkOz7JQPC7748mC0_FrzQIPOdiNNgCLcBGAs/s1600/cigol9b.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://4.bp.blogspot.com/-_-dNMBMbDqY/XFDRffss4II/AAAAAAAACo4/5Tp2YkOz7JQPC7748mC0_FrzQIPOdiNNgCLcBGAs/s400/cigol9b.JPG" width="400" height="273" data-original-width="1038" data-original-height="709" /></a><br /><br />The relationship here is more in line with what we might have expected--z levels out as RPG increases, but there is no indication that z decreases with RPG (which assuming my reasoning above is correct, reflects the fact that the teams in this dataset are much more realistic and matched in quality than are the oddballs in the full dataset). Again, the best fit comes from a quadratic regression, but the negative coefficient for RPG^2 is disqualifying. A logarithmic equation fits fairly well (r^2 = .884), but again fails to capture the behavior at lower levels of RPG, not as damaging to the fit here because of the more limited data set. The logarithmic equation is z = .2484 + .0132*ln(RPG), but this produces a worse RMSE with the 1961-2014 teams (4.012) than simply using a fixed z.<br /><br />Returning to the full dataset, what happens if we run a regression that includes abs(R - RA) as a variable alongside RPG? We get this equation for z:<br /><br />z = .26846 + .00025*RPG + .00246*abs(R - RA)<br /><br />This is interesting as it is the same slope for RPG as seen in the equation that did not include abs(RD), but the intercept is much lower, which means that for average (R = RA) teams, the estimated z will be lower. This equation implies that differences between a team and its opponents really drive the behavior of z in the data.<br /><br />Applying this equation to the 1961-2014 data fails to improve RMSE, raising it to 4.018. So while this may be a nice idea and seem to fit the theoretical data better, it is not particularly useful in reality. I also tried a form with an RPG^2 coefficient as well (and for some reason liked it when initially sketching out this series), but the negative RPG^2 coefficient dooms the equation to theoretical failure (and with a 4.017 RMSE it does little better with empirical data):<br /><br />z = .24689 - .00011*RPG^2 + .00378*RPG + .00183*abs(R - RA)<br /><br />One last idea I tried was using (R - RA)^2 as a coefficient rather than abs(R - RA). Squaring run differential eliminates any issue with negative numbers, and perhaps it is extreme quality imbalances that really drive the behavior of z. Alas, a RMSE of 4.014 is only slightly better than the others:<br /><br />z = .27348 + .00025*RPG + .00020*(R - RA)^2<br /><br />If you are curious, using the 1961-2014 team data, the minimum RMSE for Pythagenpat is achieved when z = .2867 (4.0067). The z value that minimized RMSE for the full dataset is .2852. This may be noteworthy in its own right -- a dataset based on major league team seasons and one based on theoretical teams of wildly divergent quality and run environment coming to the same result may be an indication that extreme efforts to refine z may be a fool's errand.<br /> <br />You may be wondering why, after an entire series built upon my belief in the importance of equations that work well for theoretical data, I’ve switched in this installment to largely measuring accuracy based on empirical data. My reasoning is as follows: in order for a more complex Pythagenpat equation to be worthwhile, it has to have a material and non-harmful effect in situations in which Pythagenpat is typically used. If no such equation is available (which is admittedly a much higher hurdle to clear than me simply not being able to find a suitable equation in a week or so of messing around with regressions), then it is best to stick with the simple Pythagenpat form. If one a) is really concerned with accuracy in extreme circumstances and b) thinks that Cigol is a decent “gold standard” against which to attempt to develop a shortcut that works in those circumstances, then one should probably just use Cigol and be done with it. Without a meaningful “real world” difference, and as the functions needed become more and more complex, it makes less sense to use any sort of shortcut method rather than just using Cigol. <br /><br />Thus I will for the moment leave the Pythagenpat z function as a humble constant, and hold Cigol in reserve if I’m ever really curious to make my best guess at what the winning percentage would be for a team that scores 1.07 runs and allows 12.54 runs per game (probably something around .0051).<br /><br />The “full” dataset I’ve used in the last few posts is available <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vSmpX1FeQjcDjUWYYnZZBgaKOzRprmz8SD89L7qTNM3cn3GSh8BT95yc3VYIECMt3NFr0jRa2cb79U9/pub?output=xlsx">here</a>.phttp://www.blogger.com/profile/18057215403741682609noreply@blogger.com0