Tuesday, February 18, 2020

Tripod: Linear Weights

See the first paragraph of this post for an explanation of this series.

I certainly am no expert on Linear Weight formulas and their construction-leave that to people like Tango Tiger and Mickey Lichtman. However, I do have some knowledge on LW methods and thought I would explain some of the different methods of generating LW that are in use.

One thing to note before we start is that every RC method is LW. If you use the +1 technique, you can see the LWs that are used in a method like RC, BsR, or RPA. A good way to test non-linear RC formulas is to see how they stack up against LW methods in the context the LW are for. LW will vary widely based on the context. In normal ML contexts, though, the absolute out value is close to -.1, and the HR value stays close to 1.4. David Smyth provided the theory(or fact, I guess you could say), that as the OBA moves towards 1, the value of all events LWs converge towards 1.

Now what I understand of how LW are generated:

Empirical LW

Empirical LW have been published by Pete Palmer and Mickey Lichtman. They can be considered the true Linear Weight values. Empirical LW are based on finding the value of each event with the base/out table, and then averaging the value for all singles, etc. This is the LW for the single. Another way to look at it is that they calculate the value of an event in all 24 base/out situations, and then multiply that by the proportion of that event that occurs in that situation, and then sum those 24 values.

Palmer's weights were actually based on simulation, but as long as the simulation was well-designed it shouldn't be an issue. One way you could empirically derive different LW is to assume that the events occur randomly, i.e. assuming that the proportion of overall PAs in each base/out situation is the same as the proportion of the event that occur in this situation. For instance, if 2% of PA come with the bases loaded and 1 out, then you assume that 2% of doubles occur with the bases loaded and 1 out as well. This is an interesting idea for a method. If you see a double hit in a random situation, you could make the argument that this method would give you the best guess weight for this event. But that is only if you assume that the base/out situation does not effect the probability of a given event. Does it work out that way?

Tango Tiger told me that the only event that comes up with a significantly different LW value by the method I have just described is the walk. This is another way of saying that walks tend to occur in lower leverage situations then most events. But the difference is not that large.

Modeling

You can also use mathematical modeling to come up with LW. Tango Tiger and David Smyth have both published methods on FanHome.com that approach the problem from this direction. Both are approximations and are based on some assumptions that will vary slightly in different contexts. Tango, though, has apparently developed a new method that gives an accurate base/out table and LW based on mathematical modeling and does it quite well.
The original methods published by the two are very user-friendly and can be done quickly. Smyth also published a Quick and Dirty LW method that works well in normal scoring contexts and only uses the number of runs/game to estimate the value of events.

Skeletons

Another way to do this is to develop a skeleton that shows the relationships between the events, and then finds a multiplier to equate this to the actual runs scored. The advantage of this method is that you can focus on the long-term relationships between walks v. singles, doubles v. triples, etc, and then find a custom multiplier each season, by dividing runs by the result of the skeleton for the entity(league, team, etc.) you are interested in. Recently, I decided to take a skeleton approach of a LW method. Working with data for all teams, 1951-1998, I found that this skeleton worked well: TB+.5H+W-.3(AB-H), with a required multiplier of .324. Working SB and CS into the formula, I had: TB+.5H+W-.3(AB-H)+.7SB-CS, with an outward multiplier of .322. When I took a step back and looked at what I had done though, I realized I had reproduced Paul Johnson's Estimated Runs Produced method. If you look at Johnson's method:

(2*(TB+W)+H-.605*(AB-H))*.16

If you multiply my formula by 2, you get:

(2*(TB+W)+H-.6*(AB-H))*.162

As you can see, ERP is pretty much equal to my unnamed formula. Since it is so similar to ERP, I just will consider it to be ERP. You can then find the resulting LW by expanding the formula; for example, a double adds 2 total bases and 1 hit, so it has a value of (2*2+1)*.162=.81.

Working out the full expansion of my ERP equations, we have:

ERP = .49S+.81D+1.13T+1.46HR+.32W-.097(AB-H)
ERP = .48S+.81D+1.13T+1.45HR+.32W+.23SB-.32CS-.097(AB-H)

I have recently thrown together a couple of versions that encompass all of the official offensive stats:

ERP = (TB+.5H+W+HB-.5IW+.3SH+.7(SF+SB)-CS-.7DP-.3(AB-H))*.322
ERP = (TB+.5H+W+HB-.5IW+.3SH+.7(SF+SB)-CS-.7DP-.292(AB-H)-.031K)*.322

Or:

ERP = .483S+.805D+1.127T+1.449HR+.322(W+HB)-.161IW+.225(SB+SF-DP)+.097*SH-.322CS-.097(AB-H)
ERP = .483S+.805D+1.127T+1.449HR+.322(W+HB)-.161IW+.225(SB+SF-DP)+.097*SH-.322CS-.094(AB-H-K)-.104K

Here are a couple versions you can use for past eras of baseball. For the lively ball era, the basic skeleton of (TB+.5H+W-.3(AB-H)) works fine, just use a multiplier of .33 for the 1940s and .34 for the 1920s and 30s. For the dead ball era, you can use a skeleton of (TB+.5(H+SB)+W-.3(AB-H)) with a multiplier of .341 for the 1910s and .371 for 1901-1909. Past that, you're on your own. While breaking it down by decade is not exactly optimal, it is an easy way to group them. The formulas are reasonably accurate in the dead ball era, but not nearly as much as they are in the lively ball era.

Regression

Using the statistical method of multiple regression, you can find the most accurate linear weights possible for your dataset and inputs. However, when you base a method on regression, you often lose the theoretical accuracy of the method, since there is a relationship or correlation between various stats, like homers and strikeouts. Therefore, since teams that hit lots of homers usually strike out more than the average team, strikeouts may be evaluated as less negative then other outs by the formula, while they should have a slightly larger negative impact. Also, since there is no statistic available to measure baserunning skills, outside of SB, CS, and triples(for instance we dont know how many times a team gets 2 bases on a single), these statistics can have inflated value in a regression equation because of their relationship with speed. Another concern that some people have with regression equations is that they are based on teams, and they should not be applied to individuals. Anyway, if done properly, a regression equation can be a useful method for evaluating runs created. In their fine book, Curve Ball, Jim Albright and Jay Bennett published a regression equation for runs. They based it on runs/game, but I went ahead and calculated the long term absolute out value. With this modification, their formula is:

R = .52S+.66D+1.17T+1.49HR+.35W+.19SB-.11CS-.094(AB-H)

A discussion last summer on FanHome was very useful in providing some additional ideas about regression approaches(thanks to Alan Jordan especially). You can get very different coefficients for each event based on how you group them. For instance, I did a regression on all teams 1980-2003 using S, D, T, HR, W, SB, CS, and AB-H, and another regression using H, TB, W, SB, CS, and AB-H. Here are the results:

R = .52S+.74D+.95T+1.48HR+.33W+.24SB-.26CS-.104(AB-H)

The value for the triple is significantly lower then we would expect. But with the other dataset, we get:

R = .18H+.31TB+.34W+.22SB-.25CS-.103(AB-H)

which is equivalent to:

R = .49S+.80D+1.11T+1.42HR+.34W+.22SB-.25CS-.103(AB-H)

which are values more in line with what we would expect. So the way you group events(this can also be seen with things like taking HB and W together or separately. Or if there was a set relationship you wanted(like CS are twice as bad as SB are good), you could use a category like SB-2CS and regress against that) can make a large difference in the resulting formulas.

An example I posted on FanHome drives home the potential pitfalls in regression. I ran a few regression equations for individual 8 team leagues and found this one from the 1961 NL:

R = 0.669 S + 0.661 D - 1.28 T + 1.05 HR + 0.352 W - 0.0944 (AB-H)

Obviously an 8 team league is too small for a self-respecting statistician to use, but it serves the purpose here. A double is worth about the same as a single, and a triple is worth NEGATIVE runs. Why is this? Because the regression process does not know anything about baseball. It just looks at various correlations. In the 1961 NL, triples were correlated with runs at r=-.567. The Pirates led the league in triples but were 6th in runs. The Cubs were 2nd in T but 7th in runs. The Cards tied for 2nd in T but were 5th in runs. The Phillies were 4th in triples but last in runs. The Giants were last in the league in triples but led the league in runs. If you too knew nothing about baseball, you too could easily conclude that triples were a detriment to scoring runs.

While it is possible that people who hit triples were rarely driven in that year, it's fairly certain an empirical LW analysis from the PBP data would show a triple is worth somewhere around 1-1.15 runs as always. Even if such an effect did exist, there is likely far too much noise in the regression to use it to find such effects.

Trial and Error

This is not so much its own method as a combination of all of the others. Jim Furtado, in developing Extrapolated Runs, used Paul Johnson's ERP, regression, and some trial and error to find a method with the best accuracy. However, some of the weights look silly, like the fact that a double is only worth .22 more runs than a single. ERP gives .32, and Palmer's Batting Runs gives .31. So, in trying to find the highest accuracy, it seems as if the trial and error approach compromises theoretical accuracy, kind of as regression does.

Skeleton approaches, of course, use trial and error in many cases in developing the skeletons. The ERP formulas I publish here certainly used a healthy dose of trial and error.

The +1 Method/Partial Derivatives

Using a non-linear RC formula, you add one of each event and see what the difference in estimated runs would be. This will only give you accurate weights if you have a good method like BsR, but if you use a flawed method like RC, take the custom LWs with a grain of salt or three.

Using calculus, and taking the partial derivative of runs with respect to a given event, you can determine the precise LW values of each event according to a non-linear run estimator. See my BsR article for some examples of this technique.

Calculating the Out Value

You can calculate a custom out value for whatever entity you are looking at. There are three possible baselines: absolute runs, runs above average, and runs above replacement. The first step to find the out value for any of these is to find the sum of all the events in the formula other than AB-H. AB-H are called O for outs, and could include some other out events(like CS) that you want to have the value vary, but in my ERP formula it is just AB-H in the O component. Call this value X. Then, with actual runs being R, the necessary formulas are:

Absolute out value = (R-X)/O

Average out value = -X/O

For the replacement out value, there is another consideration. First you have to choose how you define replacement level, and calculate the number of runs your entity would score, given the same number of outs, but replacement level production. I set replacement level as 1 run below the entity's average, so I find the runs/out for a team 1 run/game below average, and multiply this by the entity's outs. This is Replacement Runs, or RR. Then you have:

Replacement out value = (R-RR-X)/O

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.