Thursday, September 29, 2011

Playoff Meanderings

I always like to put down some of my thoughts about the playoffs each year, but it’s a challenge to say anything even remotely close to being meaningful. Predicting the outcome of short series is folly (although I’ll engage in a little of this folly later), and you can read that anywhere. So I always try to come up with a different angle to illustrate why the playoffs are subject to such uncertainty.

I’ve certainly had some more interesting illustrations in the past; this one is pretty lame, but for some reason when it crossed my mind in August I thought it was a lot more interesting than I do now. What is the value of each playoff game or series in terms of a regular season game? In asking this, I’m not talking about the weight that should be applied to playoff performance for evaluating individual value, or any such thing...I’m just asking what the implied value is, given the assumption that the regular season standings carry over to the playoffs.

Of course, that’s not how it works--it's a cliché, but every team starts every series out at 0-0. However, I’ll assume that regular season standings carry over (resetting with the start of each additional round to keep things manageable) and the playoff games are weighted in a manner such that at the end of the series, the team that wins the series has a better overall record than its opponent.

There are at least two different ways to approach this increasingly silly scenario, which will be best illustrated by example--treating the series outcome as a binary, or considering the games individually. Suppose that the Alphas enter a five-game division series with a record of 92-70 while their opponents the Betas are 90-72.

First, from the series outcome perspective, if the Alphas win, the series was unnecessary since the Alphas already led in the standings. If the Betas win, however, the series must be given a weight of a number of games such that adding that many wins to the Betas and losses to the Alphas give the Betas a better record. Leaving things in terms of whole games, the answer in this case is three. Giving the Betas three additional wins leaves them at 93-72; three additional losses for the Alphas would make them 92-73. The series could have gone three, four, or five games, making the effective value of those games equal to either 1, .75, or .6 regular season games.

You can also consider this from the game perspective, that is actually looking at the outcome of each game in the series rather than treating the series as a binary win or loss. If the Betas win the above series 3-0, this is pretty straightforward given the two game margin--treating playoff games as equivalent to regular season games leaves the Alphas 92-73 and the Betas 93-72. Suppose the Betas had been 88-74 instead of 90-72, though. In order to bring the Betas ahead of the Alphas (on a whole wins basis), they need five, so each win (and thus each game has to be worth) 5/3 = 1.67 times a regular season game. Now the Alphas have 3*1.67 + 70 = 75 losses and the Betas have 3*1.67 + 88 = 93 wins, so that the Alphas record is 92-75 and the Betas 93-74.

You can see that if the final margin of the series is 3-2 in favor of the Betas, the weight on each playoff game would have to be roughly four times that of a regular season game since the Betas only pick up one win when the playoff series is considered. A 4x weight brings the Alphas and Betas together at 100-82.

This is all just a silly digression, but given the assumptions it is a simple way to think about how the implied value of a playoff game compares to that of a regular season game.

Getting to the 2011 playoffs, let me offer some quick thoughts. I’ll leave the detailed handicapping to those who are better suited for it and also like quixotic quests. The marginal value of more in-depth analysis is limited, but if that’s what you seek, you won’t find it here.

The probabilities that follow assume nothing about home field advantage or pitching matchups, or even true talent for that matter. They are simply based on my crude team rankings, fueled by 25% actual W%, 25% expected W% (from R/RA), 25% predicted W% (from RC/RCA), and 25% from .500.

That formula is also arbitrary. The results should be fairly reasonable, but I’m also eager to disown at the same time, as something of a commentary on the futileness of the exercise...and most especially the bloviating that is done without any logic at all. I’m sure that there are many scribes across the country furiously writing about how certain teams have no chance, never learning the lesson that the differences between major league teams simply aren’t that great, especially after eight of the best have been selected from a 162 game sample.



This method considers all of the playoff teams to be in the top ten in MLB; only Boston (#3) and the Angeles (#8) are on the outside looking in. The Yankees, Phillies, and Rangers are near co-favorites to win it all; NYA and TEX are ranked about evenly, while PHI benefits from the weaker NL field and has the highest odds of winning a first round series and the pennant. Overall, the AL has an estimated 57% chance of winning the World Series. The most likely matchup in the Series is NYA/PHI (11%); the least likely is DET/STL (4%). The rankings imply that the worst playoff team (ARI) would beat the best playoff team (NYA) 43% of the time, which over 162 games is seventy wins. Strictly equating true probability to actual 2011 record, consider the odds that the Padres could win a seven game series against the Indians, and there is roughly the same likelihood of the Diamondbacks winning a seven game series against the Yankees.

As far as my personal rooting interests go, New York and Tampa Bay are my top two choices, followed by Milwaukee and St. Louis. I would be happy to see any of those teams win, have no particularly strong feelings about Arizona or Detroit, and be mildly disappointed if it’s Philadelphia or Texas. But there are no White Sox in this group.

Tuesday, September 20, 2011

A Quick Look at Negro League W-L Records

I wrote this about a year ago and wasn’t sure if I’d ever post it. With the recent publication of some Negro League data at Seamheads, I figured I’d better post it now before it became completely dated. The data I used was compiled by Chris Cobb and posted on the Hall of Merit site, with John Holway's research as his source data.

I need to admit upfront that I know very little about the Negro Leagues. My knowledge level of the Negro Leagues peaked at about age eleven when I read Only the Ball Was White, and has only gone downhill since then. That is one of the reasons for this post--as a (very limited) education for me on the great pitchers of the Negro Leagues.

I am going to be applying the Netural Win-Loss record approach introduced by Rob Wood, which I have written about several times. It is a way to contextualize a pitcher's W-L record using only the win-loss record of the pitcher's team. This post applies it to several Negro League pitchers.

The basic idea behind Wood's approach is that an average team's deviation from .500 is due in equal parts to their offense and defense. The portion of a team's deviation from .500 that arises from the defense (with the exception of relievers in the pitcher's game and fielders) doesn't do anything to increase a pitcher's expected W% in reality, but if you compare his W% directly to that of his teammates', he will suffer for it.

The formula is simple and linear; instead of comparing a pitcher to his team's W% when he does not get a decision (Mate), the comparison is to the average of Mate and .500. The neutral W% is easy to figure:

NW% = W% - Mate/2 + .25

From NW%, one can figure Neutral Wins and Losses:

NW = NW%*(W + L), NL = W + L - NW

It is also very easy to combine NW% and the number of decisions into wins above some baseline. Wins Above Team is traditionally defined as wins above .500:

WAT = (NW% - .5)*(W + L)

I also use Wins Compared to Replacement, with the assumption that a replacement level starter will have a .380 W%:

WCR = (NW% - .38)*(W + L)

There are a number of weaknesses to the Neutral W-L approach, and there are a number of additional complications that arise when applying it to the Negro League data. This is an incomplete list of the methodological issues that are present even when looking at major league data:

* It does not isolate performance when the pitcher actually pitches; some will receive lousy run support despite pitching for good offensive teams.

* While the approach assumes that the team is balanced between offense and defense, this is not always the case. It is a decent assumption for a pitcher's entire career, but there are still going to be cases in which a pitcher is predominantly on teams skewed one way or the other. Those on offensive teams will benefit unfairly in the metric, while those who are on teams with otherwise strong starting pitching staffs will be hurt.

* All of the problems with the definition and concept behind pitcher wins and losses themselves are still present

With respect to the Negro League results included in this post, the data I have used was compiled by Chris Cobb and posted on the Hall of Merit site, with John Holway's research as his source data. Among the problems that arise from the data:

* The records themselves are incomplete (missing seasons, team records only published for half seasons, etc.) and sometimes contradictory (individual totals that don't add up to the team total, etc.) These kind of errors exist even in major league data from the period, so it's no surprise that they are present in the more chaotic, less-organized Negro League data.

To deal with the gaps in the specific data I used, if I couldn't find the team's record, I assumed that they were .500 when the pitcher in question's decisions were removed. If a pitcher split time between teams and there was no breakdown of his W-L record with the two teams provided, I used the average of the two team's record. For seasons in which Cobb did not include the team's record and I had to look it up from another source, I used the ESPN Baseball Encyclopedia. In that case, if the team's record was only available for a half-season, I assumed that the full season record was double the half-season record.

* I only used the results from domestic Negro League games. The world of the Negro Leagues encompassed a lot more than that; players went to the Caribbean to play, teams barnstormed extensively, played games against major league opponents, etc. Limiting the analysis to league games makes it workable, but it does omit a lot of relevant performances.

In this regard the Negro Leagues were similar to the early NA/NL days, in which the league schedule constituted only a small fraction of total games played, and independent teams often compared favorably to league opponents.

* I am way out of my area of knowledge, but even I feel comfortable asserting that the NeL pitching rotations looked more like the early majors then the contemporary majors. Pitchers got a higher percentage of their team's decisions, reducing the sample size from which Mate is drawn and weakening the assumption that the other pitchers are average. I have also read that teams would purposefully match their aces against one another to create gate attractions, whereas our normal assumption is that teams will try to match their pitchers up in whatever manner creates the highest number of expected wins.

* The league structure was less stable from year-to-year, which makes it harder to compare NeL pitchers from one time period to the other. For twentieth-century major league pitchers, we can be confident that, regardless of when they pitched, that they were facing the highest level of competition available (with the obvious exception of the players locked out of the majors due to their skin color). We also know that they pitched in seasons of roughly equal length, and so their career records represent a fair sample of their performance at different ages.

We don't have that confidence when dealing with the NeL data. For example, Satchel Paige gets no credit for 1935 here, but the adjacent seasons of 1934 and 1936 appear to be among his best. Then he gets no credit for 1937-39, as he was not pitching in official league games. You will see that Paige doesn't come out as impressively as might be expected in the career totals, but the gaps in league play might well be the major cause.

* I have listed WCR figures using a .380 replacement level, but in actuality I have no idea where the NeL replacement level should be set.

From all of the caveats, it may seem as if I am declaring the NW-L statistics to be useless. That is not my intention; I simply don't want to oversell them or fail to acknowledge their biases. Many of the issues with the NW-L records are issues that would arise with any statistical analysis of Negro League pitchers. Consider what a logistical nightmare it would be to try to look at runs allowed, needing innings, and league averages, and park factors.

As sabermetricians we all know the flaws of pitcher W-L records, but there are a few benefits. Among them is the ease in determining them, at least if complete games are the norm. All you need to know is who the starting pitcher was and which team won the game, and you've got it. No need for box scores or play-by-play. No need for park factors or league averages--the average in every league and every park for all of time is .500.

These useful properties are most useful when dealing with incomplete data, and we can refine them further by incorporating team record and producing NW-L. Are the results perfect? Absolutely not. Are they likely to give us a better indication of the quality of these pitchers than raw W-L record or uncontextualized ERAs? I say yes.

The pitchers for whom data was available were: Chet Brewer, Dave Brown, Ray Brown, Bill Byrd, Andy Cooper, Leon Day, Willie Foster, Leroy Matlock, Satchel Paige, Dick Redding, Bullet Joe Rogan, Hilton Smith, Smokey Joe Williams, and Nip Winters.

Since I am out of my area of knowledge when discussing the Negro League stars, I'm not going to make a lot of comments--I'll leave interpretation up to the reader. Here are the actual career W-L records for the pitchers, along with Mate. The list is sorted by career wins above .380:



Only one of the pitchers had a worse record than that of his teams (Chet Brewer). If one figures Wins Above Team by the traditional method, Brewer would rate as a below-average pitcher. It's far more likely, though, that a pitcher with a .591 W% regarded as an excellent pitcher was in fact an excellent pitcher. The fact that his teams played .624 baseball without him indicates that they probably had above-average pitching, which while good for the team did absolutely nothing to increase Brewer's expected W%. Brewer still takes a hit, of course, when neutralizing his record by the Wood approach, but is assumed to be an above-average performer.

Here are the career neutral W-L records for the pitchers, sorted by WCR:



Here is a link to the spreadsheet containing the complete yearly breakdowns for each pitcher. You can see exactly what I inputted and which seasons I didn't have team records for (you'll see blanks in the TW and TL columns):

https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AnPJbQnlHhRHdEl5TjRzUEVscjNQVy1naDY0ODVtZlE&output=html

Again, this is obviously a very incomplete examination of the careers of a limited number of Negro League stars, and I certainly would not advocate placing too much stock in the results.

Monday, September 12, 2011

Scoring Self-Indulgence, pt. 5: Baserunner Advances

Last time I covered my scoring codes to recognize a batter reaching base; this time I’ll discuss what I record once he gets there. I’ll start with advances made by the runner independent of the actions of a subsequent batter on his team--things like stolen bases and advancing on wild pitches. Most of the codes that follow are pretty straightforward. In each case, I’ll show the advance as a runner going from first to second, but the same concepts apply to advancing to third and scoring. In each case, I’ll assume that the batter reached first by being hit by a pitch.

For every advancement that occurs during the course of a plate appearance, I record both the lineup slot of the batter at the plate and the pitch on which the event occurs (or which pitches it is between if applicable). The pitch is indicated by the same letter used in the batter’s box, except in lower case--the first pitch of a PA is “a”, the second pitch is “b”, etc.

There are several exceptions. If the last pitch (which is never given a letter in the batter’s scorebox) is labeled “lp”. If an event happens before the first pitch of a plate appearance, I use “bfp”. Finally, if an event occurs between pitches, it is labeled “a!”, where ! is replaced by the pitch letter for the last pitch before the event. Suppose the event occurs between pitches two and three of the plate appearance; in this case, the pitch code for the event is “ab”, because the second pitch (b) was the last one thrown. “ab” can be read as “after b”.

The code for a stolen base is the obvious “SB”. If it occurred on the third pitch of a plate appearance taken by the #6 batter in the lineup, the scoring would be:



As you can see from the example, the pitch information is written above the advancement symbol, in smaller type.

Wild pitches and passed balls are separated by a distinction I’d wipe out of the rule book if given the chance, but I do record them differently: “WP” and “PB” are the obvious codes. In the examples, the wild pitch occurs on the last pitch to the #6 batter, and the passed ball occurs on the first pitch to the #2 hitter:





The code for a balk is “BK”; this one comes between the third and fourth pitches to the cleanup hitter:



I don’t like the scoring distinction between a stolen base and defensive indifference, but I do make note of it on my scoresheets because SB is such a common category, and it’s easier to keep track of the ones that are really scored as steals and add in fielder’s indifference if one chooses than it is to try to divine after the fact what the scoring was. I refer to it as Fielder’s Indifference (FI), because it is a subset of fielder’s choice by definition, and the symbol seems more consistent. This one occurs on the sixth pitch to the #5 hitter:



A runner could advance on an error between pitches, which almost always would be a throwing error. If the pitcher throws the ball away on a pickoff attempt before the first pitch to the #2 hitter, the scoring looks like this:



Sometimes, the extra bases are gained before the batter-runner becomes a runner; that is, on the same play on which he reaches base. Suppose a batter dribbles a hit to the pitcher, but in his haste to make the play, the pitcher hurls the ball down the right field line, allowing the batter to move up to second. The scoring looks like this:



If there is no additional information included with the notation, it is assumed that the advance occurred on the same play as the on-base event. The other common way a batter-runner moves up is when he is able to advance on a throw to another base made in an attempt to retire another runner. In this example, the batter singles to right, then advances to second on a throw home. The code “ATx” means advanced on throw, with x standing in for the base to which the throw is made (2 for second, 3 for third, and H for home):



I have yet to touch on the means by which most bases are gained: advances on plays initiated by subsequent batters. I mark these by writing and circling the batting order position of the batter responsible in the quadrant of the runner’s scorebox corresponding to the base he wound up at. Suppose the runner from first advances to third on a play initiated by the #7 hitter. I would score it:



If the runner scores, then I use a box instead of a circle, so that it’s easy to distinguish how many runs a team has scored. In this case, the runner who moved to third a player initiated by the #7 hitter ends up scoring on a play initiated by the #9 hitter:



A runner can also score due to an event not initiated by another batter. The most common is scoring on a wild pitch. In this example, the runner from third scores on a wild pitch, with the wild pitch coming on the second pitch to the #9 hitter:



As you can see, I allow the box that indicates a run scored to vary in shape and size as appropriate to allow the necessary space for recording the event.

Sometimes, the event that advances the baserunner occurs while the ball is in play, but referring to the relevant batter’s scorebox will not note how the advancement occurred. Suppose that there is a runner on first, and the batter singles to right, advancing the runner to second. Then the right fielder boots the ball, allowing both to move up one base (the batter-runner to second and the runner to third). In this case, I would simply record the appropriate batter number circled in the runner’s third base quadrant. The batter’s scorebox will include the error by the right fielder, imply that it occurred during his plate appearances, and thus imply that the single plus the error enabled the runner to advance from first to third.

However, there are also cases in which the runner advances but the batter stays put. Suppose that the same play occurs as described above, except the batter-runner (who happens to be the #4 hitter) stops at first base. Now I would score the runner’s advancement as:



In this case, the use of a small circled 4 above the error indicates that the error occurred during the PA of the cleanup hitter.

Tuesday, August 30, 2011

A Completely Unnecessary Pitching Metric

There are a number of methods available to evaluate pitcher’s starts on a game-by-game basis rather than the more traditional full season method. There are Game Scores, Support Neutral records, Win Values, and a number of other approaches. There really is no need to add another approach to the mix, and I’m not really going to here--I'm simply going to take a conventional approach for evaluating a full season pitching line and apply it to individual starts.

Of course, I’m not going to claim that this approach is better than the others, because it’s not. It is relatively easy for me to implement, though, and I thought it would be nice to be able to offer a category on my year end stat report for starting pitchers that would consider distribution of performance rather than just aggregate performance as is the case for the rest of the metrics.

The idea is basically to estimate the winning percentage that a team should have over the long haul given the runs allowed and innings pitched of the starting pitcher. I am not using any sort of component RA estimate, and will not bother to explain the implications of this, which I’ll assume you’re well aware of (and thus can also decide for yourself whether you still have any interest in the results). To do this, I use Pythagenpat and assume that the performance of everyone other than the starting pitcher is league average. That is, the offense scores an average number of runs, and the bullpen allows an average number of runs. For the latter, I’m not going to account for the difference between the RA allowed by relievers and the overall league average.

That is not at all an inevitable choice, and if I was trying to construct a perfect metric I wouldn’t do it. But this is so obviously not a perfect metric that the extra effort would be of questionable value. Making that adjustment would also highlight the fact that this approach does not attempt to account for the effect of the number of innings the starter is able to log on the subsequent performance of the bullpen, despite research that suggests there is such an effect. Of course, using the league average doesn’t do anything to address that issue, but it also keeps things simple.

An additional benefit of not making any adjustment for the lower RA of the bullpen is that it allows this metric to be more easily comparable to other metrics that compare the performance of starting pitchers directly to the overall league average--which is a sizeable number. The overall expected winning percentage for the team of a league average starting pitcher at the end of this road will be sub-.500, which while obviously false does in fact match the results of many full season-type metrics.

One thing that cannot be ignored is park effects; the question is how to apply them. One option is to only apply them to the elements of the team other than the pitcher--the bullpen and the offense. I’ll call that option A; option B is to apply the park adjustment only to the pitcher himself.

Option A is a little harder to implement, since there are two adjustments that need to be made. On the other hand, it has some appeal because it allows us to keep the actual run environment of the game rather than recasting it in an imaginary neutral park. I’ve decided to go with Option B because simplicity is a guiding principle here, and because it is more consistent with the way I apply park adjustments to full season metrics. Again, it’s far from an inevitable choice. I’m also assuming that all games are nine innings.

With the thought process out of the way, this isn’t a particularly hard metric to demonstrate. I’ll start simple, with a pitcher throwing a complete game in a neutral park in which he allows zero runs. His expected winning percentage for the game is 1.000.

Seriously, let’s consider a pitcher in a neutral park working seven innings and allowing two runs. I’ll assume it’s an AL pitcher, so we need to know that the 2010 AL average R/G was 4.45 (this is the constant N later). The pitcher’s team can thus be expected to allow 2 + (9 - 7)*4.45/9 = 2.99 runs and score 4.45 runs. This is a 2.99 + 4.45 = 7.44 RPG environment, which has a Pythagenpat exponent of 7.44^.29 = 1.79, and thus the pitcher’s team has an expected W% of 4.45^1.79/(4.45^1.79 + 2.99^1.79) = .671.

We could go through and count up the wins (.671) and the losses (1 - .671 = .329), but I’d rather keep it in rate terms, so the final result will just be the average expected winning percentage across a pitcher’s starts.

To generalize the formula, let N be the league average R/G with R and IP as the runs allowed and innings pitched for the starting pitcher in a particular. Let dPF be the Park Factor without any adjustment so that it can be applied to full season statistics combined for home and road games. For example, the park factors I publish (which are the ones I’ll use here naturally) adjust for this. A 1.03 PF does not mean that the park inflates scoring by 3%--it means that the park inflates scoring by 6%, and is diluted by averaging with 1.00 (neutral park) so that it can be applied to full seasons statistics which are, at least in theory, comprised of one-half home games and one-half road games.

Then:
A (team RA for game) = R/dPF + (9 - IP)*N/9
X (Pythagenpat exponent) = (A + N)^.29
gW% = N^X/(N^X + A^X)

There’s really not much to it when you write it in math rather than English.

I’m very tempted to cap it off by unscrambling it from a W% back into an estimated run average, but I’d rather not deal with the implications of aggregating multiple Pythagorean exponents. One of the advantages of a game-by-game approach is that you’re able to better match performance with the run environment in which it actually occurred, and thus avoid some of the distortions that are inevitable when performance is aggregated across different run environments.

I intend to implement this fully for 2011 starting pitchers (although I might change my mind on that depending on how I feel about the effort/usefulness tradeoff in October), but for now I ran the top five AL starting pitchers from 2010 (IMHO) through the process. For comparison, I’ve included a column called sW% which is based on a traditional use of a pitcher’s full season line to estimate the theoretical W% of his team (albeit without making any adjustment for innings/start):

X = (RA/PF + N)^.29
sW% = N^X/(N^X + (RA/PF)^X)



If you use R and IP as the criteria, Felix Hernandez turned in the best performance of any AL starter, whether you aggregate or consider each game separately. Sabathia and Weaver come out about the same either way, but Lee and Price move in opposite directions when you look at the game level. This implies that Lee’s distribution of runs allowed and innings was such that it would figure to produce more wins than the averages would suggest, with Price the opposite.

This is more of a freak show stat than anything else, but it does provide a relatively simple way to compare starters at the game level on their bottom line results, and if a Cy Young race is particularly close, you may want to consider it. Or you may not; I don’t have a lot of conviction about this, and there are more rigorous approaches available, but there you go.

Tuesday, August 16, 2011

Ramblings on the Percentage of Runs Scored via Home Run

During the early part of the season, the perceived high dependence of the Yankees’ offense on home runs was often fodder for discussion in the mainstream baseball media. The implication was that while the Yankees were scoring a lot of runs, the fact that many of the runs were being scored on home runs was a sign that the performance was either unsustainable or would not be duplicable against quality pitching. The statistic most commonly cited in these discussions was the percentage of runs that scored on home runs. This post is not intended to comment on the sustainability or quality of pitching issues, but rather to offer a quick critique of the “percentage of runs scored on home runs” figure.

If one broadly divides approaches to credit runs to various individuals or events into two categories, there are those that only consider the final outcome (whether the run scored or not, and who did the scoring or the driving in) and those that attempt to assign credit at all steps in the process, even incremental steps that don’t directly push a run across the plate. The former class includes runs scored and RBI, of course, while the second class includes methods like linear weights and Base Runs.

The percentage of runs a team scores on homers obviously falls into the first class, and in fact when you think about it you will realize that the statistic is founded on a RBI perspective. The way a run gets tossed into the “resulting from a home run” bucket is to score on a home run--that is, to be driven in by a home run. Of course, one could also look at the question from the runs scored perspective--what percentage of the runners that score reached base on a home run? This is also very easy to compute, as it is simply the ratio between home runs and runs scored, data that is readily available for any team (whereas the number of runs actually scored on homers is harder to come by).

The RBI-based approach is subject to possible distortions in a manner somewhat similar to the issues with earned runs. If a home run is involved at all, the entire run is chalked up to the homer. Often, the home run is the key event enabling a run, but outside of the batter that actually hits the home run, it is never the only contributing event. In some cases, it is even relatively insignificant. If a home run scores a runner from third with no one out, it really didn’t have a large marginal value with respect to scoring the runner from third--the probability of that runner was scoring was already very high, and any number of other events would have allowed him to score. On the flip side, when a runner on first base with two outs is driven home by a home run, the home run is much more vital to scoring the runner.

The discussion in the last paragraph is making the case from transitioning from the outcome perspective to the run expectancy perspective of linear weights. Using play-by-play data, one can calculate the actual linear weight value of the home runs hit by a team. Such an approach will still be subject to sequencing fluctuations and arguably may not be as predictive as a more context-neutral approach.

One obvious context-neutral approach is to use standard linear weight values applied uniformly to all events to estimate the number of runs contributed by home runs. This figure can then be compared to the total number of runs scored. Using fixed linear weight values, though, this approach ends up boiling down to the ratio of home runs to runs scored, times a constant. For example, if the linear weight value of a home run is 1.4 runs, the result of that calculation will just be 1.4 times the simple home run to run ratio.

The next refinement is to not use actual runs scored at all; this post is going to be way too dry as is, so I won’t even bother trying to explain why mixing actual runs scored with estimated run contributions is a bad idea--it should be relatively obvious. Instead, you can compare the estimated run value of the team’s home runs to the run value of all of its offensive events.

There is a complicating factor in using linear weights (or intrinsic weights derived from a dynamic run estimator as I will in a moment) in this manner--the negative run value of the out. Simply taking Number of Event * Coefficient of Event for every event and dividing by the estimate of runs scored will result in percentages that sum to more than 100%, until outs are subtracted (and outs will have a negative percentage). This means that you can’t use the value literally--if the ratio of 1.4*HR/estimated runs scored is 25%, it doesn’t mean that 25% of the runs were scored because of home runs. Alternatively, one could look only at positive events, but then the denominator is no longer runs at all. As long as the number is viewed as a ratio and not a true percentage contribution, the result can still be useful in measuring the contribution of the home run to the offense.

Using a dynamic run estimator like Base Runs has the advantage of attempting to take into account the interaction between the offensive events rather than just assuming a fixed value. However, in the case of the home run, the additional value of considering dynamism is less than it might be for some other events because the value of a home run stays relatively fixed. The intrinsic value of a home run in BsR is:

((B + C)*A*b - A*B*b)/(B + C)^2 + 1

Where A, B, and C are the total A, B, and C factors for the team, and b and c are the respective B and C coefficients for the home run.

Take this BsR equation:

A = H + W - HR
B = .82S + 2.24D + 3.67T + 2.04HR + .1W
C = AB - H
D = HR

The formula for the intrinsic weight of the HR is:

((B + C)*A*2.04 - A*B*2.04)/(B + C)^2 + 1

I’ve also figured the intrinsic weights for the other events so that I can also show you the percentage of the positive intrinsic linear weight total contributed by home runs (“POS” in the chart below).

With this, we can look at the four different approaches I’ve discussed for 2010. In the chart below, “hr” is the intrinsic LW of the home run, “RonHR” is the number of runs that actually scored on home runs, “%onHR” is the RBI-perspective figure that gets a lot of media play (RonHR/R), HR/R is the run-scored perspective figure (HR/R), BsR% is (hr*HR/BsR), and Pos% is hr*HR divided by the sum of the other products of positive event counts and their respective intrinsic weights.



I’m not going to add much comment on these figures. This list is sorted by BsR%, which I think is the best measure of how large of a share of the offense the home run represented. Toronto was in its own world, of course, with respect to home runs hit and the share of offense contributed by the homer no matter how one estimates it. Also note the fact that the estimated linear weight value of every major league team falls in the [1.401, 1.444] range except for the Jays, 3.7 standard deviations below the mean at 1.355.

Wednesday, August 10, 2011

Sample Simple Limited Input BsR ERA Estimator

In my last post on ERA estimators, I described how my philosophy towards constructing those metrics is predicated on starting with a solid model to estimate runs. By using a solid foundation, you can be confident that, at the very least, your metric will adhere to the fundamental constraints of the run scoring process. The designer retains freedom to experiment and estimate when it comes to selecting the inputs into the model (i.e., what gets filled in for hits, walks, home runs, etc.)

In the article I sort of asserted that this could be done, and while I granted that it might be a more difficult process, I didn’t demonstrate how it could be done. This post will offer an (admittedly simple) estimator using BsR with limited inputs and a lot of estimation. The point is not to develop a metric that anyone will actually use.

I’m going to define plate appearances as AB + W, which can be approximated by IP*2.84 + H + W (it can also of course be calculated from the horribly named BFP column), but I’ll just refer to it as PA in the equations. The BsR equation I’ll be using as a basis is:

A = H + W - HR
B = (2TB - H - 4HR + .05W)*.78 = 1.56TB - .78H - 3.12HR + .039W
C = AB - H
D = HR

We only have direct knowledge of walks. Everything else will have to be filled in using estimation, for which I’ll use the 2010 major league totals. I’m not going to attempt to state any interrelationships between strikeouts, walks, and the events to be estimated--everything will simply be based on a scalar times (PA - W - K), a quantity which I’ll call N (the estimate of N based on IP is IP*2.84 + H - K).

In 2010, the ratio of hits to N was .369; the ratio of homers to N was .04; the ratio of total bases to N was .578; and the ratio of (AB - H - K) to N was .768. Thus:

A =.369N + W - .04N = W + .329N
B = 1.56(.578N) - .78(.369N) - 3.12(.04N) + .039W = .039W + .489N
C = K + .768N
D = .04N
BsR = (W + .329N)(.039W + .489N)/(.039W + .489N + K + .768N) + .04N
= (W + .329N)(.039W + .489N)/(.039W + K + 1.257N) + .04N

To convert to RA, multiply by 9 and divide by (C/2.84), which is a rough estimate of total outs (ideally, you would separate strikeouts from outs in play for this estimate). This is equivalent to multiplying by 25.56 and dividing by C:

Estimated RA = ((W + .329N)(.039W + .489N)/(.039N +K + 1.257N) + .04N)*25.56/(K + .768N)

The range for the estimated RA when applied is not as wide as the range for actual RA, which shouldn’t be a surprise since I intentionally took everything except strikeouts and walks out of the equation and didn’t do anything to amplify their value. For example, the top five starters in the AL in 2010 according to this formula were:



Again, the point is not to offer this as an equation that should be used. It’s simply an illustration of constructing a Base Runs equation while restricting the list of available inputs, yet still estimating each component separately. This same idea can be expanded upon (adding home runs to walk and strikeouts, for instance, would result in a standard DIPS-style estimator, and there are many other possible combinations of inputs), though, to produce a metric that is grounded in the foundation of the Base Runs model. As I mentioned in the previous post, one need not tie themselves to the “dumb” kind of estimation on display here (i.e. assuming that the allowed variables have no ability to improve the prediction of the missing variables).

Sunday, August 07, 2011

JB Shuck, #52

On Friday night, JB Shuck made his major league debut with the Astros. He entered the game as part of a double switch in the top of the fifth, singling in the bottom of the inning and later grounding out to second. On Saturday night, he again entered in a double switch, drawing a walk in the seventh and singling off (literally) John Axford in the ninth. Unfortunately, he also made the last out at third base, trying to advance after Axford’s throw to first ended up in right field.

Shuck is not a top prospect by any respect; he’s a 24 year-old left-handed hitting outfielder who would be stretched in center and doesn’t have any power (career .085 minor league ISO). Was he not in an organization with little talent to begin with that just traded 2/3 of their outfield, he wouldn’t be getting a chance at the majors, and he probably won’t have much of a career. Of course, I would love to be wrong about all that.

Shuck was notable during his OSU career (2006-08) as a two-way player, a type that was always quite rare on Bob Todd coached teams. Shuck was a very good left-handed pitcher for OSU (as you can probably guess, he didn’t have great stuff, but for a Big Ten left-hander he had plenty) and played left or center as well, often batting third.

However Shuck’s career ends up, he has helped to make this a banner year for Buckeyes in the major leagues. Two of his former teammates, Eric Fryer and Matt Angle, also broke in this year, making it the first season with three OSU debuts since 1969 (Steve Arlin, Chuck Brinkman and Fred Scherman). Three Bucks also debuted in 1961 (Galen Cisco, Johnny Edwards and Ron Nischwitz) and 1927 (Arlie Tarbert, Marty Karow and Russ Miller). To put the size of this crop into perspective, during 2000-2009 only three Buckeyes made the majors in total (Nick Swisher, Josh Newman and Scott Lewis).

Along with Nick Swisher (2004 debut) and Cory Luebke (2010), five OSU products have appeared in the majors in 2011. The last time that many Buckeyes played in the majors was 1974; we have a lot of work left to do to reach the highwater mark of nine in 1969. While the three newbies are not really prospects (Fryer probably has the best prospectus on the basis of being a catcher), Swisher is an established quality contributor and Luebke is on his way to establishing himself as such, and the streak of at least one major league Buckeye (which dates to 1990, re-established after a three year drought from 1987-89) appears to be safe for some time to come.

Saturday, July 23, 2011

Saying Nothing About ERA Estimators

If you follow the sabermetric blog/Twittersphere at all (and if you don’t, why on earth are you wasting time here?), I’m sure you can figure out what prompted this post. However, I’m not going to name the metric that has generated discussion about this general topic because this post is not meant to be targeted at anyone, or to be a debunking of a particular metric, or anything other than me expressing my opinion about the construction of ERA estimators. Others have different philosophies and they are welcome to them. This is mine:

First, I find it helpful to classify the inputs and construction of each metric. This is not necessary, but the reason I find it helpful is that the ERA estimators out there are relatively diverse. Compared to the sabermetric metrics that exist for evaluating offense, they are extremely diverse. Almost all offensive rates are built around an estimate of runs created and divided by either outs or plate appearances. Almost all of them start with the traditional results-based batting line.

ERA estimators, on the other hand, are all over the place. Some follow the lead of their batting cousins and use a run estimator as their base, but some are regression-based. Some use actual results, while some use batted ball data. Some use batted ball data but decide to combine the four standard categories (flyballs, line drives, groundballs, and popups) in some manner. Some assume that the pitcher has no control over what happens once the ball is put into play. Some have implicit or explicit regression built-in with regard to balls in play. Some limit themselves only to what happens when the ball is not put into play. Some estimate ERA, and some estimate total runs allowed.

You probably don’t need personally need more than one overall batting metric. That doesn’t mean there shouldn’t be diversity across the sabermetric community--there's nothing wrong with having a number of intelligently designed choices, but as an individual you don’t need both wOBA and True Average--one will suffice. That is not necessarily the case with ERA estimators--sometimes you might be interested in one that is results-based, sometimes you might be interested in DIPS, sometimes you might want to venture out into the uncertain world of batted ball metrics…even when using a common construction (BsR or LW for example), there is arguably a place for two or three or more different variations based on the inputs.

I believe that the most logical place to start with an ERA estimator is estimating runs. That is intentionally written to sound a little silly but it is not a philosophy shared by all developers of these metrics. Some put formulas down on the page that they would never consider using to try to estimate how many runs a team would score. I say that the place to start is with a logical run estimator. Given the team-level nature of the task, that suggests to me the use of Base Runs or another dynamic estimator, but I’m not going to argue too strenuously if you start with linear weights.

This is a path which is not necessarily going to minimize your RMSE, or give the best correlation with future ERA. With respect to the latter, if your goal is to provide the best possible estimate of future ERA, your metric is not attempting to measure how well the pitcher actually performed, it’s trying to forecast how well he will perform in the future. Certain constructions will by their nature be less accurate at estimating ERA in the same period. Every step you take down the path from outcome inputs (hits, walks, home runs, etc.) to component-based inputs (ignoring the actual outcomes of balls in play, or looking at batted ball types, etc.) will cost you accuracy when the standard is same period ERA. However, one can still use accuracy at predicting same period ERA for methods of similar classes.

Beginning the construction of the metric with a model of run scoring avoids some of the problems inherent in using actual pitcher runs allowed. I’m going to gloss over the fact that the number of runs a pitcher allows, regardless of whether it’s from a base period or a future period, is always dependent upon his defense and other factors outside of his control. There are still other concerns that do not apply when looking at true team-level data. The way runs are charged to individual pitchers is biased towards pitchers who inherit baserunners at the expense of those who bequeath baserunners. In practice, that means favoring relievers at the expense of starters, although depending on the performance of the relievers who inherit baserunners, individual bequeathers might actually benefit.

Thus, whenever an approach detects a reliever ERA advantage is detected, some of it is attributable to the way runs are assigned and not to the actual effectiveness of the pitcher. It might even be possible to increase the accuracy of a metric by giving a bonus to relievers. It is entirely unclear to me what benefit this provides other than lowering RMSE. It doesn’t tell you anything about how well the pitchers performed, and it certainly doesn’t help you measure “true talent” any better--if that is the objective, an adjustment in the opposite direction could be warranted.

Another advantage of modeling runs is that you can easily move between RA and ERA. Most sabermetricians prefer RA because of the biases present in ERA and the distortions created by reconstructing imaginary innings sans errors. It’s easy to rescale from RA to ERA by multiplying by a constant like .91. While it’s also easy to divide by .91 to go the other way, if the metric has been tailored to match ERA, you’ve baked the biases of ERA into your metric. This could potentially be most problematic for a regression-based estimator that uses batted ball data. Even if this bias is small, it’s still completely unnecessary.

Finally, the issue of dynamism is one that is often misunderstood with respect to ERA estimators. SIERA trumpets its “interactive” nature in its name (which does distinguish it from FIP and other linear methods) but any metric based on the foundation of a dynamic run estimator is by nature interactive. Instead of the interactivity being limited to target categories, though, every event interacts with every other event. Singles interact with triples, walks interact with home runs, doubles interact with triples, home runs interact with outs, outs interact with themselves...you get the idea (and I think that’s enough talk of events interacting with themselves).

Building your metric around a run estimator does not necessarily restrict you to simply plugging in the numbers in the appropriate place. Suppose you wanted to construct a metric based on batted ball types, strikeouts, and walks. One way to go about it would be to simply go through and estimate singles, doubles, triples, homers, and outs in play based on the percentage of each batted ball type that wind up as each. So, you would end up with equations that might look something like this:

Singles = .057FB + .217GB + .516LD + .017PU

However, if you believe that you have gleaned some other insights into the relationship between events that could improve your metric (such as strikeout pitchers having lower HR/FB rates) , you could still build that in to your formula for estimated home runs, and plug those into the run estimator. It’s more difficult than running a regression, and a more delicate balancing act (at least in terms of developing the formula), but it allows you to stay grounded in a model that estimates runs by taking a first step of, well, estimating runs.

Again, I want to make it clear that I was attempting to explain where I’m coming from when I examine metrics of this type. There is room for legitimate philosophical differences and I’m not trying to state that sabermetricians who deviate from the way I’d do it are engaging in poor practice. It would certainly be possible to develop a lousy metric based on a run estimator and following some of the other suggestions.