Monday, August 27, 2007

Early NL Series: Batter Evaluation

We have a Runs Created formula, the backbone of just about any offensive evaluation system. But what about the other details, like park adjustments, performance by position, baseline, and conversion to wins?

I’ll deal with these one-by-one. Park adjustments in this time would be a real pain to figure. Parks change rapidly, they may have the same name but be a new structure from year-to-year, teams are jumping in and out of the league (which radically alters the “road” context for any given team from year-to-year), sample size is reduced because of shorter seasons, etc. I suppose that I could look at all of this and say it’s not worth doing the work, and just use Total Baseball’s PFs. But that is not the option I have chosen, because Total Baseball’s park factors are subject to the same problems that the ones I would calculate would be.

So instead I have decided to simply use each team’s actual RPG as the standard to which a hitter is compared. This is problematic in some sense, if your goal is building a performance or ability metric--a player on one team may be valued more highly because his team has a good pitching staff, while another may be hurt if the opposite holds. On the other hand, from a value perspective, as Bill James argued way back in 1985, the result of other games don’t define the player in question’s value--his contributions to wins and losses comes only in the context of the games that his team actually plays.

Additionally, there is a huge gorilla in the room in the whole discussion of early major league baseball that I am ignoring, and that is fielding. Fielding is a pain to evaluate in any time, and it is not my specialty in any case, so I am not going to even begin to approach an evaluation of the second baseman in the 1879 NL. So of course my ratings will only cover offense and while you should take offense-only ratings with some skepticism in today’s game, it is even more so in a game where the ball is put in play about 90% of the time and there are 5 errors/game, etc. But by using the team’s actual RPG figure, we do capture some secondary effects of, if nothing else, the whole team’s fielding skill (less runs allowed means a lower baseline RG for players, as well as less runs per win). In the last paragraph I slipped in the argument about a “good pitching staff” potentially inflating a batter’s value. But in this game it is even harder to separate pitching from fielding than it is today, and that “good pitching staff” is more likely “good pitching and fielding”, to which our player has contributed at least a tiny bit. This is not in an attempt to say that the RPG approach would be better than using stable PFs if they existed, but just a small point in its favor that would be less obvious in 2006.

I am still going to apply a positional adjustment to each player’s expected runs created, based on his primary position. Looking at all hitters 1876-1883, the league had a RG of 5.36. Here are the RG and Adjusted RG(just RG divided by the prevailing RPG, in this case 5.36) for each position:

This actually matches the modern defensive spectrum if you throw out the quirky rightfield happenings. One potential explanatory factor I stumbled upon after writing this came from an article by Jim Foglio entitled “Old Hoss” (about Radbourn) in the SABR National Pastime #25:

“When Radbourn did not start on the mound, he was inserted into right field, like many of his contemporary hurlers. In 19th-century ball, non-injury substitutions were prohibited. The potential relief, or ‘exchange’ pitcher, was almost always placed in right, hence the bad knock that right fielders have received all the way down to the little league game…[the practice] surely had its origins in [pitchers’] arm strength, given the distance of the throws when compared to center and left.”

However, I’m not sure this explains the right field problem satisfactorily, because I did not look at offensive production when actually playing a given position; it was the composite performance of all players who were primarily of a given position. Change pitchers out in RF, if they got in more games as pitchers than as right fielders, would still be considered pitchers for the purpose of the classifications used to generate the data above. Of the men classified by Nemec as the primary RF, in the 1876-1883 NL there are 60 players. Of those 60, 23 (38%) pitched at some point during that season, but only 8 (13%) pitched more than 30 IP.

Another possibility is that if left-handed hitters are more scarce, there are fewer balls hit to right field. This might actually be the most satisfying explanation; I did not actually check the breakdown on left-handed and right-handed batters, but it figures that there were less lefties than in the modern game. I do know for a fact that there were very few left-handed pitchers at this time.

Excepting right field, the degree of the positional adjustments is not the same as it is today, but the order is essentially the same. Notably, Bill James has found that at some point around 1930, second and third base jumped to their current positions on the spectrum. But here, third baseman are still creating more runs than second baseman. Of course, the difference is not that large, but it is possible that a later change in the game caused the jump, and then in 1930 it just jumped back to where it had been at the dawn of the majors. On the other hand, it could just be an insignificant margin or a result of a poor RC formula, or what have you. Perhaps the increase in bunting, particularly in the 1890s, made third base a premium defensive position, and the birth of home run ball in the 20s and 30s and the corresponding de-emphasis of the bunt changed that. And of course it is always possible, as is likely the case for RF, that the offensive positional adjustment does not truly reflect the dynamics in play, and is a poor substitute for a comprehensive defensive evaluation, or at the very least defensive positional adjustment. What I have done is treat all outfielders equally, using the overall outfield PADJ for the WAR estimates below.

Then we have the issue of baseline. I am reluctant to even approach it here, since it always opens up a big can of worms, and most of the ways you can try to set a “replacement level” baseline are subject to selective sampling concerns. But I went ahead and fooled around with some stuff anyway. In modern times, some people like to use the winning percentages of the league’s worst teams as an idea of where the replacement level is. In this period, the average last-place team played .243 ball, while the second-worst was .363 and the average of the two .303. So this gives us some idea of where we might be placing it.

I also took the primary starting player at each position for each team, and looked at the difference in RG between the total, the starters, and the non-starters. I also tossed out all pitchers. All players created 5.37 RG, while the non-pitchers were at 5.55. Starting non-pitchers put up 5.78, while the non-starters were 3.68. Comparing the non-starters to all non-pitchers, we get an OW% of .305 (while I hate OW%, it is common standard for this type of discussion). This is very close to the .303 W% of the two worst teams in the league (yes, this could be due to coincidence, and yes, I acknowledge selective sampling problems in the study). And so I decided to set the offensive replacement level at .300.

In our day I go along with the crowd and use .350--despite the fact that I think various factors (chaining and selective sampling chief among them) make that baseline too low. Here I’ve gone even lower, based on the same kind of faulty study. Why? Well, this is subjective as all get out, but in this time, in which there is no minor league system to speak of, when you have to send a telegraph to communicate between Boston and Chicago, when some teams come from relatively small towns by today major league standards, and when the weight of managerial decision-making was almost certainly heavy on fielding than it is today, I think that a lower baseline makes sense. If Cleveland loses one of its players, it doesn’t have a farm club in Buffalo it can call and get a replacement from. They may very well just find a good local semi-pro/amateur/minor league player and thrust him into the lineup.

On the other hand (I’m using that phrase a lot in the murky areas we are treading in), the National League, while clearly the nation’s top league, is not considered the only sun in the baseball solar system as it is now. Many historians believe that the gap between a “minor” league and the National League was much, much smaller than it would be even in the time of Jack Dunn’s clubs in Baltimore and a strong and fairly autonomous Pacific Coast League. So perhaps a .300 NL player is really just not a very good ballplayer at all, and a team stuck with one could easily find a better one, if only there were willing to pay him or find him somewhere.

It would take a better mathematical and historical mind than I have to sort this all out. I am going to use .300, or 65% of the league RG, as the replacement level for this time.

So now the only issue remaining in the evaluation of batters is how we convert their runs to wins.
In 1876, Boston’s RPG(runs scored and allowed per game) was 13.16. Pythagenpat tells us that the corresponding RPW is 12.47. In the same league, Louisville’s RPG was 9.04, for 9.55 RPW. This is a sizeable difference, one that we must account for. A Boston player's runs, even compared to a baseline, are not as valuable as those of a Louisville player.

However, my method to account for this will not be as involved as the Pythagenpat. I will simply set RPW = RPG. This has been proposed, at least for simple situations, by David Smyth in the past. And as Ralph Caola has shown, this is the implicit result of a Pythagorean with exponent 2. That is not to say that it is right, but when we have all of the imprecision floating around already, I prefer to keep it simple. Also, it doesn’t make that much of a difference. The biggest difference between the two figures is .37 WAR, and there are only twelve seasons for which it makes a difference of .20. For most, it is almost as negligible of a difference as an extra run created.

Let me at this point walk you, from start to finish, through a batter calculation, so that everything is explained. Let’s look at Everett Mills, 1876 Hartford, who ends up as the #3 first baseman in the league that season.

Mills basic stats are 254 AB, 66 H, 8 D, 1 T, 0 HR, 1 W, and 3 K. First, we estimate the number of times he reached on an error. In 1876, the average was .1531 estimated ROE per AB-H-K (as given in the last installment). So we give Mills credit for .1531*(254-66-3) = 28.3 errors. This allows us to figure his outs as AB-H-E, or 254-66-28.3 = 159.7.

Then we plug his stats into the Runs Created formula as given in the last installment:
RC = .588S + .886D + 1.178T + 1.417HR + .422W + .606E - .147O
= .588(66-8-1-0) + .886(8) + 1.178(1) + 1.417(0) + .422(1) + .606(28.3) - .147(159.7) = 35.9 RC.

Next we calculate Runs Created per Game, which I abbreviate as RG. The formula is RG = RC*OG/O, where OG is the league average of estimated outs per game, and O is the estimated outs for Mills(159.7 as seen above). In 1876, there are 24.22 OG, so Mills’ RG is 35.9*24.22/159.7 = 5.44.

Now we are going to calculate his Runs Above Replacement, specifically versus a replacement first baseman, who we assume creates runs at 121% of the prevailing contextual average. What is that contextual average? Well, it is half the RPG of Mills’ team. Hartford scored 429 and allowed 261 runs in 69 games, so their RPG is (429+261)/69 = 10. Half of that is 5.

So we expect an average player in Mills’ situation to create 5 runs. But an average first baseman will create 21% more than that, or 1.21*5 = 6.05. So we rate Mills as a below average hitter for his position. But remember that we assume a replacement player will hit at 65% of that, which is .65*6.05 = 3.93. So Mills will definitely have value above replacement; in fact 5.44-3.93 = 1.51 runs per game above replacement.

And Mills made 159.7 outs, which is equivalent to 159.7/24.22 = 6.59 games, so Mills is 1.51*6.59 = 9.95 runs better than a replacement level first baseman. If you want that all as a drawn out formula:
RAR = (RG - (RPG/2)*PADJ*.65)*O/OG
= (5.44 - (10/2)*1.21*.65)*159.7/24.22 = 9.95

Now we just need to convert to Wins Above Replacement. WAR = RAR/RPG. So 9.95/10 = 1.00 WAR.

So our estimate is that Everett Mills was one win better than a replacement level first baseman in 1876. At times, when I get into looking at the player results, I may discuss WAR without the positional adjustment or Wins Above Average, with or without the positional adjustment. If there’s no PADJ, just leave it out of the formula or use 1. If it’s WAA, leave out the .65.

Also, I may refer to Adjusted RG, which will just be 200*RG/RPG.

One little side note about Mr. Mills. You may have noticed that I said he is the third-best first baseman in 1876 in WAR, but is in fact rated as below average. Granted, there are only eight teams in the league, but we would still expect the third-best first baseman to be above average. In fact, only two of the 1876 first baseman have positive WAA, and the total across the eight starters plus two backups who happen to be classified as first baseman (with a combined total of just 29 PA) is -5.1. Apparently, these guys hit nowhere near 121% of the league average. In fact, they had a 6.03 RG v. 5.90 for the league, only a 102 ARG. Whether this is just an aberration or if first baseman as a group had not developed their hitting prowess yet, I have not done enough checking to hazard a guess. It is clear though, by the 1880s, with the ABC trio (Anson, Brouthers, and Connor) that first base was where the big boppers played.

During one of my earlier abortive attempts to analyze National Association stats, I found that in 1871 at least, first base was one of the least productive positions on the diamond. Did first base shift, position on the spectrum, or are we dealing with random fluctuations or the inherent flaws in offensive positional adjustment? Historians are probably best equipped to address this conundrum.

Monday, August 20, 2007

The Audacity of OPS

I have intended for some time to write a post or a series of posts discussing all of the various means of combining OBA and SLG into a more complete stat that are floating around out there. This is not that piece; I have no intentions of discussing all the various OBA/SLG formulas or discussing the technical implications of the ones I do discuss. This is more of a rant, to get this off of my chest.

The title is a little misleading; OPS isn’t really audacious, although perhaps some of its supporters do fall into “recklessness” which is one definition my dictionary gives. I was just too impressed with myself for coming up with the clever phrase (do you have to be a political junkie to get it? More likely, it’s bad and even if you do get it, it’s more likely to elicit a groan then a chuckle).

I should also make it clear before I start: OPS is not a bad statistic. It certainly beats the pants off of looking at the triple crown stats, and there is absolutely nothing wrong with using OPS for quick comparisons or for studies involving large groups of players, or any such thing. But I do believe it is important to keep in mind what OPS is and is not--it is a decent, quick way to evaluate a hitter. But it is not a stat denoted in any sort of useful unit or estimated unit; it is not a stat that was constructed based on a theory about how runs are scored; and it is not a stat that you should go out of your way to use if you have other alternatives available.

First, let’s talk about the components of OPS themselves. On Base Average is a fundamental baseball measurement, because it is essentially the rate of avoiding batting outs. You don’t need me to explain to you why avoiding outs is so important, and that is the point. OBA is a statistic that we would want to invent if we did not already happen to have it sitting around.

Slugging Average is not a fundamental baseball measurement. SLG may be fairly intuitive, and it certainly is venerable, but it is not something that obviously is an important measurement to have on its own. After all, slugging average doesn’t really measure power, because it includes singles. So then what does it measure? It is bases gained on hits by the batter per at bat. But what is the greater significance of bases gained by the batter per at bat?

It really has none. Certainly it is good for batters to gain bases on hits; but that, in and of itself, is not a meaningful measurement. You can even look at the game in such a way that the goal is to gain bases--but in that case, the goal is not for the batter to gain bases, it is for the team to gain bases. And a team doesn’t gain one base for each single on average, nor four bases for each homer, nor do the ratios between one base for a single, two for a double, etc. hold when talking about the bases gained by the team.

The point is not that Slugging Average is meaningless or stupid; the point is that it just is. It is one way of attempting to quantify the value of hits other then counting them all equally as batting average does. It is a crude way of doing so, but it does have a fairly strong correlation with runs and it is a nice thing to know.

But if we didn’t already have SLG, would somebody have to invent it? I think not. There are ways to use the same inputs that would be more useful and would better reflect the run creation value of those inputs.

My message in all of that is that it is not as if OBA and SLG are both (again, my claim is that OBA is, SLG is not) obvious, fundamental things that you would want to know about a team or player’s hitting. They just happen to be two statistics that are the most telling of the widely-available stats.

And it just so happens that when you add them together, you get a measure that is very highly correlated with runs scored, easy to explain computationally, and widely accessible due to the proliferation of OBA and SLG data.

But if you did not have OBA and SLG available to you, would you think of going about creating them so that you could add them together into some uberstat? I would certainly hope not. And how simple is OPS, really? It is simple to compute, in a way, and it is simple to explain, but is it simple to explain why those two things should be added together other then that “it works”? If somebody asks you, “How do you know that just adding them together weights them properly?”, how do you respond?

I said above that OPS is simple to compute, in a way. What I meant by this is that OPS is simple to compute, if you alrseady have OBA and SLG computed for you. Then it is just a simple addition. What if somebody has not already computed them for you? Well now you have (H+W)/(AB+W) + TB/AB, which is not nearly that simple, and not a whole lot simpler then (TB+.8H+W-.3AB)*.32/(AB-H), which gives you a much better rate stat (runs created per out). I guess it can be said that it avoids multiplication--if somebody has already figured total bases for you. If you have to do that yourself, now you have (H+W)/(AB+W) + (H+D+2T+3HR)/AB, which is not that much more simple then (1.8H+D+2T+3HR+W-.3AB)*.32/(AB-H). I guess it can be said that it only includes whole coefficients, if you want to argue on its behalf.

What if we think about what OPS looks like if you write it with a common denominator? Now we have:
OPS = ((H+W)*AB + TB*(AB+W))/(AB*(AB+W))

Not so simple anymore, and can anyone possibly explain the logic behind multiplying those things together like that, other then that “it works”?

Then there is the matter of OPS+. Some people are really shocked to learn that OPS+ is calculated as OBA/LgOBA + SLG/LgSLG - 1. “This is not a true relative OPS!”, they exclaim. “It doesn’t really mean that a 100 OPS+ batter was 20% better in OPS then an 80 OPS+ batter!” While these statements are true, and there is a legitimate complaint to be lodged about the naming of the statistic, the horror at the sacred construction of OPS being violated is somewhat audacious.

When somebody tells you that a stat is adjusted, or has a “+” suffix on the end of it, you expect it to be the ratio of the player’s stat to the league average, perhaps with a park adjustment thrown in. You don’t expect it to be a similar but different statistic. So the measurement that is labeled OPS+ does mislead. Give Pete Palmer a slap on the wrist for this, and move on.

Then when you do move on, give Pete Palmer a pat on the back. Why? Simply for the fact that the measure he has given you is more telling then the measure that you were expecting. I’m not going to get into the math here, since I plan on covering that in my later series, and I will ask you to take this on faith (at least with regards to what is presented here; this is not new information and it has been shown by other sabermetricians in other places). Let’s call OPS/LgOPS “SOPS+” for “straight OPS” plus. People think that OPS+ is SOPS+, and it is not.

The real effect of OPS+, other then adjusting for the league average, is to give more weight to the OBA portion of OPS. Not sufficiently enough weight, but around 1.2 times as much as it is given under OPS or SOPS+. The other thing it does, in addition to correlating better with run scoring, is to express itself in a meaningful estimated baseball unit. OPS+ can be viewed as an approximation, an estimate, of runs per out relative to the league average, which is what you really want to know (or at least is a lot closer to what you really want to know then the ratio of OPS to lgOPS is). Since OPS is unitless, SOPS+ is unitless as well. You can of course use SOPS+ to approximate relative runs/out as well. However, in order to do it, you have to take two times SOPS+, minus one.

So when people complain that OPS+ distorts the ratio between player’s OPS, they are right. But this distortion is a good thing, since it puts it in terms of a meaningful standard instead of a ratio of a contrived, not theoretically-based statistic (OPS). SOPS+ wouldn’t tell these folks what they think it would. A 120 SOPS+ hitter would be 20% above the league average in OPS. That does not in any way, shape, or form mean anything other then that. It does not mean that they created 20% more runs per plate appearance then an average player. It does not mean that they created 20% more runs per out then an average player. It does not mean that they were 20% better then an average player. It does not mean that they are 20% more talented then an average player.

To me, it is a parody of sabermetrics when people complain about OPS+ not being SOPS+, for any reason other then the confusion caused by its name. We sabermetricians have used OPS, and now we will complain about something that is no longer pure OPS, even though it is a more meaningful statistic with clearer units that correlates better with wining baseball games. What is inherently superior about adding OBA to SLG and then comparing to the league average versus comparing OBA to the average, SLG to the average, and adding the results? Considering that OPS doesn’t have any units to begin with and doesn’t correlate better with runs scored, nothing that I can see.

I apologize for the rambling nature of this, but I warned you when I started that it was a rant. OPS is a fine, quick way to measure a hitter. That does not mean that its units are meaningful, that does not mean that it is has meaningful units when it is divided by the league average, or that it is a statistic that has any inherent logic behind it other then adding together two things because it works, or that another metric that combines OBA and SLG in a different way is necessarily inferior or incorrect. As long as you keep those things in mind, there’s not really anything audacious about OPS.

Monday, August 13, 2007

Early NL Series: Intro and Run Estimation

My major interest in baseball research is theoretical sabermetrics. The “theoretical” label sounds a bit arrogant, but what I mean is that I am interested particularly in questions of what would happen at extremes that do not occur with the usual seasonal Major League data that many people analyze (for instance, RC works fine for normal teams, and so does 10 runs = 1 win as a rule of thumb. You don’t really need BsR or Pythagenpat for those types of situations--they can help sharpen your analysis, but you won’t go too far off track without them.) Thus my interest in run and win estimation at the extremes, as well as evaluation of extreme batters (yes, I still have about five installments in the Rate Stat series to write, and yes, I will get around to it, but when, I don’t know). Secondary to that is using sabermetrics to increase my understanding of the baseball world around me (example, how valuable is Chipper Jones? What are the odds that the Tigers win the World Series? Who got the better of the Brewers/Rangers trade?). I don't do this a whole lot here because there are dozens and dozens of people who do that kind of stuff, and I wouldn't be able to add any added insight. But a close third is using sabermetrics to evaluate the players and teams of the past. Particularly, I am interested in applying sabermetric analysis to the earliest days of what we now call major league baseball.

A few years ago, and again recently, I turned my attention to the National Association, the first loose major league of openly professional players that operated from 1871-1875. However, this league, as anyone who has attempted to statistically analyze it will know, was a mess. Teams played 40 games in a season; some dropped out after 10, some were horrifically bad, Boston dominated the league, etc. All of these factors make it difficult to develop the kind of sabermetric tools (run estimators, win estimators, baselines) that we use in present day analysis. So I finally threw my hands up and gave up (Dan Rosenheck came up with a BsR formula that worked better for the NA then anything I did, but there are limitations of the data that are hard to overcome). For now, it is probably best to eyeball the stats of NA players and teams and use common sense, as opposed to attempting to apply rigorous analytical structures to them.

Anyway, when things start to settle down, you have the National League, founded in 1876. I should note at this point that while I am interested in nineteenth-century baseball, I am by no means an expert on it, and so you should not be too surprised if I butcher the facts or make faulty assumptions, or call Cap Anson “Cap Anderson”. If you want a great historical presentation of old-time baseball, the best place to go is David Nemec’s The Great Encyclopedia of Nineteenth Century Major League Baseball. I believe that a revised edition of this book has been published recently, but I have the first edition. It is really a great book, similar in format to my favorite of the 20th century baseball encyclopedias, The Sports Encyclopedia: Baseball (or Neft/Cohen if you prefer). Like that work, only basic statistics are presented (no OPS+ or Pitching Runs, etc.), but you get the complete roster of each team each year, games by position, etc. And just like Neft/Cohen, there is a text summary of every season’s major stories, although Nemec writes these over the course of four or five pages, with pictures and trivial anecdotes, as opposed to the several paragraphs in the Neft/Cohen book. I wholeheartedly recommend the Nemec encyclopedia to anybody interested in the 19th century game.

That digression aside, the 1876 National League is still a different world then what we have today. The season is 60 games long, one team goes 9-56, pitchers are throwing from a box 45 feet away from the plate, it takes a zillion balls to draw a walk, overhand pitching is illegal, etc. But thankfully, you can make some sense of the statistics of this league, and while our tools don’t work as well, due to the competitive imbalance, the lack of important data that we have for later seasons, the shorter sample sizes as a result of a shorter season, etc., they can work to a level of precision that makes me comfortable to present their findings, with repeated caveats about how inaccurate they are compared to similar tools today. For the National Association, I could never reach that level of confidence.

What I intend to do over the course of this series is to look at the National League each season from 1876-1881. I chose 1881 for a couple reasons, the first being that during those seven seasons the NL had no other contenders to “major league” status (although many historians believe that other teams in other leagues would have been competitive with them--it's not like taking today’s Los Angeles Dodgers against the Vero Beach Dodgers). Also, in Bill James’ Historical Data Group Runs Created formulas, 1876-1881 is covered under one period (although 1882 and 1883 are included as well). That James found that he could put these seasons under one RC umbrella lead me to believe that the same could be done for BsR and a LW method as well. I will begin by looking at the runs created methodology here.

Run estimation is a little tricky as you go back in time. Unfortunately, there is no play-by-play database that we can use to determine empirical linear weights, and some important data is missing (SB and CS particularly). The biggest missing piece of the offensive puzzle though is reached base on error, which for simplicity’s sake I will just refer to as errors from hereon. In the 1880 NL, for instance, the fielding average was .901, and there were 8.67 fielding errors per game (for both teams). One hundred years later, the figures were .978 and 1.74. So you have something like five times as many errors being made as you do in the modern game.

When looking at modern statistics, you can ignore the error from an offensive perspective pretty safely. It will undoubtedly improve the accuracy of your run estimator if you can include it, but only very slightly, and the data is not widely available so we just ignore it, as we sometimes ignore sacrifice hits and hit batters and other minor events. But when there are as many errors as there were in the 1870s, you can’t ignore that. If you use a modern formula like ERP, and find the necessary multiplier, you will automatically inflate the value of all of the other events, because there has to be compensation somewhere for all of the runs being created as a result of errors.

So far as I know, there is only one published run estimator for this period. Bill James’ HDG-1 formula covers 1876-1883, and is figured as:
RC = (H + W)*(TB*1.2 + W*.26 + (AB-K)*.116)/(AB + W)
Bill decided to leave base runners as the modern estimate of H+W, and then try to account somewhat for errors by giving all balls in play extra advancement value. If you use the total offensive stats of the period to find the implicit linear weights, this is what you get:
RC = .730S + 1.066D + 1.402T + 1.739HR + .434W - .1081(AB - H - K) - .1406K

As you can see, the value of each event is inflated against our modern expectation of what they should be. I should note here that, of course, we don’t expect the 1870s weights to be the same as or even that similar to the modern weights. The coefficients do and should change as the game changes. That said, though, we have to be suspicious of a homer being valued at 1.74 runs and a triple at 1.40. The home run has a fairly constant value and it would take a very extreme context to lift its value so high. Scoring is high in this period (5.4 runs/game), but a lot of that logically has to be due to the extra errors. Three and a half extra errors per team game is like adding another 3.5 hits--it's going to be a factor in increased scoring.

To test RMSE for run estimators, I figured the error per (AB - H). I did this because I did not want the ever changing schedule length to unduly effect the RMSE. Of course, this does introduce the potential for problems because AB-H is much less a good proxy for outs in this period then it is today, as I will discuss shortly. I then multiplied the per out figure by 2153 (the average number of AB-H for a team in the 1876-1883 NL). In any case, doing this versus just taking the straight RMSE against actual runs scored did not make a big difference. Bill’s formula came in at 35.12 while the linearization was 30.65.

Of course what I wanted to do was figure out a Base Runs formula that worked for this period, as BsR is the most flexible and theoretically sound run estimator out there. What I decided to do was use Tango Tiger’s full modern formula and attempt to estimate some data that was missing and throw out other categories that would be much more difficult to estimate. I wound up estimating errors, sacrifice hits, wild pitches, and passed balls but throwing out steals, CS, intentional walks, hit batters, etc. Some of those events were subject to constantly changing rules and strategy (stolen bases and sacrifices were not initially a big part of the professional game) or didn’t even yet exist (Did teams issue intentional walks when it took 8 balls to give the batter first base? I am not a historian, but I doubt it. Hit batters did not result in a free pass until the 1887 in the NL). In the end, I came up with these estimates:

ERRORS: In modern baseball, approximately 65% of all errors result in a reached base on error for the offense. I (potentially dubiously) assumed that a similar percentage held in the 1870s, and used 70%. Then I simply figured x as 70% of the league fielding errors, per out in play (AB-H-K). x was allowed to be a different value for each season. Some may object to this as it hones in too much on the individual year and I certainly can understand such a position. However, the error rates were fluctuating during this period. In 1876 the league FA was .866; in 1877 it was up to .884; then .893, .892, .901, .905, .897, and .891. These differences are big enough to suggest that fundamental changes in the game may have been occurring from year-to-year.

James’ method had no such yearly correction, and if you force the BsR formula I will present later to use a constant x value of .134 (i.e. 13.4% of outs in play resulted in ROE), its RMSE will actually be around a run and a half higher then that of the linearization of RC. I still think that there are plenty of good reasons to use the BsR formula instead, but in the interests of intellectual honesty, I did not want to omit that fact.

It is entirely possible that a better estimate for errors could be found; there is no reason to assume that every batter is equally likely to reach on an error once they’ve made an out in play. In fact, I am sure that some smart mind could come along and come up with better estimates then I have in a number of different areas, and blow my formula right out of the water. I welcome further inquiry into this by others and look forward to my formula being annihilated. So don’t take any of this as a finished product or some kind of divine truth (not that you should with my other work either).

SACRIFICES: The first league to record sacrifices, so far as I can tell, was the American Association in 1883 and 1884. In those leagues, there was .0323 and .0327 SH per single, walk, and estimated ROE. So I assumed SH = .0325*(S + W + E) would be an acceptable estimate in the early NL. NOTE: Wow, did I screw the pooch on this one. The AA DID NOT track sacrifices in '83 and '84. I somehow misread the HB column as SH. We do no thave SH data until 1895 in the NL. So the discussion that follows is of questionable accuracy.

I did this some tie ago without thinking it through completely; in early baseball, innovations were still coming quickly, and it is possible that in the seven year interval, the sacrifice frequency changed wildly. George Wright recalled in 1915 (quoted in Bill James’ New Historical Baseball Abstract, pg. 10): “Batting was not done as scientifically in those days as now. The sacrifice hit was unthought of and the catcher was not required to have as good a throwing arm because no one had discovered the value of the stolen base.”

On the other hand, 1883 is pretty close to the end of our period, so while the frequency may well have increased over time, the estimate should at least be pretty good near the end of the line. One could also quibble with the choice of estimating sacrifices as a percentage of times on first base when, if sacrifices are not recorded, they are in actuality a subset of AB-H-K. Maybe an estimate based both on times on first and outs in play would work best. Again, there are a lot of judgment calls that go into constructing the formula, and so there are lots of areas for improvement.

WP and PB: These were kept by the NL, and there were .0355 WP per H+W-HR+E and .0775 PB per the same. So, the estimates are WP = .0355*(H + W - HR + E) and PB = .0775*(H + W - HR + E).

Then I simply plugged these estimates into Tango’s BsR formula. D of course was home runs, while A = H + W - HR + E + .08SH and C = AB - H - E + .92SH. The encouraging thing about this exercise was that the B factor only needed a multiplier of 1.087 (after including a penalty of .05 for outs) to predict the correct number of total runs scored. Ideally, if Base Runs was a perfect model of scoring (obviously it is not), we could use the same formula with any dataset, given all of the data, and not have to fudge the B component. The fact that we only had to fudge by 1.087 (compared to Bill James who to make his Basic RC work had to add walks into the B factor, take 120% of total bases, and add 11.6% of balls in play to B), could indicate that the BsR formula holds fairly well for this time when we add important, more common events like SH, errors, WP, and PB. Of course, perhaps Bill could get similar results using a more technical RC formula + estimation. The bottom line is, a fudge of only 1.087 will keep the linear weights fairly close to what we expect today. I don’t know for sure that they should be, but I’d rather error on the side of our expectations as opposed to a potentially quixotic quest to produce the lowest possible RMSE for a sample of sixty teams playing an average of 78 games each.

So the B formula is:
B = (.726S + 1.948D + 3.134T + 1.694HR + .052W + .799E + .727SH + 1.165WP + 1.174PB - .05(AB - H - E))*1.087

The RMSE of this formula by the standard above is 28.18. I got as low as 24.61 by increasing the outs weight to -.2, but I was not comfortable with the ramifications of this. As mentioned before, if one does not allow each year to have a unique ROE per OIP ratio, the RMSE is a much worse 32.20. Again, I feel a differently yearly factor is appropriate, but can certainly see if some feel this is an unfair advantage for this estimator when comparing it to others. The error of approximately 30 runs is a far cry from the errors around 23 in modern baseball, plus the season was shorter and the teams in this period averaged only 421 runs/season, so the raw number makes it seem smaller then it actually is. As I said before, you should always be aware of the inaccuracies when using any sabermetric method, but those caveats are even more important to keep in mind here.

Another way to consider the error is as a percentage of the runs scored by the team. This is figured as ABS(R-RC)/R. For sake of comparison, basic ERP, when used on all teams 1961-2002 (except 1981 and 1994), has an average absolute error of 2.7%. The BsR formula here, applied to all NL teams 1876-1883, has an AAE of 5.4%, twice that value. So once again I will stress that the methods used here are nowhere near as accurate as the similar methods used in our own time. Just for kicks, the largest error is a whopping 24.2% for the 1876 Cincinnati entry, which scored 238 runs but was projected to score 296. The best estimate is for Buffalo in 1882; they actually scored 500 versus a prediction of 501.

Before I move on too far, I have a little example that will illustrate the enormous effect of errors in this time and place. In modern baseball, there are pretty much exactly 27 outs per game, and approximately 25.2 of these are AB-H. We recognize, of course, that ROE in our own time are included in this batting out figure, and should not be, but any distortion is small and can basically be ignored.

Picking a random year, in the 1879 NL, we know that there were 27.09 outs/game since we have the innings pitched figure. How many batting outs were there per game? Well, if the modern rule of thumb held, there should be just about 25.2. There were 28.01. So there are more batting outs per game then there are total outs in the game. With our error estimate subtracted (so that batting outs = AB - H - E), we estimate 24.60. Now this may well be too low, or just right, or what have you. Maybe I it should have been 50% of errors put a runner on first base instead of 70%. I don’t know. What I do know is that if you pretend errors do not exist, you are going to throw all of your measures for this time and place out of whack. Errors were too big of a factor in the game to just be ignored as we can do today.

Let’s take a look at the linear values produced by the Base Runs formula, as applied to the entire period:
Runs = .551S + .843D + 1.126T + 1.404HR + .390W + .569E + .081SH + .280PB + .278WP - .145(AB - H - E)

This is why I felt much more comfortable with the BsR formula I chose, despite the fact that there were versions with better accuracy. These weights would not be completely off-base if we found them for modern baseball. Whether or not they are the best weights for 1876-1883, we will have to wait for when brighter minds tackle the problem or when PBP data is available and we can empirically see what they are. But to me, it is preferable to accept greater error in team seasonal data but keep our common sense knowledge of what events are worth rather then to chase greater accuracy but distort the weights.

This is still not the formula that I am going to apply to players, though. For that, I will use the linear version for that particular season. Additionally, for players, SH, PB, and WP will be broken back down into their components. What I mean is that we estimate that a SH is worth .081 runs, and we estimated that there are .0325 SH for every S, W, and E. .081*.0325 = .0026, and therefore, for every single, walk, and error we’ll add an additional .0026 runs. So a single will be worth .551+.0026 = .554 runs. We’ll also distribute the PB and WP in a similar way.

There are some drawbacks to doing it this way. If Ross Barnes hits 100 singles, his team may in fact lay down 3.25 more sacrifices. But it will be his teammates doing the sacrificing, not him. And we would assume that good hitters would sacrifice less then poor hitters, and this method assumes they are all doing it equally.

On the other hand, though, we are just doing something similar in spirit to what a theoretical team approach does--crediting the change in the team’s stats as a direct result of the player to the player. Besides, there’s really no other fair way to do it (we don’t want to get into estimating SH as a function of individual stats, and even if we did, we have no individual SH data for this period to test against). Also, in the end, the extra weight added to each event will be fairly small, and I am much more comfortable doing it with the battery errors which should be fairly randomly distributed with regards to which particular player is on base when they occur.

Then there is the matter of the error. Since the error is done solely as a function of AB-H-K, we could redistribute it, and come up with a different value for a non-K out and a K out, and write errors out of the formula, and have a mathematically equivalent result. However, I am not going to do this because I believe that, as covered previously, errors are such an important part of this game that we should recognize them, and maybe even include them in On Base Average (I have not in my presentation here, but I wouldn’t object if someone did) in order to remember that they are there. I think that keeping errors in the formula gives a truer picture of the linear weight value of each event as well, as it allows us to remember that the error is worth a certain number of runs and that outs, actual outs, have a particular negative value. Hiding this by lowering the value of an out seems to erase information to me.

I mentioned earlier that each year will have a different x to estimate errors in the formula x(AB-H-K). They are: 1876 = .1531, 1877 = .1407, 1878 = .1368, 1879 = .1345, 1880 = .1256, 1881 = .1184.

At this point, let me present the weights for the league as a whole in each year in 1876-1881, and then the ones with SH, PB, and WP stripped out and reapportioned across the other events. The first set is presented as (S, D, T, HR, W, E, AB-H-E, SH, PB, WP). The second is presented as (S, D, T, HR, W, E, AB-H-E).

1876: .552, .853, 1.146, 1.417, .386, .570, -.147, .085, .289, .287
1876: .588, .886, 1.178, 1.417, .422, .606, -.147
1877: .563, .862, 1.152, 1.414, .398, .581, -.153, .079, .287, .285
1877: .598, .894, 1.184, 1.414, .433, .616, -.153
1878: .546, .846, 1.138, 1.417, .380, .564, -.144, .087, .289, .287
1878: .581, .879, 1.171, 1.417, .415, .599, -.144
1879: .543, .830, 1.108, 1.397, .385, .560, -.140, .082, .275, .273
1879: .577, .861, 1.139, 1.397, .419, .594, -.140
1880: .537, .825, 1.105, 1.400, .378, .554, -.137, .086, .277, .275
1880: .571, .856, 1.136, 1.400, .412, .588, -.137
1881: .560, .859, 1.149, 1.415, .395, .578, -.151, .080, .287, .285
1881: .595, .891, 1.182, 1.415, .430, .613, -.151

Next installment, I’ll talk a little bit about replacement level, the defensive spectrum, and park factors.

Tuesday, August 07, 2007

Career Walk Rates (an excuse to make a quick point)

Putting aside my attempt at being an objective analyst for a moment, my favorite offensive event is the walk, and my favorite kind of players are those that walk a lot. So I’m just going to do a quick look at some walk-focused derived career statistics for a those major leaguers with 5000 AB between 1901 and 2005 as a vehicle to talk about a couple. Also, please note that there are no new insights in this post; these are not ideas that originated with me or are unique. And it is really just a space-filler, to justify this post’s existence, so that I can complain about a certain quickie stat that people use.

Right off the bat, I’m going to ignore hit batters and sacrifices. So the only events being considered are at bats and walks in this analysis. Now, if you want to determine a player’s propensity to walk, what is the first possible statistic that comes to mind? I think if you’re like most people, you would say the percentage of plate appearances in which the hitter walked. And I think you would be right. So here are the top and bottom ten in Walk Percentage, W/(AB + W):
1. Ted Williams (20.8)
2. Barry Bonds (20.2)
3. Babe Ruth (19.7)
4. Eddie Yost (18.0)
5. Mickey Mantle (17.6)
6. Mark McGwire (17.6)
7. Jim Thome (17.5)
8. Frank Thomas (17.4)
9. Joe Morgan (16.7)
10. Rickey Henderson (16.7)
591. Kitty Bransfield (4.2)
592. Manny Sanguillen (4.2)
593. Tim Foli (4.2)
594. Enos Cabell (4.2)
595. Everett Scott (4.0)
596. Hal Chase (3.6)
597. Art Fletcher (3.5)
598. Ozzie Guillen (3.5)
599. Shawon Dunston (3.3)
600. George Stovall (3.2)

Obviously, we could get into league adjustments, park adjustments, or at least equivalent run values (as I did with the “Gavvys” for home runs), but that would be defeating the point of this post, which is not to look at the career lists as much as it is to discuss the construction of the stats themselves.

Another measure you’ll see people look at sometimes is what is sometimes called isolated walks, or on base extension, or other names, but is simply OBA-BA. It seems reasonable enough at first glance; OBA measures times on base by hits and walks, BA measures the frequency of hits per at bat, so the difference should tell you something about walk frequency. What kind of list does that make?
1. Barry Bonds (.141)
2. Ted Williams (.136)
3. Eddie Yost (.134)
4. Babe Ruth (.130)
5. Mark McGwire (.129)
6. Jim Thome (.126)
7. Mickey Mantle (.124)
8. Joe Morgan (.122)
9. Earl Torgeson (.121)
10. Frank Thomas (.121)

Obviously, this is a similar list, but not in the same order. Why is this? Well, what is OBA-BA?
(H + W)/(AB + W) - H/AB. BA and OBA have different denominators, so it is not exactly clear what this is measuring. But with a little bit of algebra, you can right “ISW” as:
ISW = W*(AB - H)/(AB*(AB + W))

As you can probably now see, this is a statistic that doesn’t really measure anything. Why are walks multiplied by outs in the numerator, and at bats multiplied by plate appearances in the denominator? Is there any logical explanation for this?

No, there isn’t. OBA-BA is just something that people use because they are lazy, and it obviously tracks the true walk rate well. But like OPS, it is a statistic that doesn’t have units; arguably unlike OPS, it doesn’t make even a bit of sense, despite generally tracking a useful thing (walking rate/frequency/ability).

You can fiddle with that ISW equation to further see what it winds up doing; if I rewrite it as the mathematically equivalent W/(AB + W)*(AB - H)/AB, and then rewrite (AB - H)/AB as the equivalent one minus batting average, you can see that:
ISW = W%*(1 - BA)

In other words, the more base hits you get, the worse you look in OBA-BA, despite an equal walk rate. I have never really seen it used for serious analysis, which is good because it never should be, and I don’t think there is any reason to ever use it.

If you just have the three major rate stats at your disposal, then you can calculate W% as (OBA - BA)/(1 - BA). Another quick walk stat you can use is walks per at bat. Now you have to be incredibly lazy to not want to add walks back into the denominator, but walks per at bat (which I’ll call WAB) does have a useful property, in that it ties into what I have always considered the “fourth rate stat”, secondary average, which is equal to ISO + WAB if you ignore stolen bases. What kind of career list does WAB give?
1. Ted Williams (.262)
2. Barry Bonds (.253)
3. Babe Ruth (.246)
4. Eddie Yost (.220)
5. Mickey Mantle (.214)
6. Mark McGwire (.213)
7. Jim Thome (.212)
8. Frank Thomas (.211)
9. Joe Morgan (.201)
10. Rickey Henderson (.200)

As you can see, this is the exact same order as the W% list. And if you understand the math, this is no surprise. Remember, we eliminated all extraneous categories (HB, SH, SF, INT) from consideration, so there are only at bats and walks. Therefore, W/AB is just the ratio form of the stat, where W% is the percentage version. That may be unclear; if so, consider:
Winning % = W/(W + L) Win Ratio = W/L
Walk % = W/(W + AB) WAB = W/AB

Win % is to win ratio as Walk % is to WAB. This is true of all ratios and percentages, mathematically, but I thought that using another example from baseball might help people see this. Therefore, W% and WAB are directly related--W% = WAB/(1 + WAB) and WAB = W%/(1 - W%). They will always produce the same list in order, and while W% is probably a more intuitive and useful form, WAB is the ratio of walks to non-walks, which can be a legitimate form of the stat. And if for some reason you want to get WAB from the basic rate stats, WAB = (OBA - BA)/(1 - OBA).

Anyway, one other walk stat that I do like is an estimate of the percentage of runs created derived from walks. You can do this pretty easily with a linear RC formula, and it has been done by many other sabermetricians in the past. It is sort of a junk stat, but it is a fun way to find players who may not have historically high walk rates but depended on the walk to contribute to their teams.

I am going to use one RC formula for the whole century; essentially Paul Johnson’s ERP (TB + .8H + W - .3AB)*.324. You can of course expand that out to see the weight on each event, but all that we care about here is that a walk is worth .32 runs. So .32*W/RC = percentage of RC derived from walks, which I will call %W:
1. Donie Bush, 44.5 %W, 13.8 W%
2. Eddie Yost, 44.5, 18.0
3. Miller Huggins, 44.3, 15.3
4. Eddie Joost, 42.0, 15.7
5. Mark Belanger, 38.1, 9.1
6. Earl Torgeson, 37.5, 16.5
7. Elmer Valo, 37.1, 15.8
8. Rickey Henderson, 36.9, 16.7
9. Joe Morgan, 36.9, 16.7
10. Burt Shotton, 36.6, 12.6

As you can see, the top of this list is largely made up of middling or poor hitters, as the great walkers like Ruth and Bonds and Mantle created a lot of runs by walking, but also many by getting base hits and hitting for power. Mark Belanger’s 9.1 W% is quite pedestrian, but he brought little else to the table as a hitter, and so a large share of his value did come from walking.

591. Manny Sanguillen, 11.6, 4.2
592. George Stovall, 11.6, 3.2
593. George Sisler, 11.5, 5.4
594. Heinie Zimmerman, 11.4, 4.4
595. Art Fletcher, 11.3, 3.5
596. Dante Bichette, 11.3, 5.3
597. Joe Medwick, 11.0, 5.4
598. Hal Chase, 10.3, 3.6
599. Garret Anderson, 10.2, 4.5
600. Shawon Dunston, 9.4, 3.3

Some good players, some bums, some crooks, a Coors Field fraud, a Michigians; all in all, a group of players that I love to hate.

The takeaway here is that if you’re going to figure a walk rate stat from BA and OBA, then for pete’s sake (Rose? Alexander? Schourek?), please use (OBA - BA)/(1 - OBA) or (OBA - BA)/(1 - BA), not the unitless and nonsensical OBA-BA.