Monday, August 27, 2007

Early NL Series: Batter Evaluation

We have a Runs Created formula, the backbone of just about any offensive evaluation system. But what about the other details, like park adjustments, performance by position, baseline, and conversion to wins?

I’ll deal with these one-by-one. Park adjustments in this time would be a real pain to figure. Parks change rapidly, they may have the same name but be a new structure from year-to-year, teams are jumping in and out of the league (which radically alters the “road” context for any given team from year-to-year), sample size is reduced because of shorter seasons, etc. I suppose that I could look at all of this and say it’s not worth doing the work, and just use Total Baseball’s PFs. But that is not the option I have chosen, because Total Baseball’s park factors are subject to the same problems that the ones I would calculate would be.

So instead I have decided to simply use each team’s actual RPG as the standard to which a hitter is compared. This is problematic in some sense, if your goal is building a performance or ability metric--a player on one team may be valued more highly because his team has a good pitching staff, while another may be hurt if the opposite holds. On the other hand, from a value perspective, as Bill James argued way back in 1985, the result of other games don’t define the player in question’s value--his contributions to wins and losses comes only in the context of the games that his team actually plays.

Additionally, there is a huge gorilla in the room in the whole discussion of early major league baseball that I am ignoring, and that is fielding. Fielding is a pain to evaluate in any time, and it is not my specialty in any case, so I am not going to even begin to approach an evaluation of the second baseman in the 1879 NL. So of course my ratings will only cover offense and while you should take offense-only ratings with some skepticism in today’s game, it is even more so in a game where the ball is put in play about 90% of the time and there are 5 errors/game, etc. But by using the team’s actual RPG figure, we do capture some secondary effects of, if nothing else, the whole team’s fielding skill (less runs allowed means a lower baseline RG for players, as well as less runs per win). In the last paragraph I slipped in the argument about a “good pitching staff” potentially inflating a batter’s value. But in this game it is even harder to separate pitching from fielding than it is today, and that “good pitching staff” is more likely “good pitching and fielding”, to which our player has contributed at least a tiny bit. This is not in an attempt to say that the RPG approach would be better than using stable PFs if they existed, but just a small point in its favor that would be less obvious in 2006.

I am still going to apply a positional adjustment to each player’s expected runs created, based on his primary position. Looking at all hitters 1876-1883, the league had a RG of 5.36. Here are the RG and Adjusted RG(just RG divided by the prevailing RPG, in this case 5.36) for each position:

This actually matches the modern defensive spectrum if you throw out the quirky rightfield happenings. One potential explanatory factor I stumbled upon after writing this came from an article by Jim Foglio entitled “Old Hoss” (about Radbourn) in the SABR National Pastime #25:

“When Radbourn did not start on the mound, he was inserted into right field, like many of his contemporary hurlers. In 19th-century ball, non-injury substitutions were prohibited. The potential relief, or ‘exchange’ pitcher, was almost always placed in right, hence the bad knock that right fielders have received all the way down to the little league game…[the practice] surely had its origins in [pitchers’] arm strength, given the distance of the throws when compared to center and left.”

However, I’m not sure this explains the right field problem satisfactorily, because I did not look at offensive production when actually playing a given position; it was the composite performance of all players who were primarily of a given position. Change pitchers out in RF, if they got in more games as pitchers than as right fielders, would still be considered pitchers for the purpose of the classifications used to generate the data above. Of the men classified by Nemec as the primary RF, in the 1876-1883 NL there are 60 players. Of those 60, 23 (38%) pitched at some point during that season, but only 8 (13%) pitched more than 30 IP.

Another possibility is that if left-handed hitters are more scarce, there are fewer balls hit to right field. This might actually be the most satisfying explanation; I did not actually check the breakdown on left-handed and right-handed batters, but it figures that there were less lefties than in the modern game. I do know for a fact that there were very few left-handed pitchers at this time.

Excepting right field, the degree of the positional adjustments is not the same as it is today, but the order is essentially the same. Notably, Bill James has found that at some point around 1930, second and third base jumped to their current positions on the spectrum. But here, third baseman are still creating more runs than second baseman. Of course, the difference is not that large, but it is possible that a later change in the game caused the jump, and then in 1930 it just jumped back to where it had been at the dawn of the majors. On the other hand, it could just be an insignificant margin or a result of a poor RC formula, or what have you. Perhaps the increase in bunting, particularly in the 1890s, made third base a premium defensive position, and the birth of home run ball in the 20s and 30s and the corresponding de-emphasis of the bunt changed that. And of course it is always possible, as is likely the case for RF, that the offensive positional adjustment does not truly reflect the dynamics in play, and is a poor substitute for a comprehensive defensive evaluation, or at the very least defensive positional adjustment. What I have done is treat all outfielders equally, using the overall outfield PADJ for the WAR estimates below.

Then we have the issue of baseline. I am reluctant to even approach it here, since it always opens up a big can of worms, and most of the ways you can try to set a “replacement level” baseline are subject to selective sampling concerns. But I went ahead and fooled around with some stuff anyway. In modern times, some people like to use the winning percentages of the league’s worst teams as an idea of where the replacement level is. In this period, the average last-place team played .243 ball, while the second-worst was .363 and the average of the two .303. So this gives us some idea of where we might be placing it.

I also took the primary starting player at each position for each team, and looked at the difference in RG between the total, the starters, and the non-starters. I also tossed out all pitchers. All players created 5.37 RG, while the non-pitchers were at 5.55. Starting non-pitchers put up 5.78, while the non-starters were 3.68. Comparing the non-starters to all non-pitchers, we get an OW% of .305 (while I hate OW%, it is common standard for this type of discussion). This is very close to the .303 W% of the two worst teams in the league (yes, this could be due to coincidence, and yes, I acknowledge selective sampling problems in the study). And so I decided to set the offensive replacement level at .300.

In our day I go along with the crowd and use .350--despite the fact that I think various factors (chaining and selective sampling chief among them) make that baseline too low. Here I’ve gone even lower, based on the same kind of faulty study. Why? Well, this is subjective as all get out, but in this time, in which there is no minor league system to speak of, when you have to send a telegraph to communicate between Boston and Chicago, when some teams come from relatively small towns by today major league standards, and when the weight of managerial decision-making was almost certainly heavy on fielding than it is today, I think that a lower baseline makes sense. If Cleveland loses one of its players, it doesn’t have a farm club in Buffalo it can call and get a replacement from. They may very well just find a good local semi-pro/amateur/minor league player and thrust him into the lineup.

On the other hand (I’m using that phrase a lot in the murky areas we are treading in), the National League, while clearly the nation’s top league, is not considered the only sun in the baseball solar system as it is now. Many historians believe that the gap between a “minor” league and the National League was much, much smaller than it would be even in the time of Jack Dunn’s clubs in Baltimore and a strong and fairly autonomous Pacific Coast League. So perhaps a .300 NL player is really just not a very good ballplayer at all, and a team stuck with one could easily find a better one, if only there were willing to pay him or find him somewhere.

It would take a better mathematical and historical mind than I have to sort this all out. I am going to use .300, or 65% of the league RG, as the replacement level for this time.

So now the only issue remaining in the evaluation of batters is how we convert their runs to wins.
In 1876, Boston’s RPG(runs scored and allowed per game) was 13.16. Pythagenpat tells us that the corresponding RPW is 12.47. In the same league, Louisville’s RPG was 9.04, for 9.55 RPW. This is a sizeable difference, one that we must account for. A Boston player's runs, even compared to a baseline, are not as valuable as those of a Louisville player.

However, my method to account for this will not be as involved as the Pythagenpat. I will simply set RPW = RPG. This has been proposed, at least for simple situations, by David Smyth in the past. And as Ralph Caola has shown, this is the implicit result of a Pythagorean with exponent 2. That is not to say that it is right, but when we have all of the imprecision floating around already, I prefer to keep it simple. Also, it doesn’t make that much of a difference. The biggest difference between the two figures is .37 WAR, and there are only twelve seasons for which it makes a difference of .20. For most, it is almost as negligible of a difference as an extra run created.

Let me at this point walk you, from start to finish, through a batter calculation, so that everything is explained. Let’s look at Everett Mills, 1876 Hartford, who ends up as the #3 first baseman in the league that season.

Mills basic stats are 254 AB, 66 H, 8 D, 1 T, 0 HR, 1 W, and 3 K. First, we estimate the number of times he reached on an error. In 1876, the average was .1531 estimated ROE per AB-H-K (as given in the last installment). So we give Mills credit for .1531*(254-66-3) = 28.3 errors. This allows us to figure his outs as AB-H-E, or 254-66-28.3 = 159.7.

Then we plug his stats into the Runs Created formula as given in the last installment:
RC = .588S + .886D + 1.178T + 1.417HR + .422W + .606E - .147O
= .588(66-8-1-0) + .886(8) + 1.178(1) + 1.417(0) + .422(1) + .606(28.3) - .147(159.7) = 35.9 RC.

Next we calculate Runs Created per Game, which I abbreviate as RG. The formula is RG = RC*OG/O, where OG is the league average of estimated outs per game, and O is the estimated outs for Mills(159.7 as seen above). In 1876, there are 24.22 OG, so Mills’ RG is 35.9*24.22/159.7 = 5.44.

Now we are going to calculate his Runs Above Replacement, specifically versus a replacement first baseman, who we assume creates runs at 121% of the prevailing contextual average. What is that contextual average? Well, it is half the RPG of Mills’ team. Hartford scored 429 and allowed 261 runs in 69 games, so their RPG is (429+261)/69 = 10. Half of that is 5.

So we expect an average player in Mills’ situation to create 5 runs. But an average first baseman will create 21% more than that, or 1.21*5 = 6.05. So we rate Mills as a below average hitter for his position. But remember that we assume a replacement player will hit at 65% of that, which is .65*6.05 = 3.93. So Mills will definitely have value above replacement; in fact 5.44-3.93 = 1.51 runs per game above replacement.

And Mills made 159.7 outs, which is equivalent to 159.7/24.22 = 6.59 games, so Mills is 1.51*6.59 = 9.95 runs better than a replacement level first baseman. If you want that all as a drawn out formula:
RAR = (RG - (RPG/2)*PADJ*.65)*O/OG
= (5.44 - (10/2)*1.21*.65)*159.7/24.22 = 9.95

Now we just need to convert to Wins Above Replacement. WAR = RAR/RPG. So 9.95/10 = 1.00 WAR.

So our estimate is that Everett Mills was one win better than a replacement level first baseman in 1876. At times, when I get into looking at the player results, I may discuss WAR without the positional adjustment or Wins Above Average, with or without the positional adjustment. If there’s no PADJ, just leave it out of the formula or use 1. If it’s WAA, leave out the .65.

Also, I may refer to Adjusted RG, which will just be 200*RG/RPG.

One little side note about Mr. Mills. You may have noticed that I said he is the third-best first baseman in 1876 in WAR, but is in fact rated as below average. Granted, there are only eight teams in the league, but we would still expect the third-best first baseman to be above average. In fact, only two of the 1876 first baseman have positive WAA, and the total across the eight starters plus two backups who happen to be classified as first baseman (with a combined total of just 29 PA) is -5.1. Apparently, these guys hit nowhere near 121% of the league average. In fact, they had a 6.03 RG v. 5.90 for the league, only a 102 ARG. Whether this is just an aberration or if first baseman as a group had not developed their hitting prowess yet, I have not done enough checking to hazard a guess. It is clear though, by the 1880s, with the ABC trio (Anson, Brouthers, and Connor) that first base was where the big boppers played.

During one of my earlier abortive attempts to analyze National Association stats, I found that in 1871 at least, first base was one of the least productive positions on the diamond. Did first base shift, position on the spectrum, or are we dealing with random fluctuations or the inherent flaws in offensive positional adjustment? Historians are probably best equipped to address this conundrum.


  1. Hi,

    I somehow completely overlooked this series. But it's very interesting to me because (once i get some career stuff taken care of) I'm keen to do some statistical reviews of past winning Reds teams, including the 1882 team. The issue of a replacement baseline is one that I've been thinking about, as one would certainly think that replacement level has increased over time simply because the pool of talent from which teams can draw is so much bigger now than it used to be.

    I'm obviously pretty new to these discussions, but I'm curious as to why the average winning percentage of the bottom-two teams would be a good indicator of replacement level. If anything, I'd think the average worst team would be your best indicator...and even then, I'd expect that few teams are actually as bad as an all replacement-level team unless incompetence is a more widespread problem than I think it is. I do see that those winning percentages match up reasonably well to the bench vs. starters comparison studies, but as you mention here, selection bias (among other problems) is a major issue with those studies...

    Anyway, I'd just be interested in hearing your views on this.


  2. As I've tried to express, I'm not particularly enchanted with any of the means of estimating replacement level. Anyway, I have been sketching out some stuff for post-1883, and I am going to switch to using .350 like I do today for those seasons.

    The NL, after expelling PHI and NY in 1876, went through a period of several years in which they had to desperately look for teams from places like Indianapolis, Milwaukee (both much less impressive then than now), Worcester, Syracuse, Troy,...These teams often were run on shoestring budgets and had some pretty lousy players playing for them. It wasn't until 1879 that historian David Nemec says that playing in the NL really became seen as a status symbol, any more prestigious than playing for a strong independent club, of which there were many. The Buffalo Bisons in 1878 went 10-7 (IIRC) against their NL exhibition opponents, then joined the NL in 1879 and finished a solid third.

    My point in all of that is that often the teams at the bottom of the standings were pretty weak, and one could argue that they both represented replacement level talent--relative to the world of high-level profession baseball as a whole. This is something we don't have to deal with today, since obviously the best talent all the way up the ladder is controlled by the major league clubs.

    I think that treating the NL's weaklings, who moved through the league like it was a revolving door , as the baseline of talent that Chicago or Boston could reasonably acquire in an emergency is probably a mistake.

    But of course I could be way off base about all of that.

    But for your study, starting with the Reds in '82, I would suggest sticking with defining repl level however you do it now. And of course I would include a measure versus an average baseline as well so you have something on firmer ground. (Incidentally, I have already run my numbers and scrounged up some anecdotes for the 1882-83 AA if you ever want to compare notes--you'll probably get around to writing your series before I get around to posting my write-ups for those seasons here. Also, please be sure to explain to the Reds fans that the modern day Reds started with the 1882 team and not with the 1869 team. :-)

  3. Thanks for that, it is helpful.

    I did quickly check out how a moving 10-year average winning percentage of last place teams varied over time, and it does decline from 0.380 or so in the 80's and 90's to 0.330 is the 20's, and down to 0.300 or so in the early 1900's... So, there would seem to be some basis for dropping the cutoff.

    One approach that I am considering might be to set the modern cutoff to 0.350, and then use the decline in average last place team winning percentage between today and whatever historical date I'm looking at to adjust the replacement level decline. That approach, however, would drop it to 0.300 by 1945 or so, and that might be too much of an adjustment based on your work. I guess it could reflect that modern day replacement level is set too low, of course...

    Like you, I'm not entirely convinced that the modern day replacement level is built upon a particularly solid foundation. I view it as more of a convenient, reasonably low baseline that seems to do a good job of balancing the value of playing time vs. performance. So I can see that there might not be anything wrong with using modern day replacement level going back quite a ways into the past as well.

    I dunno, I'll keep thinking on it. :)


I reserve the right to reject any comment for any reason.