Walk Like a Sabermetrician: May 2013

One of the common traps I fall into in attempting to come up with topics for this blog is latching on to a sentiment expressed by one or two random people on a message board somewhere, proceeding as if this viewpoint is widely held and needs to be debunked. Hopefully, the resulting missives have some general value as a discussion of the underlying topic rather than as simply a response to a strawman.

A sentiment of this type that has caught my eye a few times recently is the notion that the actual baseline or reference point used in a total value metric doesn’t really matter. This seems so self-evidently wrong that it’s not worth addressing, but perhaps it would be useful to take a step back from the question of what the baseline should be (i.e. average v. replacement, or where exactly the replacement level line is drawn), or the theoretical justification for the baseline of choice (such a concept of “freely available talent”, a way to peg the value of a minimum salary veteran, the level of an average bench player, etc.), and instead ask why so many are compelled to use a baseline when looking at baseball statistics at all.

The most basic reason to use a baseline is to strike a balance between quantity and quality. One can rank batters by total hits (quantity) or by batting average (quality), but both of these are incomplete on their own as they consider only one of the dimensions of the observed performance. Sometimes, you might only be interested in one or the other, and that’s okay--there's certainly nothing wrong with looking at a total or a rate only as the question under investigation demands. But for the majority of questions people ask that involve player comparisons, both quantity and quality must enter into the equation.

A baseline is a very simple way to accomplish this. Define some floor below which additional playing time is not deemed helpful, or possibly even should be penalized. Use this floor as a reference point and you have now incorporated both quality and quantity. Again, I’ll stress that this is the simplest rationale for using a baseline that I could offer--I've said nothing about ancillary benefits (such as theoretical purity or a reference point that models a meaningful real world concept). Such ancillary benefits are often the reason behind advocates’ choice of a particular baseline, but they are not the focus of this post.

That all should seem simple enough, but there are a few other points to make, which will work best in bullet format.

* People will sometimes object to the use of a baseline by stating that the use of a particular baseline makes the exact value used as a baseline the focal point of one’s player evaluation system. I disagree with this characterization, although it may be true of some extreme baseline choices (a baseline of 1.000, for instance, would have incredibly distortive effects). As long as a reasonable baseline is selected, the performance of the players in the segment of the game under consideration remains the focal point of the metric. That is not to say that the baseline is of no consequence (more on this below), but if sensible it does not greatly distort a comparison between two players--it facilitates it.

* Baselined metrics are a natural target for criticism since it admittedly can be difficult to explain exactly why the baseline has been set where it is (average is an exception to this since it is intuitive). Even if one can eloquently articulate the definition of replacement level (or alternative) that has been used, the process that led to a given value is not so easy to define. Usually there is a fair amount of judgment involved, and that means that the sabermetrician’s subjective judgment has come into play.

The good news is that the minute differences between baselines are rarely material--using a .340 baseline or a .360 baseline will result in very similar results, especially for single seasons. So if one can agree on the logic used to hone in on the baseline, the actual point at which the line is drawn is not as important.

* The notion that the choice of baseline doesn’t matter at all is another matter. It’s difficult to respond to this claim, since it’s obviously wrong on its face. It’s more interesting to try to figure out the perspective(s) from which it might be true. I have two suggestions, one well-developed and one a pseudo-psychological mess:

a) Baselines don’t matter if you have two players with equal playing time. For example, let’s suppose that the league average BA is .260 and “replacement level” is .230. Player A hits .315 and Player B hits .260, each in 600 AB. Relative to average, Player A is +33 hits and Player B is +0 hits. Relative to replacement level, Player A is +51 hits and Player B is +18 hits. In each case, the difference between the two players is 33 hits relative to baseline (including the zero baseline, raw hits, since Player A had 189 and Player B had 156).

But as soon as you change playing time, the baseline comes into play. If Player A had 400 AB, and Player B had 600 AB, then relative to average Player A is +22 hits and Player B is +0 hits. Relative to replacement, Player A is +34 hits and Player B remains +18 hits. We’ve gone from a difference of 22 baselined hits to a difference of 16. This example is not extreme enough to change the rank order, which is precisely why I chose it--to illustrate subtle differences even in cases in which the conclusion appears obvious before applying a baseline at all.

b) A disproportionate number of assertions that baseline doesn’t matter seem to come from people who consider themselves the smartest guy in the room. Perhaps these people don’t need a baseline to balance quantity and quality. But the rest of us need some help, a systematic manner in which to compress two dimensions of information onto one scale for comparison. Doing so does not require one to discard the rate and playing time components, nor does it require one to follow in lockstep with the conclusions implied by the chosen baseline.