Monday, December 21, 2009

A Caution on the Use of Baselined Metrics per PA

I threw this together based on a Twitter discussion I had last week that included Justin (@jinazreds), Matt (@devilfingers), Josuha (@JDSussman), and Erik (@Erik_Manning)--hopefully I didn't miss anybody. Said discussion was going just fine between the other parties until I stepped in and said the opposite of what I meant, so I need to clarify my point. The end result will be that I obfuscate my point, but that's par for the course around here.

I really should just get around to writing the rate stat series that I have been promising since I started this blog, and then I could give my thoughts on this topic from A-Z in one place. But this is a lot easier, and the rate stat series would have eight parts and be remarkably dry reading as I go around in circles.

Suppose we want to express a baselined measure of value as a rate stat. In this case, I'll work with something similar to Palmer's Batting Wins--wins above average, considering only offensive production--but the theory behind it has wider applications.

The standard way of doing this (incidentally, one of the few things that Tango Tiger, David Smyth, and myself ever fully agreed upon on the topic of rate stats in our many discussions at FanHome (at least at the time--I certainly don't presume to speak on behalf of those gentleman)) is to look at BW/PA. If we were working with a standard runs created method, we would look at RC/out. But when our metric has already been baselined to average, we have already incorporated the run value of avoiding outs/generating PA. RAA/Out will double-count that aspect of offense, more or less.

Of course, we all recognize that the value of a run varies depending on the context in which the hitter plays, so we convert RAA to WAA, and we have something like Batting Wins. Let's look at two players credited with a similar number of BW, but in very different contexts with a big difference in PA:

Nap Lajoie, 1903 AL: 5.8 BW in 509 PA
Frank Thomas, 1996 AL: 6.2 BW in 636 PA

Incidentally, the BW figures here are my rough estimates; for the purposes of this discussion, it doesn't really matter how they reflect specifically on Lajoie and Thomas--I don't care to compare them to see who was better, I just needed a good example. They actually differ fairly substantially from those published elsewhere, but that's not important. There will also be some rounding discrepancies from using just one decimal place throughout the post, but the purpose of this exercise is not a precise examination of the two players.

Figuring BW/650 PA, we come up with Lajoie at 7.4 and Thomas at 6.3. From this, we can conclude that Lajoie was significantly more productive on a rate basis as an offensive player, right?

Let's get a second opinion first. If the stat we wanted to put on a rate basis was standard Runs Created, we'd generally do that by taking RC/Out and comparing it to the league average. My estimates have Lajoie at 207 and Thomas at 195. One needs not be an expert on the relationship between the scales of the two metrics to realize that 207-195 is a much narrower gap than 7.4-6.3.

What is the cause of this discrepancy? It's not the RC/RAA inputs, since they are based on the same formulas. It's not a case of the metrics being incompatible--RC/Out and RAA/PA (or BW/PA) correlate very highly when the samples are drawn from similar contexts.

The problem is that Plate Appearances (which are obviously the denominator for BW/PA) are not constant across contexts. Outs are, more or less. No matter what era the game is played in, what park it's played in, how many runs are scored, or anything else, there are still three outs per inning. And (approximately) 27 outs per game. Even if you had five inning games in one league and thirteen inning games in another, it will all wash out (or close to it) when you look at runs per out.

On the other hand, plate appearances are not constant across environments. In 1903, AL teams averaged 35.8 PA/G (actually AB+W only), while in 1996 AL teams averaged 38.7. Therefore, 650 PA in 1903 are not equivalent to 650 PA in 1996. 650 PA in 1903 represent the number than an average offense would generate in 18.2 games, but in 1996 they represent just 16.8 games worth.

Getting back to the actual PA used by Larry and the Big Hurt, one would think that since Thomas came to the plate 127 more times that he had participated in a much larger share of his team's PA (even when we recognize the difference in schedule length). However, Lajoie's 509 PA are equivalent to 14.2 games; Thomas' 636 to 16.4 (*). Thomas had 15% more opportunities when you adjust for context, versus 25% more when only raw PA is considered (and this is without considering the difference in season length).

In a higher PA environment, players will get more raw opportunities, but each PA has less impact on wins and losses, as each represents a smaller portion of a game. We can adjust for this by normalizing Plate Appearances to some "reference level", common for all leagues.

So let's instead look at BW/650 PA, except we'll normalize PA to an average of 37.2/game (this is roughly the post-1901 major league average). Lajoie will now be credited with (5.8/509)*(35.8/37.2)*650 = 7.1 BW/650 and Thomas with (6.2/636)*(38.7/37.2)*650 = 6.6 BW/650. The gap is .5 BW, whereas before normalizing PA it was 1.1.

If you'd like a formula:

(Baselined metric/PA)*[(Reference PA/G)/(League PA/G)] = baselined metric/normalized PA

or

baselined metric/normalized PA = [(Baselined metric)*(Reference PA/G)]/[PA*(League PA/G)]

where "reference PA/G" is simply the fixed PA/G value everything is being scaled to (37.2 in the Lajoie/Thomas example)

When looking at players within the same league, one doesn't have to worry about this issue--in that situation, one doesn't even have to convert from runs to wins unless they are so inclined.

Let me circle back and explain the underlying premise of this post again, as I'm pretty sure I've been too verbose and may have distracted from it. Basically, the point I am trying to make is that a batter's contribution occurs within the context of his team's games (or, if we'd like to divorce the player from his actual team, the idealized games of a league average team). What matters is not the raw number of plate appearances a batter gets, but the proportion of his team's plate appearances that he gets. That's the point, in a nutshell.

So we could look at Lajoie/Thomas from that perspective as well, making it explicit with the use of percentages. Lajoie played in a league in which there were 140 games in a season and 35.8 PA/G, so the average team would get 140*35.8 = 5,012 PA, of which he was given 10.2% (509/5012). Thomas was given 10.1% of the idealized team's PA (636/162/38.7).

Therefore, their opportunity as measured in PA was essentially equal. Thomas actually had 127 more plate appearances because he played in an environment in which there were a lot more to go around in each game, and because he played in a league in which there were 22 extra games played. We want to adjust for the former cause when looking at BW/PA; the latter is not a problem because Thomas also had 22 extra games in which to increase his raw number of BW (it might be something you want to consider, in Lajoie's favor, if you are comparing raw BW totals).

(Incidentally, one can use this principle to try to adjust for the differing numbers of PA players get as a result of being on good or bad offensive teams, even within the same league. The most notable metric to incorporate this factor is David Tate's Marginal Lineup Value. I'll leave a full discussion of the pros and cons of that approach for another time).

When expressing individual batter's productivity as a rate, there are legitimate reasons not to use outs. I've written about some of them before. The good news, though, is that using outs does not cause an excessive amount of distortion on the player level, as long as you don't take it too far (as Bill James' old system of Offensive Won-Loss Records did). If I had to present just one rate stat and it had to be the most accurate estimate of individual offensive performance I could possibly offer, it would not be runs/out--it would be something like the WAA/Normalized PA presented here or something even more complex. (Just to be clear: if you use outs as a denominator, the numerator should be absolute runs; if you use PA as a denominator, then you can put your baselined metric in the numerator).

But the nice thing about working with outs (and I am fully aware that I'm repeating myself) is that outs are constant across all contexts. Outs are fixed at three per inning whether you play in the Baker Bowl in 1930 or in Dodger Stadium in 1968. Avoiding a lot of headaches that come from making sure you've considered all of the variables when using PA as your denominator might well be worth the tiny bit of distortion that comes with using outs. I know it is for me.

(*) If you really want to get cute, you could argue that we want to look at PA/Out as the number of outs is not constant across all league-seasons due to factors like extra inning games, home teams that don't bat in the bottom of the ninth, rainouts, etc. I wouldn't waste my time but I wanted to acknowledge it.

4 comments:

  1. Why, you may ask, the emphasis on in the title and the post about "baselined metrics"? If you for some reason were working with RC/PA, wouldn't you want to consider league context in some way as well?

    Of course you would. But everyone with any serious sabermetric inclinations realizes that reflexively. It is nowhere near as obvious when you are dealing with baselined metrics, because the numerator (RAA or WAR or whatever) already includes a comparison to some sort of contextual baseline. But when you make it a rate, you need an additional adjustment to be precise, and that can sneak up on people.

    ReplyDelete
  2. A (very well-qualified) reader pointed out that I used two seasons separated by 90 years, and that the differences between PA/G for seasons in the same era will not be nearly as large. My response:

    You're right, of course, and I should have mentioned that the
    difference will be small in most cases.

    On the other hand, even between an adjacent recent season (2005-2006
    AL) there is a difference of .3 PA/G and a ratio of .992. It's
    comparable in magnitude to applying a park factor for a nearly neutral park, which sabermetricians do all the time. And unlike a park factor, it's a pretty straightforward adjustment that doesn't really rely on estimating an effect that can never be precisely quantified.

    To play devil's advocate to myself, if all parks had PFs in the
    .99-1.01, they'd probably be gnored for the most part.

    ReplyDelete
  3. Great stuff. Quick question, I keep seeing you post 19th century AA numbers. Do you have this decades minor league numbers in a SQL form? I've been trying to find it, but I've been unsuccessful.

    ReplyDelete
  4. JD, this particular AA was a major (1882-1891). However, I don't believe there is a downloadable DB with 19th century minors. SABR's minor league database is not downloadable which they address on its website:

    The statistical history of minor league baseball is very poorly documented. We view most of the statistics we currently display as being provisional, and we anticipate this will be the case for some time. We believe it is unwise to release downloadable datasets which are immature and have not been cross-checked for quality. It is our plan to offer downloads of a full year's worth of statistics (for all leagues) once all leagues in that year have completed the proofing process. We plan to release statistics for the 2009 season this fall or winter, and will proceed backwards in time from there.

    We may be able to offer to run specific queries against the dataset for research projects. Please contact John Zajc in the SABR office at jzajc@sabr.org for information on custom querying of the database.

    ReplyDelete

I reserve the right to reject any comment for any reason.