Tuesday, December 06, 2005

Rate Stat Series, pt. 3

I had intended this to be the second installment, but as I was writing it I realized I used the concepts of PA generation in it(which I was planning to cover a little later in the series), so I jumbled the order so that what is here makes sense.

Since we know that Runs/Out is the proper rate stat for a team, the obvious solution is to carry that over and apply it to individual batters. We first put it to a logical test. Does it, on first blush, make sense to rate players based on R/O? Yes, it does. Runs and outs are the two crucial factors in an offense. Individuals, as well as teams, want to create runs and avoid outs.

We should also apply a more substantive logical test. What are we trying to measure with a batter’s rate stat? His theoretical individual ability as a batter, or his ability to help his team score runs? The first definition is feasible, but what we really want to know is what this batter will do for a team. An individual is valuable if he helps his team win (or, from an offense only standpoint, help his team score runs). Therefore, our rate stat, along with our quantity stat that we are converting into a rate stat, should reflect his ability to help a team score runs.

The difference between “theoretical individual ability” and “ability to help a team score runs” is very thin; you might even say razor thin. However, it does exist. We see the problem with multiplicative run estimators like RC and BsR. BsR attempts to model the run scoring process; therefore, it represents theoretical offensive ability. But the run scoring function is a team function, and therefore it is inappropriate to apply BsR directly to individual batters. We instead use Linear Weights or a theoretical team method, to model the batter’s impact on the team run scoring process. Of course, the results we get from applying BsR directly to an individual player are not too far off from the result from the more proper approaches. This is because the difference between the team function and the player function are very, very small.

We need to keep this principle in mind when evaluating R/O. Just because R/O is the right rate stat for teams does not mean that it is the right rate stat for individuals (and of course the opposite is true as well; that R/O is right for teams does not disqualify it from being right for players).

R/O passes our rudimentary logical test. Now we should see if it properly reflects the run impact an individual creates within a team context. Suppose we consider two players, admittedly extreme. One has 300 PAs, makes 100 outs, and draws 200 walks. His Runs Created is approximately .32*(200) - .1*(100) = 54, for .540 R/O. Player Two has 500 PA, hits 150 home runs, and makes 350 outs. His RC is about 1.46*(150) - .1*(350) = 184, for .526 R/O. So Player A has a higher R/O and therefore is the “better” hitter.

But suppose that we add these to an otherwise average team, giving them each 1/9 of the team plate appearances. This team has a .330 OBA and scores .12 R/PA and .179 R/O (approximately 4.5 runs/25.2 outs). If we add player A, they will now have an OBA of .330*(8/9) + (200/300)*(1/9) = .3674, which based on the reasoning in the previous installment will result in 25.2/(1 - .3674) = 39.84 PA/G. Their R/PA will be .12*(8/9) + (54/300)*(1/9) = .1267, and their R/G will be 39.84*.1267 = 5.046.

Following the same procedure with Player B, his team will have an OBA of .3267, will generate 37.43 PA/G, score .1476 R/PA, and score 5.523 runs/game. Player B’s team will score .477 R/G more than Player A’s team, and yet Player A had a slight edge in R/O. What is going on here?

What is going on is that we have seen the flaw in applying R/O to individuals. Player A has a tremendous ability to avoid outs with a .667 OBA. A team with an OBA of .667 would score an oodle of runs, because they would make so few outs, and the impact of each PA generated would compound many times because the next batter coming up would also have a stratospheric OBA. Player B is sub-par at reaching base, with a .300 OBA; he reduces the number of PA his team will generate. But he also has 1.200 SLG, and will drive in tons of runs. As a team, he would still be formidable, but his low OBA will be a bigger impediment.

The point to take away from this example is that a player added to a team who avoids an out does not generate subsequent PAs at his OBA--he generates them at his teammates’ OBA(or more precisely, the New OBA of the team with himself included). R/O implicitly assumes that he generates PA at the rate of his own OBA, overstating the importance of avoiding outs.

(A technical note: if you plug Player A’s stats into the BsR version on my site, you will not get an “oodle” of runs--you will get about 3.65 runs/game. There are a couple reasons for this: one, the BsR formula I’m using gives walks a B coefficient of .039, whereas others go as high as .1, which will make a big difference in such an extreme case. Also, the linear weight formula we used to estimate his RC in a team context will be a little off because he is very extreme, and will have significant impact on team LW--of course, Player B will change his team’s values as well. But even if taking these technical issues into account would put Player B’s R/O ahead of Player A’s, it would still be close, and the point here is to illustrate the huge discrepancy between R/O and the impact they actually have on team runs scored).

If R/O does not work, one very obvious alternative is R/PA. It gives the “correct” answer for our players above--.18 for Player A and .368 for Player B. Let’s look at a different set of players and see how R/PA stacks up against what we expect to happen. These two are the 2005 seasons of Chad Tracy and Todd Helton. Tracy had .346 OBA, and created 88 runs in 538 PA and 349 outs, while Helton had a .411 OBA and created 100 runs in 615 PA and 346 outs. Tracy’s RC/PA was .1636 and Helton’s was .1626. But in R/O, Helton was ahead .2890 to .2521.

If add the players to the same team as above (.330 OBA and .12 R/PA), Tracy’s team would score 4.708 runs/game, while Helton’s would score 4.756 runs/game. So now R/PA is not matching our expectations.

What has happened is that with R/PA, we do not account for the player’s avoidance of outs(read: generating additional PA) at all. When Tracy comes to the plate, he creates an equal amount of runs as Helton. But he gets to the plate less often, because he uses up more of his team’s outs.

While we needed an extreme case to show the failure of R/O for an individual, here we took two fairly ordinary players from this season. There are extremes that can be created or more unusual cases in baseball history that show a much larger discrepancy.

In the end, R/PA is useful as a descriptive stat(it is after all the number of direct runs he creates every time he hits), and R/O is useful as a theoretical measure of how good the player’s offense would be as a team, and is actually a pretty decent shortcut rate statistic. But neither R/PA or R/O fulfills everything we want our rate stat to do. Next time, we’ll look at how the flaws of R/O can get really out of hand when you try to stretch it further down the path of batter analysis.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.