Walk Like a Sabermetrician: Meanderings

* When the Indians played at the Phillies in June, multiple outlets reported that this was the Indians' first-ever regular season game in Philadelphia. It seems that the source of this was the Indians' own media notes on the game, and many of the outlets simply passed along this information. That's reasonable enough--one should have a reasonable degree of confidence in the information put out by the team, and not have to fact check everything.

On the other hand now, I would hope that someone would recognize that the claim the Indians had never before played in Philadelphia is absurd on its face. The Indians shared the American League with the Philadelphia A's for fifty-three seasons.

Of course, the note could have been easily corrected by changing it from "in Philadelphia" to "at the Phillies". My incredulity, mild as it is, comes not from the incorrect tidbit of trivia, but from the ignorance of history on behalf of people report about baseball for a living.

Granted, it's not particularly relevant to reporting on modern baseball to know about a team that hasn't played in that city in nearly sixty years, and it could just be a simple oversight, which we are all prone to at one time or another (quick: find a factual error in this post!) On the other hand, it has always seemed to me that people who make their living writing about or commenting about or compiling data about baseball would be big enough baseball fans to be aware of a franchise that won five World Series and featured luminaries like Connie Mack, Lefty Grove, Eddie Collins, Home Run Baker, ...

This phenomenon is not limited to the specific example of the A's--it's something that gnaws on me occasionally when I hear people talk. The Indians' play-by-play man, Tom Hamilton, is sometimes revealed to be somewhat ignorant of National League rookies or Korean and Taiwanese baseball, for instance. The best defense of professionals might be that they do this for a living, and so rather than being a fun diversion, it's a job. It still seems a little odd to me.

* I could be way off base on this next observation--it certainly wouldn't be the first time that I made a faulty generalization. (Also, I don't want to get bogged down in the identities of the players discussed. If you think I've mischaracterized the career of George Brett, then feel free to substitute someone else in his place). However, it seems to me as if the average fan, when evaluating great players, is not draw to the extreme poles of peak or career but rather to extreme performances on either pole. So he won't pick Sandy Koufax as one of his top pitchers, with his career-preferring double picking Nolan Ryan--he'll pick both Koufax and Ryan, because he's impressed by the extreme peak and the extreme career.

The result of this is a bewildering middle ground, in which Pete Rose and Sandy Koufax are simultaneously voted on the All-Century team (I am not saying that the silly All-Century vote confirms my theory; that vote would be completely consistent with people belonging to extreme career or extreme peak camps, and simply balancing each other out in the public at large. It also did not boast a particularly well designed voting system or an informed electorate. I may be using it as evidence, but am admitting that it could easily be used by someone arguing against me as well, or dismissed as meaningless). If you are an extreme career voter, then there's no way in hell you can believe that Sandy Koufax was one of the ten best pitchers of all-time. And if you're in the extreme peak camp, it makes no sense to believe that Pete Rose is the sixth-best outfielder of all-time.

However, if what is going on is that fans are impressed by one extreme or the other, then picking Koufax and Rose can be explained. I still don't think it makes a lot of sense, because taking a dual extremes positions excludes the players who were really good in their peaks and really good in their careers--which is the bulk of the great players in history.

The other complicating factor is that the average fan when evaluating a career probably looks at bulk totals rather than baselined value. When I talk about career value, I almost always mean career value above replacement. Just staying around and compiling only counts to the extent that you can exceed replacement level, and so the last few years of Pete Rose's career have no impact on my evaluation of him. But for those who are looking at career through the lens of totals, Rose's last years often are a strong positive (4,000 hits!)

The very greatest, of course, had tremendous peaks and tremendous career totals--Cobb, Ruth, Bonds, etc. There are some great players that had very good peaks but extraordinary career totals--Hank Aaron, for instance. There are some great players that had great peaks but only good career totals--say Pedro Martinez. And then there is the bulk of great players--guys like Mel Ott or George Brett . If you draw up your list by either extreme, these guys are not going to be at the very top. But if you evaluate by some combination of peak and career, these guys will rank comfortably ahead of one-trick ponies like Koufax.

* The five no-hitters thrown continue to be one of the driving factors behind the "Year of the Pitcher" storyline that the media has run with. But what is the probability of observing five or more no-hitters in a season?

I was going to write about applying the Poisson distribution to this question, but happily discovered that there have been multiple pieces that already covered that ground (see Bob Brown's article "No-Hitter Lollapaloosas Revisited" in the 1996 Baseball Research Journal, this post at Bayes Ball (great blog name, BTW), this one at Tom Flesher's blog, and this paper by some folks from Middlebury College's Econ department).

Since these folks have already done the legwork, I'm not going to offer a justification for this approach (they are much better qualified to do so in any event, so I'll refer you there). Based on the data in Flesher's post, the observed probability of a no-hitter from 1961-2009 is 120/201506 = .0006.

So far this season, there have been 3,166 games played in the majors (through 8/2, and counting each game twice since each is an opportunity for a no-hitter). The mean is .0006*3166 = 1.9 no-hitters. The Poisson probability for x observations is:

P(x) = e^(-mean)*mean^x/x!

So:

The first column gives the probability of observing x no-hitters; the second column gives the probability of observing at least x no-hitters. You can see that there is a 3.1% of observing exactly five no-hitters and a 4.4% chance of observing exactly five no-hitters, at least based on this Poisson model with a .0006 probability of a no-hitter in any individual game.

Of course, this model assumes that the probability of a no-hitter is fixed at the observed level over the last fifty or so seasons, regardless of changes in league environment. To crudely estimate a more 2010-specifiic probability, consider that the overall ML BA is .2594; the 1961-2009 average was .2597. This season is pretty much in line with the average BA from which the sample data comes.

Of course, the sample is just that, so we can figure a rough probabilistic estimate of no-hitter frequency. If a pitcher needs to record y outs to get a no-hitter, and each at bat is treated as independent, then the probability of a no-hitter is (1 - BA)^y. Of course each at bat is not truly independent, and each batter doesn't have the same BA, and you can add some other objections.

If you use 27 as y, you will definitely underestimate the frequency of no-hitters, as many no-hitters don't actually require 27 batting outs--there are outs made on the bases, outs made by sacrifices which don't figure into BA, etc. If you do what I am going to do, and set y = 25.2, which is the average number of batting outs per game, you're not considering that there are generally less non-batting outs when there are less baserunners.

Using .2594 and y = 25.2, the estimate probability of a no-hitter is (1 - .259)^25.2 = .00052, which over 3,166 games yields a mean of 1.64 no-hitters. Using that mean, the Poisson probabilities are:

Neither of these approaches is foolproof, but they both indicate that it is not extremely unlikely to see five no-hitters over 3,166 games.

As an aside, it's well-known that the Mets have never had a no-hitter in franchise history, covering 7,742 games (again, through 8/2). Using the Poisson approach and a .0006 probability (which ignores the quality of Mets pitching, park effects, etc.), the mean is 4.65, and the probability of zero is just .961%.

Using a binomial approach, the probability of zero is (1 - .0006)^7742 = .959%, and you can see that the Poisson matches the binomial very well. It is much easier to work with, though, especially when dealing with non-zero observations. If we wanted to know the probability that the Mets had pitched seven no-hitters, we'd have to compute C(7742, 7)*(.0006)^7*(1-.0006)^(7742-7), which a spreadsheet can handle but it's a big mess. The binomial estimate for the probability of seven Met no-nos is 8.90%, which is the same as the Poisson estimate.

I'll leave you with that; probability of x no-hitters for the Mets franchise:

1 comment:

TomAugust 3, 2010 at 11:19 PM
Interesting work. I had considered crunching some numbers to estimate any year-specific effects but, well, you can pretend I have some real rationale for failing to do so.

It was nice to see you working out a few things that I'd thought about but didn't pursue. The mean number of batting outs approach is a perfectly reasonable way to attack it, particularly since I doubt there's much payoff for the extra finesse.

I reserve the right to reject any comment for any reason.

Walk Like a Sabermetrician

Tuesday, August 03, 2010

Meanderings

1 comment:

Me, Elsewhere

Analysis Links

Reference Links

Blog Archive

OSU Baseball

End of Season Statistics

Win Shares Walkthrough

NL 1876-1881 Series

Labels

About Me