Monday, August 27, 2012

Sabermetric Generations

Allow me the indulgence of going meta-sabermetric here. I don't really like doing that, as this is the point at which you are writing about sabermetrics itself rather than baseball, and baseball is a heckuva lot more interesting than sabermetrics. However, I think there are some things about the field that those of us who fancy ourselves as sabermetricians should consider. This post is a half-developed missive on some of those things.

My basic premise here is that sabermetric people can be divided into three generations--not perfectly, of course, but as a general classification. I say "people" because I don't want to make it about who is or is not a "sabermetrician" (Do you have to publish your own run estimator to be a sabermetrician? Write a blog? Do any research at all?), but just about people who would consider themselves to be either practitioners or consumers of sabermetric research, or both.

These three generations are not defined strictly by age, but rather by when you came of age as a saberperson (or how, but the when and the how elements are very closely related). My three groups are:

1. Pioneers--This is by far the most restrictive group, as it only includes those who actually did pioneering sabermetric research (whether they called it by that name or not). Earnshaw Cook, George Lindsey, Pete Palmer, Bill James, and the like are the pioneers.

2. Second Wave--These are the folks who came to sabermetrics largely through the work of the pioneers--reading the Baseball Abstract or The Hidden Game or various SABR publications or The Diamond Appraised, and the like. They may or may not have gone on to become researchers themselves; they may just be consumers of research. It is also possible that their own inquisitiveness led them to sabermetrics without a firm push from Bill James or another pioneer, but they still came onto the scene after the work of the pioneers had been published. Many, many people fall into this group, and even listing a few would be foolish. I consider myself in this group although I also share a few traits with the third group, as I explained before.

3. Internet generation--These are people who have come to sabermetrics in the last 10-15 years and may have done so without ever reading the work of the pioneers. Their interest in sabermetrics is young enough to have been fueled by reading the work of second wavers (or of course their own inquisitiveness). A typical path to sabermetrics for a member of the Internet generation would have been to read Rob Neyer. From there, they sought out BP or Bill James.

Before I use these classifications to make a point, I need to issue a couple of disclaimers. The first is that I am not an evangelist for sabermetrics. I don't go to the airport and hand out flowers, or go to people's doors and hand them tracts. I don't really care if you are interested in sabermetrics or not, and I don't tailor my writing to appeal to folks who are on the fence.

So when I express my concern about something below, it's not borne out of any fear of what overzealous internet posters will do the reputation of sabermetrics or anything like that; it is simply out of an ordinary desire for intelligent and factual discourse.

The second disclaimer is that this is not a get off my lawn post. As I stated in the piece linked above, I straddle the fence between the second wave and the internet generation, and while I might wish to place myself in the former group, you could make a reasonable case that I am in fact a member of the latter group. No group is inherently better or worse than any other; this post is about certain negative traits of some members of the internet generation, but there are many positive things that can be said about the internet generation.

The third disclaimer is that this whole matter of putting saberpeople into groups and then describing those groups is obviously dangerous for the same reason that forming any sort of artificial groups of people is. So it should go without saying that I am not claiming that all members of a generation share certain characteristics or behave exactly the same way.

My concern is about the fact that with the wealth of information available today, particularly through sites like Baseball-Reference, Fangraphs, and StatCorner, it has become quite possible for members of the internet generation of saberpeople to cite statistics without really understanding them at all. Of course, second wavers also had this ability, but it wasn't so instantaneous. You had to wait for new books to be published in the spring or you had to go through the trouble of figuring statistics yourself.

Of course, there are many members of the internet generation who do fantastic research, develop their own statistics, or figure stats themselves. They are not who I am talking about. I am talking about the subset of folks who don't do their own research, don't endeavor to truly understand what the numbers mean, and yet still talk about them authoritatively.

With sabermetrics prominent on the internet baseball scene, it is much easier to learn about the field. This is on the whole a very welcome improvement, but it is also easier for people to get indoctrinated into sabermetric principles without fully understanding them. There is a sabermetric-brand of conventional wisdom which can be just as misleading as the conventional wisdom of the traditionalists when it is wielded by an individual who has not done his own legwork, but has simply read it or told it and believed it to be true.

The ubiquity of sabermetric ideas in online discussions of baseball makes it quite possible for members of the internet generation to be introduced to sabermetrics almost simultaneously to the moment at which they become serious baseball fans. This pretty much happened to me as recounted in the earlier linked post, although it was primarily through printed works of pioneers rather than through the internet. That being the case, I can't criticize this path of discovery. However, I do think that there might be a critical thinking advantage to having first accepted the conventional wisdom, gradually looking at it skeptically, and then having those questions reinforced by the discovery of sabermetrics. Today it is quite feasible for young baseball fans to skip the conventional wisdom altogether and jump right into sabermetric ideas.

A specific example of the kind of thing I'm talking about is the notion that any pitching statistic like ERA that does not incorporate DIPS principles is worthless, and that only FIP or tRA or a similar metric is appropriate. The problem is not with the truth that ERA has a lot of biases which have often been overlooked, or that FIP is a better predictor of future performance. The problem comes when a relatively good measure like ERA (and one that measures actual runs allowed, which are unquestionably important if not wholly attributable to the pitcher) is thrown in the dustbin as if it is no more useful or telling than Batting Average or raw RBI count, and that anyone who even considers it is classified as a dinosaur.

In defending ERA, I am not saying that it is inherently wrong to come to the conclusion that FIP or Metric XYZ is not the best tool for measuring pitcher performance--just that it is wrong to reflexively come to that conclusion, and that sometimes the advocates of such a position can take on a zealous tone. This tone often echoes that of reflexive anti-sabermetric screeds.

You may be thinking to yourself that I am attacking a strawman, and that no one actually thinks like that. I didn't quote/link anyone specifically, because singling out message board posters is not the point of this discussion--but sentiments of this sort are out there.

This is where my admission that this post is half-developed becomes painfully clear, because I really don't have anything to offer about what can or should be done about this (given that I am not a sabermetric evangelist, my developed answer would most likely be centered around the premise that individuals are responsible for their own rhetoric). At this point, it will devolve into a paean to do-it-yourself sabermetrics.

There are now at least four large-scale implementations of WAR floating out there--Rally/Baseball-Reference, Fangraphs, BP's WARP, and the Baseball Gauge's WAR. As a result, people will sometimes wonder why sabermetrics can't have a meeting of the minds, hash out all of the differences, and produce one unified version of WAR that can be presented to the wider world.

There are a number of reasons why I think this is a bad idea (the danger of presenting any one metric as *the* uberstat; the legitimate uses of alternate baselines, run estimators, park factors, league adjustments, position adjustments, and all of the other components that go into WAR; the notion that consensus is a positive for its own sake), but that's irrelevant, because it will never happen. The hypothetical moment that it did happen would prove Gary Huckaby right--sabermetrics would be dead. Enforcing standardization for problems in which the answer is often subjective would discourage innovation and discredit alternate views on the question of ability v. value (among others).

Additionally, the existence of one version of WAR, widely accepted and presumably published by major websites, would in my estimation do more to discourage people from figuring their own statistics than anything else in the history of the field--more that Total Baseball or Baseball-Reference or Fangraphs. If you were a new convert to sabermetric thinking, and were told that there was one metric that was the best and was freely available, what would be your most likely reaction?

1) Awesome, this sabermetrics thing is not nearly as complicated as all of the screeds suggested it would be.

2) Darnit, I was hoping to find the unified theory of everything myself.

3) Darnit, I can't believe I don't get to try to figure out how to use a slide rule, or make a spreadsheet, or do SQL coding, or however these saberwhatevers get their results.

There is a lot of value in figuring your own sabermetric statistics, even by just applying other people's metrics. There's no better way to learn about the inputs and how they are combined than by actually walking through the process yourself.

At one time, in order to have up-to-date access to sabermetric stats, you pretty much had no choice but to figure them yourself. This resulted in a waste of a lot of man-hours of sabermetricians, but it also meant a group of people that were better informed about the construction and therefore the objectives and philosophy of the metrics they were using. The easy availability of statistics today is a great boon to the field, to be sure, and it is particularly great for serious practitioners who already have a good understanding of metric construction. I am not a luddite--the downside of a potentially less-informed average consumer of sabermetrics does not outweigh the benefits--but it also should serve as an impetus for transparency in computational explanations and for continuing reminders of the why in addition to the numerical results themselves.


  1. A boulder of salt because I forgot all about Baseball Gauge, but this post seems to indicate that their version of WAR is dead as far as being an ongoing published-on-the-internet metric.

    Anyway, as I said on the Twitter, this is fascinating (Colin just stole my word on Twitter) and I don't think I disagree with any of it.

  2. As I attempted to say on twitter, In my opinion you're confusing having an interest in sabermetrics for being a sabermetrician. Think of someone having an interest in a science vs. someone who is a scientist. They're different things.

    In this case, people interested in sabermetrics are interested in learning about the discoveries in baseball, or about what those discoveries seem to say, but they aren't interested in making further discoveries themselves and taking that extra step.

    Sabermetricians (Saberists, whatever you want to call them) are those who try and make the new discoveries, or apply those next steps, etc. etc.

    Those interested in sabermetrics are more likely to make errors and misinterpret or misuse sabermetric principles because well, A. they don't have that same devotion as sabermetricians most of the time (not always) and B. they don't have the itch to try and figure out why a surprising result they think they've read might not be true and instead just accept it as gospel.

    I don't think there's a generational issue here at all - it's just that the internet made sabermetric findings so much more accessible to the public that the group of those who are just INTERESTED in the subject has grown tremendously; after all before the internet, it was a lot harder to find the information, and if you were one of the people seeking the info, you were more likely to be a sabermetrician yourself (or you wished to become one).

  3. I think that what you're saying makes some sense, but I do *kind of* agree with your own comment that you might be fighting against a strawman a bit. What's the problem with writers (who aren't researchers) building off what others have done? Is it that we can't write about sabermetric stats without doing our own research, because that means we don't understand what goes into the stats the same way?

    I'm not sure that's fair. I've spent a lot (read: A LOT) of time trying to familiarize myself with the stats and the methodology of different saber approaches and stats. Just because I haven't done a lot of "hard" research doesn't necessarily make my take on said stats less valid, does it?

    Expecting everybody (or most everybody) to take their fandom to a level where they are sabermetricians (or explore theory) themselves is kinda unrealistic, I think. There's value in changing the discussion, too, so that the casual fans talk about the sport in a way that's more descriptive of what's actually going on as well.

    (But I don't think I'm disagreeing with you on that particular point, from your piece.)

    I'm not sure what sparked this - are there people that you're reading who are "too dismissive" of regular stats? Are people not backing up assertions of sabermetric statistical validity? I guess I have trouble understanding the problem that's being addressed here -- maybe with that in mind, this would make a little more sense to me.

    I will say, however, that this seems to be a thoughtful post. I hope that it sparks some reasoned discussion about this issue, because I'm sure there's valid arguments surrounding this.

  4. Jason, I did not realize that the Baseball Gauge WAR was dead. The perils of posting things one wrote over a year ago without a fact check.

    garik, I agree with your point to some extent. The paragraph that begins "Of course" attempts to draw a distinction along those lines. Jason put it well on Twitter when he described my post as dealing with "sabermetric users and consumers". I think where we disagree is that there are now a lot of people writing about baseball infusing sabermetric statistics or principles. Whether these people would consider themselves sabermetricians or not, or whether they would be considered sabermetricians by other sabermetricians or not, they do represent sabermetrics to the general public. Many of these people have excellent understanding. Some of them don't. And the followers that these folks tend to inspire often don't understand at all and simply parrot numbers.

    Bryan, no, your take is not less valid because you don't do research, and I did not mean to imply that it was. But the reason your take is valid is because you have put in the time to familiarize yourself with the methodology. There are lots of people who haven't done that, and those people are not doing themselves or sabermetricians any favors.

    I don't think that people are necessarily too dismissive of regular stats--but they are often too accepting of one particular sabermetric approach. That's the point I was trying to make in the ERA/FIP example.

  5. I'd put the third wave not at internet, but at post-Moneyball, the period where the popularity of sabermetrics really took off.

    Usenet was a key to the second wave and a sabermetrics history that doesn't have it is missing a big part. Usenet (which essentially died off post-Moneyball as blogs took over) is the source for Baseball Prospectus, Baseball-Reference, DIPS, PECOTA, ZiPS, and a ton of writers and analysts (Huckabay, Kahrl, Law, Woolner, McCracken, Forman, etc).

  6. Dan, that is a fair point (Although I certainly didn't intend this to be a sabermetric history). rsbb is easy to over look for those of us who were not a part of it, largely because the archives are so poorly preserved (I write this as someone who has spent a fair amount of time trying to find stuff in rsbb archives).

    In any event, I don't think that rsbb necessarily falls outside of the second wave classification, as (at least from my impression) it seems as if the dominant outside influences on the participants were James and Palmer.

  7. I find myself contradicting myself in my thought process in trying to make up my mind if I agree with you or not. On the one hand I agree completly that it's important to learn about anything before givinv your opinion about something. It would be weird.for a random person to say that medicine a is better than medicine b just because he read this somewhere online. He doenst know any of the mechanism that are involved. And such is the same case with asvanced stats. Being pro FIP and mindlessly disgarding era would be wrong if you dont know why fip is better than era.

    On the otherhand, if medicine a is proven to be better than medicine b, and is widely a cepted so, then I dont see the problem to state this. Eventhough you might not know the mechanisms involved.
    ERA has many flaws and has been proven quite a few times that it isn't a good predictive or a discribing stat (not that fip is perfect, but it is better). So I don't really see the harm if a kid or anyone else for that matter, states that fip is a better stat than era.

    Anyhow, I hope you understand the point Im trying to make. And sorry for the typos, typing on a smaftphone is real crappy.

  8. I would contend that it hasn't been proven that ERA is less of a "describing" stat than FIP (I should have used RA as the example rather than ERA, since I use RA all the time and have never really used ERA). I'm not sure exactly what you mean by "describing", but from the way I interpret it, it's almost a tautology that ERA is a better describing stat than FIP.

    In any event though, the issue isn't really that someone might claim that FIP is better than ERA, but that the actual number of runs a pitcher allows would be tossed aside as a completely worthless piece of information, one as flawed as pitcher wins. That type of extremism to a particular evaluation tool is almost always unwarranted, but is certainly unwarranted if the person making the statement doesn't have a firm grasp of how FIP works.


I reserve the right to reject any comment for any reason.