Tuesday, March 13, 2007

Historical Offense by Position

One empirical topic I have always been interested in is how offensive production varies by fielding position. I have written a number of pieces for this blog on that topic. This is another one, but it will examine all of the years from 1901-2005.

I got the data from the National Pastime Almanac, which is basically a repackaging of the Lahman database. For those who aren’t database whizzes, it provides a decent amount of filtering options, that makes it easy for instance to get hitting by all players at a certain primary position. And that is exactly what I did. The positional assignment for each player-season was based on the position that they played most often.

One quirk that this program apparently has is classifying players with no field time as catchers. For example, Eddie Collins in 1929 appeared in 9 games with 9 PA, and never played the field--the almanac lists him as a catcher. This is obviously an incorrect classification, and I did not realize that this problem was there until I had compiled all the data. Hopefully, though, the small numbers of players affected (with small numbers of plate appearances) will not be enough to affect the results.

UPDATE: I got a nice note from Ron Gudykunst, the creator of the National Pastime Almanac. There is now an updated version including 2006 data which has fixed the flaw I mentioned. So it remains a problem with the data here, but not with the program.

I used Estimated Runs Produced as the run estimator, and AB-H+CS for outs (or AB-H if CS was not recorded). Then I took the ten-year moving average of each position’s R/O versus the overall average. The complete table of these offensive positional adjustments is available as a Google Spreadsheet. In addition to each position separately, I included the combinations of 1B/DH and LF/RF. Again, these are ten-year averages, so “1913” represents 1913-1922. The table thus ends at “1996” because this is 1996-2005.

I also made a few graphs to illustrate some of the interesting relationships. On the graphs, “1” is 1901, “4” is 1904, etc. The first includes all of the positions and may be a little hard to read:

Next, we can see how pitcher hitting has gone through the century:

Looking at the three outfield positions together is interesting; today, there is a fairly large gap between the production at the corners and the production out of center, but in mid-century this gap was much smaller, and in the early part of the century, center fielders were more productive hitters then their compatriots:

Finally, second and third base are two that are oft-cited as having flip-flopped on the defensive spectrum. This data gives the point of no return as 1928, but the gap has closed significantly in recent years:

I have not endeavored to interpret a lot of these results because I don’t have any particular insight to share. Hopefully, though, they will be of interest to some readers.

  Thanks for this info!

    Those graphs are AMAZING. I'll be mining and referring to them for months.

