Thursday, May 26, 2011

Thursday, May 19, 2011


Print Baseball Encyclopedias

As I grow older, I try to stay alert to warning signs of old-fogeyism. One or two such signs are not particularly concerning--they can just be written off as personal quirks/eccentricities, which we all possess to one degree or another. A prime example for me is cell phones. I hate the things, and I always have. I finally got one, only because it was cheaper than paying for a landline, and if there's one thing I hate more than cell phones, it's spending money on any type of phone.

When it comes to baseball, one of the possible signs I've noticed is my continuing love for print encyclopedias. I think it's great that we have Baseball-Reference, Retrosheet, the National Pastime Almanac, the Baseball-Databank, and the like, and obviously there are countless advantages to computerized data that you and I take advantage of every day. Still, I have yet to warm up to the idea of going to Baseball-Reference, clicking on a page, following a link somewhere else, and wasting an hour or two just wandering in the statistical record of the game. I still do this all the time with print encyclopedias. This post is a tribute/review of them.

Of course, the print encyclopedia is a dinosaur. It always was a bit of a wonder that one could publish a multi-thousand page book, carrying a hefty hardcover price, and sell enough of them to make it a worthwhile business endeavor, especially with annual or semi-annual editions. Perhaps they never really earned their keep anyway, but they should have.

The advent of computerized equivalents has driven the print encyclopedia out of existence (although apparently the erstwhile ESPN Baseball Encyclopedia is still being shopped to publishers). If that is the inevitable cost of progress, then so be it--I wouldn't give up my Lahman database to get a new edition of Total Baseball if that was what it would take. Still, I miss the print encyclopedias--and it seems as if other people do to.

As I write this (New Year's Eve), the current cheapest prices listed on for a copy of the final edition of each of the printed encyclopedias (new or used) are:

* Macmillan (10th edition, 1996): $44.99

The 9th edition is available for as little as $25.

*Sports Encyclopedia: Baseball (2007 edition): $123.08

The 2006 edition is available for as little as $3.31.

*Total Baseball (8th edition, 2004): $99.65

The 7th edition is available for as little as $3.58.

*ESPN Baseball Encyclopedia (5th edition, 2008): $95.80

The 4th edition is available for as little as $1.73.

*STATS All-Time Baseball Handbook (2nd edition, 2000): $3.99

The exception, and not really an iconic book as it only went through two editions and presumably had the most limited printing run of any of the five.

I'm not sure if these prices reflect actual demand for the books in question, or whether sellers think they have something valuable and are setting the price above the intersection of the demand and supply curves. Assuming that it is a real phenomenon, it suggests that there are a fair number of people who miss the print encyclopedias so much that they are willing to pay a high price just to have the final update.

I have at least one copy of each of the big four (excluding the STATS book from that designation) on my bookshelf at all times. Of the four, the two that I use most are ESPN and Sports Encyclopedia: Baseball. Of all of the encyclopedias, I have to count SE:BB as my favorite. It's certainly not the most statistically complete or the best-edited, but it's the only one of the four that breaks from the career register format and instead presents a season rosters format.

I've always felt that the season rosters lend themselves better to browsing than the career registers. (This is the part where the readers scream, "With a computer you can have both!") Not only does it allow one to look at team composition and track changes from year-to-year, it allows one to view an entire league-season on 2-4 pages, making it much easier to get the big picture for a season.

The SE:BB is not without flaws, of course. The book is filled with typos, many of which were presumably there from the first edition to the last. Two quick examples, both from the 1994 edition (although I'd be very surprised if they were corrected in later updates):

* Johnny Kling is listed as "Johnny King" with the roster for the 1901 Cubs, and in the 1901-19 Batter Register (later Cub seasons correctly list him as "Kling").

* The header for the 1972 NLCS says "Cincinnati (west) 3 Pittsburg (East) 2". Perhaps if this was a listing for 1882, it could be considered authentic to the times.

There have to be dozens of similar errors throughout the book, none of which are damning to its utility as a baseball reference but all of which do build up to an uneasy feeling of neglect. Still, the charms of the book overcome that for my money.

Like its cousin, the Macmillan, the statistical selection in SE:BB was formed at its first publication (1969 for Big Mac, 1974 for SE:BB). OBA is nowhere to be found, nor is CS or pitcher home runs allowed. Fractional innings pitched are rounded, an the typesetting varies throughout the book, making some sections more difficult to read. Sometimes space requires severe truncating of batting lines--Dick McAuliffe went 7-27 as a 20 year old left-handed hitter for the 1960 Tigers, but that's all you can find out.

The ESPN encyclopedia, edited by Gary Gillette and Pete Palmer, is my favorite of the three career register works. Mostly this is because it is the most recent, superceding Total Baseball. For the most part, the statistical selection is the same as TB. In both cases, I'd love to have a better offensive rate than OPS+, and I think they tried to hard with respect to fielding categories, but both give the basic categories necessary to build standard statistics.

Total Baseball is unique because of the volume of the text that accompanies the statistics--short biographies of notable players, team histories, a history of sabermetrics, and a bunch of other articles that changed from edition-to-edition. More than any of the other encyclopedias, the article turnover created a reason to buy each new edition (other than, of course, the updated statistics).

The MacMillan must be given respect due to its status as the pioneer; the research that went into producing it has been incorporated by every serious baseball historical work of any stripe since that time. As an encyclopedia, though, it's heyday was the first edition. It soon had SE:BB as a competitor, and with the two including essentially the same basic data, the (IMO) superior format of SE:BB made it an unfair fight. MacMillan also played fast and loose with changing statistics for silly ends. Later editions cut this out, and added some interesting data like team home/road splits and sketchy Negro League records, but by that time Total Baseball was on the scene.

The STATS All-Time Major League Handbook was the most thorough encyclopedia for individual statistics, but as such it is the one that has taken the biggest hit from the existence of Baseball-Reference. No other encyclopedia offered complete batting, pitching, and fielding data (including all of the minor categories like GDP and sacrifice hits allowed), but the sheer volume of data sapped the book of any character it might have otherwise had. While Big Mac has standings and playoff records and the like, and Total Baseball had all of that and the articles, there was no room in the Handbook for anything other than the player career register. The ancillary material was shuffled off into an equally large All-Time Sourcebook.

While the massive print encyclopedia may be something of a relic, I do think it would be wonderful if it could live on. Obviously I know nothing about the real-world feasibility of what I am about to spout, but it would be great to see an organization like SABR step up to the plate and subsidize an updated print encyclopedia (even if it had to be in PDF format, as SABR has done with the Emerald Guide) every half-decade or so. Eventually the desire for such a tome might be foreign to even the crustiest old baseball historians, but I think it's safe to say that day is still several decades off into the future.

Standard Deviation of Franchise W%

Speaking of electronic encyclopedias, this is the type of exercise that they make a breeze, which previously would have been an arduous chore. I figured these a while ago with the intent of using them in some other discussion, but that never materialized so I'll dump them here.

These charts simply show the standard deviation of full-decade W% for each major league franchise. I have criticized the use of decades as a line of demarcation for baseball statistics in the past, but this is not a through analytical endeavor and they do provide an easy, straightforward manner of categorization. I have defined the decade here as 1901-1910, 2001-2010, etc, not because I have any particularly strong feelings on the matter of decade division but because it works better since 1) it includes 2010 and 2) the first decade thus defined corresponds with the American League's 1901 ascension to major league status.

There are four different standard deviations shown for each decade--"whole" is the StD for teams that completed the entire decade. This is fairly arbitrary, as it allows the 1961 AL expansion teams but excludes the 1962 NL expansion teams (the four 1969 expansion teams are obviously excluded as well). "All" is the StD for all franchises that played in the decade, even if it was for as little as one season (actually, the shortest in-decade tenure is two years for the 1969 expansion teams). "1901s" is the StD for the sixteen franchises that have played continuously since 1901. While they now make up just over half of MLB, they at least provide a constant frame of reference throughout the century. "Expan", as you might figure, is the StD for whichever of the fourteen expansion franchises competed in a given decade.

By this measure, the 1980s and 90s stand out as very competitive periods in the game, and the 2000s were a step back from that. However, the standard deviation of franchise W% in the last decade were essentially the same as the 1950s and 60s, and still well under the norm for most of history.

The next chart gives the average W% for teams by decade broken down into 1901s and expansion teams. It also lists the best and worst franchise W%s for the decade, but those lists include only the teams that played ten seasons in each decade:

In the 1980s, expansion teams actually had a slightly better record than the 1901s, but they have lost ground in the last twenty years. Of course, most of the big city teams are 1901s, with the major exception being the Angels. The spread between the best team W% and worst was higher in the 2000s than it had been since the 1960s, but I wouldn't attempt to make anything out of it.

Two Team Cities

During a bout of encyclopedia browsing, I noticed that the two Boston teams both had dreadful 1906 seasons. The Braves were 49-102, but the now-Red Sox were even worse, losing three more games (49-105). I made the mistake of pointing this out on Twitter and saying that it "had to be the worst" such record.

Of course, it didn't have to be anything, and it isn't. It is only the third-worst combined record by teams in the same city since 1901. While I'm sure someone has done this before, a quick search turned up nothing. I considered Brooklyn to be New York (meaning that from 1903-1957 New York had three teams), and I considered the Angels/Dodgers and Giants/A's as sharing a city (when applicable). The ten worst single season records for the two or three teams combined:

At least Boston 1906 was the worst in something, as it was the worst non-Philadelphia combined record. Philly has seen some bad records over the years, but none worse than 1919 when the Phillies were 47-90 and the A's were 36-104. The worst years for each of the two-team cities other than Boston and Philadelphia were St. Louis 1913 (108-195, .356), Chicago 1948 (115-191, .376), Bay Area 1979 (12-199, .386), New York 1965 (127-197, .392), and Los Angeles 1992 (135-189, .417).

The best records are:

Four of these top ten featured a crosstown World Series, led by the 1906 victory by the White Sox over the Cubs; the others are St. Louis 1944, New York 1951 (Giants/Yankees as the Dodgers dropped the three-game NL playoff), and New York 1952 (this time Dodgers/Yankees). The banner years for the other cities were Boston 1915 (184-119, .607), Philadelphia 1913 (184-120, .605) and Los Angeles 2009 (192-132, .593).

The overall records for each city (for years in which they had multiple teams) are:

The cities in which the combined record has been good still have two teams; the ones in which they were poor do not. Shocking but true.

Saturday, May 07, 2011

Great Moments in New York Post Horse Racing Info

Great Moments in Yahoo! Box Scores

They'd better make sure this issue doesn't carry over to my fantasy team.

Wednesday, May 04, 2011

Great Moments in Yahoo! Box Scores

I guess the Dodgers trouble meeting payroll came to fruition sooner than anticipated.

Tuesday, May 03, 2011

Scoring Self-Indulgence, pt. 2: Scoring Pitches and Strikeouts

When I score a game, I almost always keep a pitch-by-pitch record of the game, unless for some reason I have to juggle watching the game with some other task, and will not have the ability to accurately record each and every pitch. Even when I set out with this as my intention, I often find myself unconsciously scoring the pitches anyway.

My system for tracking pitches only records the basics--whether it is a ball or a strike, and any of the basic subgroups contained within those two categories (intentional balls/pitchouts, swinging strikes, called strikes, fouls). Some people attempt to keep track of pitch locations or pitch types; of course, Pitchf/x has rendered this even more of a chore than it was previously, and some people (hello, nice to meet you) just aren’t good enough at observing pitch locations and distinguishing pitch types to even attempt to put in this level of effort.

The final pitch of a plate appearance is not recorded separately--it is implied by whatever event follows. For example, if a batter draws a walk, I don’t record the fourth ball independently of noting the walk. If a pitch is hit into play, you’ll see a symbol for a base hit, or a groundout, or whatever the case may be. I don’t see any reason to waste another pencil stroke on spelling this out.

The left side of the empty scorebox is used to record balls; the right side is reserved for strikes, and the very top (and on the very rare occasions when necessary, the very bottom), with much smaller letters, is where two-strike fouls are recorded. The order of pitches in indicated by letters of the alphabet--the first pitch is “A”, the second pitch “B”, and so forth.

Balls usually don’t usually need any elaboration--intentional balls/pitchouts are the only common subcategory. I do not distinguish between the two; it is usually pretty obvious which is being employed if you consider the context of the plate appearance and the pitch sequence. An intentional ball of any stripe is simply circled.

The other, much less common alteration needed to balls is the automatic ball, on the rare occasion that the umpire makes that call. The symbol for this is simply a lower case “a” in front of the usual symbol. For example “aD” would indicate an automatic ball called on what would have been the fourth pitch.

There are more modifiers needed for strikes. Called strikes receive no alteration, while a left bracket “[“ is put around the outside of a foul and a left brace “{“ is put around the outside of a swinging strike. The foul symbol is not used with two-strike fouls, since by definition they could be nothing else. Another modifier I use which can be applied to strikes of all kinds (except two-strike fouls) is circling the letter, which is used in case of a bunt attempt.
A couple of examples will hopefully make this pretty clear:

The first pitch (A) is a garden variety ball. The second pitch (B) is a called strike. The pitcher is called for a rare automatic ball (aC) before what would have been the third pitch. The fourth pitch (albeit the third actually delivered) is a foul. The fifth pitch (E) is another foul. The sixth pitch (F) is a ball, and the seventh pitch (G) another foul. Finally, the batter flies to right on the eighth pitch, for which the pitch is not explicitly noted--the occurrence of a flyout is sufficient to demonstrate its existence.

In this plate appearance, the batter shows bunt on the first pitch (A) but takes a strike. The second pitch is a standard ball (B), but the third pitch is a pitchout (C). The batter swings and misses on the fourth pitch (D), and does it again on the fifth pitch for a strikeout.

The batter attempts to bunt the first pitch, but he bunts through it for a swinging strike (A). He attempts to bunt again on the second pitch, but this time he fouls it off (B). He then takes a ball (C), fouls off a pitch (D), and eventually grounds back to the box.

There are several different possible symbols for a strikeout that actually becomes an out in my system--a swinging strikeout, a called strikeout, a strikeout where the putout is something other than catcher unassisted, a strikeout on a missed bunt (that is a swinging strikeout on a bunt), and a strikeout on a two-strike foul bunt. In these examples, I will not include the pitch sequence, since that has already been explained and it would just clutter the scorebox and distract from the out itself.

This is a standard swinging strikeout. The solid dot is my universal symbol for an out; any time the batter-runner is retired at any point, the dot will appear somewhere within his scorebox. This makes it much easier to quickly see how many outs there are in the inning, and also eliminates some potential confusion in cases in which a certain code could indicate an out or could indicate something else.

And the inscrutable backwards K for a called third strike.

Sometimes the scoring on a strikeout is something other than the standard catcher unassisted. By far the most common is the catcher throwing to first for the out (23), although there are other possible and weird ways for this to occur.

This is the symbol I use for a foul bunt with a two-strike count, resulting in a strikeout. As I’ll show later, the squiggly line is my symbol for a bunt on a ball in play, so the symbol applied to that of the strikeout has a clear meaning.

If a bunt is attempted but missed for a third strike (that is, the batter offered but did not make any contact), then the brace that indicates a swing for a non-third strike is included in the symbol above to distinguish it from the more common third strike bunted foul.