Walk Like a Sabermetrician: February 2006

Tuesday, February 28, 2006

World Baseball Classic

It seems from reading various baseball sites and blogs that I am part of a minority of people who is actually looking forward to the World Baseball Classic, and trying to figure out where to fit in sleep between trying to watch Korea/Taiwan and Japan/Red China. Yes, a lot of marquee players are not playing. Yes, the format of single elimination playoffs is far from ideal for baseball. Yes, the players are away from their training camps. I got the point. Those things diminish the tournament a little bit, I admit, but it’s still a heckuva lot better to watch glorified exhibition games then actual exhibition games in the first weeks of March. And I still believe that there is a chance that the event will be far more relevant then glorified exhibition games.

Before I get to the baseball aspect, I am going to touch briefly on the political aspects, which I am loathe to do, simply because when you write about politics you will probably have fifty percent of your audience disagree with you, and maybe a quarter of them get infuriated at you. Luckily for me, I don’t have much of an audience to alienate.

Some people seem to be opposed to the WBC because it fosters nationalism or jingoism or what have you, and we should all move past those petty little feelings and view ourselves as citizens of the world or something. Of course, this kind of attitude would also cast disdain on the Olympics, the World Cup, and any number of other international sporting events. I don’t think much of the Olympics, but that is partly because they try to foster an image of the pureness of competition and have Yoko Ono reading peace poems during the opening ceremonies, and mostly because you can’t slap USA on a curler’s shirt and make me care anymore about it then two old men in panama hats playing shuffleboard on a cruise ship in the middle of Nassau. I disdain the World Cup because soccer is the most boring sport known to man. It’s not because I have a problem with “our country” vs. “their country”.

Of course, the element of this that is a little ugly is when the allegations about people’s heritage and patriotism start flying. Enter baseball’s most annoying manager, Ozzie Guillen, and his slams on Nomar and A-Rod. First of all, I wonder what somebody who every time their interviewed during the World Series feels the need to stick his fist up and say “Venezuela” as if he’s practicing for the rise of Hugo Chavez’ worker’s paradise cares about the Dominican Republic and Mexico. It would be one thing if a Dominican questioned ARod, who was after all born and raised in Miami, about his decision, but for a Venezuelan to take it another step up from a patriotism issue by saying “He was kissing Latino people’s asses” is absurd. My dictionary defines Latino as “A person of Hispanic, especially Latin-American, descent, often one living in the United States”. I think that describes Alex Rodriguez to a tee, Ozzie. I guess if I ever have a hard time choosing between Cleveland and Cincinnati, which both have large white populations, Ozzie will accuse me of kissing white people’s asses.

And then there’s the Cuba issue--they're in, the State Department will not allow them to play, now they will, yaddayadda. Now I’m as anti-communist as anybody (probably more so then most--the favorite quotation I gave for my high school yearbook was “Better dead then red”), but I think that the best way to spread capitalism is to practice it, and go ahead and trade with the Cubans even if they don’t want to reciprocate. Now I do understand why those who have fled Castro have a problem with this sort of position, and that is why it is such a political hot potato in the US, whereas half of the stuff you own was made in China and nobody gives a flip. But the State Department should not have done that, and flip-flopping on it just made them look weak and gave Castro a propaganda victory.

That does not mean, though, that Major League Baseball should let them play. Unfortunately, it is co-sponsored by the International Baseball Federation, and they would probably not go along with it if Cuba was banned. So the ideal solution is again complicated by political realities. My reason for wanting Cuba banned is because of guys like Jose Contreras, and El Duque, and Yuniesky Betancourt. These are men without a country in the WBC. And I consider them all to be courageous and admirable, and it infuriates me that they are not allowed to play for their country because they didn’t enjoy living under a communist thug. I am very sympathetic to Ariel Prieto’s position: “I'm angry now that I can't play in the tournament. I'm really ticked off. "Cuba has no place in that tournament because it has always criticized professional baseball. If Fidel Castro doesn't want professional baseball, that's his problem. Those of us that have deserted because of political reasons could have come up with a team for the World Classic. The United States and MLB are at fault.”

The only thing I disagree with him about is that the “U.S.” is at fault. But it is a shame that he and his countrymen cannot play. What should have been done is to allow them to enter a “Free Cuba” team in the tournament, and let the Cuban government decide if they want to share the field with “traitors”. They should have established a game of chicken with Fidel Castro, not Ariel Prieto and Osvaldo Fernandez (I don’t think he’s still around. But I think Osvaldo is a great name).

Finally, Jose Contreras had some great thoughts along similar lines that are really worth reading:

Politics over, back to baseball. Some have criticized the inclusion of lesser nations like Red China, South Africa, and Italy, but including these countries, particularly South Africa and Italy, is a necessary sop to the notion of a “true” world championship. Red China is a little shakier on those grounds because Asia will still be represented by its three true baseball-playing countries regardless. But China is a huge untapped market and baseball would be silly to not pursue it, and hopefully the WBC will cause greater exposure for the game in Italy and other such places.

As for the tournament itself, it is difficult to predict winners and the like because the participating players may change eighteen times in the next week and a half and who knows how rusty they will be, how the pitch limits will affect the game, etc. And of course the sample size of a three game round robin pool leading up to single-elimination playoffs is next to nothing.

But looking broadly at the groups, you have to like Japan and either Korea or Taiwan, probably Korea, to come out of the Asian group. Group B features Canada, Mexico, South Africa, and the US, and the US is the obvious favorite here, with Canada probably second. Mexico could make it interesting though, and South Africa…I pity them. Group C of Cuba, Holland, Panama, and Puerto Rico is an interesting one. I just cannot envision Cuba as being better then the national teams of the DR, Venezuela, and the US. The success of their defectors in the majors does not point to that. Heck, it’s only one game, but they lost to the Dutch at the last Olympics, and that Dutch team did not include any of their major leaguers (not that they have a lot of them, but there is a Mr. Jones who plays center who I hear is a ballplayer). I would not be surprised to see Cuba really struggle, although I would also not be surprised if they won any of these matchups. But I’m going to go ahead and consider PR and Panama the favorites in that one. Then Group D, which features Italy, Australia, the Dominican Republic, and Venezuela. Whoever was balancing these groups didn’t think that one through it looks like--Venezuela and DR could very well be the #2 and #3 favorites in the tournament. If that pools games are going to be played on neutral turf in Orlando anyway, why not send Venezuela to Puerto Rico and bring Cuba to Orlando? Oh yeah, that’s right, the State Department. Never mind, I expected logic from MLB AND government. Big mistake.

So you have Korea, Japan, US, Canada, DR, PR, Panama, and Venezuela. Now without looking at the actual rosters, you have to like the red, white, and blue there. But then you look at the US roster, and you see that the rotation is Clemens, Sabathia, Willis, and Peavy? Not exactly Johnson and Halladay and Carpenter and Oswalt and Smoltz. Now Peavy and Willis are arguably as good as those guys, but they don’t have the same star power and CC Sabathia, Chief Wahoo love him, is not in that class. However, a bullpen that features Street, Wagner, Nathan, Lidge, and Cordero is more in line with expectations. Varitek catching…Lee at first…Utley or Young at second…Jeter at short…Rodriguez at third…sounds great. Outfielders: Damon, Francouer, Griffey, Wells, Winn, Matt Holliday!!? Did anybody call Adam Dunn? How about Grady Sizemore? Brian Giles? Pat Burrell? Carl Crawford would look good compared to Matt Holliday. I cannot wrap my head around the idea that you are going to make a US National Baseball team and put Matt Holliday on it.

Anyway, the US pitching is probably still the best, and the US should probably be the favorite. But if it comes down to Johan Santana against anybody, I think I might bet with the Venezuelans. Their infield is not spectacular, but Abreu, Cabrera, and Ordonez make that US outfield look like a bunch of rookies. Wait, Jeff Francouer was a rookie. The Dominican offense is scary too, but not quite as much now that Manny Ramirez has pulled out, and their pitching depth doesn’t compare. It really is amazing I guess that I could list all of those guys who were left off the US staff, and it still looks like the best in the tournament.

The wildcards will be Japan and Cuba, just because you can’t really tell how these guys will do against major leaguers. Based on the performances of Japanese and Cuban players in the majors, I have a great deal more respect for Japan’s chances. In the end though, I’d say the playoff format makes it a crapshoot between the US, Venezuela, the Dominican Republic, and Japan, in that order, with respect to Puerto Rico. And what exactly is on ESPN on Friday night that is so friggin’ important that they have to show Korea/Taiwan on tape delay?

Friday, February 24, 2006

A Review of "The Hardball Times Baseball Annual 2006"

The second edition of the good folks at the Hardball Times annual is much-improved from their first effort. Not that it was bad, but most of the articles were reprinted from what had appeared on their site, and the stat section seemed to be the main focus. This year, the stat section is back with all of the data it had previously, plus stuff (such as Win Shares for example) that did not appear last year.

There are reviews of all six divisional races as well as the playoffs. The playoff reviews incorporate WE graphs, which is a very nice touch, although I don’t completely agree with the games they selected for this treatment (but I should note that all World Series games are shown). This is followed by a commentary section on the 05 season, of which I especially liked the articles on the business side of the game and the World Baseball Classic.

The next section includes essays on baseball history and is highlighted by Bill James’ guest appearance to analyze Bert Blyleven’s win-loss records (there is also a guest piece by Rob Neyer in the book). As with anything else, the essay sections include some that have topics that interest me and some that don’t; some that I like and some that I think are overly pretentious. But on the whole, they are well worth reading.

The next section it entitled “Analysis” and is the kind of sabermetric stuff that is probably right up the alley of people who read this blog. The first is an article by John Dewan on the significance of the 100 pitch level. He finds that pitchers who average >100 pitches in the first half of the season perform better in the second half then pitchers who average <100 pitches in the first half but put up similar ERAs. I have a few problems with this study. The first is that pitchers who average more pitches have probably pitched more innings, and therefore given us more evidence that they are quality pitchers then others who make less pitches. Dewan’s data may suggest this as the gap in second half performance is less noticeable for the groups with high first half ERAs. I don’t want to overstate this point though, as pitches probably have a loose correlation with innings, since pitches are determined both by the amount of work and the style of the pitcher.

But the second problem I have is that managers are more likely to allow better pitchers to make >100 pitches then poorer pitchers. If you have Randy Johnson and Aaron Small on your staff, you do not start the season with a clean slate on your opinion of their ability. If Johnson is roughed up in the early innings, you still feel he is a quality pitcher, and you are likely to have a slower hook with him. It is the same problem that makes pitch counts, batters faced, etc. so hard to study--the best pitchers pitch more, just because they are the best pitches.

The second article is Dan Fox’s look at “luck” which I discussed previously here. Then Studes checks in with a piece that shows the empirical LW values of each event as well as for batted ball type. The only thing I would question here is that he uses the overall values for each event (S,D,T,etc.) times the percentage of line drives(or GB, etc.) that result in that event. It is possible that when a certain type of ball results in a hit, the run value is higher and lower. Perhaps more infield flies become singles when the infield is in for instance, and have a higher run value. I don’t know this to be true, and I don’t believe the effect would be big at all, and I’m not saying it calls into question the data presented at all--just that it would be interesting to see if certain types of events on a given type of batted ball occurred more or less often.

Studes then looks at parks, including a look at park factors for batted ball type. This article is very interesting, and I don’t say that just because he mentions the park factors I publish annually on my website. The questionable thing here is the reference to yours truly as “an internet baseball wonk”. Don’t get me wrong, I am not offended, but I have never been described as a “wonk” before. I have always associated the term “wonk” with a specific type of nerd or geek or what have you--one who studies the details of political stuff (i.e. “policy wonk”). I suppose that the word has broader applications, and that the implication I just mentioned is simply the most common. Regardless, being a “wonk” is much more flattering then some other terms that could be applied.

Studes goes for a trifecta with an article looking at DER and a runs above average approach that considers batted ball type. The encouraging thing to me about this article is that the DER actually seems to track the more accurate measure fairly well, so it is still presumably a decent gauge of fielding performance for periods before the more advanced data is available.

JC Bradbury of Sabernomics fame and David Gassko investigate the correlations of various event and batted ball rates for individuals from year-to-year, and in a related article Bradbury uses batted ball data to estimate OPS. Dan Fox then reviews his baserunning analysis methodology (Incremental Runs), and Studes finishes the section with a look at player contracts using Win Shares. To me, the analysis section was definitely the highlight of the book and contained a lot of interesting and though-provoking stuff.

The stats section includes a lot of the basic stats, plus RC, Win Shares, Incremental Runs, and some more basic data that you can’t find anywhere else, like errors broken down into fielding and throwing, double plays started, and percentages of PAs resulting in various batted ball types.

All-in-all, the folks at the Hardball Times have vastly improved their book from a year ago. And last year’s edition is not one that I regret purchasing. So I can definitely recommend this edition.

Friday, February 17, 2006

Near the Banks of the Olentangy

Thanks to the bizarre scheduling practices employed by NCAA baseball, in one week the Ohio State Buckeyes will open their 123rd season. Of course, the opening game will be played in Gainesville, Florida, and then the team will go to Jacksonville, Clearwater, and Bradenton for a total of sixteen games before ever setting foot on their home field. I will not go into a digression on the inanity of NCAA baseball scheduling, which gives an enormous advantage to southern teams, but suffice it to say, I am not a big fan of it.

Coach Bob Todd will enter his nineteenth season at OSU firmly entrenched with a school record 726 wins (.666 W%), 335 Big Ten wins (.657), and seven each of Big Ten regular season and tournament titles. He is still searching for an elusive trip to Omaha, where the Buckeyes have not advanced since 1967 and no Big Ten team has reached since 1984. OSU has had several close calls, most recently losing Super Regionals in 1999 and 2003.

The Buckeyes are coming off a 40-20 season in which they went 17-12 in the Big Ten. That was good for a disappointing fifth place, but the team went on to win the Big Ten tournament for the third time in four years, and got the automatic NCAA tournament berth that goes with it. The Buckeyes lost the opening game to #2 Oregon State, giving up a run in the eighth and ninth, but rebounded to defeat Virginia before the season came to an end with a loss to St. John’s. While the early Big Ten regular season performance was disappointing, ultimately with a Big Ten title and NCAA tournament appearance, it was a successful campaign.

Overall, the Buckeyes' .667 W% was second in the B10 behind that team up north (.689). In EW%, based on runs scored and allowed, the Bucks’ .676 was second to those who shall not be named as well (.684). In PW% based on runs created and allowed, Ohio’s .678 edged out those other guys’ .675. So the performance in the W-L column was very much in line with what you would have expected given the inputs.

Of course, in college sports, the changes in team composition for year to year can be radical. The Buckeyes’ key losses are pitchers Mike Madsen and Trent Luyster, first baseman Paul Farinacci, and outfielders Steve Caravati and Mike Rabin. Madsen ranked third on the team with +17 runs saved against an average B10 pitcher, and was second among the starting four in RAA (coincidentally, he was also the team’s designated #2 starter). He was drafted by the A’s and pitched well in the Northwest League last year (80 IP, 1.69 ERA, 7.7 K/9). Luyster was the designated #3 starter, and finished sixth on the team and fourth among starters with +8 RAA. Drafted by the Blue Jays, he did not pitch in pro ball last season.

Farinacci had a reputation as an excellent defender at first, but was a slightly below average hitter. Rabin was the leadoff hitter, and after a poor -14 RAA, .317 OBA junior season put up +2 RAA and a .396 OBA a year ago. He had very little power (ISO of .039 as a junior, .060 as a senior). Caravati was B10 Player of the Year in 2004, but never really got on track last year--despite this, his +17 RAA led the Buckeyes by a wide margin. For a team that struggled offensive last year (-12 runs vs. the B10 average), he is quite a large loss.

Only one player of note transferred out, pitcher/first baseman Jeff Carroll, who has joined the ranks of the preppy frat boys down I-71 at Miami. Carroll was average as a long man out of the pen last year, and didn’t do anything in his approximately 40 career PA. However, he wanted an opportunity to play the field, and apparently felt Miami would be a better place for this.

The Buckeyes welcome a freshman class of thirteen players, on whom I have no real insight, except for the one game I saw them play in fall practice, and did not bring in any transfers from other schools.

Looking at how the defense will fill out, sophomore Eric Fryer will return as the starting catcher. Fryer had a great freshman season (323/390/409, +2), a huge improvement over the production from the catcher’s position in 2004. He did not participate in fall practice due to an injury sustained in summer ball, but apparently will be ready to go in the spring. His backup, Kelly Houser, has graduated, leaving a muddled situation. Redshirt freshman Josh Hula and true freshman Justin Miller are the top candidates. Junior Adam Schneider has some pop, but is not great defensively and did not catch as much last year as he did as a freshman. He’ll pop up again in the DH discussion, and is more of an emergency or pinch-hit option as a catcher.

At first base, there is no clear favorite with the transfer of Carroll. Apparently, Miller or Fryer could play first when they are not catching, and true freshman JB Schuck could also figure in. From what I know of his high school exploits and his performance in fall practice, Schuck is a real talent and will force his way into the lineup somewhere, and will probably pitch as well. To me, he looks like an outfielder, but the outfield depth is pretty good, so maybe he’ll be on the Nick Swisher plan.

Junior Jason Zoeller will be back at second base, off a 302/353/451 dead average season. He is one of the top power threats on the team and good defensively (at least to these eyes). Senior Jedidiah Stephen is the incumbent at shortstop and is a similar offensive player to Zoeller, although perhaps not as good defensively at his position. He played third base as an underclassmen. Junior Ronnie Bourquin also returns at third base. Slowed by a thumb injury last year, he slumped to -6 RAA after a +8 freshman campaign. He will likely be the cleanup hitter for OSU.

The infield backups will include the runners-up in the first base derby and sophomore Chris Macke on the corners, and a number of candidates in the middle (sophomore Tony Kennedy, redshirt freshman Michael Arp, and true freshman Ben Toussant and Matt Currant). Senior Kris Moorman is another corner backup, and could figure in at first base, although I'll believe it when I see it. An intriguing option at first base is Seth Sanders, a true freshman from Ann Arbor, the home base of the enemy. Sanders looks like a power hitter, and showed a good eye in the one game that I watched (obviously not a good sample size). Macke filled in last year when Bourquin was injured, and hit 375/412/469 in 34 PA, but was not really seen again after Ronnie returned. While it does not appear to be in the cards, Macke at third with Bourquin moving over the first base would be a move worth looking into from my perspective.

In the outfield, junior leftfielder Jacob Howell is firmly entrenched. The B10 Freshman of the Year in 2004, he struggled with nagging injuries and faded to 270/323/322, and a team low -8 RAA. If he can return to form, he will be an important cog getting on base out of the #2 slot. Sophomore Matt Angle began to take over the right field job last year, and will be the center fielder and leadoff man in 05. Angle showed little power, but good BA and plate discipline last year, and I believe he will be a Rabin-type performer with a higher OBA due to the walks. He also made some fabulous grabs in the outfield. Junior Wes Schirtzinger is the leading candidate in right, although he has done little the last two years after suffering a wrist injury after a promising freshman campaign. The outfield backups will be senior Cody Caughenbaugh, sophomore Jonathon Zizzo, and true freshman Chris Griffin and Zach Hurley.

At DH, I expect to see a platoon, with Schneider against lefties and Caughenbaugh against righties. Caughenbaugh flashed solid power last year, but his BA was in the 270s which kept his run creation below average.

The starting rotation will be anchored by junior lefty Dan DeLucia, who rebounded from a tough freshman season to become OSU’s ace last year (3.49 RA, 3.54 eRA, +31). DeLucia is the real deal, and could contend for B10 pitcher of the year honors. Sophomore lefty Corey Luebke, the #4 guy a year ago, started out pitching great but faded as the season wore on and briefly lost his rotation slot. In the end, though, Luebke’s 4.68, 4.06, +13 season was very solid for a freshman and if he can improve with experience as many freshman do, Ohio State will have two scary left-handed starters. According to the official preview on the OSU website, junior Trey Fausnaugh will move into the rotation after being a reliever the past two years. He was the closer as a freshman and pitched very well, but last year he was rocked for a 7.93 RA and 5.43 eRA in 27 innings out of the pen. A member of the team told me that his biggest problem last year was a lack of break in his curveball. If that is the case, then perhaps it is an encouraging sign as coaching may be able to rectify that--he did not appear to lose any velocity.

The fourth spot will likely fall to sophomore Dan Barker, who pitched brilliantly in long relief last year, with a 2.75 RA in 30 innings. His .94 eRA was even better, but he only allowed hits on 19.1% of the balls put in play against him. So he was very “hit lucky” it appears, and should not be expected to duplicate his performance. But I think he is a very solid bet to be an effective #3 pitcher (I remain a little skeptical of Fausnaugh as the #3 starter).

Other candidates will likely get a shot at cracking the roation, although it is difficult to say who will get a shot at this, since there are seven freshman or redshirt freshman pitchers on the roster, not including Schuck. Often, with the compact schedule of games on the spring training trip, a number of pitchers will get the opportunity to work a few innings, and then when the team comes back north, hopefully one has emerged. Mid-week games also are an opportunity throughout the season for continued auditions.

One intriguing candidate for the job is fifth-year senior Chris Hanners, a lefty who was roughed up in his only real starting shot as a sophomore and was decently effective as a reliever his freshman year. However, he has made just one appearance each of the last two years due to constant shoulder troubles.

The bullpen will be anchored by sophomore Rory Meister, who was dominant last season with a 2.23 eRA and second on the staff at +22 RAA. His $H was around 22%, so he may have been a little fortunate on that account. The bullpen will be filled out by the other pitchers, and hopefully two or three effective relievers will emerge as the season progresses--if I had to guess, I’d say Shuck, freshman righty Jake Hale (a 24th round pick of the Indians, who had some shoulder issues in the fall), and freshman lefties Josh Barrera and Eric Best. The insight behind this is that they are the only freshman pitchers whose profiles on the team website include quotes from Coach Todd. Other freshman pitchers include redshirt lefty Matthew Selhorst, true freshman righties Taylor Barnes and Jake Weber, and true freshman lefty Brad Hays. One thing that is for sure is that OSU will have no shortage of left-handed pitching.

The schedule begins with three games in Gainesville against Wake Forest, Florida, and Coach Todd’s alma mater, Missouri the final weekend of February. The first weekend of March will be in Jacksonville against UNC-Greensboro, Jacksonville, and Western Michigan. The next weekend in Clearwater will pit the Bucks against Lehigh, Northern Illinois, and Bethune-Cookman. Starting March 19th, the Buckeyes will go to Bradenton to play UMass (1 game), Cornell (3), and Vermont (2), before finally opening at home March 30th against Toledo. Other midweek games, all at home, will feature Miami, Central Michigan, Oakland, Cleveland State, Eastern Michigan, and Pittsburgh.

The Big Ten schedule is very favorable from an OSU’s fans perspective, because the final two weekends of the year will be at home when the weather is most likely to be acceptable for baseball watching. The Buckeyes open at Iowa, come home for Illinois, go up north for the second year in a row, then to Indiana, home for Purdue, at Michigan State, and home for Minnesota (for the second year in a row as well) and Penn State. Iowa’s appearance on the schedule is their first since 2003, while Northwestern is off the schedule for the first time since 2001. All Big Ten series are nine inning games on Friday and Sunday and a pair of seven inning games in a Saturday doubleheader.

It is difficult for me to forecast how this team will do in the conference race, because I honestly don’t know a whole lot about the other teams, other then reputation. Illinois won the regular season title last year with a hot start, while the team up north was the favorite but did not live up to that billing. The perennial contenders, OSU and Minnesota, both struggled at times, but wound up as the top two finishers in the tournament.

Baseball America picked Ohio State to win the Big Ten. Boyd Nation of the Boyd’s World college site found that OSU returns the most RAA of any Big Ten team, although he also found that the predictive value of that measurement was not a great predictor of the upcoming season. My thinking is that there is no hitter on the team, with the exception of Eric Fryer, who I would expect to have a worse year at the plate then they did a year ago. That does not mean that I expect that none will, just that no name jumps out at me. With the short season of college baseball and the inherent variability of sports, it is almost a certainty that some will. But there are at least two guys in Jacob Howell and Ronnie Bourquin who I anticipate will be much better then they were a year ago, if they are fully recovered from their injuries. The struggle for the offense will be power, as it was last year.

The pitching staff loses two senior starters, but I think the top two in the rotation are pretty solid bets. The back two are a little more shaky, but the bullpen at least has a clear anchor. I think the pitching will probably not be quite as good as a year ago, but time will tell. In short, I think this is definitely a Big Ten Tournament team, and a team with a very good shot at the Big Ten title, but I’m not ready to say they should win it. The good news is that March 30th is just forty-one days away.

Friday, February 10, 2006

Rate Stat Series, pt. 5

This series is pretty disjointed, and this entry will be no exception. I have realized that in this piece I will discuss some of the assumptions that influence my thinking on the other issues I have discussed, so this probably should have come first.

The major point of this installment is that we should state our assumptions and our goals before we begin so that we can make the right choices. The right choice will be different, potentially wildly different, depending on what we are setting out to do. This series purports to choose which rate stat is best to use for an individual batter. But what is best?

It depends on what you are trying to measure of course. A frivolous example is that ISO is a good metric if you are trying to measure power, but a horrible metric if you are trying to measure on base ability. We first must define what the properties of a good rate stat for a batter would be. If we use a different definition, we will get different answers.

What I have used as my definition, throughout this series without explicitly stating it(which was a mistake), is that the true measure of a batter’s production is how many runs an otherwise average team would score if we added the batter to it. Actually, wins instead of runs ideally, but adding more runs will almost always add more wins for a team that begins at average.

What I want to do is look at a team that scored, say, 750 runs in a league where the average team scored 750 runs, and add one player to that team, and give him one-ninth of the team plate appearances, and see how many runs that team will score with the player added. We will account for both the player’s impact on the team scoring rate and the number of plate appearances that they have. A “good” rate stat will be one that accurately reflects the rank order of players when using this criteria and as a bonus, if it could accurately reflect the magnitude of the players’ contributions. In other words, if we find that Batter A will add 50 runs to a team and Batter B will add 45 runs, our ideal rate stat would rank A ahead of B, but not by an enormous amount--in fact, by a margin that if converted to runs above average would be about five.

If you start with different assumptions, you may get different answers. For example, if your goal was to find out how many runs a batter would add to a team filled with replacement-level players, you will likely reach similar conclusions about which rate stat is superior, but you may not, especially for close calls. If your goal is to estimate a batter’s contribution if he does not affect the team’s run environment, you will potentially get different answers. If your goal is to estimate an individual’s ability on a realistic range of different teams, you will have a much more complicated probabilistic function and get potentially different answers. And on and on. But for my purposes, I have defined the goal above, and every comment I make about a rate stat being “right” or “wrong”, “better” or “worse”, etc., is based on that assumption.

With that out of the way, we can start to tackle the issue of comparing players to different baselines. I have written a long article on my site entitled “Baselines” which talks at length about various ways people have set baselines, why they have done so, which I prefer, etc, so I will not repeat that here. Instead, I will point out that based on the assumption I gave above, about adding a player to an average team, the most obvious choice is to compare to the league average. This would be .500 in OW% terms(while most sabermetricians acknowledge OW% as faulty, it is still convenient to use the terminology, so long as we understand it is just a shorthand and do not start building bridges based on it).

So the baseline I will look at is average. This is also convenient because the other baselines are not as straightforward to apply. Later we will see a rate stat, R+/PA, that requires not only R/PA but also a comparison of OBA to the league average. If you want to apply a replacement baseline to this, all sorts of sticky problems arise. First of all, when most sabermetricians say the “replacement level” is a .350 OW%, they are defining OW% by R/O as we did in the last installment. So you need to convert the .350 OW% into a runs/PA ratio. And then you still have to deal with the OBA. Do you still use league average OBA in the R+/PA formula, and then compare an individual’s R+/PA to a replacement player’s R+/PA, or do you compare the player’s OBA directly to a replacement player’s OBA? And what is a replacement player’s OBA anyway? That answer is tied directly to your answer to what is a replacement player’s R/PA, since R/O = (R/PA)/(1 - OBA). But how did you answer that? What assumptions are you making about a replacement player? Is he a certain X% below the league average in hitting singles, doubles, etc? Or is he around 95% of the league average in terms of singles with bigger losses in secondary skills? And how does his sacrifice bunt rate compare to the league? Is it higher? Does he hit into more double plays, or does he strike out more and hit into less?

Those are all useful questions to ask if you are serious about applying a replacement level type analysis. But they make life a lot more complicated. Average, while it may well be flawed, has the advantage of being very clean. It is a mathematically defined fact rather then a calculated value based on a series of assumptions.

So we’re using average, if for nothing else then to make this discussion manageable. This does not mean that I advocate using average as the baseline for all of the types of questions you want to answer, or even many types of questions. But I do think that average is a good starting point for theoretical discussion, especially since, again, it is the only choice for which we know all of the parameters we need to know. Now what do I mean by applying a baseline anyway?

All it means is that we compare the player’s performance to the performance of a baseline (in this case average) player. If a player creates 100 runs in 400 outs, and an average player would create 75 runs in 400 outs, then he is +25 runs above average (or +.0625 RAA/PA). Now since this is a series primarily about rate stats, the second format is a rate, and is more useful to us. But if you want to go from “rate” to “value” or include playing time, then you are going to want to use some sort of baseline.

Sometimes, it may be useful to use the baseline even if we do not convert our rate stat to take playing time into account. For example, suppose we are have determined that a team would score 800 runs with our player and 750 without him. We could leave that as +50, which would be a rate if we have made some constant assumption about how much playing time he will get. For example, the simplest version of Marginal Lineup Value(link) assumes that the player got 1/9 of the team plate appearances, and was expressed as the number of runs he would add over the course of a full season. While it is not a format that one usually sees, a +50 MLV is still a rate--it's a rate of runs added/season.

And that leads to another point about rates. Since people are used to seeing a rate expressed as runs per out, or runs per PA, they will sometimes have a negative initial reaction to a rate which does not look like that. Like the MLV rate. Or like RAA/PA, a very important rate stat we’ll discuss later. That can have negatives, of course (as can MLV). And you can no longer divide them. For example, a player with 6 R/G in a 4.5 R/G is often written as 1.33. The relative stat in this way is instantly adjusted for league context, and people like percentages. But if you are working with RAA/PA, you can’t express it as a percentage of the league, because the league is zero. You can’t say a player who is +.08 RAA/PA is -3.846 times better then a player who is -.02 RAA/PA. RAA/PA must be compared relatively as differences.

I will expound on that topic more in an upcoming installment. The point here is just that a figure like that is every bit as much a potential choice of rate stat as the formats that people are used to seeing. And that when you bring the baseline into it, the difference is just the total above the baseline divided by some unit of playing time, while a ratio needs to be manipulated to be in that kind of format. So if your total stat of choice is RAR, you might want your rate stat of choice to be expressed in the same units. The difference allows you to do what the ratio cannot.

Thursday, February 02, 2006

A Review of "The Book on the Book"

The last book review I did was on a seventeen year old book; this one is a little less then a year old, so maybe by the next time I do one it will actually be timely. Anyway, The Book on the Book by Bill Felber is subtitled “A Landmark Inquiry into Which Strategies in the Modern Game Actually Work”. I hate to rip on these, because I realize that the publisher wants to make the book sound dynamic and exciting, but this book is not by any means a definitive book on the Book, nor is it a landmark inquiry into anything.

If you are well-read in sabermetric theory, you will not learn anything from this book. If you are not, you still may not learn anything from this book. The studies Felber presents are wrought with questionable methodology and selective sampling problems. For example, a study on how pitchers who pitch a lot of innings in a given season are affected by fatigue studies the late season performances of pitchers who pitched 250 innings. 250 is a lot of innings to pitch in the modern era. In order to reach 250 IP, you generally not only have to stay healthy, but also pitch effectively. Pitchers who have their effectiveness sapped by fatigue do not reach 250 IP. It’s like studying the people who finish the marathon and concluding that they didn’t collapse in the last five miles.

There are also some misrepresentations of sabermetric concepts, such as Run Expectancy, which Felber at one point refers to as “odds”. The standard RE table does not present odds, it presents expected values. There are tables of course that do print the odds of scoring x runs in a given situation, and these tables are actually more valuable then a standard RE table, because you can derive the RE from them. But the table Felber presents is a standard table. Actually, the discussion on using RE to value strategic decisions that follows really isn’t bad--but it’s at about the same level as what was published twenty years ago in The Hidden Game. So if you’ve read that, you’re not going to learn anything, and if you are new to the field, you would be better served by reading the original.

The part of the book in which Felber evaluates front office decision making has a number of annoying parts to it, like the use of linear regression to estimate W% from payroll. It is never specified whether he is using all the data in his regression or doing a separate regression for each year; whether he uses the absolute dollar figure or compares it to the league average; and most incredibly, what the formula is in case you wanted to use it yourself.

Then a look at what a player “should” be paid, based on TPR (modified to be above replacement), ensues. It does not consider such things as the players’ free agency/arbitration status--this is forgivable to me, because you can still attempt to make a general statement that x RAR equal y dollars or something. But he assumes that salaries should be distributed based on the linear difference between their TPR an the position average TPR. Now maybe they should be, but to evaluate contracts without studying the reality of how the market actually works is a little silly. Furthermore, it appears as if he calculates these values specifically for every season. This could cause a huge change, like ARod getting $25 million/year, to totally skew the evaluation of dollar values of the other players at his position.

There is a look at park factors which points out that they are not very stable from year-to-year and discusses some possible explanations for this…without really touching on the most likely culprit, random chance. There is a bizarre digression on how if Sammy Sosa would catch a flyball with two hands instead of one he might be able to save fifteen runs. He criticizes Win Shares because they compare to a low standard and because they don’t include Loss Shares. I agree with this; however, it seems as if Felber paints the choice as between the WS baseline and the average baseline, when of course there are choices somewhere in the middle that most sabermetricians make.

Even more bizarre then the Sosa discussion is one on switch-hitting, in which he claims that even the 1999 Ken Griffey would have benefited from “judicious platooning”. He shows that Griffey had +57 Batting Runs, but was +55.43 from the right side, leaving him around +2 from the left. He remarkably defines a replacement level player as “[one] who would have hit 5 percent below the average performance level for the league”, and then claims that this replacement level player would have been +4 runs in Griffey’s left-handed PA! This leads to the ridiculous (and incorrect, as somebody 5% below average can’t have positive LW) conclusion that Griffey could have been platooned with some bum versus lefties and the Mariners would have been better off.

On the positive side, I think Felber is a talented writer. I don’t want you to get the impression that I think he’s an idiot--I just don’t think he is qualified to write a book of this sort. Some of the studies with selective sampling issues and the like would be ok as blog entries or something, to generate discussion or provoke thought, but they don’t make for a good book. I cannot recommend The Book on the Book.

Walk Like a Sabermetrician

Tuesday, February 28, 2006

World Baseball Classic

Friday, February 24, 2006

A Review of "The Hardball Times Baseball Annual 2006"

Friday, February 17, 2006

Near the Banks of the Olentangy

Friday, February 10, 2006

Rate Stat Series, pt. 5

Thursday, February 02, 2006

A Review of "The Book on the Book"

Me, Elsewhere

Analysis Links

Reference Links

Blog Archive

OSU Baseball

End of Season Statistics

Win Shares Walkthrough

NL 1876-1881 Series

Labels

About Me