Monday, March 31, 2008

... Be a Believer (?)

The older I get, the more I care about baseball (or at the very least, my interest has leveled off at its peak), the more I know about baseball, the less invested I am in the performance of my team, the Indians.

Some of that is surely natural. After all, as a kid my levels of attachment were clearly unhealthy. I was despondent for at least a week after the World Series losses in 1995 and 1997. Only kids or mentally unstable adults have such reactions to mere baseball games; fortunately, I was the former, so I grew out of it (and started obsessing over much more important sporting events, in which there truly is an epic struggle between good and evil).

The last point is actually one that I think may help explain it. When one is a child, he does not control his own destiny, and he is not free to form his own associations. Latching on to the sports team that is geographically closet or that his parents root for may be an act of convenience, or an act of conformity. In my case it was the former. Rooting for some other team has the potential to serve as an act of rebellion, a small outlet of individuality that can be expressed without any real repercussions.

When one becomes an adult, though, choices can be made (more) freely, and one can form his own identity. What are the first choices of importance that many young Americans make in adulthood? Choosing a college to attend is one for many people. I may have been born in the Cleveland area, through an act of God or parents or however you’d like to phrase it, but I chose where to go to college. I could have gone anywhere (not really, of course; intelligence, academic accomplishments, and certainly not least of all finances has bearing on that matter) or nowhere, but I chose my school. Perhaps that is why college sports loyalty runs so much deeper than pro (and if you went to a university with a Division I athletics program and disagree with the premise of that sentence, than I would humbly suggest that you went to the wrong school).

However, my love for the game itself has never waned--it has only gotten stronger with time. Stronger despite quickly abandoning any notion that I could actually play at a level above the backyard (my baseball career highlight is either a catch made on the street or…never mind, it’s too pathetic to detail here. Suffice it to say, I suck). Stronger despite recognizing that the league and its teams are businesses with an eye on the bottom line just like any other (of course, that helps the decline in devotion to any one team, but it doesn’t explain why I follow that league more closely than before).

The recognition of the business-nature of professional sports is, along with the college/free will explanation, the best explanatory factors that I can come up with. As a capitalist, recognizing that all businesses are out to maximize their profits in no way causes me to revile them all. However, it does compel me to choose between them on a rational basis (Which will be the best investment on the stock market? Which will offer the best career opportunity?), rather than on a matter of where they are located (What would one think of an investor who says "American Greetings is from Cleveland, I should buy their stock"?)

In the same way, while I can still pull for the Indians because they are/were my “hometown” team, I can also look around the league and see teams that I admire for other reasons, like the philosophy of their GM (Oakland) or the players they happen to employ at the moment (the Chicago frickin White Sox of all teams). The Indians are still #1, and will always be, (and they are helped by the fact that I like Mark Shapiro and Eric Wedge and Chris Antonetti), but there’s no blind devotion to the cause anymore, and there’s no hating whoever knocks them out of the playoffs just because.

In the final analysis, the childhood favorite team served as the hook that got me to bite on baseball. Had the Indians not opened a new ballpark in 1994 and emerged as a contender, who knows when or if I would have become a baseball nut? But they served their purpose in that regard, and then I somewhat discarded them.

On the other hand, there are folks who are older, well-rounded baseball fans that remain fanatical followers of their team. Perhaps I’m just too cynical or too wrapped up in singing Hang on Sloopy to relate to the average baseball fan. Either way, it is the general prospect of Major League Baseball being played that has me smiling, rather than the fact that the Cleveland Indians will open their campaign with a decent chance to return to the playoffs.

NOTE: For those who are not familiar with official Indians team jingles (hopefully 99% of the audience), the title of this piece and the Indians preview are derived from a 1980s entry in that genre. Unfortunately, I cannot find the lyrics to this song online, so you will have to make due with my memory of what they are (also, I did not hear this song when it was actually in use):

Indian Fever; it’s catching fire with everyone
Indian Fever; you can be part of the fun
Go to (winner?) at every game, that’s when the excitement begins
So catch Indian Fever, be a believer with the Cleveland Indians

Indian Fever; it starts from the very first inning
Indian Fever; each game is a brand new beginning
It’s the hits, the homers, the double plays, it’s how you feel when we win
So catch Indian Fever, be a believer with the Cleveland Indians

Friday, March 28, 2008

Housekeeping

This is a low-quality post (Reader: “Tell me something I don’t know”), here mostly to mention that I am trying to figure out the new Blogger sidebar options, and getting that set up (they actually made it a lot easier to customize the sidebar for those of us who aren’t HTML experts, but I’ll still have to figure it out). So the links or the archives might disappear for a while, but they’ll be back eventually.

Anyway, I wanted to briefly touch on the Opening Day in Tokyo “issue”. I use quotation marks because I don’t understand what all the fuss is about. There are two strains of commentary on the issue: one is from intelligent people, and another is from those who are appear to be jingoists. I’m not opposed to a little jingoism here and there, but not in the baseball world (unless you’re one of those fools who claim that rooting for the US in the World Baseball Classic constitutes jingoism).

Dealing with the intelligent criticisms, I just don’t understand what all the fuss is about. MLB is attempting to increase its worldwide profile, and I think that is a great thing. So there were some Japanese league games going on at the same time that may have been overshadowed? So what? It’s a long season, they’ll be plenty of time for Japanese fans to focus on their league (and of course, no one was stopping them from doing so if they wanted to ignore the MLB games).

I direct the same point about it being a long season to those who are disturbed that A’s fans lost two home games. In the long run, the difference between 79 and 81 games is negligible. I don’t believe that a season ticket holder would even notice the difference if the season was cut from 162 to 158 games, unless they went out of their way to count how many tickets they had.

Some say that Opening Day is a special day, and it is. But home Opening Day is special too, and the Red Sox and A’s will still hold one of those. So the A’s started their home schedule in a road park. Excuse me if I am not overcome with sympathy. The Indians have not opened the season at home since 2001(including three years in a row at beautiful Comiskey Park). Last year, their first three scheduled home games had to be moved to Milwaukee. I think the A’s fans can live with waiting a week for their home opener, and having it not be the season opener.

The Japanese fans have seen a large number of talented players leave their teams for major league teams. The situation is a bit different than those of players from other countries that immigrate to MLB since the Japanese leagues represent a much higher level of play and professionalism than can be found in Latin America. I don’t want to suggest that this means MLB owes them in any way; after all, baseball players are not national assets except in the twisted mind of Fidel Castro, and Hideki Matsui can choose to go play in the US if he pleases. However, is it too much to ask that the Japanese occasionally be rewarded with a major league game in their country? Perhaps one that actually counts, rather than patronizing them with an exhibition as if they were a bunch of baseball neophytes like the Chinese?

I am no fan of Bud Selig, but it must be said that MLB is doing well right now. You don’t necessarily have to give him or the current owners or the MLBPA or any other group credit for it, but likewise some go over-the-top in their criticisms of the man. Personally, I am no fan of interleague play, unbalanced schedules, a 16 team NL and a 14 team AL that spits in the face of all historical precedent, five-game playoff series, or the “This time it counts” All-Star Game. But I can also give credit where credit is due, and sharing baseball with Japan and other countries of comparable baseball passion is an idea that deserves credit. That in no way implies that Selig or anything MLB has done is the cause of that passion. Embracing it is what I applaud.

Monday, March 24, 2008

2008 Predictions

Disclaimers first: these are offered in the spirit of fun and BS-ing, not in the nature of serious analysis. If I was trying to be serious, there would be confidence intervals and probabilities of winning the division and all sorts of other stuff. I have no interest in doing any of that.

Last year, I successfully predicted six of the eight playoff teams (the Mets and the Padres let me down, losing out to the Diamondbacks and Rockies). I also correctly tabbed the Red Sox as the World Champions. This is remarkable, since in the previous two years of this blog, a team that I picked fourth in one of the Central divisions went on to the pennant (and in 2005, two of them did). The point: I make no claim to being accurate, I make no claim that you should pay attention to what you read here. They are for my own personal amusement, and you can laugh if you want, but please don’t sneer, because if you sneer, you’re taking it way too seriously. Also, there are enough spaces in this darn thing as it is, so I use one paragraph per division, which results in a couple of run-on paragraphs.

I’m going to switch my usual order and begin with the Neanderthal League:

EAST

1. Atlanta
2. New York
3. Philadelphia
4. Washington
5. Florida

New York will be the popular choice, but Atlanta cannot be overlooked. All three of the top teams were very close in EW% and PW% last year. The Braves may not have improved themselves by losing Renteria and Jones, but the trade for Santana has obscured the fact that the Mets offseason was considered very shaky. The Mets’ pitching has a lot of questions, outside of Johan--Pedro's health, El Duque’s age, and while I like John Maine and Oliver Perez, they’re not quite proven commodities at their 2007 effectiveness levels. The Phillies can certainly be a factor again as well. The Nationals and the Marlins are overmatched, of course.

CENTRAL

1. Milwaukee
2. Chicago
3. Cincinnati
4. Houston
5. Pittsburgh
6. St. Louis

Milwaukee will score a whole bunch of runs, and I like their pitching well enough. Their defense should be improved with the addition of Cameron and the consequent reshuffling of the other positions. That does not mean it will be easy for them to beat out the Cubs, who were better on paper than they were on the field last year. Those two stand out in this division, and I think that they will get some separation on the middle two teams, the Reds and the Astros. I am hesitant to put Cincinnati third, since I have little faith that guys like Votto and Bruce will actually get a fair shot this season. The Astros made a bunch of ill-advised moves that really did not do much for their prospects in the short term, and can’t possibly be justified in the long-term. The Pirates remain the Pirates, but the Cardinals are just a bad baseball team. A rotation that is counting on Looper, Pineiro, Mulder, Reyes, Lohse, etc.? I’m sure they’ll be able to come up with a couple of effective guys behind Wainwright, but they look like a longshot to have a solid starting five. With Pujols’ health in question, look out below. Incidentally, in the last two seasons, neither World Series participant has returned to the playoffs in the following season (Houston and Chicago in ’05, St. Louis and Detroit in ’06). Those two instances are the first time in the history of the wildcard that neither pennant winner qualified for the playoffs a year later (although the ’93 Phillies and Blue Jays were in big trouble when the plug got pulled). Before that, you have to go back to 1990, when Cincinnati and Oakland failed to qualify in 1991. Two years in a row? 1986 (Mets/Red Sox) and 1987 (Twins/Cardinals).

WEST

1. Los Angeles
2. Colorado (wildcard)
3. Arizona
4. San Diego
5. San Francisco

Once again, I don’t see a lot to separate the top four teams. I am going with the Dodgers because I truly think that they have the most talent. Unfortunately, Joe Torre does not inspire optimism that guys like LaRoche, Kemp, and Ethier will get playing time over Garciaparra and Pierre. Their rotation has the potential to be really good. Colorado was not a fluke last year; the Diamondbacks may have been, but they also have the young talent to make a real improvement. San Diego appears to be the weakest of the four, but it is not hard to imagine them being in the mix down to the wire. The Giants are a different kettle of fish. They would fit in much better with the aimless mediocrity of the NL Central.

Now, the home of truth, justice, and the Designated Hitter, long live the American League:

EAST

1. New York
2. Boston (wildcard)
3. Toronto
4. Tampa Bay
5. Baltimore

The division of bays, Jays, and Rays may look the same on paper as it has for a while, but it’s a lot more interesting than a cursory glance might indicate. The popular consensus seems to give the BoSox the edge over the Yankees, but I’m not so sure. These two weren’t that far removed last year, and I think people have overrated Boston’s offense based on their October performance. They’re very good, no doubt, but the Yankees’ is better, and if they can avoid all the pitching problems of last year, this could be very interesting. It does seem like Boston has more depth to plug any injury problems, but the front-line talent for these two is close enough that I’m going to be a contrarian and pick New York. Toronto remains a team that could, with good fortune and others’ misfortune, find their way into October, but also remains a team that has to be picked third. I would nominate “Rays” as the worst team name in the post-1900 history of the game, but the Boston Bees might have something to say about that. Of course, they would stand no chance against the giants of the nineteenth century--the Brooklyn Bridegrooms, the Cincinnati Porkers, the Cleveland Infants. And of course I’ve stuck with big city teams--no Union Association teams in Altoona, or National Association teams in Rockford (those two didn’t have bad names, but the Elizabeth Resolutes is another matter). Their name may be historically bad, but the team is making strides, and their moves should shore up their pitching and defense. I would tag them as team that might be able to stay in the hunt in September if they did not play in this killer division. The Orioles are my choice as the worst team in the AL.

CENTRAL

1. Detroit
2. Cleveland
3. Minnesota
4. Chicago
5. Kansas City

As much as I hate to do it, I am picking the Tigers to win, but I think they will be overrated in the public perception. Their offense is being touted as a juggernaut, and adding Cabrera obviously helps, but Polanco, Granderson, and Ordonez are all good bets to be less productive than they were a year ago. Predicting them to score 1,000 runs is downright silly. However, their pitching figures to be better than last year, and Cleveland’s figures to be worse. The Indians have a good shot, but I just can’t tab them as the favorites. Minnesota should be able to get better production from third base, left field, and shortstop, which will help keep them afloat despite some serious losses. Chicago seems to think that they are a contender, but they also seem to think that Ozzie Guillen is the kind of guy who you give a long term contract to. Unfortunately for my dislike of them, the acquisition of Nick Swisher means that I have to cut them a little slack. The Royals are still the Royals, at least for now. But they have a real shot at beating out Minnesota and/or Chicago, and that’s progress.

WEST

1. Los Angeles
2. Seattle
3. Texas
4. Oakland

The Angels have the clearest path to a division crown in all of baseball. I have never been wild about their team, but they definitely outclass their division rivals, although their recent injuries in the rotation give one pause. The Mariners are a team that I am hesitant to comment about, since there is a very vocal group of supporters who get upset and go to extreme lengths to justify the Mariner’s outperformance of their basic component statistics. Let’s just that I don’t think they are as good as they think they are, and leave it at that. I was the idiot who thought Texas would be a factor in this race last year. It saddens me to see Nick Swisher cast off from the A’s and sent to the White Sox of all teams, but I think the rebuild was a wise choice. It is an interesting rebuild, trading pieces that are still young and under contract for a few years, rather than those who are about to become free agents. It could pay dividends, though, since the return ratio is greater than if you wait (compare the package the A’s got for Haren to the one the Twins got for Santana).

WORLD SERIES

New York (A) over Los Angeles (N)

The World Series pick should be taken far less seriously than the divisional picks (which puts it pretty low on the serious scale). I think last year was the first time in my prognosticating life (going back to 1995) that I correctly identified the World Series champion.

I can’t justify picking the NL champ to win, and by picking the Yankees to win the East, I painted myself into a corner and have to pick them to win it all.

NL ROY: Kosuke Fukudome, CHN
NL Cy Young: Johan Santana, NYN
Take a guy who was the fifth-best pitcher in the AL during an off-year in 2007 and move him to a moderate pitcher’s park in the NL.
NL MVP: Mark Teixiera, ATL
Being a relatively new face, he’ll get a lot of credit when the Braves are a mild surprise.
AL ROY: Evan Longoria, TB
AL Cy Young: Daisuke Matsuzaka, BOS
Just as Josh Beckett took a big step forward in his second year in Boston, so will Matsuzaka.
AL MVP: Miguel Cabrera, DET
I’m staying with the “credit the new guy” approach to prognosticating the MVP winner.

Most annoying story of the year: Steroids and related witch-hunts
Most predictable stories of the year: “White Sox continue slide, Guillen upset”; “Tigers’ offense not as dynamic as anticipated”; “ARod chokes in /insert situation here/”
First manager fired: Tony LaRussa quits in St. Louis, but as far as a firing goes, I don’t have a good candidate. Of course, I should note that these predictions on the bottom involve even less thought and are even more in the spirit of fun than the standings predictions.

Over/under on posts on this blog during the season: 20

Monday, March 17, 2008

Indian Fever...

In the past, I have written a preview for the Indians, since they are the team I follow most closely. I’m not going to do a full-blown preview this year, but this mini-preview will suffice. This is in the vein of a preview magazine, running down the positions and who is expected to fill them, rather than an introspective look at the organization that you would get from Baseball Prospectus and other such outlets, to be clear upfront. It really will give you no insight into the team if you know anything about them, just some vague predictions from me on which players will have better seasons.

The Indians rotation has the first four set with CC Sabathia, Fausto Carmona, Jake Westbrook, and Paul Byrd. Behind them, three lefties battled for the fifth spot. As I am writing this, none of them has been particularly effective in the spring, so one assumes that Cliff Lee, with the most experience and the largest contract, will get the job, while Jeremy Sowers and Aaron Laffey will head to Buffalo.

This is an area that actually concerns me as a fan. I guess if you are a supporter of the Orioles or the Cardinals or some other team, this is easy to scoff at, and I understand that…but the Byrd is always walking a tight rope, and he isn’t getting any younger. Westbrook is your standard issue groundball pitcher. Sabathia is a legitimate ace, but he was the best pitcher in the AL last year, and it would be asking a lot for him to repeat that performance. Carmona was almost certainly over his head last year; the question is how far he will drop off. On the plus side, his strikeout rate improved in the second half; on the negative side, he made a big jump in workload last year, especially when you consider the playoffs.

The story in the bullpen is similar, where Rafael Betancourt and Rafael Perez cannot reasonably be expected to repeat their performance. Jensen Lewis should be serviceable as a middle reliever, and hopefully Masa Kobayashi can pitch in there. Aaron Fultz and Joe Borowski are what they are. It should be a good group, but I hesitate to predict anything beyond that with the unstable nature of bullpens.

If you add all that up, one has to conclude that the Indians will allow more runs this season. The good news is that it is reasonable to expect that they will score as many or more than they did in 2007. Dividing the regulars into categories of likely to do better, worse, and about the same, I would say:

Better: DH Travis Hafner, CF Grady Sizemore
Same: C Victor Martinez, 1B Ryan Garko, SS Jhonny Peralta, LF Platoon, RF Franklin Gutierrez
Worse: 2B Asdrubal Cabrera, 3B Casey Blake

While Cabrera probably played over his head during his major league stint, the Indians overall production from second base (3.25 RG) was poor, and should be improved by Cabrera or previous starter Josh Barfield, who is ticketed for Buffalo.

The bench will consist of Kelly Shoppach as catcher, Jamey Carroll as utility infielder (an acquisition I am befuddled by), Jason Michaels/David Dellucci as the inactive part of the left field platoon, and Andy Marte as the out of options must-carry. Were I in charge, I would try to move Michaels and let Ben Francisco into the platoon, and if Shin-Soo Choo comes back from his elbow issues healthy, I wouldn’t mind seeing him take over the lefty part. Regardless, Francisco deserves to be on the team on merit, and has little to prove at Buffalo, but the Marte situation will preclude that.

I will post my predictions for the major league standings in a couple of weeks, but I tend to think as most observers do that the Indians will be in the mix for the playoffs, fighting for the division and the wildcard. Ultimately, though, I have to give Detroit a slight edge in the division and New York/Boston the edge in the wildcard hunt. It should be another fun season, though, and that’s all you can really ask for.

Tuesday, March 11, 2008

The Horse is Not Dead

Joe Posnaski blogs on "stats he likes", one of which is OPS+. I disagree, but that's neither here nor there. One of the things he writes is: "a player with a 113 OPS+, for instance, performed 13 percent better than league average". As I have demonstrated before, this is essentially true, at least within the limitations of the restriction of using just relative OBA and relative SLG.

Then of course, a know-it-all poster on BTF chimes in. It's one thing to be snarky and conceited if you know what you're talking about. This guy doesn't; he thinks he is showing how wrong Posnaski is, but in fact it is just the opposite.

#8 (Greg Maddux School of Reflexive Profanity):

The preposterous decision to subtract 100 rather than divide by two bites another rational person on the ass.

There are several more ignorant comments that follow, but they are in the spirit of asking questions, not of being a know-it-all. So I have no problem with them; they just need to be educated.

When I did my "Audacity of OPS" post, there were some BTF posters who scoffed at the notion that people were irrationally tied to OPS and rejected OPS+ on silly pretenses. I thought it was true then, and threads like that one make me believe it more than ever.

This is not rocket science, folks. OPS+ is closer to measuring what it is we actually want to measure (batter productivity relative to the league average, preferably expressed in either R/O or RAA/PA) than do OPS/LgOPS, (aOBA + aSLG)/2, or any of the other alternatives that are mentioned there. Pete Palmer knew what he was doing.

Monday, March 10, 2008

Other Ventures

This post will be little more than an advertisement for some of the other baseball things I have been working on lately. First, I have a new blog, Weekly Scoresheet. This new blog will in no way supercede this one, although it may cause a slight reduction in the number of posts here. It is solely devoted to scorekeeping, and particularly to displaying scoresheets that I have kept over the years (and, eventually, if anyone is interested, some scoresheets from readers of the blog). I intend to post one scoresheet per week, and occasionally there may be other posts that just deal with my thoughts about keeping score.

It is in many ways a self-indulgent site, since I just post my own scoresheets. Why would anyone want to look at those? Fair question, and I don’t expect to have a large readership. Personally, though, I enjoy looking at other people’s scoresheets and think there is a dearth of information on scorekeeping on the web. If someone else had a similar site, I would check it out from time to time. So whether anyone reads it or not, I will enjoy writing it.

Secondly, about a month ago I contributed a number of pages to Tango Tiger’s Sabermetrics Wiki. I have not contributed much lately, but eventually I’d like to add some more. Hopefully, though, that won’t be necessary, as other people add their work.

I am sure that I do not have many readers who do not also read Inside the Book, and thus are not familiar with the wiki project. But on the off chance that there are those of you out there who fit into that category, I wanted to do another post touting the effort. There are a number of pre-existing pages that need more explanation (particularly, may I suggest the pages on park factors, Win Shares, DIPS (does not exist, although there is a page on BABIP), and fielding metrics?

I am going to reproduce the page on linear weights here, since I wrote most of it, and it is heckuva lot better than the LW article on my website, which is embarrassingly bad (I keep intending to replace it, but never actually sit down and do it). As you can see, while this article is longer than some of the others I suggested needed work, it still only scratches the surface of what could be discussed about linear weights, focusing on how the weights are derived. So don’t feel intimidated by the fact that a page has already been started.

Linear Weights

Linear Weights (LW) is a term used broadly to refer to any linear run estimator, and also to the analytical system of Pete Palmer (see Linear Weights System). The pioneer of Linear Weights was Canadian sabermetrician George Lindsey, but the concept was expanded upon and popularized with Palmer's Batting Runs.

Methods for Generating Linear Weights

EMPIRICAL APPROACH

The empirical approach to Linear Weights is closely related to the concept of Run Expectancy. To generate the weights, some sample of data (often all plays in a given league-year, or over the course of several years) is analyzed. The change in run expectancy on each play is calculated as follows:

Change in RE = Final RE - Initial RE + Runs Scored on play

For example, take the case of a grand slam with 2 outs, using this RE Table. The initial RE is for the bases loaded, 2 out state (.815 runs). The final RE is for the new state, which is bases empty, 2 outs (.117 runs). Four runs scored on the play, and thus the value of the play was .117 - .815 + 4 = 3.302 runs.

After doing this process for each play, the results are averaged to produce the Linear Weight values. This procedure will result in out values that will estimate runs above average (or in other words, the sum of the product of the coefficients and the frequencies of each event will be zero). In order to estimate the total number of runs scored, to the coefficients of events which include outs (i.e. a strikeout, caught stealing, double play, etc.) must be added 1/3 of the expected run total for the inning (equivalent to the bases empty, no outs run expectancy) for each out on the play.

INTRINSIC WEIGHTS BASED ON DYNAMIC ESTIMATORS

Dynamic run estimators differ from linear run estimators in that they do not place a fixed coefficient on each event, but rather attempt to model the run scoring process. Thus, the value of each event varies based on the frequency of other events.

However, for any given set of input statistics, the intrinsic linear weight that the dynamic estimator places on a given event can be determined. If one trusts that the dynamic estimator being used is a good model, then the linear weights it generates for the inputs could be valuable.

Various approaches can be used to determine the intrinsic weights. The so-called "+1 method" adds one of a given event (i.e. one walk or one double). The difference between the output of the estimator with the additional event minus the output without it is the linear weight for that event. More precise estimates can be generated by adding smaller increments (for example 1/100th of a walk), finding the change in estimated runs scored, and dividing by the size of the increment added. The smaller the increment, the more accurate the estimate because each change in the inputs changes the system ever so slightly.

For dynamic estimators that can be written as simple formulas, the formula for the intrinsic weights can be found by partially differentiating the equation with respect to each event. The partial derivative is a calculus concept which finds the change that would be created by adding an infinitesimal amount, eliminating the effect of changing the system.

The intrinsic weights found through the Base Runs estimator, as well as those from Markov models of run scoring, are the ones that are most often used by sabermetricians, since those models work over a wider range of contexts than other dynamic estimators like Runs Created.

MULTIPLE LINEAR REGRESSION

Linear weights are sometimes generated by running multiple linear regressions to predict runs from the various offensive inputs. This is usually done on team seasonal data, although it could be done on game or inning level data too.

The drawback of regression is that it is a purely mathematical procedure, and the results do not always conform to what logic or other means (such as empirical linear weights) tell us to be true about baseball. The correlation between an event and runs sometimes does not reflect the impact that it has upon runs. For example, take this regression on team season data from 1954-1999 found in Jim Albert and Jay Bennett's Curve Ball:

R/G = (.49S + .61D + 1.14T + 1.50HR + .33W + .14SB + .73SF)/G

A double is only seen to contribute .61 runs, well below the .8 usually found through other procedures. Additionally, sacrifice flies are valued at .73 runs. This result is not surprising when one considers that sacrifice flies always result in runs. However, as observers we know that while the sacrifice fly contributes to the run, the more important element was the events that allowed a runner to reach third base with less than two outs. Albert and Bennett explain that sacrifice flies are a "carrier" category, meaning "[They] may carry more information than their literal name implies."

The choice of categories in a regression often affect the coefficients as well. It is not unusual for a regression using Total Bases and Hits to give coefficients for hit types more in line with our expectations than a regression using singles, doubles, triples, and home runs as separate inputs.

SIMPLE MODELS

In lieu of play-by-play data or using intrinsic weights, several methods for producing approximate linear weights for different contexts have been created. These approaches rely on assumptions about the relationships between the value of offensive events that are fairly valid within the normal range of team contexts. While they may not work well when applied to theoretically extreme teams (for example, nine Babe Ruths), they can be used to generate reasonable weights for normal teams and leagues.

Both David Smyth and Tangotiger have published these types of models. Smyth's begins with the premise that each on base event is worth the average number of runs scored per baserunner (approximated as (R - HR)/(H + W - HR)), and proceeds to use various assumptions to estimate each event's value in terms of advancing baserunners. Combining these two values gives an overall coefficient for each event.

SKELETONS AND TRIAL AND ERROR

Skeletons refer to an equation that is crafted based on relative weighting of offensive events, which is then multiplied by a constant in order to estimate runs. An example of an estimator developed by a skeleton approach is Paul Johnson's Estimated Runs Produced. Johnson used play-by-play data to determine the average number of bases gained on hits and walks, then experimented to find a value for outs and found a constant (.16) which would bring his equation in line with runs scored.

In the case of ERP, the logic used to create the skeleton was similar to that of the Run Expectancy approaches described above, since both relied on examination of play-by-play data. However, Jim Furtado's approach in developing Extrapolated Runs was a hybrid. Using ERP as a starting point, he also considered regression results and experimented until he found a formula that he felt made common sense and had superior accuracy when applied to his sample data.

This family of techniques is often criticized because they are seen as forsaking the theoretical soundness of empirical approaches in pursuit of more accurate predictions with sample data.

Examples of Linear Weight Estimators

Below is a list of linear run estimators commonly used by sabermetricians. However, it should be noted that linear weight methods often do not have unique names as they are tailored to a specific environment or context. The methods below use long-term average values and are generally not designed for any specific context. (lists Batting Runs, Estimated Runs Produced, and Extrapolated Runs).

Monday, March 03, 2008

Baseball Previews: The Bad and the Ugly

As I have explained before, I am an unabashed fan of pre-season baseball magazines. Not because the writing is great, or because they give any great insight. What they are is a sure sign of spring, and an opportunity to review the rosters of each team, which is a daunting task during the off-season.

Unfortunately, the sheer number of them available continues to dwindle. One year, sometime in 1998-2000, I recall having copies of a low-budget one (no color inside at all) with a name I cannot recall, Spring Training, Athlon, Sporting News, Sports Illustrated, Bill Mazeroski’s, Ultimate Sports, and Street and Smith’s. This year, at least in my market, I can only find four: Lindy’s (this one is a descendent of the aforementioned Ultimate Sports), Beckett, Athlon, and Sporting News (apparently the parent company of S&S has purchased TSN, and so they are now combined under one banner).

While I don’t think I ever read all of them cover to cover, they were nice to have. Of course, time marches on, and the internet has certainly cut into the market for these things. While I love the internet, I do think it’s a shame that print publications are being lost, but that’s just the market at work, and I love the market. The biggest casualty is the beloved Sports Encyclopedia: Baseball, which has always been my favorite baseball encyclopedia for general, flip-through-the-pages amusement. Thankfully, we still have the ESPN Baseball Encyclopedia as a career register print publication--all is not lost.

While modern technology may be doing its damage to the baseball preview market, I’m sad to report that they may just be doing it to themselves. So far I have read Lindy’s and Beckett. As usual, Lindy’s is filled with silly comments from “scouts”, and an odd fixation with closers, but minus a few errors I noticed, seems to be well-proofed and factually accurate.

The Beckett preview is another matter. Before I criticize it though, I should point out that proofreading and correct grammar/spelling is not exactly my strong suit (the jackass commentator on my OPS post did have a point). However, in fairness to me, I do not write for a living, nor do I charge you to read this blog, nor is it put out in print, where mistakes are permanent.

The Beckett preview is filled with mistakes and poor use of the language. For example, “blurprint” instead of blueprint (pg 22). “Tejada, who averaged 26 HR in each of his four seasons in Baltimore.” (pg 22). “but they are not actively shopping the soon-to-be 29-year-old, but if they got an offer they couldn’t refuse…” (pg 24). And that’s just from the first team preview in the magazine, the Orioles.

That may come across as nitpicking, and to some extent it is…but what is not nitpicking is pointing out the bizarre baseball comments, since it is a baseball magazine after all. For each team, “3 Keys to Win” are listed. What is left unexplained is what the teams are vying to win--the division, the World Series, 81 games? It would seem that the bar for winning is different for the Orioles than it is for the Red Sox. Anyway, these “keys” are often hilarious. For example, for Boston, “Curt Schilling needs to win a minimum of 12 games this year, to justify his salary, and his roster spot.” (pg 26). The obvious question is why 12 games? Why not 11 or 13? Deeper than that though, suppose Schilling wins zero games (which now seems to be a pretty decent bet). That may mean that he did not justify his salary, but is it really a key for the Red Sox to win? He won nine games last year and I seem to recall that Boston won. If Schilling fails to win 12 games, does that mean the Red Sox will not win the Series? A playoff berth? 75 games?

How about “Danks must prove to be a reliable No. 4 starter that accumulates a minimum of 18 quality starts this season.” (pg 30). “DH Travis Hafner needs to start hitting like ‘Travis Hafner’ again.” (pg 34)--that was a serious LOL moment for me. And on it goes. I’ve only drawn on the first four AL teams profiled for material here.

The best comment might be one about the Royals on pg 42: “The results may not take the Royals to the World Series this year, but a lot of things would have to go wrong for this team not to finish at .500.” Yet, they predict that Kansas City will finish fifth in the AL Central.

For pure “We really need a different editor/proofreader” fun, it’s tough to top the comment about the Brewers on pg 110: “they were watchting the playoffs from thier couch.” When I type that in Word, it automatically changes the position of the i and the e in “their”--I had to type it a second time to make the program understand that I wanted it. Maybe that’s all these guys need--a modern word processor.

If you think I’m telling you not to buy this magazine, though, think again. I’m not going to do anything to encourage the further demise of this once-proud genre of baseball commentary. Unfortunately, with this kind of writing, they’re going to need a lot more than my moral support.