Monday, December 17, 2007

Providing Zero Insight, but Filling Space Nonetheless

In Bill James’ early Abstracts, he had a little box entitled “Talent Analysis” for each team that estimated the composite value of all of its players (as estimated by the Approximate Value method), what percentage of it was acquired through various means (trades, free agency, development), what percentage of it fell into defined age categories (“young”, “prime”, “past-prime”, and “old”), and how much total value had been produced by the team’s farm system. This article is going to be in the spirit of those and look solely at the offensive players for 2007, and without regard to ho w players were acquired, only whether they were products of the farm system or not. Also, I have omitted the age breakdowns, although I may look at that in the future.

I should note that I do not consider my examination here to be particularly insightful, and certainly it is not unique. This is the kind of stuff that I sometimes figure myself and keep to myself, but since I still can’t (or, more accurately, want to go through the effort to do so) access my pre-written articles, I have to fill this space up with something.

I also would be remiss if I did not point out that this strain of analysis did not die with the Abstract, but is in fact being practiced in other places, most notably by Steve Treder at the Hardball Times. So not only am I just ripping off James’ idea, I’m covering old ground.

Now, for a long list of caveats. One is that I have only considered hitters, and furthermore only hitters with 300 PA. So there are guys who were injured or who were part-time players or what have you and really did have value that are being ignored here.

Second is that I have used my own personal WAR figures, except I have multiplied them all by seven, and put a floor of zero on them. I did this because I didn’t want to get caught up in the numbers as WAR, but wanted something with a direct linear relationship to WAR. The result is a number that looks kind of like a fantasy dollar value--it's not, don’t go out and bid $65 for ARod in your fantasy league because he gets 65 points here, but the scale at least resembles fantasy dollar values.

If you read all of my stuff, you might be thinking “That’s awfully hypocritical, considering he doesn’t like unitless numbers like OPS or EQA”. Duly noted. However, the distinction as I see it is that WAR*7 has a direct, one-step relationship to a meaningful unit (just as Win Shares does, in theory). OPS needs an addition and a multiplication to be a decent run estimator, while EQA differences and ratios are unitless.

Third, I did not factor in defensive value, so everyone is assumed to be an average fielder at his position, thus making Hanley Ramirez the most valuable player in the NL, which I obviously don’t agree with.

Fourth, the system for classifying the producing organization of each player is not optimal. I credited each player to the franchise for which he made his major league debut. Obviously, players often change hands in the minors, and thus you might want to credit Grady Sizemore to the Nationals rather than the Indians. First major league organization is easy to do, though, and I don’t think it’s too much worse than going by signing organization. I believe that in Treder’s analysis, he tries to identify the organization most responsible for the player’s development, which could be either. This is a better approach, but I kept it simple here.

Fifth, the method of assigning players to teams was the same I used in my end of season stat reports, which means each player is credited to just one team even if played a significant amount with two teams (i.e. Saltalamacchia, whose name I find easier to spell than the star first baseman he was traded for).

Finally, I already slipped this in, but I only considered offensive players. So pitching, both on the team and produced by the system, has been completely ignored.

There are a number of different ways to look at the data, and what I am going to do is discuss a few of the interesting things, and then post a big chart at the bottom with all of the data.

Looking at the total (TOT) is not very interesting, because on the team level this just points out teams with talented position players. More interesting is the HG column, which measures homegrown talent retained by the team this year (actually, it includes anyone who made their major league debut with the team and played for So Sammy Sosa is considered homegrown for the Rangers despite the fact that he had been gone from the organization for more than fifteen years).

The leader in homegrown value was the Marlins, with 193 points. The next four teams on the list (Brewers, Phillies, Rockies, Braves) are all Neanderthal League outfits as well, while the Yankees top the AL, with a wide gap from those six teams back to the Indians.

The fact that the Yankees have a lot of homegrown value (James called it talent, and I may slip up and use that term too, but I want to stress that this is a measure of 2007 value and not talent) at first seems surprising, but consider Jorge Posada, Derek Jeter, and Robinson Cano all contributed significant value. Their total is boosted by the presence of Hideki Matsui, who in reality is a free agent signing, but here is treated as a Yankee product since he debuted in the majors with them.

There are a lot of good teams at the top of the homegrown list, but there are some pretty solid teams near the bottom too. One of those is the Cubs, with just 13 points of homegrown value (all contributed by Ryan Theriot). They are beat out by the Giants, though, whose seven points were all contributed by Pedro Feliz.

A logical jump is HG%, which is the percentage of total value produced by the team’s system. As you would expect, this has a strong correlation with the raw HG figure. Milwaukee led the way with 94% homegrown, with Florida, Minnesota, Colorado, and Philadelphia rounding out the top five. The Brewers got just 12 points from imported players (Johnny Estrada and Kevin Mench).

On the flip side, the Cubs (9%) and the Giants (7%) are the trailers. Teams as a whole got 55% of their offensive production from players they had developed.

Moving on, we have the “PROD” column, which measures the amount of value produced by the system. The leaders are the Indians with 278, just edging out the Marlins’ 273. I’ll look at the Tribe more closely in a bit, but Florida has produced 6 20 point players (~3 WAR), of which they retain(ed) five (Miguel Cabrera is now a goner). Only Edgar Renteria is gone. This may seem surprising considering the fire sales they have held, but a lot of the players they gave up in those trades were imports to begin with (Sheffield, Alou, Lowell), or no longer are around to produce any value.

The mean production is 161, pegging the Astros (162) and the Mets (158) as the most average organizations. The standard deviation is 66. I say this to set up that the z-scores range from -1.77 (Cubs) to +1.77 (Indians). With one exception, another half a standard deviation (-2.26) away from any other team. That team is the Giants.

When you see that the Cubs have only produced 44 points (around 6 WAR) of value, you can see that this is pretty bad. Their most notable contribution came from Brendan Harris (22), with the aforementioned Theriot next and just Corey Patterson and Ross Gload to chip in. But the Giants are on a whole different plane with a pitifutl 12. Only two San Fran products batted 300 teams with positive WAR this season--Feliz (7 points) and Yorvit Torrealba (5). I realize that this analysis overlooks a lot, especially pitchers, of which the Giants have a promising crop and a few good exiles out there. But it still strikes me as absurd that they rewarded Brian Sabean with a contract extension. Sabean got just two mediocre (for a playoff team) playoff teams out of four seasons of the greatest offensive force in baseball history, and his team has not been a real factor for a few years now. He has built an impossibly old team (although in his defense he has traded no prospects of offensive value to get it). You’re going to tell him “Nice job, we’d like another five years of this?”

Moving on, I have a column “%Retain”, which is the percentage of value produced by the system retained by the system (HG/PROD). The Rockies lead the way at 91%--only the Juans, Pierre and Uribe, are no longer members of the organization. They are followed by Philadelphia, Cincinnati, Detroit, and Milwaukee. The Tigers’ system has not produced much (61), but they retain 47 of it, and I doubt they’re too broken up about not having Juan Encarnacion, Frank Catalanotto, and Nook Logan. The major league average is 55%, which if you think about it makes sense--it has to be the same as the HG% on the league level.

On the other side of things, the White Sox stick out like a sore thumb with just 12% retention (the Padres are next at 26%). Magglio Ordonez, Carlos Lee, Aaron Rowand, Mike Cameron, and Frank Thomas are all 20 point players who have taken their services elsewhere by whatever means, while their most valuable retained product is a Japanese exception, Tadahito Iguchi. Josh Fields (10) is the most valuable true White Sock standing.

The “#” column gives the total number of players in the sample produced by each team, and “per #” is the per player average value produced by a system. The top three in producing players are Atlanta with 14, then Cleveland and Florida with 13. The average is eight and a third. The bottom three are San Francisco with 2, the Cubs with 3, and Detroit and Baltimore with five. Of course these lists are similar to the value produced lists.

In terms of value per farm product, Florida leads the way at 30, followed by the Yankees (27), Seattle, Philadelphia, and Colorado (26). Detroit has just 12 per player, Chicago 11, and San Francisco 6. So again, not only are the Cubs and Giants last in total value and players produced, the players that they have come up with are the least valuable.

You can play around with a lot of different combinations of the columns, but the last I will present is “Surplus”, which is the raw difference between Total and Production. A positive surplus means that the team had more value in 2007 than its system had produced. The average of course is zero, with the Astros (-4) being the closest. They, most notably, have lost Bobby Abreu, Luis Gonzalez, and Kenny Lofton, but they have also brought in Carlos Lee, Mark Loretta, and Mike Lamb with offsetting value (at least for 2007--the three that got away would have been a much bigger drain in, say, 2001).

The team with the biggest surplus (149) is Detroit, which has imported all of its notable offensive players except Curtis Granderson (Ordonez, Guillen, Polanco, Sheffield, Rodriguez). The flip side of the coin is their divisional foes, Cleveland. The Indians are short 124 points of value. You could make a pretty good team out of Indian exports (C: Josh Bard 1B: Sean Casey 2B: Brandon Phillips 3B: Kevin Kouzmanoff LF: Manny Ramirez CF: Coco Crisp RF: Brian Giles DH: Jim Thome). Even without a shortstop (and John McDonald is probably at least close to replacement level when you consider his defense), this team would have 157 points, which would rank it eighth in baseball (just ahead of the real Indians at 154).

Which feat do you find more impressive? That the Tigers have built a playoff contender on the basis of players brought in from elsewhere, or that the Indians have built a playoff team despite losing all of those players. It helps, I guess, that both teams have significant home grown pitching (on one hand Verlander, Zumaya, Bonderman, Robertson; on the other, Sabathia, Carmona, Betancourt).

Here’s a frivolous question for you: which team possessed the most value produced by another team? My off the cuff guess is the Yankees from the Mariners, on the strength of Alex Rodriguez. And that is indeed the answer. However, the second place finish is based on three players instead of just one--the Padres have 58 points of value produced by the Indians in Josh Bard, Kevin Kouzmanoff, and Brian Giles.

Here is the complete chart, which I sorted by total value produced:

Monday, December 10, 2007

Hitting by Position, 2007

This is a good once a year, mail-it-in type of post. As always, remember that we are dealing with just one year of data here, so it is not particularly significant, and any surprising findings should be viewed in that light. Nonetheless, it is a topic that I am always interested in and have fun looking at.

The data came from the Baseball Direct Scoreboard, which gets its data from STATS. I entered into the spreadsheet by hand, so I may have made a few errors, and those are my fault, not those of the Baseball Direct Scoreboard (I initially had Marlins shortstops down for 500+ walks, rather than the correct 53, and didn’t catch this until I looked at the position totals and shortstops came out as above average).

The first chart I have for you is the composite hitting by position. In addition to the standard positions, I like to look at 1B and DH together, and corner outfielders together. The “MLB” row are the MLB totals; they are not the same as the result you would get for all of the positions because of the way STATS compiles the data (pinch hitters don’t count, and there might be some other stuff). The “POS” row is the composite of the non-pitcher positions. “RC” is the basic version of ERP, as I did not include SB or CS data, and “PADJ” is the one-year offensive positional adjustment for each position, figured by dividing the position’s RG by the average position RG. “HPADJ” is the ten-year PADJ from 1992-2001 as a sort of baseline to compare against:

Again, I don’t want to read too much into the one-year data, but you can see that the range between positions is smaller than in the 1992-2001 period.

Insert obligatory comments on pitcher hitting and inflammatory comments about the Neanderthal League here. Pitchers created runs at a whopping 8% clip compared to position players. The top group of pitchers in terms of RAA compared to an average pitcher was the Cardinals, who took this coveted title for the second year in a row with a .195/.223/.238 line, +8 runs, three runs better than the Diamondbacks, Mets, and Dodgers. The worst was the Nationals at .112/.135/.138, -7 runs, just edging out the -6 turned in by the Astros, Giants, and Reds. Toronto pitchers had fun during interleague play, turning in 8 singles and a double in 22 PA for an above-total MLB average 5.1 RG.

I thought it would be fun to run a chart this year of the worst hitting teams at each position, which I have not done before. The best hitting teams at each position is boring, because the best players play almost all the time, and they usually play the same position. So it’s not at all interesting to report that the best hitting third base outfit was in the Bronx. However, the trailer list is a little more interesting, since bad players don’t usually take all 600 PAs themselves. So here are the worst at each position (no park or league adjustments here, BTW). The “RAA” column is against the average RG for the position in 2007:

Teams managed to overcome one bad position, as Cleveland and Arizona made the playoffs (although Arizona’s overall offense was poor; Cleveland was a bit above average). However, I can’t recommend being like the White Sox and having black holes at two positions.

A junk final thing I like to look at is the correlation, on a team level, between the long-term PADJ and the positional RGs. A positive correlation indicates that the team got their biggest offensive contributions from the left end of the defensive spectrum positions that you would expect; negative correlations indicate the opposite. Here are the team correlations (pitchers are not included for anyone and DH not for the NL). “AVG” is the average of the team figures, while “MLB” is the correlation between PADJ and RG for each position in the majors, individually:

I will show the data for three teams: the Astros, who had the strongest positive correlation; the Orioles, who had the weakest correlation; and the Yankees, who had the strongest negative correlation. The chart shows the position’s RG, the position’s ARG against the overall team RG (for the positions considered, i.e. no pitchers or NL DHs), and the 1992-2001 PADJ for each position as a benchmark:



The Astros repeat this honor; they led last year at +.91. You can see that they are weak up the middle, but get production out of the corners. Their team offensive spectrum goes 1B, LF, CF, RF, 3B, 2B, C, SS. Only CF is really misplaced relative to the defensive spectrum.



The Orioles’ correlation of +.03 is the lowest absolute correlation of any team, and you can see that by glancing at the numbers--they are over the map. The middle infielders are much better than the norm and their left fielders were the worst in baseball, but most of the other positions are fairly close to where one would expect.



The Yankees represent another repeat leader, as the correlation was the same -.34 in 2006. Six of their nine positions went the opposite way of what you would expect (i.e. you would expect first baseman to be above average; theirs were below average).

To end frivolously, I was impressed with the similarity between the production of the Cardinals’ center and right fielders. There may be better matches out there, but this one just happened to catch my eye. CF had 637 AB, RF 638. They each rapped out 170 hits, but RF had a little more power, winning in doubles (34-30) and homers (20-19). Triples went to center fielders 3-2, and the walk column was in their favor 56-49. Adding it all up, CF made 464 outs, RF 465. The center fielders created 87 runs and right fielders 86, giving them a 4.71 to 4.66 RG edge.

Raw data (stats by position for each team)

Monday, December 03, 2007

The Classes of 2003 and 2008

I felt that it was a very nice little coincidence this week when the two blockbuster trades both involved a young outfielder who first made a name for himself as a big high school prospect in the 2003 amateur draft. It is especially noteworthy to me, as I myself am a member of the class of 2003. Now of course I am not and never have been a major league prospect, but it is not hard to feel a bit of a connection to the players in the exact same age range as you, who are reaching adulthood at the same time as you. In basketball we had LeBron James to bear our standard, and in baseball, Delmon Young and Lastings Milledge were two of our very best prospects.

On the trades themselves, my opinion is not particularly interesting, since I have no special insight on the players and so many other voices have already weighed in. I think that the Rays-Twins trade was just a great baseball trade, one that will be fun to watch over the years to see who emerges on top. Were I running things in Tampa Bay, I would have found it very difficult to trade away a talent like Delmon Young, and I’m not usually one impressed by toolsy players with pitiful walk rates. If I had to guess, I’d say Matt Garza is able to provide more early value, but eventually the Twins win the deal. It’s a great trade, though, because eminently reasonable people can view it any number of ways.

The same will not be said of the Milledge deal. There is very little that can be said in defense of it, it seems; most comments that are not made by people with jaws on the floor tend to stress that Milledge is not that good of a prospect, and that the trade is not an all-time debacle. If that is the best that can be said for a deal, then it almost certainly shouldn’t have been made.

I have also seen the point raised that critics of the trade are overstating not only Milledge’s prospects but his trade value, and that Minaya obviously found that the value was only Schneider and Church. I have two problems with this argument, the first of which is simply that because Minaya felt that Schneider and Church was the most valuable package he could receive does not make it so.

The second is that the argument treats the relationship between a ballplayer and his general manager in the same manner as the relationship between a gallon of milk and a store manager. Even if we suppose that the Nationals’ package represents the extent of Milledge’s value, he’s not a commodity with an expiration date. He doesn’t need to be cashed in.

If I may make an even more ridiculous analogy, the relationship between the GM and the player is more like the relationship between me and my car. I have a car, and it has a certain resale value; which, in the case of my lovely early 1990s automobile, not a whole heckuva lot, but someone would take it at some price. If I sold the car, the best I might be able to get for it is $1,000. So if I do sell it for $1,000, have I made a good deal? After all, I got fair market value for it.

My answer is: depends. I was under no obligation to sell the car, so we have to consider other factors, like how much it will cost to replace my car. A better example is probably the stock market. If I sell a stock for $15, I have by definition received fair market value--there's not even a question about it. But if you were an investment advisor, would you just tell your clients to feel free to buy and sell stocks willy-nilly, just because by definition any stock transaction is a fair deal?

My point is that just because Milledge has X trade value doesn’t justify a decision to trade him for X. This is a trade for which it is very hard to see the upside for the Mets. If Brian Schneider and Ryan Church are going to outperform whoever the Mets’ other alternatives for their roster spots would have been, and doing so will make a significant impact on their fortunes, than I would submit that it is going to be a rough year in Queens.

Moving on, there is the Class of 2008. The potential Hall of Fame class, that is. As I have written before, I don’t really care about who goes into the HOF, because I don’t believe that the HOF has any capacity to honor the truly great players anymore (and “anymore” is not a new condition; the situation dates back to the 1970s at least) . I care a little bit, to about the same extent as I care about who wins NBA games. If Bert Blyleven is finally elected, it will still not be an honor to tell him that he is in the same class as Rube Marquard. As far as I am concerned, they can only dishonor him by waiting a dozen years before considering him worthy of standing aside Marquard.

If it was just Marquard, that would be on thing. But it’s not--it's Pop Haines, and Catfish Hunter, and Bob Lemon, and Chief Bender, and Dizzy Dean, and Jack Chesbro, and Lefty Gomez. None of whom should flatter Blyleven, or Tommy John for that matter, as company.

So I try to stay out of the HOF debates; while I like the “who was better than who” exercise as much as any baseball fan, it’s a lot more interesting to make your own lists, or follow along with something like the Hall of Merit or to just argue about players on a message board. So I’m not going to write an essay begging and pleading for the induction of Alan Trammell, Bert Blyleven, Tommy John, Goose Gossage, Mark McGwire, and Tim Raines, or bemoaning the fact that the BBWAA voters didn’t even give Lou Whitaker a second chance on the ballot. I’m going to write a sentence that does that, and move on with my life. Now excuse me; the Bobcats might be playing the Clippers right now.