Walk Like a Sabermetrician: July 2020

Tuesday, July 28, 2020

Chutzpah

Step 1: Advocate tirelessly for baseball to be shutdown, along with anything else that your political masters deem “non-essential” (all while being completely oblivious to how totalitarian this all is, as you not only no longer claim to be guided by liberal values, ideals, or principles, you and the political movement you follow have completely lost the ability to even think in terms of liberal values, ideals, or principles).

Step 2: Baseball (and other “non-essential” means of voluntary economic cooperation between individuals that provide the livelihoods for the people who buy your subscriptions and advertise on your website) gets shutdown.

Step 3: Shockingly, your revenues from subscriptions and advertisements declines.

Step 4: Ask me for money so that you can cover the activity that you tirelessly advocated to be shutdown.

Hard pass. I’d wish you good luck, but I wouldn’t mean it.

Thursday, July 23, 2020

2020 Predictions

I really should eschew doing predictions this year – the whole point of an exercise like this (other than fun, which is the main point) is to predict what will happen over a reasonably large sample. I don’t predict the outcomes of playoff series in the same manner, because I contend they are inherently unpredictable as binary outcomes with any level of accuracy that makes it worthwhile. The practice of making rank-order predictions is already a simplification of the reality of what is actually being predicted, and when applied to a season that is slated to be less than 40% the normal length, it is a foolhardy exercise indeed. Add on extra uncertainty due to player availability variability beyond the normal injuries, an extended gap since the last time we actually had the opportunity to observe players’ talent levels on the field, a severely unbalanced schedule, etc. ad nauseam, and there’s no good reason to do it.

Except that it’s fun, and I’ve been doing it in an unbroken chain since 1995, and if there's ever been a season in which to try to embrace the fun elements of baseball, this is it. So why not? I didn’t put a lot of my own effort into this – usually I use the Marcel or ZIPS or Steamer projections as a starting point, but make my own tweaks to both player’s performance and my thoughts on likely playing time. Here I just used the team win estimates published by Baseball Prospectus, Fangraphs, and Clay Davenport as a starting point rather than building up from player-level performance. I also made some of my own judgment calls on team-level performance more aggressively than I normally would – it’s easier to disbelieve someone’s team-level prediction when you haven’t dug in at the player-level yourself. I have not in any way though inserted randomness for what I hope would be obvious reasons if you are reading this blog.

AL EAST

1. Tampa Bay
2. New York (wildcard)
3. Toronto
4. Boston
5. Baltimore

AL CENTRAL

1. Minnesota
2. Cleveland (wildcard)
3. Chicago
4. Kansas City
5. Detroit

AL WEST

1. Oakland
2. Houston
3. Los Angeles
4. Texas
5. Seattle

NL EAST

1. New York
2. Atlanta (wildcard)
3. Washington
4. Philadelphia
5. Miami

NL CENTRAL

1. Chicago
2. Cincinnati (wildcard)
3. Milwaukee
4. St. Louis
5. Pittsburgh

NL WEST

1. Los Angeles
2. Arizona
3. San Diego
4. Colorado
5. San Francisco

WORLD SERIES

Los Angeles over Tampa Bay

Wednesday, July 15, 2020

Hallmarks of Quality Metrics

Another old post I never published, probably because it was repetitive of sentiments I'd written before. I'm guessing I must have encountered a metric that really annoyed me and this was written as a responsive missive.

In a previous article I discussed some of the shortcomings of OPS as an advanced metric, which naturally leads to the question: “What are the characteristics of good advanced metrics?” While the relative importance that one places on each criterion is up for debate (the list that follows is in no particular order), the following considerations should be relatively non-controversial. I’ve used the term “metric” to refer to any statistic or derived category, which is not precise terminology:

1. Clear purpose

Before one designs a metric or uses it to answer a question, it’s imperative that the question of interest be defined. What is the metric setting out to measure? Most metrics in use, even those that are not in favor with sabermetricians, do fairly well on this score. Counting statistics, regardless of their ultimate utility, are largely clear in terms of definition and meaning. Some, like hits or strikeouts, are inherently obvious. Those with more involved definitions often still have a clear purpose even if the execution of that idea is somewhat muddled (like errors).

2. Developed with a theory in mind

This criterion is closely related to a clear purpose, but takes it a step further by questioning the thought process that went into developing the metric. OPS doesn’t fail, as it is based on the reasonable notion that hitting can be broken down into the broad categories of getting runners on base (OBA) and advancing them (SLG). However, due to the somewhat arbitrary nature in which the two statistics are combined, OPS does not match up to metrics like wOBA and True Average which have as their basis a linear weight model of the run scoring process. Some proposed metrics fail spectacularly, though, as they simply combine statistical columns without any particular rhyme or reason. Thankfully, most of these fail to gain traction, but some fail to gain traction yet still have their own Wikipedia pages. Metrics of this type may appear to “work” as they will generally produce reasonable leader boards, but the same could be said for any haphazard combination of positive events and categories.

3. Accurate

A metric should result in an accurate estimate of whatever it is designed to measure. For instance, a metric that attempts to measure offense productivity should have a strong correlation with team runs scored as scoring runs is the prime objective for an offense. The best-performing models for estimating team runs scored tend to be based on either dynamic models of the run scoring process (such as David Smyth’s Base Runs) or linear weight models (pioneered by George Lindsey and Pete Palmer and now in wide use). Thus it stands to reason that metrics built on linear weights (such as wOBA) are a better tool to use when evaluating offensive production than alternatives that do not correlate as well with runs scored.

Sometimes, though, it is not easy to measure accuracy due to a lack of data to verify against or a desire to use the metric to address a similar but subtly distinct question. For example, metrics validated against team results are often used to measure individual performance, which leads to the next criterion.

4. Adaptable over a wide range of contexts

While there is nothing inherently wrong with a metric that is designed to work only under a limited set of conditions--so long as said metric is not stretched beyond its capabilities--it is better still to be confident that the metric will produce reasonable results for a broader set of questions.

Sometimes metrics work well over normal ranges of performance and thus provide reasonable answers for most questions. For example, the common rule of thumb that 10 runs = 1 win is quite accurate at predicting the win totals of major league teams from their runs scored and allowed. However, the actual relationship between runs and wins is not linear—it only appears to be linear because the conversion is calibrated over a narrow set of possible outcomes. When the model is applied to more extreme conditions (which in this case could be an average level of runs scored per game much different than major league norms or teams with very low or very high run differentials), the accuracy will suffer. A dynamic model of estimated winning percentage (such as Pythagenport) can maintain accuracy over a wider range of scenarios.

A related but slightly different issue occurs when some metrics that are designed for use with team data are applied to individuals. A classic example is Bill James’ original version(s) of Runs Created, which recognizes the dynamic relationship between getting runners on base and advancing them. When applied to an individual’s statistics, though, the implication is that the player is reaching base, then advancing himself around the bases, whereas he actually interacts with his teammates. The resulting distortion requires that caution be used when interpreting Runs Created estimates for individual players.

5. Expressed in meaningful units

Ideally, the metric should return a result that has a logical, interpretable baseball meaning. Metrics expressed in terms of runs and wins are ideal since the connection to the objective of the game is made clear, but there are any number of other expressions that can be meaningful. On Base Average, for instance, represents the percentage of plate appearances in which a batter reaches safely, which is easy to explain and easy to think about in terms of on-field implications.

In some rare instances, it is next to impossible to express a result in meaningful units and so a nebulous value must suffice. One example is Bill James’ Speed Score, which endeavors to estimate a player’s speed skill by taking into account a number of categories related to speed (such as stolen base attempt frequency, rate of triples per ball in play, defensive range, etc.) Since there is no single manifestation of speed on the field and no obvious units to capture baseball speed, James uses an abstract scale.

6. Not needlessly complex

It is certainly tempting to say that metrics should be simple, but in my opinion simplicity need not be a goal unto itself. What is important is that the metric not make things more complicated than they need to be.

However, describing complex processes sometimes necessitates the use of complex models. The key is to avoid complexity for its own sake and phony precision. The end use and user of the metric should also be considered—if a “quick and dirty” estimate will suffice, then a simple metric may suffice, but a more complex metric can be used when a true best estimate is needed.

7. Catchy Name

This final entry is somewhat tongue-in-cheek, as it is irrelevant to the quality of a metric, but there’s no denying that when it comes to mainstream acceptance, marketing matters. To bring things full circle, a good name succinctly references the intended purpose and use of the metric while providing a minimum amount of ammunition to those looking to mock the field. Whether any sabermetric measures score particularly well on this front will be left as a rhetorical question for the reader.

Wednesday, July 08, 2020

April 4, 1994 pt. 2

I’ve previously written about the Indians/Mariners opening day game of April 4, 1994 that made me a baseball fan. I won’t rehash all that in detail again, but I recently was able to watch a replay of this game for the first time. I’d never actually seen any of it before, except for highlights – in real time I listened to about the seventh inning forward on the radio.

For the rewatch, I kept a scoresheet, which is reproduced at the bottom of the post. A few observations:

* Chris Berman and Buck Martinez called the game on ESPN. Berman was not as terrible as I remember him being, but most of my exposure was later. That is not to say that he was good. Martinez is a middling announcer with a terrible voice, and was in 1994 as well. It would be a real treat to be able to watch this game with the local radio call of Herb Score and Tom Hamilton that I would have enjoyed in place of the national guys.

* Randy Johnson had a no-hitter through seven, which was noteworthy for reasons beyond the obvious. As all Indians fans know, Bob Feller is the only pitcher to throw an Opening Day no-hitter, and here was a threat to no-hit the Indians in on Opening Day in their first game in their new park with Feller on hand. Plus Randy Johnson, while not yet the legend that he would be, was obviously a legitimate no-hit candidate. He’d already thrown one in 1990, and 1993 had been his breakout year, finishing second in the Cy Young voting and recording his third straight season with over 10 K/9.

So not having seen the game and filling in the details in my mind given what I knew about the Big Unit later, I assumed that he had spent the first seven innings carving Cleveland up. But that was not the case at all; it more resembled what you would have expected a Greg Hibbard no-hit bid to look like. Through seven, Johnson had walked four and fanned two on 94 pitches. His twenty-one outs were distributed as:

12 on groundouts (including 2 DPs)
5 on flyouts
2 on strikeouts
1 popout
1 caught stealing

His opposite number, Dennis Martinez, was pitching a similar game from a DIPS perspective with one big exception – the two out solo shot he yielded to Eric Anthony in the third. Otherwise, through seven Martinez had struck out four, walked four, and hit Edgar Martinez in the first inning (providing an early injury scare as Mike Blowers pinch-ran, all this after Martinez had appeared in just 42 games in 1993. He’d only appear in three more games the rest of April).

* Two future stars were languishing down in the Indians lineup – Manny Ramirez batting eighth, and Jim Thome on the bench. It would be some time before Thome was trusted to start against left-handed pitchers, and so Mark Lewis was the ninth-place hitter and third baseman. Ramirez provided a Manny being Manny moment. After Candy Maldonado walked to open the eighth and Sandy Alomar singled to break up the no-no, Manny clanged a 1-0 Johnson offering off the big wall in left for a game-tying double. With Mark Lewis looking to advance the go-ahead run to third base, Ramirez strayed two far off second and was picked off by a Dan Wilson throwback to second on the first pitch.

Ramirez and Thome were never in the game simultaneously; with the Indians down a run with one out in the tenth, Ramirez drew a walk and was replaced by pinch-runner Wayne Kirby. It was then that Thome batted for Lewis, which brought on lefty reliever King for Seattle. Thome pulled a double down the right field line to put runners at second and third, and Kirby would later score when Vizquel hit into a fielder’s choice. It would work out in the end two, as Kirby walked it off in the eleventh with a two-out, line drive single to left to score Eddie Murray with the winning run.

* Despite what I’m about to say below, this was a good game for star power as these two teams would emerge as top AL contenders of the latter half of the nineties: Hall of Famers Randy Johnson, Ken Griffey, Edgar Martinez, Eddie Murray, Jim Thome, future Hall of Famer Omar Vizquel, would have been Hall of Famer Manny Ramirez, should be Hall of Famer Kenny Lofton, could have been Hall of Famer Albert Belle, a former Rookie of the Year in Sandy Alomar, and other memorable names including Jay Buhner, Carlos Baerga, Tino Martinez, and Jose Mesa.

* One thing that struck me in re-watching it is what an ordinary game it was. Granted, given the circumstances (opening day and opening game of a new park) it was extremely memorable for Indians fans, but if you strip all that out and just evaluate it as a game, it wouldn’t be the most exciting of most major league team’s seasons. I have personally attended at least six Indians games in the last four seasons that were more compelling, and I’ve only been to about sixty games in that time and I’m making that list from the top of my head. I had built it up in my head as a kind of epic, and in some senses it disappointed upon rewatch.

On the other hand, that disappointment reminded me of what a great game baseball is. I have now watched twenty-six seasons of major league baseball and perhaps become jaded about just how interesting and exciting baseball inherently is. That this game wouldn’t rank in the top 10% of games I’ve attended recently speaks to what an amazing game baseball is. Since this game was sufficient to almost instantly turn me into a baseball nut, I suspect that a much less exciting contest would have done the trick. And it should have...I’m repeating myself again, and as I write this we are still two and a half weeks from even the possibility of baseball in 2020, and that too reminds me that baseball is just the best in every way.

* More generally on the franchise that I yolked myself to on April 4, 1994, I have no comment on the fact that the Indians will liekely soon be changing their name itself. I do have two strains of thought on possible future names:

1. I think “Expos” is the logical choice, which is a snarky way of saying that my suspicion is that this name will be changing again in the relatively near future as the franchise settles into its new home in Montreal, Nashville, Portland, Las Vegas, Charlotte, etc.

2. “Spiders” is a dreadful option. First of all, as a general philosophy, I believe that baseball team names should be non-threatening. Most baseball team names are – I would contend that the only exceptions among the sixteen teams dating to 1901 or earlier are Pirates and Tigers, depending on what you think (very carefully) about Braves and Indians. Cubs are not an animal I would wish to encounter, but the name suggests cute and cuddly teddy bears rather than miniature grizzlies. Among expansion team names, the only one that I would classify as even mildly threatening is Rangers, and I would suspect the desired effect is strength and honor rather than menace.

The exception is the 1998 expansion. The Devil Rays and the Diamondbacks both sound threatening, although the former is actually generally harmless (to humans at least, and I think that’s all we should consider lest all the bird names become threatening) and was later downgraded to the double meaning “Rays” anyway. The latter is a scary animal, but is also in my opinion a contender for best expansion team name, due to the baseball tie in (my other contender for best expansion team names would be Brewers (although that was recycled), Colt .45s/Astros, and Pilots/Mariners. The latter was a case in which the city had a great name and then got a similar yet superior one eight years later).

So I would contend that Spiders is contrary to the spirit of baseball nicknames. The history of the name is also quite problematic (although quite appropriate if my misgivings about the future of the franchise are founded). The original Spiders represented Cleveland in the National League from 1887-1899, never winning a pennant. In the early 1890s they were a strong outfit, finishing second three times and even capturing a Temple Cup (which I do not in any way deem to be comparable to a regular season pennant) with names like Cy Young and Jesse Burkett, but were soon a victim of the systemic corruption of the 1890s NL, with owner Stanley Robison siphoning off talent for the St. Louis now-Cardinals in which he also had a stake. As you probably know, this culminated in the 20-134 debacle of 1899 before the team joined Detroit, Lousiville, and Washington on the chopping block, leaving Cleveland open for Ban Johnson’s play at major status for the American League two years later. I would contend that this is quite an ignominious history and nothing to be celebrated or emulated.

If Cleveland’s major league history must be the first source of inspiration, the Indians’ prior unofficial names won’t cut it: Blues is boring, Cleveland isn’t supporting a team called the Broncos, Naps would be fine with me but doesn’t sell and the headlines write themselves. The Players League outfit was referred to as the Infants. The Negro Leagues don’t provide much in the way of an option, as Cleveland’s proudest entry was the Buckeyes, a name of which THE sports team of only entity is worthy.

I do think there is one Cleveland major league name that would work – the first, the Forest City club which represented the city in the National Association during 1871-72. This team actually participated in the NA’s first league game on May 4, 1871. Maybe you’d have to rework it to Foresters (or even Sawyers), but it’s a name I could get behind.

Best non-historical choice, although semi-violating my own suggested rule about nice namesakes: Buzzards.

Wednesday, July 01, 2020

"Replacement Level" Managers

This is an old post that I never published. It's not good, as it just presents something of a freak show stat, but I was mildly interested by it when I re-read it so maybe someone out there will be as well. All of the facts/figures are through 2009 and I did not update them at all. I did not one factual error which is also not corrected - Billy Southworth was inducted into the HOF in 2008.

I put quotes around "replacement level" in the title because this article is not really about establishing a replacement level for managers in the same sense as the phrase would imply when discussing players. It is rather about establishing a baseline for crude comparisons of managerial records, in the same vein as WAR--but without any claim that the baseline represents the point at which talent is freely available.

After all, it's folly to hold up a manger's W-L record as the sole evidence of his quality as a manager. Even the most ardent believers in the importance of managers to a team's record cannot possibly believe that they can separate the manager's contribution from all of the other noise that goes into a team's record.

If you want a crude method to compare managerial W-L records, there are few options that come to mind. Conventional approaches would include just looking at total wins, winning percentage, and games over .500, just as one might do with pitcher W-L records.

Of course, my own initial thought as a sabermetrician is to turn to a baseline that values longevity to some extent. If a manager is allowed to direct 3,942 major league games, yet has a sub-.500 record, it would be silly to assign him a negative number and move on (Gene Mauch). Managers are obviously employable even with losing records, and there are many factors well outside the manager's control that contribute to a team's record.

So my natural inclination is to look at a manager's wins above replacement, which inevitably leads to a decision about how to define managerial replacement level. There are a lot of ways to estimate replacement level for players, but one of the simplest is to look at the aggregate performance of players given very little playing time. The analogous solution would be to look at managerial records for those managers that were replacements, managing less than a full season of games.

When using this approach for players, one must be careful to consider the selective sampling issues involved--players that fail in an initial trial are less likely to receive future playing time, even though it is possible that their true talent is greater (the opposite is also true to some extent). The same is also likely true to some extent for managers--managers whose teams do not perform well in an initial interim role are not as likely to be retained. However, since my application here is just establishing a rough baseline to use for ultimately unimportant comparisons of managerial records, I am simply going to proceed as if these concerns are irrelevant.

The goal is not to devise a rating system for managers; it is to find a crude baseline to use for comparing un-contextualized managerial records. The freak show nature of the exercise is evident, and hopefully will serve to excuse my playing fast and loose with proper research procedure.

What I did was look at career records for all managers with less than 154 games managed (Although I then removed managers who served full season stints in seasons with less than 154 games from the list as well, as well as Cubs managers from the early 60s who were part of the College of Coaches experiment and Stanley Robison and Ted Turner, who owned their teams and weren't real managers.) from 1901-2009. This is my group of "replacement-level" managers. There are 109 such managers, serving in a total of 135 different team-seasons. Their career totals of games managed range from one (ten managers, with either Rudy York or Eddie Yost as the biggest name) to 149 (Tom Runnells with the 1991-92 Expos).

Overall, they managed 5530 games (an average of 41 games each), going 2322-3208 for a .420 W%. So that will be my baseline for managerial records--.420.

By using .420 as a baseline, I don't mean to imply that it is a replacement-level in the traditional sense. It is quite possible that interim managers generally don't keep their jobs if they don't manage at least a .420 W%, but I don't mean to imply that replacement managers are ".420 managers".

If one was to attempt to measure a manager's replacement level in terms of actual effect on a team attributable to the skipper, my intuition is that it would be close to .500. There are simply too many possible candidates for managerial positions for me to think otherwise. Regardless, though, this "study" in no way indicates that the managers lowered .500 teams to .420.

The teams had a total aggregate record (with both the replacement and non-replacement managers) of 9334-11752, a .443 W%. This comparison does not take into account that the games managed by replacements ranged from one to over 140.

A crude way to compare team performance with and without the replacement level manager is to weight each team-season by the minimum of games managed by the replacement and other games. Using this approach, the weighted average of (W% with replacement manager - W% otherwise) is -.019.

Another crude approach is to weight by the harmonic mean of games managed by the replacement and others, rather than the minimum of the two. The weighted average difference is -.025 when using the harmonic mean. Those results should not be used to draw any conclusions, but without any regression or significance testing they imply that a replacement-level manager might lower a .500 team to .480 or .475, a difference in the range of four games a year. I am not claiming that is true, for the selective sampling reasons discussed previously among a myriad of other reasons.

With that out of the way, I will present some data on managerial records above .420 for managers, 1901-2009. I'll call this Austin Rating in honor of Jimmy Austin, who is the only man to serve three such stints as manager (all with the Browns) without reaching 154 career games. Austin's player-manager career started with St. Louis in 1913, replacing George Stovall temporarily (2-6) before Branch Rickey took over permanently. He also did a stint in 1918 (7-9) in relief of Fielder Jones before Jimmy Burke stepped in. His final and longest experience at the helm was in 1923, when he was 22-29 replacing Lee Fohl. His career 31-44 mark (.413) is a little below the .420 baseline, so his own Austin Rating is -.5.

Here are the top 25 career managers (again, through 2009):

There are sixteen Hall of Fame managers from this period; fourteen are in the top 25 for Austin Rating, with Whitey Herzog (270, 28th) and Wilbert Robinson (224, 34th) just missing the top 25. This is not offered as an indication that Austin Rating tracks HOF managerial choices or that it correctly identifies good managers, as any reasonable system based on career wins and losses would likely produce similar results for Hall of Fame skippers.

Going down the list, the non-Hall of Famers are either active or recently retired (Cox, LaRussa, Torre, Piniella) or in the Hall of Fame as a player (Clarke) until you get to Billy Southworth (Clark Griffith is also in the Hall, with a noteworthy career in the areas of playing, managing, and ownership). Southworth does not have wins in bulk (which seem to be the true indicator of HOF selection), but his .597 W% results in a very strong Austin Rating.

Here are the bottom ten managers:

Most of these guys served in the early part of the twentieth century, when competitive balance was less pronounced and multiple franchises had long walks in the wilderness. Protho brings up the rear for managing three teams in Phillies dreadful pre-War stretch (1939-41). The only manager on the list that commanded over half of his games post-1950 was Roy Hartsfield, original skipper of the expansion Blue Jays. Extending the list down to 13th would include Alan Trammell, while Manny Acta ranks 18th lowest, but including 2010 would give him a slight bump as the Indians scraped over the .420 mark.

Finally, here is the leader in Austin Rating for each current team in their current city (except Washington which doesn't have much of a history; record with that franchise only):

Walk Like a Sabermetrician

Tuesday, July 28, 2020

Chutzpah

Thursday, July 23, 2020

2020 Predictions

Wednesday, July 15, 2020

Hallmarks of Quality Metrics

Wednesday, July 08, 2020

April 4, 1994 pt. 2

Wednesday, July 01, 2020

"Replacement Level" Managers

Me, Elsewhere

Analysis Links

Reference Links

Blog Archive

OSU Baseball

End of Season Statistics

Win Shares Walkthrough

NL 1876-1881 Series

Labels

About Me