Saturday, July 23, 2011

Saying Nothing About ERA Estimators

If you follow the sabermetric blog/Twittersphere at all (and if you don’t, why on earth are you wasting time here?), I’m sure you can figure out what prompted this post. However, I’m not going to name the metric that has generated discussion about this general topic because this post is not meant to be targeted at anyone, or to be a debunking of a particular metric, or anything other than me expressing my opinion about the construction of ERA estimators. Others have different philosophies and they are welcome to them. This is mine:

First, I find it helpful to classify the inputs and construction of each metric. This is not necessary, but the reason I find it helpful is that the ERA estimators out there are relatively diverse. Compared to the sabermetric metrics that exist for evaluating offense, they are extremely diverse. Almost all offensive rates are built around an estimate of runs created and divided by either outs or plate appearances. Almost all of them start with the traditional results-based batting line.

ERA estimators, on the other hand, are all over the place. Some follow the lead of their batting cousins and use a run estimator as their base, but some are regression-based. Some use actual results, while some use batted ball data. Some use batted ball data but decide to combine the four standard categories (flyballs, line drives, groundballs, and popups) in some manner. Some assume that the pitcher has no control over what happens once the ball is put into play. Some have implicit or explicit regression built-in with regard to balls in play. Some limit themselves only to what happens when the ball is not put into play. Some estimate ERA, and some estimate total runs allowed.

You probably don’t need personally need more than one overall batting metric. That doesn’t mean there shouldn’t be diversity across the sabermetric community--there's nothing wrong with having a number of intelligently designed choices, but as an individual you don’t need both wOBA and True Average--one will suffice. That is not necessarily the case with ERA estimators--sometimes you might be interested in one that is results-based, sometimes you might be interested in DIPS, sometimes you might want to venture out into the uncertain world of batted ball metrics…even when using a common construction (BsR or LW for example), there is arguably a place for two or three or more different variations based on the inputs.

I believe that the most logical place to start with an ERA estimator is estimating runs. That is intentionally written to sound a little silly but it is not a philosophy shared by all developers of these metrics. Some put formulas down on the page that they would never consider using to try to estimate how many runs a team would score. I say that the place to start is with a logical run estimator. Given the team-level nature of the task, that suggests to me the use of Base Runs or another dynamic estimator, but I’m not going to argue too strenuously if you start with linear weights.

This is a path which is not necessarily going to minimize your RMSE, or give the best correlation with future ERA. With respect to the latter, if your goal is to provide the best possible estimate of future ERA, your metric is not attempting to measure how well the pitcher actually performed, it’s trying to forecast how well he will perform in the future. Certain constructions will by their nature be less accurate at estimating ERA in the same period. Every step you take down the path from outcome inputs (hits, walks, home runs, etc.) to component-based inputs (ignoring the actual outcomes of balls in play, or looking at batted ball types, etc.) will cost you accuracy when the standard is same period ERA. However, one can still use accuracy at predicting same period ERA for methods of similar classes.

Beginning the construction of the metric with a model of run scoring avoids some of the problems inherent in using actual pitcher runs allowed. I’m going to gloss over the fact that the number of runs a pitcher allows, regardless of whether it’s from a base period or a future period, is always dependent upon his defense and other factors outside of his control. There are still other concerns that do not apply when looking at true team-level data. The way runs are charged to individual pitchers is biased towards pitchers who inherit baserunners at the expense of those who bequeath baserunners. In practice, that means favoring relievers at the expense of starters, although depending on the performance of the relievers who inherit baserunners, individual bequeathers might actually benefit.

Thus, whenever an approach detects a reliever ERA advantage is detected, some of it is attributable to the way runs are assigned and not to the actual effectiveness of the pitcher. It might even be possible to increase the accuracy of a metric by giving a bonus to relievers. It is entirely unclear to me what benefit this provides other than lowering RMSE. It doesn’t tell you anything about how well the pitchers performed, and it certainly doesn’t help you measure “true talent” any better--if that is the objective, an adjustment in the opposite direction could be warranted.

Another advantage of modeling runs is that you can easily move between RA and ERA. Most sabermetricians prefer RA because of the biases present in ERA and the distortions created by reconstructing imaginary innings sans errors. It’s easy to rescale from RA to ERA by multiplying by a constant like .91. While it’s also easy to divide by .91 to go the other way, if the metric has been tailored to match ERA, you’ve baked the biases of ERA into your metric. This could potentially be most problematic for a regression-based estimator that uses batted ball data. Even if this bias is small, it’s still completely unnecessary.

Finally, the issue of dynamism is one that is often misunderstood with respect to ERA estimators. SIERA trumpets its “interactive” nature in its name (which does distinguish it from FIP and other linear methods) but any metric based on the foundation of a dynamic run estimator is by nature interactive. Instead of the interactivity being limited to target categories, though, every event interacts with every other event. Singles interact with triples, walks interact with home runs, doubles interact with triples, home runs interact with outs, outs interact with get the idea (and I think that’s enough talk of events interacting with themselves).

Building your metric around a run estimator does not necessarily restrict you to simply plugging in the numbers in the appropriate place. Suppose you wanted to construct a metric based on batted ball types, strikeouts, and walks. One way to go about it would be to simply go through and estimate singles, doubles, triples, homers, and outs in play based on the percentage of each batted ball type that wind up as each. So, you would end up with equations that might look something like this:

Singles = .057FB + .217GB + .516LD + .017PU

However, if you believe that you have gleaned some other insights into the relationship between events that could improve your metric (such as strikeout pitchers having lower HR/FB rates) , you could still build that in to your formula for estimated home runs, and plug those into the run estimator. It’s more difficult than running a regression, and a more delicate balancing act (at least in terms of developing the formula), but it allows you to stay grounded in a model that estimates runs by taking a first step of, well, estimating runs.

Again, I want to make it clear that I was attempting to explain where I’m coming from when I examine metrics of this type. There is room for legitimate philosophical differences and I’m not trying to state that sabermetricians who deviate from the way I’d do it are engaging in poor practice. It would certainly be possible to develop a lousy metric based on a run estimator and following some of the other suggestions.

Tuesday, July 19, 2011

Scoring Self-Indulgence, pt. 4: Reaching Base

Before I begin demonstrating how plays that result in a batter reaching base are scored, I need to make clear exactly how I divide the scorebox. It’s nothing special; the box is divided into quadrants, one for each base, with first base beginning in the lower right, and the trip around the bases is recorded counter-clockwise from that point (just as in the game itself). The areas for balls and strikes (as well as two-strike fouls, which are not included in the diagram) are ignored if not needed, as are the base quadrants--there are no actual lines in the scorebox, it’s just a way to organize the way the boxes are filled in:

I’ll begin with the on base events that are very simple to score--those where the ball is not put into play. My favorite baseball event, and the reason for the silly name of this blog:

If it happens to be an intentional walk, then I circle the walk symbol (this matches how the ball symbol is circled for an intentional ball):

The close cousin of the walk is the hit batter, which I record so:

Their distant and often overlooked cousin is catcher’s interference. I simply score that play as “INT”:

This is as good of a time as any to discuss on of the minor tenets of my scoring philosophy. As you know, catcher’s interference is also scored as an error on the catcher, which I do not make note of on my sheet. I do not generally see the need to take up space repeating information that can be inferred from other markings. Interference is always an E2, and so interference suffices for me.

Now that stance can definitely be a pain in the butt if you want to go back through the scoresheet and count errors, and in doing so you might overlook the “INT”. But my concern is not in data compilation after the game. If it was, I would use a completely different system of scoring than this one.

I use one of the most common symbolic means of recording hits--the use of dashes in proportion to the number of bases the hit is worth. Thus, the base symbol for a single is a simple dash. I complicate matters a bit by including a hit location code and a symbol for trajectory after the hit; I won’t discuss those here just yet. Suffice it to say that the following is a flyball single to right field:

I use a slightly different symbol for an infield hit; it looks like a plus sign, but really it’s supposed to be the standard horizontal dash for a single with a vertical line of equal length running through it. I use this vertical line to modify the other hit symbols for special cases, as you’ll see below. This is an infield hit on a groundball in the vicinity of the second baseman:

Standard doubles feature two horizontal dashes. This one happens to be a flyball to right-center field:

A vertical dash through the standard double symbol indicates that it is a ground-rule double. This one came on a flyball to left field:

If the reason for the batter being awarded second base was fan interference, I draw a little flag at the top of the ground-rule double symbol, creating a “F” for fan interference; this example is on a fly ball down the left field line:

You can figure out the symbol for triple; this one is on a flyball to center-left:

For the very rare occasion on which an automatic triple occurs, I’d draw a vertical line through the three horizontal lines, but that’s not even worth rendering. Moving on to home runs, they feature four horizontal lines. Since a home run means there won’t be any stops at the bases, the symbol is written large enough to take up the whole box. I denote a run scored by boxing the event that allows the runner to score in his box, so the home run is boxed. I also denote a RBI with an empty circle, so we’ll assume this is a two-run homer on a flyball to right:

A vertical line added to the home run symbol indicates an inside-the-park home run. This example is a solo inside-the-parker on a flyball to right field:

The other main way to reach base is on errors. My basic symbol for an error is the letter “E”, preceded by one of four letter codes: “F” for fielding, “T” for throwing, “C” for catching and “R” for receiving. The quadrant in which the error is recorded is the one for the base on which the batter-runner ends up. A fielding error by the first baseman that results in the batter-runner stopping at first is marked as:

A throwing error by the third baseman which allows the batter to reach second base:

If there is no indication to the contrary, you can assume that the throw was intended for first base. However, sometimes, a batter will reach on a throwing error when the intent of the fielder was to make a play on some other runner. In such a case, I use an arrow and the number of the base (2, 3, or H for home) that the fielder was trying to throw to. In this case a third baseman tried for a force at second, but threw the ball away instead. The batter may have reached anyway, and technically he is considered to have reached on a fielder’s choice, but this is a perfect example of the scoring legalese that I endeavor to avoid:

A catching error by the center field which allows the batter-runner to advance to third base:

A receiving error occurs when a fielder mishandles a throw from another. When allowing a batter to reach base, this almost always means that the fielder who made the throw is given credit for an assist. I note this by recording his position number first, then the position number of the player who (literally) dropped the ball. In this case, the shortstop gets credit for an assist and the first baseman is charged with an error:

In the rare event of a four-base error, it would be written across the scorebox and boxed in a similar fashion as the home run.

I will now look at the miscellaneous means of reaching base. One is a strikeout plus a wild pitch or passed ball. While such strikeouts are almost always swinging, it is possible to have a passed ball on a called strike three, in which case the K is backwards:

If a batter reaches on a fielder’s choice, I use the obvious code “FC”. Some people make scoring legalese distinctions between fielder’s choices and forceouts, but as you can guess by now, I don’t consider that necessary. It’s helpful to record the initial fielder, since that indicates where the ball was hit. This example is a fielder’s choice initiated by the shortstop:

If appropriate, a hit trajectory modifier (like bunt or chop) can be added to the fielder’s choice code above the fielder’s number.

A similar case is the rare double play that allows a batter to reach base. The most common type of this play is an failed attempt to turn a triple play on a groundball to third. If the third baseman tags the bag, throws to second for the force, and the batter still reaches at first, you could have:

Sacrifice hits and sacrifice flies can also occur in tandem with a runner reaching base, sometimes without an error in the case of a sacrifice hit. Suppose the pitcher makes an unsuccessful attempt at retiring the runner at second on a bunt attempt, but no error is charged and the scorer credits a sacrifice:

There could be an error as well; suppose that the catcher attempts to make a play on the lead runner and throws the ball into center field, with an error charged for allowing the batter to reach second and the runner to reach third, but the SH credited as well:

Finally, a batter might get credit for a sacrifice fly and reach safely when an outfielder fails to make a catch. Suppose that happened with the right fielder:

Sunday, July 17, 2011

Matt Angle, #51

The Orioles beat the Indians 8-3 today, but overall the game was great for me as Buckeye product Matt Angle made his major league debut, leading off and playing left field. Angle grounded out three times against Jeanmar Gomez, then drew a walk against Joe Smith. Angle is now the fifty-first OSU product to play in the majors, although the unconfirmed list I maintain is now up to sixty. Angle played at OSU from 2005-2007 and was a seventh round pick in '07 by the Orioles.

Unfortunately, Angle's upside is probably fifth outfielder. At OSU, he was a good center fielder and leadoff hitter, getting on base a ton but not hitting for much power. In the minors, he has been a similar type of player with a career .285/.372/.350 line. Angle can certainly help a team as a pinch-runner/defensive replacement, but he's yet to display a consistent ability to get on base at the highest levels of the minors (his AAA OBA is .336 in 765 PA).

Whether he has much of a career or not, he's in the encyclopedia forever now, and that's an awesome thing. Jack Shuck might be the next best hope for a Buckeye in the majors, but his AAA line is similar to Angle's without the speed (.267/.375/.321). Of course, in the Astros system...

Tuesday, July 12, 2011

Crude Team Ratings at the All-Star Break

Crude Team Ratings are a system I put together last year to adjust team records for strength of schedule. The resulting value is expressed on a scale where an average team gets 100, and the numbers themselves can be plugged directly into an odds ratio calculation. If a team with a rating of 120 plays a team with a rating of 90, they should win about 120/(120 + 90) = 57% of the time. Because of the way the ratings are calculated (explained in the linked article), a rating of 100 does not mean a .500 team--a .500 team will actually be a shade below 100 in a normal league.

The ratings are similar in theory to those published elsewhere, and so there’s nothing particularly unique or interesting about them. But I felt that the All-Star break was a logical point to stop and take a look at the ratings as they stand, both because interleague play is now complete and we can get a better read on the difference between the leagues, and because having some idea about strength of schedule to date (and in the future, although I haven’t figured that here) is helpful when handicapping the pennant races.

I will run through three sets of CTRs--one based on win/loss record (CTR), one based on R/RA (eCTR), and one based on RC/RC Allowed (pCTR). I prefer the latter two, especially at this point in the season, but you could also do a combination or factor in projections. I slapped the “crude” label on them in the name for a reason.

First, here are the CTRs based on actual win/loss record:

The league ratings (which are simply the average rating of the division or league’s members) show the AL with a much smaller advantage over the NL than in recent years. However, as you’ll see, the AL advantage grows as we move further away from actual record towards component record.
CTRs based on expected record (runs scored and allowed):

The top three teams remain the same for the three approaches, but each time a different club is ranked #1. Houston actually starts to look a little better as you go and only shares last place on the predicted list:

Boston’s component record has easily been the most impressive in MLB to date when adjusted for schedule. The Giants have played the weakest schedule by any measure and here only appear to be an average club. Pittsburgh has both exceeded its component record and benefitted from a weak schedule. Cleveland comes out better, as essentially an average team, and I was surprised to see that the Tribe has actually played a tough schedule.

Actual record gave the NL East the distinction of best division, but here the AL East returns to its customary position. The Centrals and the NL West are the weak divisions, the most notable element of which is that the NL Central is not alone at the bottom of the barrel as they were in 2010.

Finally, here is a freak show way of comparing the performances of teams so far in 2011 to what they did in 2010. The first column shows each team’s 2011 pCTR to date; the second column is their 2010 pCTR; and the third column is the implied winning percentage of the 2011 team against their 2010 predecessors. Of course, I’m comparing full seasons to (slightly more than) half seasons, not applying any regression, and presenting it as a W% is just a cute trick device. (Doing so if one takes the result seriously also implies that an average team is equally good in 2010 and 2011):

The Pirates and Indians being near the top of the list won’t surprise anyone, but the Red Sox while really good last year have been great so far. The range (if not the standard deviation; I’m not going to bother) of theoretical this year v. last year W% is pretty close to what you’d expect for a range of team W% in the current season.