Walk Like a Sabermetrician: April 2012

Wednesday, April 18, 2012

Scoring Self-Indulgence, pt. 7: Field Locations and Ball Trajectories

When the ball is put into play, I track its location using a set of field location codes. What I use is not nearly as intricate as the system developed by Project Scoresheet and used or altered by STATS and subsequent data compilers. The entire exercise is inherently subject to error, and it is next to impossible to make accurate determinations about whether balls are hit within the borders of zones when it close. And one’s perspective can certainly skew judgment even in cases where the location or trajectory appear to be clear. Still, I find it a useful thing to record, recognizing the biases inherent and the particular fallibility of my judgment.

For the infield, I do not break the zones up at all. Everything on the infield is 1, 2, 3, 4, 5, or 6, corresponding to the fielder closest to the play (although often in practice this winds up being the fielder that makes or attempts to make the play, regardless of where the ball was actually hit). I do use combinations for balls that are truly in between--64 would be a ball right up the middle, perhaps a bit to the left of second base, while 46 would be just to the right of second base; 13, 15, 23 and 25 (or the reverse) are often helpful for bunts that are in between the area of responsibility for the two positions. Other than that, the location codes stay simple.

If the ball is hit where you’d expect it to be hit given the outcome of the play as seen on the scoresheet, I make no redundant location note. A “63” does not need a note explaining that the ball was hit in the vicinity of the shortstop unless something out of the ordinary happened. If there is something that needs to be noted, I tack the location on as a subscript.

Suppose for instance that there is a popup on the pitcher’s mound; generally, the pitcher will move out of the way and let a real infielder take it. If it is the third baseman, the scoring would look like this:

In the outfield, I use divide things up more extensively. In terms of depth, there is “S” for shallow, “D” for deep and “W” for warning track/wall. Medium depth is the default if none of the other options is listed, but could be indicated by “M” if one was so inclined.

From left to right, I have 7l (left field line; written with a cursive “l”, which makes it much easier to distinguish on the scoresheet), 7 (left), 78 (left-center), 87 (center-left), 8 (center), 89 (center-right), 98 (right-center), 9 (right), and 9l (right field line, again with the cursive). This diagram is poorly drawn, but should give you the gist of it:

The difference between center-right(left) and right-center is razor thin to begin with, and I’m positive that I’m biased by the identity of the fielder who makes the play. That is, I’m much more likely to record the location of a ball as “89” if the center fielder makes the play; if the right fielder makes the play, it’s more likely to go down as “98”.

Let me offer a few examples of flyouts scored using this system, which hopefully will be sufficient for anyone who actually cares (and I know full well that no one does) to understand. A flyball caught by the centerfielder in medium deep center would simply be “8”. But suppose he made the play in shallow center:

A catch made by the right fielder on the warning track along the right field line would be:

A catch made in deep center-left by the left fielder:

A foul fly caught by the left fielder in deep left:

Hopefully, those examples are sufficient to allow one to figure out the many possible combinations themselves. The other element that needs to be added before completing the scoring of hits is the ball trajectory. The biases here are even worse than those for hit location, but as long as you remember it’s just one observation from a particular vantage point, I don’t see what the harm is in taking a stab at this on your scoresheet.

For hits, I combine the location code with a trajectory code. The basic trajectory categories are: bunt (seen earlier with outs), chop (seen earlier, and only used on infield hits), flyball, groundball, and line drive. An infield hit is always assumed to be a groundball unless otherwise noted; an outfield hit is always assumed to be a flyball unless otherwise noted. Using default trajectory types helps to remove some of the clutter that otherwise would be present on the sheet.

The trajectory for a hit is generally noted above or below the field location of the hit. The symbol for a bunt is a squiggly line below; for a chop, a “v” shape below; for a flyball, an arch above; for a groundball, a straight line below; and for a line drive, a straight line above. This is best shown through some examples. First, examples of infield hits, where a groundball is assumed unless otherwise noted. Thus, the following play is a groundball single to the first baseman:

The next example is single on a ball chopped to third base. I use the chop symbol sparingly; if a ball is a tweener between a chop and a grounder, I go with the latter description:

This is a bunt single right in front of the plate to the catcher:

This is a line drive hit to the pitcher; picture the pitcher knocking the ball down but having no play (it’s better than picturing Herb Score or Willie Blair):

A flyball infield hit (remember, I do not distinguish between a flyball and a popup) can happen when the ball drops between fielders and no error is assigned; this one drops on the pitchers’ mound:

Scoring outfield hits is a little more involved because there is a wider variety of locations and a greater diversity of trajectories. Again, if there is no trajectory included for an outfield hit, it is assumed to be a flyball, so this is a flyball single to medium left, along/near the line:

Here is a flyball single to shallow center-right:

There is no need to use any depth modifiers for groundball hits to the outfield. This one is a classic grounder right up the middle:

A line drive single to left-center:

A flyball double to the warning track or wall down the left field line:

A line drive double to deep center:

A flyball triple to deep right-center:

I usually don’t use any trajectory codes on home runs--they are generally considered to be flyballs, but some real laser beams get a line drive tag. This one is hit down the right field line:

Of course, inside-the-park home runs provide more fodder for location codes. A flyball off the wall in deep center-left that ends up a homer:

Tuesday, April 10, 2012

Scoring Self-Indulgence, pt. 6: Outs on Base

I previously demonstrated how I score batting outs, but did not touch on outs that occur once there is a runner on base. Again, the examples will assume that the runner reached first by being hit by a pitch. I will begin by looking at plays in which the runner is thrown out in his attempt to advance before the ball is put into play, then look at those that occur after the ball has been put in play (which often are forces for which the runner is not culpable).

Caught stealing is given the obvious code “CS”. The most common is the catcher-to-shortstop putout, although there are obviously many possible combinations. This runner was nailed on the last pitch of a PA (indicating that the PA must have ended with a strikeout; otherwise the runner would have been free to advance on a free pass event or the ball would have been in play) taken by the #3 hitter:

The CS symbol can pop up when an out is not actually recorded; for example, "2CSRE6" is a play in which the catcher gets an assist and the runner is charged with a caught stealing, but is safe due to a receiving error by the shortstop.

I divide pickoffs into three different classes: those that are pure pickoffs (PO), those that are caught stealing/pickoffs on which the play is made at the subsequent base (CP), and those that are caught stealing/pickoffs on which the play is made at the original base (PC). The first is the most straightforward--the runner is picked off when not attempting to advance, in this case on the fourth pitch to the #7 hitter, with the play going 13:

Caught stealing/pickoffs are common when the runner starts for the next base before the pitcher starts his motion, then he steps off and throws. This example is a 136 putout made on the first pitch to the #9 hitter:

I score it as a pickoff/caught stealing when the runner is charged with a caught stealing because he was leaning, but the play is made at the base from which he started. This should be pretty easy to visualize--the runner gets just a touch too far away from the bag, the pitcher whirls to throw, he dives back...and gets tagged out. On the 7th pitch to the #5 hitter:

One extra note I sometimes include, which can be applicable to a number of different outs on base, is the use of the symbols OVS and OVR. OVS is for overslide, and OVR for overrun. This just gives the reader of the scoresheet a better idea of what actually happened on the play. If a runner is caught stealing, but it’s because he popped off the base after initially being sage, you can include the OVS symbol:

The most common ways of getting retired once the ball is put into play are on fielder’s choices and double plays. I do not score any play that technically is considered a fielder’s choice as such; I only use “FC” for a forceout. I’ll show an example of those cases in a minute; first, a standard fielder’s choice. This one goes pitcher to shortstop and forces the runner at second. There is no need to record which batter was responsible, because it will be evident from an examination of the scoresheet:

If the runner is retired as part of a double play when forced (or when doubled off his base on a flyout), I use “DP”. In the latter case, it is necessary to note which batter was the catalyst for the play in question, as I’ll demonstrate in a moment. In the first example, the runner is forced at second third baseman to second baseman as part of a double play:

Of course, there is always the possibility of a triple play, but those are so rare I won’t bother with an example, and the scoring is conceptually similar to that of a double play. I use the obvious symbol “TP” in those cases.

This example is a double play in which the runner is double off first, right fielder to first baseman, after a flyout hit by the #8 hitter:

When a runner chooses to advance of his own volition, I record it as a fielder’s choice provided that the batter does not receive credit for a hit on the play, or it is not an error or flyout. Suppose that a runner at second is thrown out at third by the shortstop when he attempts to advance, unforced, on a groundball. Since the batter will not be credited with a hit, I record the runner’s out as “FC65”. In the corresponding batter’s scorebox, the play will be scored as “FC6”:

If the runner is thrown out attempting to take an extra base on a hit (or on an error), then I scored it as an out advancing. It is not necessary to note which batter’s PA the out occurred on as long as there is some other evidence of it in the runner’s scorebox. In this example, the runner advanced on some play initiated by the #1 hitter, but then tried to an extra base and was throw out at third, center fielder to third baseman. Since the advancement to second is already noted as having occurred during the PA of the leadoff hitter, it is assumed that the out on third also occurred on that play.

If the advancement occurred on a play in which the runner did not take another base noted on the scoresheet, then there would have to be some note of it. In this case, the runner at third was thrown out at the plate by the left fielder while trying to advance on a flyball hit by the ninth hitter:

When a batter is thrown out attempting to take an extra base after reaching safely on a hit or an error, I score it as “OS” (Out Stretching). In this example, the batter singled to right, then was gunned out right fielder to shortstop attempting to take second:

One unusual way in which a runner can be wiped out is when he is hit by a batted ball. For the batter, this automatically becomes a single. Let’s suppose that a runner on first is hit by a ground ball hit by the batter in the vicinity of the second baseman. First, the scoring for the batter: he gets credit for a groundball infield single to second base, with the fact that his ball hit a runner noted with the use of the code “HBB” (Hit by Ball):

For the runner, the out is noted as HBBx, where x is the position number of the fielder that gets credit for the putout (so, in this case, HBB4). There is no need to note which batter hit the ball because it will be clear from examining the other scoreboxes:

Another modifier along the lines of OVR/OVS is the use of OBL for “Out of Baseline” (not the world’s most wanted man). This code is used when the runner is ruled out for going out of the baseline. There will still be standard scoring for this play, so the notation is made parenthetically (I use brackets). Suppose the play is scored catcher to first base:

The above example was for a batter-runner, but it could also be used for runners already on base.

Other modifier codes that are rarely used but that can pop up are “LE” (left early--I use this if a runner is called out for tagging before the ball is caught, followed by the appropriate credit for the putout and a circled indication of which batter initiated the play) and “MB” (missed base, used when a runner is called out for failing to touch one of the bases, and again followed by the putout credit and indication of which batter was at the plate if necessary).

Tuesday, April 03, 2012

2012 Predictions

Every year I try to disavow these predictions as a serious exercise--this is just me as a baseball fan, having fun. Every year I stress that picking a team first in a rank order doesn’t mean that I necessarily think they necessarily have even a 30% chance to actually win their division. Every year I try to disassociate these predictions from sabermetrics--sure, my thinking on them is influenced by sabermetrics, as is everything else I think about baseball-- but these are decidedly not “sabermetric” predictions, not even in the sense that the PECOTA or CAIRO or Davenport projected standings might be. But despite my efforts, every year some random yahoo on the internet links this post and demonstrates no understanding of any of these points.

So this year I will dispense with all of it, recognizing the lost cause for what it is, and get down to business--no, strike that, fun:

AL EAST

1. Boston
2. New York (wildcard)
3. Tampa Bay
4. Toronto
5. Baltimore

I’m sure the Yankees will be the consensus pick, but I’ll be a contrarian and stick with the Red Sox. Yes, their starting pitching is shaky, but at least the guys at the back of the rotation have some upside. Their offense is as good as anyone’s on paper. The Yankees are certainly a force to be reckoned with; however, I think one could very easily overstate the pitching difference between the two teams. Sabathia and Pineda is a good duo (oops!), but the rest of New York’s rotation is hardly rock solid with Kuroda’s age and Nova’s ordinariness. It’s an edge, but it isn’t overwhelming. Tampa Bay remains a threat, and the second wildcard certainly brightens their 2012 outlook, but their offense does not inspire confidence. If Toronto were in the Central or the NL West, I’d pick them second at worst, but such is the nature of the AL East. If things go right for them (Bautista maintains his level, Lawrie plays at a high level, they get some production out of first and left, rotation potential in Alvarez, Morrow, and/or Drabek steps forward), they could surprise. Baltimore...yeah.

AL CENTRAL

1. Detroit
2. Chicago
3. Cleveland
4. Minnesota
5. Kansas City

I would not pick Detroit to win any other division save the NL West, and even there I’d consider them vulnerable. But this is not any other division, and their foes appear incapable of mounting the mid-to-high 80s win total that could topple the Tigers. A lot of things went right for Detroit last year, and outside of Verlander the rotation isn’t special. Outside of Fielder and Cabrera (which is admittedly a big qualifier given the fact that they are both among the best hitters in the game), the offense doesn’t feature any proven high-end performers, so overall it projects as good not great. Alas, no one in the division appears up for the challenge. Chicago has been written off as rebuilding, but there’s still a pretty good pitching staff there, and you can always dream on Dunn and Beckham coming to life and boosting the offense to contention-level. Cleveland is a team that went from solid rebuilder to adrift without a plan in the span of about ten days, although admittedly some of that sentiment may be over emotionalism as a fan. I’ve written about them more in-depth, and while I’m not downgrading them over their spring training woes, there’s no new information that’s come to light since I wrote that piece that has given me reason for optimism (in fact, the Sizemore injury, Chisenhall’s flailing, various minor pitching injuries, and a bullpen that looks shakier than I’d thought have had the opposite effect). I think it’s more likely the Tribe finishes last than first. I’m picking Minnesota ahead of Kansas City on the hope that Mauer and Morneau return to even 75% of their 2010 production, but the Royals certainly have the brighter looking future. Then again, if there’s going to be a team that comes out of nowhere in MLB this year, this is the division that offers the best opportunity.

AL WEST

1. Texas
2. Los Angeles (wildcard)
3. Seattle
4. Oakland

Texas is a terrific team, of course. They lack any huge stars (sorry, Josh Hamilton isn’t consistent enough for this label, and Ian Kinsler’s 2010 power outage makes me pause) but are solid everywhere except perhaps center field. They have enough minor league depth that they should be able to plug leaks as they emerge about as well as anyone in the league. Los Angeles got rid of Tony Reagins (a definite plus), then finally made the huge splash. And it was huge. They arguably have the league’s best pitching staff, shaky fifth starter notwithstanding, and should make this an interesting race. I’m not really sure why I picked Seattle over Oakland, but it shouldn’t matter--neither team has much of a chance. I’d guess that Oakland has a higher variance of expected wins.

NL EAST

1. Miami
2. Philadelphia (wildcard)
3. Atlanta
4. Washington
5. New York

My predictions make no claims to accuracy, but there are two divisions I have been consistently wrong about--the AL Central and the NL East. Assuming that there’s a cause for those mistakes other than chance, I’ve chalked up the former to the fact that I’m a Cleveland fan and tend to pick them when I think it’s defensible (which does not include 2012). In the NL East, the reason would be a tendency to predict the demise of the dynasty too early. I picked against the Braves consistently near the end of their run, then picked them over the Phillies in recent years.

I am not learning from past mistakes and am picking Miami to win their first division title. The Marlins are really easy for me to hate, with the uniforms and the home run fountain and Jeff Loria and Ozzie Guillen. The top three are all very close and so I am picking what would annoy me the most. I also think they are the most balanced between offense and defense, which doesn’t translate to wins but also means it’s harder to point out the Achilles heel. Do you like that segue? Ryan Howard is the least of the Phillies concerns, as Chase Utley is and always has been a more valuable player, and now a bigger loss to injury. The offense is old and was only average in 2011. The starting pitching is tremendous, but the bullpen nothing special. Atlanta would be easier to like if they had a shortstop or another big bat, but I wouldn’t count out Jason Heyward in the latter role. They should be in the hunt. Washington still looks more like a .500 team than a contender to me, but they’re close enough that good fortune could put them in the playoffs. New York remains a mediocre team more than a bad one, but that won’t stop us from having to read the lamentations of Mets fans. I realize it’s tough to see the crosstown Yankees win consistently and the Dodgers escape ownership purgatory, but toughen up guys.

NL CENTRAL

1. Cincinnati
2. St. Louis (wildcard)
3. Milwaukee
4. Chicago
5. Pittsburgh
6. Houston

The Reds stood pat after making the playoffs in 2010, which not surprisingly resulted in a step backwards. This year, they decided to go for it, trading for Mat Latos and signing Ryan Madson. The latter move has flopped thanks to no fault of the team, but in this case, it really is the thought that counts. They have a capable offense and if they are willing to make hard choices (like sticking with Aroldis Chapman and relegating Bronson Arroyo to long reliever if need be), I think they can do it. In other words, I’m putting my division pick in the hands of Dusty Baker. Gulp. St. Louis lost Pujols, but signing Beltran is about as good of a response as one could expect, and I wasn’t penciling in Carpenter and Wainwright for more than 350 combined innings anyway. Milwaukee is obviously a weaker offense without Prince Fielder, but in this division they remain firmly in the contenders tier. I was (relatively) bullish on Chicago in 2011; that was a mistake but mediocrity is good for the top of the second division in the NL Central. It must be really frustrating to be a Pirates fan; not for the obvious reasons, but for the little things. The team had a hot three and a half months last year which gave their fans a semblance of hope and fun, and they finally have a divisional rival that is much worse off than they are. So of course MLB strongarms that rival to move to the other league. The Astros new front office is easy to like, but would be more so if they hadn’t moved Brett Myers to the bullpen, a move that I don’t understand on any level.

NL WEST

1. San Francisco
2. Arizona
3. Colorado
4. Los Angeles
5. San Diego

Picking the Giants feels wrong, as I object to picking an organization that seems to view scoring runs with contempt. But this division isn’t very strong and the terrific pitching has overcome this punchless offense before. Arizona’s starting pitching has the potential for serious regression from Kennedy, Collmenter, or Cahill and their offense while solid doesn’t seem to offer a lot of upside. Jamie Moyer is a great story and I wish him all the best (who wouldn’t love to see a legitimate 50 year old non-knuckleball pitcher in 2013?), but his presence in the rotation really encapsulates what you need to know about Colorado. The Dodgers exceeded expectations last year and the ownership change should foster optimism for the future, but Ned Colleti’s bizarre offseason does the opposite for the immediate future. San Diego is not a horrible team, and the trade for Carlos Quentin indicated that Josh Byrnes may not be as committed to a rebuild as Jed Hoyer was.

WORLD SERIES

Boston over Miami

AL Rookie of the Year: SP Matt Moore, TB
AL Cy Young: David Price, TB
AL MVP: 1B Albert Pujols, LAA
NL Rookie of the Year: C Devin Mesoraco, CIN
NL Cy Young: Zack Greinke, MIL
NL MVP: 3B Hanley Ramirez, MIA

First manager fired: Jim Tracy, COL...just kidding, he’s manager for life (Dan O’Dowd’s life, at least). So, in a mercy firing, Brad Mills, HOU.
Best pennant race: NL East
Worst pennant race: AL Central
Worst team in each league: BAL, HOU
Most likely to go .500 in each league: CHA, WAS
Team in each league most likely to disappoint mainstream consensus: CLE, ARI
Team in each league most likely to surprise mainstream consensus: BOS, MIL

Monday, April 02, 2012

Great Moments in Yahoo! Standings

Sunday, April 01, 2012

Ubaldo and Tulo

In the first inning of today’s Cleveland/Colorado game, Ubaldo Jimenez hit Troy Tulowitzki with his first pitch. The two have had some sort of silly squabble in the press this spring, which you can read about elsewhere.

Jimenez claims that hitting Tulo was an accident. Should we believe him? We can’t know for sure, but this is a fun application of some simple Bayesian estimates. We are interested in estimating the probability that Jimenez was intentionally throwing at Tulo given that he hit him; I’ll call this P(I|HB).

Based on Bayes theorem, we can write:

P(I|HB) = P(HB|I)*P(I)/(P(HB|I)*P(I) + P(HB|NI)*P(NI))

So there are four unknowns we need to estimate:
* P(HB|I) -- the probability of a hit batter given that Jimenez was intentionally throwing at Tulo. I’ll estimate this as 50%; my intuition is that it’s higher, but Ubaldo’s control this spring has been terrible and the lower this is set, the better the end probability will look for him.
* P(I) -- the probability that Jimenez was intentionally throwing at Tulowitzki. Obviously, we can’t know this. Let’s be very generous and assume that it was only 1%.
* P(HB|NI) --the probability that Jimenez would hit Tulo given that he was not intentionally throwing at him. In his ML career, Jimenez has hit 44 batters and thrown 15,218 pitches, which is about .3%. Some of those may have been intentional, and his control is not a constant, but I’ll use .003 as the estimate here.
* P(NI) -- the probability that Jimenez was not throwing at Tulo. This is unknowable, but it is just the complement of P(I), so we’ll start it out at 99%.

Given these assumptions:

P(I|HB) = .5*.01/(.5*.01 + .003*.99) = .627

So given that we observed Jimenez hitting Tulo and the other assumptions, there is a 62.7% chance that he intended to hit him.

The following chart varies P(I) and presents the associated probabilities for three P(HB|I) values--50%, 25%, and 75%. As you can see, P(I) is the dominant factor here; once you establish a reasonable probability of intent, the probability of succeeding in plunking Tulo doesn’t matter much. Of course, once you have a very high estimate of intent, you are pretty confident and the observation that Tulo was actually hit isn’t that important:

One can get carried away with this type of analysis, though. As you can see, any assumption that there may have been intent involved will result in a very high probability that intent was in fact present. I have no doubt that sometimes pitchers with grudges hit batters by accident, and wouldn’t want to presume that such innocent coincidences are beyond the realm of possibility. When it’s a direct hit on the first offering in a spring training game...I’m with Bayes.

EDIT: See this thread on Inside the Book for some additional points. MGL's language in #1 does a much better job of expressing what P(I) represents than I did.

Walk Like a Sabermetrician

Wednesday, April 18, 2012

Scoring Self-Indulgence, pt. 7: Field Locations and Ball Trajectories

Tuesday, April 10, 2012

Scoring Self-Indulgence, pt. 6: Outs on Base

Tuesday, April 03, 2012

2012 Predictions

Monday, April 02, 2012

Great Moments in Yahoo! Standings

Sunday, April 01, 2012

Ubaldo and Tulo

Me, Elsewhere

Analysis Links

Reference Links

Blog Archive

OSU Baseball

End of Season Statistics

Win Shares Walkthrough

NL 1876-1881 Series

Labels

About Me