## Sunday, April 01, 2012

### Ubaldo and Tulo

In the first inning of today’s Cleveland/Colorado game, Ubaldo Jimenez hit Troy Tulowitzki with his first pitch. The two have had some sort of silly squabble in the press this spring, which you can read about elsewhere.

Jimenez claims that hitting Tulo was an accident. Should we believe him? We can’t know for sure, but this is a fun application of some simple Bayesian estimates. We are interested in estimating the probability that Jimenez was intentionally throwing at Tulo given that he hit him; I’ll call this P(I|HB).

Based on Bayes theorem, we can write:

P(I|HB) = P(HB|I)*P(I)/(P(HB|I)*P(I) + P(HB|NI)*P(NI))

So there are four unknowns we need to estimate:
* P(HB|I) -- the probability of a hit batter given that Jimenez was intentionally throwing at Tulo. I’ll estimate this as 50%; my intuition is that it’s higher, but Ubaldo’s control this spring has been terrible and the lower this is set, the better the end probability will look for him.
* P(I) -- the probability that Jimenez was intentionally throwing at Tulowitzki. Obviously, we can’t know this. Let’s be very generous and assume that it was only 1%.
* P(HB|NI) --the probability that Jimenez would hit Tulo given that he was not intentionally throwing at him. In his ML career, Jimenez has hit 44 batters and thrown 15,218 pitches, which is about .3%. Some of those may have been intentional, and his control is not a constant, but I’ll use .003 as the estimate here.
* P(NI) -- the probability that Jimenez was not throwing at Tulo. This is unknowable, but it is just the complement of P(I), so we’ll start it out at 99%.

Given these assumptions:

P(I|HB) = .5*.01/(.5*.01 + .003*.99) = .627

So given that we observed Jimenez hitting Tulo and the other assumptions, there is a 62.7% chance that he intended to hit him.

The following chart varies P(I) and presents the associated probabilities for three P(HB|I) values--50%, 25%, and 75%. As you can see, P(I) is the dominant factor here; once you establish a reasonable probability of intent, the probability of succeeding in plunking Tulo doesn’t matter much. Of course, once you have a very high estimate of intent, you are pretty confident and the observation that Tulo was actually hit isn’t that important: One can get carried away with this type of analysis, though. As you can see, any assumption that there may have been intent involved will result in a very high probability that intent was in fact present. I have no doubt that sometimes pitchers with grudges hit batters by accident, and wouldn’t want to presume that such innocent coincidences are beyond the realm of possibility. When it’s a direct hit on the first offering in a spring training game...I’m with Bayes.

EDIT: See this thread on Inside the Book for some additional points. MGL's language in #1 does a much better job of expressing what P(I) represents than I did.