Comments on Walk Like a Sabermetrician: Pseudo-SIERA Using BsR

Another little modification (one that has almost n...

2010-02-18T09:29:03.333-05:00

Another little modification (one that has almost no effect on RMSE) is using distinct fudge factors to estimate earned runs--one applied to B to make Base Runs equal runs allowed, and another to estimated runs allowed to make it equal earned runs. That results in:

B = (2*(eS + 2*eD+ 3*eT + 4*eHR) - (eS + eD + eT + eHR) - 4*eHR + .05*W)*.768

Pseudo-SIERA = (A*B/(B+C)+ eHR)*.927*9/(E/3)

While changes in event frequencies for types of ba...

2010-02-17T17:40:34.734-05:00

While changes in event frequencies for types of batted balls is quite possible, I don't think it's unreasonable to expect that they'd be stable enough to make an equation of this sort. Formulas like SIERA do not reset every year, and I think it stands to reason that the

The bigger problem, as far as I can tell, is that Colin's data source and BP's data source are probably not the same, and so they have different criteria for what makes a line drive or a flyball, etc. I don't have the exact figures in front of me right now, but there was a noticeable difference in the distribution on batted ball types between Colin's data and the BP data.

I think the problem comes from the data Colin prov...

2010-02-17T15:56:28.667-05:00

I think the problem comes from the data Colin provided is from 2003-2008. I'm guessing you are testing using 2009 data which will have different outcomes for each batted ball type (HR being the biggest effect) compared to 2003-2008 which was much more offensive. Scaling all the outcomes to get the right numbers I guess is the best we can do if we don't have any other info.

Bryan, great catch. I somehow neglected to force ...

2010-02-17T09:18:23.606-05:00

Bryan, great catch. I somehow neglected to force home runs equal to the league total (again, this is admittedly cheating for the purposes of testing a formula on even terms against those calibrated on a different dataset). It should be:

eHR = .0677*nG

There are also not enough outs, so introduce a new term E:

E = .8008*G + .6366*nG + K

And Pseudo-SIERA to (I left out the times 9 needed to convert runs to an ERA, although I had it in my spreadsheet all along):

(A*B/(B+C)+ eHR)*.914*9/(E/3)

I believe this evens things out, although I certainly could have made another mistake. At this point the specific formula is even more of a jury-rigged atrocity than it was initially; I want to emphasize that this exercise is about the concept, not this specific implementation. This fix does lower the RMSE ever-so-slightly to 1.041.

As you suggest, the best fix would be to use new hit type outcome database. I don't have the capacity to do that right now, so hopefully someone else will pick up the ball from here (and never fear, I know of at least two people who seem to be doing just that).

So I tried to do this with the same data you used....

2010-02-16T20:19:14.585-05:00

So I tried to do this with the same data you used. I'm finding that I'm overestimating the runs scored in the league by quite a bit. I think there are too many home runs but there are also too many hits as well. Ideas of quick fixes? I can always just use a heavy hand and normalize it. New hit type outcome data would probably fix this but that is harder (aka I don't know how to do that).

Michael, feel free to post your formulas here or p...

2010-02-15T19:10:49.881-05:00

Michael, feel free to post your formulas here or post a link to any article you might write about them.

I figured this was what you were doing when you asked me about BsR--I started working on this Thurs morning but didn't get it finished until Sunday.

Patriot, I talked to you about this a bit on Twit...

2010-02-15T09:32:16.206-05:00

Patriot,

I talked to you about this a bit on Twitter (MarlinManiac here). I attempted to do this as well, calibrating B to the 2003-2007 ML data, since that was the timeline for the dataset Colin posted. I like the idea of splitting the batted ball data to grounders and nongrounders somewhat like SIERA, I think I'll tinker with that as well.