Comments on Walk Like a Sabermetrician: Run Estimation Stuff, pt. 2

I used the 1960-2004 data for Ruane's LW. I'll go ...

2008-11-21T17:28:00.000-05:00

I used the 1960-2004 data for Ruane's LW. I'll go back and enter a column for the totals. We could of used different source for the totals. I used the BDB database.

Thanks for posting the link!

Did you use the totals that I included in the firs...

2008-11-21T10:03:00.000-05:00

Did you use the totals that I included in the first part of the series, or did you use the 1954-2007 totals (those are the only ones I see in your spreadsheet)?

It seems like your values are pretty close to mine, so the potential difference in input would be sufficient to explain the differences.

You have a good point about attempting to estimate errors and including a fractional SH. My guiding principle behind these formulas was not to estimate any missing data. That choice makes the formula look nicer, but it admittedly does make it less than optimal from an accuracy standpoint.

This comment is from terpsfan101...don't worry, I'...

2008-11-21T09:49:00.000-05:00

This comment is from terpsfan101...don't worry, I'm not writing questions to myself :)

Patriot,

I was bored so I fooled around with your Baseruns equations. For some reason I couldn't get Ruane's LW to reconcile to your values. I added the shortfall per/PA like you did.

I'm pretty sure Ruane included ROE's under AB-H-K-GIDP, because I get approximately a .04 RC run differential between AB-H-K-GIDP and SO when I include ROE under AB-H-K-GIDP.

If you include partial baserunners for SH and AB-H-K-GIDP to account for ROE's, then you don't get negative B coefficients for the walk when using initial baserunners. You still get negative B coefficients using final baserunnners.

Here's the link:

Link

You would add it to each event that included a PA ...

2008-07-04T20:16:00.000-04:00

You would add it to each event that included a PA (which would include batting outs and Ks). Or you could leave it in the B factor as x*PA, which would be mathematically equivalent.

Suppose that we had:

B = .7S + 2D + 3T + 2HR + .01W + .1(AB-H-K) + .02K

Suppose that the needed B was 1000, the actual B was 970, and there were 3000 PA. (1000-970)/3000 = .01. So we would have the new B factor:

.71S + 2.01D + 3.01T + 2.01HR + .02W + .11(AB-H-K) + .03K

Would you use this (B'-B)*PA method for batting ou...

2008-07-04T19:34:00.000-04:00

Would you use this (B'-B)*PA method for batting outs and strikeouts?

That is an excellent question. Allow me the indul...

2008-07-04T18:30:00.000-04:00

That is an excellent question. Allow me the indulgence of answering by way of a digression.

One of the problems with just multiplying the B factor by a scalar (like 1.03) is the effect that has on the negative events. The 1.03 multiplier means that all of the weights will be increased, which will make the outs more costly, even though we are already UNDERestimating runs scored.

So an alternative you could try is this: find the B shortfall for your input data (in other words, B'-B instead of B'/B). Then, distribute this evenly over all events. You could do (B'-B)/PA, and if that was .004 then you would add .004*PA to the B factor. If you want to get SB and CS in there, you could use (PA + SB + CS); hopefully the error won't be large enough that it will make much of a difference either way).

I would not recommend applying the change only to the out coefficients. But you could try the addition method rather than the multiplication method, and for all I know it might work better (I have not tested it).

Is it OK to adjust the out values in the B factor ...

2008-07-04T18:07:00.000-04:00

Is it OK to adjust the out values in the B factor to force your equation to fit a specific dataset? For instance, if I calculate a multiplier of 1.03, instead of multiplying everything in B by 1.03, could I just adjust the out value to bring it closer to 1?

The F1-W formula and the F1 formula are both based...

2008-07-04T10:27:00.000-04:00

The F1-W formula and the F1 formula are both based on:

A = initial baserunners = H + W - HR + HB

C = batting outs = AB - H + SH + SF

Double plays are accounted for in the B factor of F1-W; they have their own weight of -1.7.

In a version like F2, which is based on initial baserunners and all outs in C (AB - H + DP + SH + SF + CS), the DP has a lower (absolute) B weight (-1.36). The balancing occurs for all of the events in the B factor.

If you put in the actual event frequencies for the 1960-2004 period, the F1-W formula predicts 787,363 runs--the exact number that was scored (that is not to illustrate any accuracy claim, as it is designed to do just that. But it does balance).

Perhaps the confusion is from the fact that GIDPs are already counted in AB-H. Adding them in again counts both outs--the out recorded on the baserunner and the batter being retired.

I also should probably use "GDP" instead of "DP", since DP implies all double plays, but since I said upfront I am only dealing with the official offensive categories and the only such category is GIDP, I find the extra letters unnecessary, just like the P in "HBP".

Hopefully that clears this up despite its rambling nature :-)

I took a look at the F1-W version of the formula, ...

2008-07-04T04:14:00.000-04:00

I took a look at the F1-W version of the formula, and to get everything to balance, you have to use a C factor of AB-H-DP+SH+SF. For the F1-W version, you stated you were using AB-H+SH+SF for the C factor. Am I missing something here? Since you used AB-H-K-DP in B, you would have to use DP in C, or the linear weight value of a DP would be understated.

Thanks, I did not know that was a default conversi...

2008-06-23T17:58:00.000-04:00

Thanks, I did not know that was a default conversion.

Re: the walk coefficient, the first potential explanation as always is random variation with this particular sample (although 1960-2004 is pretty large, so it shouldn't be an issue). But the walk coefficient is always low to begin with, so it moving by .05 can take it into negative territory, whereas if I had gotten a coefficient of 1.9 for a double and some previous formula had 1.85 I probably wouldn't have even noticed.

The missing data probably doesn't help either. Tango's full version accounts for just about everything that could possibly happen...balks, wild pitches, runners left on base in innings that aren't played to completion...and he gets .05 or so for the walk.

Then there is Tango's explanation, which is true and could act in conjunction with 1 or 2 as well. (It is that BsR assumes that all baserunners will score with the same frequency, but of course this is not true...and you can go there to read the rest).

The good news is that the .025 walk rate version only overrates the walk by .007 runs, and is only a couple hundredths of a run worse in RMSE.

Great stuff. Any ideas as to why the walk coeffic...

2008-06-23T15:21:00.000-04:00

Great stuff. Any ideas as to why the walk coefficient flipped into the negative? I agree that it's probably correct to force it to be a positive number. Just seems surprising to me that this would happen.

Also, FWIW, the single space thing is default html convention (and european style, of course) and it takes a special html code to get a second space after a period. No idea why Blogger isn't doing that for you, though. -j