Comments on Walk Like a Sabermetrician: On Run Distributions, pt. 3: Zero Modification

I can agree with everything in that statement. So...

2012-06-26T16:05:35.716-04:00

I can agree with everything in that statement.

Something I overlooked in Miller's article. He's using the three parameter Weibull. I just assumed he was using the regular Weibull. When people use the term Weibull, they usually mean the two parameter Weibull from survival analysis. Fitting the 3 par to data is problematic as sometimes max like doesn't exist in closed form. I'm not even sure how to fit the 3 par W to data.

I agree completely that he hasn't derived a wa...

2012-06-22T17:17:08.986-04:00

I agree completely that he hasn't derived a way to calculate custom exponent. However, the fact that the parameter varies by context indicates that it is picking up on a similar phenomenon to what drives custom exponents.

Or to put it another way, I don't see any reason to believe that the Tango-based method of estimating R/G distribution is any more accurate than Weibull (perhaps it is, but that has not yet been demonstrated). And since there's no real evidence that it's a better fit to the R/G distribution, I don't see any reason to believe that it would produce a better estimator of W%. Which is certainly not to say that one shouldn't try and see if it in fact does.

My understanding is that they are fixed for a leag...

2012-06-22T00:45:46.652-04:00

My understanding is that they are fixed for a league in a given time period. If your exponent is 1.74 (Max Like) then it's 1.74 for the league in a given time period (AL 2004?). If it's 1.79 (Least Sqs) then it's 1.79 for the whole league.

He hasn't derived the pythagenpat or pythagenport. He's simply derived Bill James' fixed exponents with exponents resulting from means and variances. Deriving the pythagenpat or pythagenpor would require a distribution (at least somewhat) different from the Weibull.

I think the convolutions of the Tango are a road to an estimator. I just don't have the skills to do it.

What method do you have in mind when you say we ca...

2012-06-21T17:00:30.897-04:00

What method do you have in mind when you say we can already do better than Pythagorean? I'm not aware of anything more accurate than Pythagorean with a custom exponent. The value of the exponent in Miller's derivation is not fixed, but rather one of the parameters that defines the shape of his Weibull distribution (if my recollection is correct--it's been a while since I read it and the math was challenging for me anyway).

I've read that paper, in BTN I believe. The p...

2012-06-20T23:08:00.600-04:00

I've read that paper, in BTN I believe. The problem with Weibull is that it leads to the pyth and we can already do better than the pyth.

I believe the track to follow is BVL's algorithm:
F(n)=dr^n * f(0)^9 * q

where q is sum i=1 to min(i,9)of (9 i)*(n-1 i-1)*(c/f(0))^i

i=1 gets you a constant (power=0)
i=2 gets you a line (power=1)
i=3 gets you a quad (power=2)
...
i=9 gets you (power=8)

The sum (q) is an 8th degree polynomial.

It's a hairy function, but it seems continuous which would take us out of the realm of discrete math and into calculus.

(9 i)*(n-1 i-1) is the same regardless of RI and c.

Yes, it would be wonderful if R/I followed the geo...

2012-06-20T17:08:11.009-04:00

Yes, it would be wonderful if R/I followed the geometric distribution, as then runs in any N innings would just be a negative binomial with r = N. For whatever reason, in reality the R/I and R/G distributions don't easily relate, which I think is one of the reasons why there's been relatively so little published on estimating it.

With respect to a continuous approximation, there are some articles out there on using the Weibull distribution and a paper by Steven Miller on using Weibull to approximate the Pythagorean relationship. I believe I linked a couple in the first installment of this series if you are interested.

If the c value of the Tango distribution were 1, t...

2012-06-20T15:12:41.489-04:00

If the c value of the Tango distribution were 1, then RI would be geometrically distributed (NB distributed with r=1). It would really be easy to calculate RG, just use NB with r=9 (the sum of geometric or NB with the same B are NB with r=sum of r's). Unfortunately that doesn't work with zero inflated NB's. I don't know why r seems to be around 4 for the RG distribution.

It would be nice to find a simple easy to work with continuous approximation to the NB that would allow analytic derivatives so a win estimator could be derived.