Comments on Walk Like a Sabermetrician: End of Season Statistics, 2008 Patriot:

I always look for to this post every year, and you didn't disappoint.

I would love to hve the spreadsheets:

KJOKBASEBALL
AT
YAHOO.COM

---

That's not a very well-written article. It just piles on idea after idea without any kind of master plan on how to organize it or tie it all together.

It has all the hallmarks of the writings of a mad genius, except the genius part. Patriot,

I was just frustrated at myself for not being able to follow your section on Win% Estimators on your other website. 

Z-scores, derivatives, slopes, logarithms, etc... My head felt like it was going to explode.

---

Hear that sound? That's the sound of the mathematical geniuses of the world laughing at the suggestion that I might be in their ranks.

There's no doubt that I have mathematical skills and training at a level greater than the average member of the population--but I assume that you and just about everybody drawn to sabermetrics does too. 

I don't use any kind of math beyond what I learned in Calculus II or III in college (which is good, because I didn't go far beyond that level), but I don't explain it particularly well, so it probably seems a lot more complicated than it is. "Partial differentiation" sounds a lot scarier than the actual process of taking a derivative is.

---

Patriot,

Thanks for posting the link.

I am not trying to be sarcastic here, but are you some sort of mathematical genius? Reading some of your works, especially the mathematically heavy ones on your other website, I end up getting lost most of the time. Of course, this is not your fault, but a testament to my lack of intelligence. If you don't consider yourself a mathematical genius, then where did you get your mathematical training? If you say that it was mostly self-taught, then you are a mathematical genius.

---

In the spirit of shameless-self promotion, I've created a blog. My first sabermetric post will be about Linear Weights. Look for it sometime over the weekend. I did write an introductory post:

http://thehumanraindelay.blogspot.com/

Note, that I stole Mike Hargrove's nickname for the title of my blog.

---

OK, I forgot that you were including CS in your ERP formula. There are ton of great ideas in this post. I've re-read it at least 3 times so far.

---

I use outs = AB - H + CS, which average around 25.5 (25.45 and 25.53 in the AL and NL this year, respectivley).

I do use 25.2 for AB-H; if you look at the "dRA" formula in the post, there is a multiplication by 25.2 for that very reason.

---

Patriot,

I get 25.2 Outs/Game when using AB-H from 1993-2007. Any reason you use 25.5?

---

Patriot,

aidenbdud@yahoo.com.

thanks so much. I appreciate all you guys do and I love to read it.

Terpsfan, I agree with p. Having a place where all your work is organized I would image is very useful. Hey, I would read it.

---

I agree with not using the one year empirical data.

You should start a blog, if for nothing else just as a place to keep all your research in one place for your own reference. Any readership etc. is a bonus. I used to do all of the stuff I do here even before I was posting it on the internet.

---

After looking at your 2008 batting data and seeing how great your ERP formula worked, you have convinced me to switch to a simpler method. I've thrown out 8 categories (XI,PkO,PkE,BK,PB,WP,DI,OA) and combined one (ROE+RFC). When throwing out categories I need to be careful when reconciling the events that I'm including so that the Linear Weights and Runs Created values remain in proportion with one another. This involves making a slight adjustment to the Runs Per Out (the Total Runs Per Out, not the rate stat) for each event. Using the data for all seasons (see the "LW All Seasons sheet" in my spreadsheet), here is what the new LW and RC look like:

LW RC Event
0.464 0.467 1B
0.767 0.770 2B
1.051 1.052 3B
1.404 1.404 HR
0.310 0.310 UIBB
0.164 0.164 IBB
0.337 0.337 HBP
0.493 0.495 ROE
-.087 0.111 SH
-.271 -.098 OUT
-.273 -.109 SO
0.192 0.192 SB
-.419 -.267 CS

These will be the LW I plug into my BsR equation used to estimate LW for pre-1954 seasons.

[Note on what follows below. In my head I understand the process I'm going to explain, but don't know how to express it clearly.]

The empirical LW will form the backbone of the BsR equations used to estimate LW for years after 1953. You might be saying, why not use the empirical data for each season? Well, I don't feel quite comfortable using only 1 season's worth of data. But I do feel comfortable combining the empirical data and then plugging that into Baseruns to generate single season Linear Weights. I will combine the empirical LW in this manner to form BsR equations used to estimate the single-season LW for post-1953 years:

ML: 1954-1962
ML: 1963-1972 
AL: 1973-1985
NL: 1973-1985
AL: 1986-1992
NL: 1986-1992
AL: 1993-2007
NL: 1993-2007

Thanks for listening to my ideas here. Maybe this project will give me an excuse to start my own blog.

---

My goal here was to calculate a set of LW that I could apply to individual hitters. So the uncertainty on my part is probably due to the fact that I'm including categories you really can't apply to individual batters. Although, I like the idea of giving the baseruuner credit for PB and WP advances like you did in your 1876-1881 series. I could probably get rid of Balks, Defensive Indifference, and Other Advance, but then the LW and RC wouldn't reconcile without fudging them. Of course, they're not going to reconcile anyway when I apply them since the LW were derived from a dataset that excluded partial and home-half of the ninth and later innings. Once I iron out the details, I'm going to attempt to use Baseruns to estimate Linear Weights back to 1876. So if I don't want to include the 3 categories I listed above (BK,DI,OA), then I probably shouldn't include them in my Full Baseruns formula. Ignoring Balks definitely won't be an issue since they were hardly ever called prior to the Play-By-Play era (1954-2008).

---

Terps, my suggestion would be to just present the results. You seem to have included all of the calculations in the spreadsheet. I would just hide more columns so that only the real good stuff (LW, BsR formula, RE). For example, in your RE sheet, I would just display the situation, the frequency, and the RE, and hide all of the other columns.

---

Patriot,

Thanks for posting the links. Consider the spreadsheets "beta versions" as I can't quite seem to find a decent way to present the data. I'd be interested to here your suggestions about how to clean things up a bit.

---

The reason I use Google is storage space. I used to post them on my free Tripod site, but that only has 20 MB of space. Google has (essentially for my purposes) unlimited space. 

Also, I do like that people can view the results without having to download the file. Some people are interested in tinkering, but a lot just want to know the results.

That said, I will be happy to email you the excel versions of whichever spreadsheet you want if you send me an email.

---

OK, I figured out that I was using the wrong link. I needed to post the link to the published version.

For the Linear Weights:

http://spreadsheets.google.com/pub?key=pzy9IhjJPqas3SLH5qvVTYg&hl=en

For the Run Expectancy data:

http://spreadsheets.google.com/pub?key=pzy9IhjJPqasczX-d6q_eUA&hl=en

---

Aww, any reason you chose to use Google Spreadsheets? I absolutely love being able to investigate the structure of stats and formulas and being able to experiment with them in excel. It's great for learning both Excel and sabermetrics. That's why I love your sites so much. Darn it.

Oh well then, anyway thanks so much for putting this together.

---

Thanks for the offer, but if I'd wanted a better formula, I'd have put some initiative in on it myself. I do look forward to seeing your spreadsheet though--feel free to leave a link in the comments here (or in a related post) if you'd like.

When I first learned Excel, one of the first big things I did was enter Brock2 by hand from the back of the 85 Abstract (this was a good 12-13 years after it was actually published, mind you). I never did actually play around with it too much...I remember putting in Manny and Thome and my other Indian heroes of the time. 

The reason I've used the Willie Davis approach here is that I want to maintain a run park factor approach, while maintaining the same ERP if you used the park-adjusted BA/OBA/SLG to figure ERP as if you figured ERP, then park-adjusted. I'm not crazy about it's application to the questions of moving players into radically different contexts, but the vast majority of the PFs are +/- .05 of 1 so I'm a lot more confident in it.

---

The "Willie Davis" method has always intrigued me. Before I knew any better, I used it with Runs Created. I eventually set it up to use Baseruns, but never got back to it. In fact, I substituted Baseruns for RC in Brock2, and the projections look a lot more reasonable. Brock2 with RC would give insanely high predictions for high O.P.S. players. 

If you tell me what years you'd like to include in your dataset, I could derive a simple ERP equation for you. I just calculated empirical LW for each season, similar to what Ruane did. Before, I used the entire Retrosheet era for the Run Expectancy chart. Obviously, the sample sizes are small for single season LW, but now I can combine them together any which way I want. I'll post them on Google Docs shortly. I'm still fooling around with the pivot-tables.