Walk Like a Sabermetrician: Offensive Percentages and Overall Productivity

This one is really dated, so I’ll just point out that it was written in 2010.

Joe Mauer had the biggest power season of his career in 2009, and it was not surprisingly also his best overall offensive campaign. Still, Mauer was a very productive batter in 2006 and 2008 with much less power. How unusual was that? How good of a hitter would we expect someone of his relative H/W/P profile to be? Those are the kinds of questions that this discussion touches upon.

I was inspired to look into this by a Twitter conversation I had in mid-late June (2010, mind you). I had read a comment on BTF about how the Twins must be terrified by Mauer's power drop this year given the huge investment they made. This led me to tweet that Mauer had been arguably the best position player in the league in '06 and '08 while hitting for relatively little power, and that his value was not necessarily dependent on power. In addition to the power drop in '10, he'd seen his walk and single rate declines, and that was sapping his value as much as losing power.

Someone responded by saying that 2009 was the first season in which Mauer's power relative to the league was equal to his value relative to the league. I responded that his ISO ratio was much lower than his wRC ratio, and this led to a tangent about the slope of ISO versus runs.

However, the question that the conversation brought to mind was the typical relationship between overall offensive value and the share of that value that is derived from hits, walks, and power.

I'm going to look at all players 2000-2009 with 400 or more at bats in a season, and compare their H/W/P to their RG. Then I'm going to run some cringe-worthy regressions (but be comforted by the borderline freak-show nature of the topic itself), and then we can all find something more productive to do.

The strongest correlation between any of H/W/P with RG is H%, which has a r of -.64 (P% is +.54 and W% is +.34). H% has a negative correlation with RG; the higher the proportion of positive linear weight contribution (I'm going to stop using that mouthful and start calling it "value", but please remember what I really mean is positive linear weight contribution), the lower the RG.

The best way I found to estimate H/W/P from RG is to start by simply estimating H%. The best correlation for a simple regression comes from using the natural log of RG:

eH% = -.1883*ln(RG) + .9242

where RG = (TB + .8H + W - .3AB)*.324*25.2/(AB - H)

I'm a little hesitant to even mess with logs in such a trivial application, but it gives a slightly better fit and it does a better job of matching the high RG outliers (read: Barry Bonds). Fretting about those outlier Bonds seasons may be problematic from a statistical perspective, but I think it has some grounding in baseball logic. It makes intuitive sense that H% will be lower as RG increases; the upper bound of observed seasonal BA is around .420. A .420 hitter with little power (.08 ISO for a .500 SLG) and moderate walks (.475 OBA, which means .1 W/AB in this case) will only have a 9 RG. In order to be a historic-level performer, one has to excel in both batting average and secondary average. The log regression seems to strike a balance between the two.

After H% is removed, it's hard to find much of a correlation between RG and P%/W%. I figured the percentage of non-hit value contributed by power (P%/(P% + W%)), and its r with RG is just +.06. So I decided to keep it simple and simply use the average for everyone: 63% of non-hit value comes from P%, 37% from W%:

eW% = (1 - eH%)*.37
eP% = (1 - eH%)*.63

These estimators work pretty well for players when grouped by RG. In the chart below, "2" indicates players with RG between 2-2.99; "4" for 4-4.49; "4.5" for 4.5-4.99; "7" for 7-7.99, and so on:

Really, I could have just dispensed with the estimators and just used the chart to estimate H/W/P for players of different ability levels, but where would be the fun in that?

Here is Joe Mauer's actual and estimated H/W/P breakdown for 2005-2009 and the first half of 2010 (which is current, as of the moment I actually wrote this):

To this point, Mauer's 2010 has been just about his worst offensive season (without an adjustment for league scoring context). Mauer has always had a higher H% than the average player with his RG. Even with his 2009 power surge, he had a lower P% than expected (24 to 31%).

Mauer's career high P% is 24%. That is the typical value for a player with a RG of 4.8-5.3. So even in Mauer's best power season, his P% is below a typical P% for a player with a RG lower than that in Mauer's worst overall season (yes, that is an awful sentence).

While Mauer has an unusual profile, I wouldn't describe it as extremely unusual. The four largest deltas between H% and eH% in 2000-09 all belong to Ichiro Suzuki, with H%s over 75% with expectations in the high 50s. Juan Pierre and Placido Polanco are two other players whose names pop up on that list. Limiting the group to players with RG > 7, Mauer is the only batter whose name appears twice in the top ten deltas.

Looking at the P% deltas for players with RG > 7, Mauer's 2006 was the largest (19% actual, 29% expected), and his 2009 was tenth. Barry Bonds' 2002 even manages to rank fifth despite 45 homers, because 22% of his value came from walks.

Whether Mauer is able to approach the value projection implied by his contract without retaining some of his power games is a question best left for the projection mavens. However, just looking at his career to this date, Mauer's power has always made up much less of his value than a typical player at his level of offensive productivity, and his 2009 was no exception (albeit slightly less extreme). At least to this point, Mauer has been one of the most valuable hitters in the game while relying on power to the same extent as a league-average performer.