Wednesday, December 08, 2021

Rate Stat Series, pt. 15: Mixing Frameworks

Thus far I have employed what I’ve described as a “puritanical” approach to matching run estimator, denominator, and win conversion for each of the three frameworks for evaluating an individual’s offensive performance. While I think my logic for this approach is sound, I do not think it is necessarily wrong to mix components in a different manner than I’ve described. This will be a brief discussion of which of these potential hybrids make more sense than others, and a few issues to keep in mind if you choose to do so.

For the player as a team framework, there are many places in the process in which a batter’s value (at least to the extent that we can define and model it) as a member of a real team is distorted. At the very beginning of the process, a dynamic run estimator is applied directly to individual statistics. This creates distortion. Then the rate stat is runs/out; this doesn’t create a tremendous amount of distortion, as runs/out is a defensible if not perfect choice for an individual rate stat even if you don’t use the player as a team framework. Then if we convert to win value, we create a tremendous amount of distortion by essentially multiplying the run distortion by nine – instead of just mis-estimating an individual’s run contribution, we compound the problem by assuming that the entire team hitting like him would shape the run environment to be something radically different from the league average.

This is an opportune juncture to make a point that I should have made earlier in the series – for league average hitters, the distortion will be very small, and the differences across all of the various methodologies we’ve discussed will be lesser. I have focused in this series on a small group of hitters, basically the five most productive and five least productive in each league-season. These are of course the hitters most impacted by different methodological decisions. A league average hitter would by definition have Base Runs = LW_RC = TT_BsR (at least as we’ve linked these formulas in this series), would by definition have 0 RAA or 0 WAA, would by definition have a 100 relative rate, regardless of which approach you take. This series, and more generally my sabermetric interests, tend to focus on the extreme cases, and evaluating methods with an eye to applicability to a wide range of performance levels. Additionally, extreme players are the ones people in general tend to care the most about – you can imagine people debating who had the better offensive season, Frank Robinson in 1966 or Frank Thomas in 1994. I cannot imagine many people doing the same for Earl Battey and Gary Gaetti.

While we could avoid compounding the issues in the player as a team approach, but why bother? The framework is inferior to linear weights or theoretical team in every possible way, except one could argue ease of calculation gives it an advantage over TT. The only value to be had in evaluating a player as a team is a theoretical exercise, and if you lose commitment to that theoretical exercise it ceases to have any value at all.

For the theoretical team framework, you could just calculate TT_BsR, and then treat it in the same way as linear weights, or calculate TT_BsRP and treat it in the same way as R+. However, it would be pretty silly to go through the additional effort needed to calculate the TT run estimates instead of their linear weight analogues only to use them in the same way. Unlike in the case of player as team, there is no argument to be made here that you would be making the results more reasonable by doing so, as the TT estimates can be combined with team-appropriate rate stat denominators and team-appropriate conversions from runs to wins. Here the result of mixing frameworks would be extra work coupled with less pure results.

The only mixture of frameworks that makes sense, then, is to mix the run estimation components of the linear weights framework with the rate stat and win conversion components of the theoretical team framework. A logical path that might defend this approach would be: It is questionable whether the valuation of tertiary offensive contribution claimed by theoretical team approach are accurate or material. Thus our best estimate of a player’s run contribution to a theoretical team remains his linear weights RC or RAA. When it comes to win estimation, we are on much firmer ground in understanding how team runs and runs allowed translate to wins than we are in measuring individual contributions to team runs. We shouldn’t refuse to use this knowledge in the name of methodological consistency, but rather we should use the best possible estimates for each component of the framework. That means using the full Pythagenpat approach coupled with linear weights run estimates.

Convinced? I’m not, but let me walk through an example of how we could apply this hybrid approach to the Franks. We can start with our wRC estimate, which we will now use in place of TT_BsRP as “R_+” going forward for this hybrid linear-theoretical team framework. Then we can use any of our TT rate stats  - I’ll show R+/O+ rather than R+/PA+ here, as I think it’s the former that might serve to make this hybrid framework an attractive option. R+/O+ allows us to express the individual and team rate on the same basis and better yet, doing so while using the most fundamental of rate stats (R/O) as that basis.

O+ remains equal to PA*(1 – LgOBA), and these relative R+/O+ figures are the same as our relative R+/PA, which makes sense – the numerators are the same, and the only difference in the denominators is multiplying PA by (1 – LgOBA). So an alternative way of expressing the relationship is:

R+/O+ = (R+/PA)/(1 – LgOBA)

I will skip some steps here, since they were all covered in the last installment – no need to convert this to a W% and then convert back to a relative adjusted R+/O+.  

TT_R/O = (R+/O+)*(1/9) + LgR/O*(8/9)

TT_x= ((TT_R/O)*LgO/G + LgR/G)^.29

TT_WR = ((TT_R/O)/Lg(R/O))^TT_x

RelAdj R+/O+ = (TT_WR^(1/r) - 1)*9 + 1

It might be helpful to take a step back and look at our results for the relative adjusted metric (whether R+/PA or R+/O+) for each of the four options we’ve considered, which are:

A: linear run estimate and fixed RPG based on league average (final rate based on R+/PA)

B: linear run estimate and dynamic RPG based on player’s impact on team (final rate based on R+/PA)

C: linear run estimate and Pythagenpat theoretical team win estimate (hybrid approach discussed in this installment; final rate based on R+/O+)

D: theoretical team run estimate (full theoretical team approach; final rate based on R+/O+)

I included a third row which shows the percentage by which Thomas’ figure exceeded Robinson’s. The TT (D) approach maximizes each player’s value and also the difference between them. The hybrid approach (C) falls in between the pure linear approach (A) and the TT approach (D). One thing that I did not fully expect to see is that the linear approach that varies RPW based on the estimated RPG of a team with the player added (B) produces a lower estimate than any of the other approaches, and is the outlier of the bunch. I didn’t make enough of this in part 13, but we are actually better off assuming that the individual has no impact on RPW than adjusting RPW based on his own impact on the theoretical team’s RPG.

Note that when we translate to a reference league, we are fixing each theoretical team's RPG to the same level. In reality, the Frank Thomas TT will have a higher RPG than the Matt Walbeck TT for any given reference environment. However, this is not an issue, because the runs value we're reporting is not real. It is intended to be an equivalent run value that reflects the player's win contribution for a common frame of reference. When we use the full Pythagenpat approach, the theoretical team's winning percentage is preserved, so the runs are now an abstract quantity and do not need to tie out to any actual number of runs in this environment. In this sense, when we convert TT to a win-equivalent rate, we're doing something similar to what I said the linear weights framework was doing - we're making the run environments equal after adding the player. The difference is that in this case we are capturing the disparate impact of the players on team wins first, then restating a win ratio as an equivalent run ratio given the assumption of a stable run environment. Thus the runs are an abstraction, but the value they represent is preserved.

When we do the same with a dynamic RPW approach (i.e. when we use the Palmerian approach of adding the batter's RAA/G to the RPG and then calculating RPW), we run into difficulties because while we have fixed a WAA total, and can then translate that WAA to an equivalent for the reference environment, we have not taken into account that the batter would need to contribute more runs to actually produce the same number of wins. This is not a problem for the full Pythagenpat approach because we used a run ratio that modified the batter's abstract runs in concert with the run environment. 

Now there is a way we could address this, but it results in a mess of an equation that I'm not sure has an algebraic solution (whether it does or not, I have no interest in trying to solve it). Basically, we can solve for the RAA that would be needed to preserve the batter's original WAA in the reference environment. To do this, we need to remember to apply the reference PA adjustment. For this application, I think it’s easiest to apply directly to the batter’s original WAA, so WAA*Ref(PA/G)/Lg(PA/G):

We could then say that adjWAA = X/(2*(refRPG + X/G)^.71) where X = the batter's RAA and G could be the batter's actual game or some kind of PA-equivalent games as long as we’re consistent with what was used in the original WAA calculation. I used the Goal Seek function in Excel to solve the following equations for the Franks:

for Robinson: 8.552 = X/(2*(8.83 + X/155)^.71), X = 85.555

for Thomas: 7.217 = X/(2*(8.83 + X/113)^.71), X = 71.176

With these new estimates of RAA, we can get relative adjusted R+/PA by first dividing by each batter’s PA, then adding in the reference R/PA, and then dividing by the reference R/PA:

This approach allows us to more correctly calculate a win-equivalent rate stat for the linear weights framework when allowing RPW to vary based on the batter’s impact on his team’s run environment. However, given that I don't know how to solve for x algebraically, and my pre-existing philosophical issues with using this approach with the linear weights framework to begin with, I think that at this point there are two better choices:

1. If you want to maintain methodological purity, stick with the fixed RPW for the linear weights framework like I advocate

2. Use the hybrid framework, which embraces the "real" run to win conversion and actually makes this math easier

With "B" out of the picture, what we see for A, C, and D make sense in relation to each other. C produces a lower final relativity for great hitters like the Franks because it recognizes that when they inflate their team's run environments, RPW increases. D is higher than both because the initial run estimate is higher, due to using TT_BsR rather than LW, and thus giving each batter credit for their estimated tertiary contributions. And our revised linear weight approach falls somewhere in between.

Of course, it’s possible that I’m wrong about this, and the hybrid or theoretical team approaches that make use of Pythagenpat are overstating the impact of the player on the runs to win conversion. I can’t prove that I am right about this, but I would offer two rejoinders:

1. The full Pythagenpat approach produces the same W% for the theoretical team across different environments by definition, and that is the most important number at the end of the lined if we are trying to build a win-equivalent rate stat.

2. We should expect the results from the linear and the Pythagenpat approach to be similar (which they are after making proper adjustments), as our RPW formula is consistent with Pythagenpat for .500 teams. While the relationship between the RPW formula and Pythagenpat will fray as we insert more extreme teams, the theoretical team approach doesn’t produce any extremes for even great hitters like the Franks. For example, if we use the Palmer approach, Thomas’ 77.8 RAA in 113 games takes an average 1994 AL team that scores 5.23 R/G up to 5.92 R/G, which would rank second in the league, even with the Yankees.  This is not an extreme team performance when it comes to applying the win estimation formulas. If we use Pythagenpat, such a team would be expected to have a .5623 W% using the RPW formula and a .5620 W% using Pythagenpat. So we shouldn’t expect the results to diverge too much when applying the approaches to derive win-equivalent rate stats.

And with that, we have come to the end of everything that I wanted to say about rate stats. I will close out the series with one final installment that attempts to briefly summarize my main points with limited math.

Wednesday, December 01, 2021

Hitting by Position, 2021

The first obvious thing to look at is the positional totals for 2021, with the data coming from Baseball-Reference. "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the total for all positions, including pitchers (but excluding pinch hitters). “LPADJ” is the long-term offensive positional adjustment, based on 2010-2019 data (see more below). The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:

Having reviewed this data annually for about fifteen seasons, I would strongly caution about drawing any conclusions about shifting norms from single season results, which are better treated as curiosities (and as strong warnings against using single-year or other short time period averages in developing any kind of positional adjustment for use in a player value system). The most interesting curiosity then is shortstops outhitting left fielders, essentially even with third basemen and even DHs. The bumper crop of free agent shortstops has accentuated recognition of the current strength of the position, and their collective 2021 offensive performance lives up to the hype.

However, I think the most interesting group is the pitchers. The lowest PADJ ever recorded by pitchers was -5 in 2018; they rebounded to 0 in 2019 and now fell back to -4 after taking a year off. This on its face should not be surprising, but remember that pitcher performance was buoyed by Shohei Ohtani. What would it look like sans Ohtani?

While Ohtani had only 65 PA as a pitcher (1.48% of all pitcher PA), he accounted for 6% of their doubles, 5% of their runs, RBI, and walks, and 17% of their home runs). Ohtani’s hitting as a pitcher paled in comparison to what he did as a DH, although he still created runs at 13% better rate than the MLB non-pitcher positional average. Without Ohtani, pitchers would have set a new low with a -6 PADJ. It’s interesting to consider that if the DH is made universal in the new CBA, future seasons’ data will likely show pitchers combine for competent offensive performances.

My next table is usually total pitcher performance for NL teams, but such a display would be incomplete without including the Angels. All team figures from this point forward in the post are park-adjusted. The RAA figures for each position are baselined against the overall major league average RG for the position, except for left field and right field which are pooled:

The teams with the highest RAA by position were:

C--SF, 1B--SF, 2B--LA, 3B--ATL, SS--SD, LF--CIN, CF--BAL, RF--PHI, DH--LAA

Those are pretty self-explanatory although it’s fun that Shohei Ohtani is essentially responsible for two of his teams positions ranking #1--let’s see Boster Posey or Brandon Belt do that!

I find it more entertaining to gawk at the teams that had the lowest RAA at each position (the listed player is the one who started the most games at the position, which does not always mean they were most responsible for the dreadful performance):

Hunter Dozier pulled the reverse Ohtani as the leading starter at third and right; he hit .213/.286/.390 with 3.7 RG, so it wasn’t really his fault. It’s kind of sad to see Miguel Cabrera leading the Tigers DHs to oblivion, and it kind of was his fault. Cabrera’s overall line (.256/.321/.386 for 4.2 RG) wasn’t that bad, but in 180 PA as a first baseman he posted a 844 (unadjusted) OPS while in 335 PA his OPS was just 617. 

The next table shows the correlation (r) between each team’s RG for each position (excluding pitchers) and the long-term position adjustment (using pooled 1B/DH and LF/RF). A high correlation indicates that a team’s offense tended to come from positions that you would expect it to:

I didn’t dig through years of these posts to check, but Kansas City’s negative correlation may be the lowest I’ve ever seen. The Royals only above average positions were catcher, second base, and shortstop.

The following tables, broken out by division, display RAA for each position, with teams sorted by the sum of positional RAA. Positions with negative RAA are in red, and positions that are +/-20 RAA are bolded:

A few notes:

* Only five of the fifteen AL teams had positive RAA from their position players, while each NL division had three teams with positive RAA.

* Baltimore’s infield production was the worst in the majors at -91 runs, and only Cedric Mullins’ center field prevented them from having below average performance at all positions. Texas was saved from the same fate only by their right fielders who were just +1 run; the Rangers poor production is impressive for how consistently bad it was across the board.

* Their Texas neighbors were the opposite in displaying consistently good production across the board; Houston’s outfield was the best in the majors at +63 runs, while their infield was second in the AL to Toronto. 

* San Francisco and Los Angles were nearly mirror images of each other; the Dodgers narrowly edged the Giants for the top infield in MLB (+89 to +87), while their outfielders were both slightly above average (+5 to +4). LA catchers were outstanding (their 25 RAA tied for second in the majors with the Blue Jays and ChiSox), but the Giants were better at 32 RAA, giving them a three run edge for the majors top total positional RAA.

A spreadsheet with full data is available here.