Monday, January 27, 2020

Tripod: Ability v. Value

Since I haven't been producing much in the way of new content, I've decided to re-publish some of the articles I posted on my old Tripod website (see link on the side of the page if interested, or just wait for all of the content to show up here). I don't know how long that platform will exist, so the objective is to move stuff over to this blog to preserve it for myself. It's all old - most of it was written between 2001 - 2005, as when this blog started I switched to posting here. When I first started this blog, I had the crazy notion that the content would flow the other way - that I would convert blogposts into "article" format and move them to Tripod site. This piece may include the only successful such migration in the addendum, which first appeared on this blog. I've since written about most of these topics again here, and I certainly think my later work is better, more correct, etc. than the old stuff. I have not done any editing, so there are typos and "thens" in the places of "thans" and the like. I will be putting "Tripod:" in the tile of these re-posts. There will certainly be some statements that didn't age well - see "literal ability" below for a prime example.

This is a topic that gets brought up all the time, both directly and indirectly, in sabermetric circles. If you are discussing the rankings of players, Park Factors, era adjustments, or clutch hitting, this debate will quickly become an issue. Each definition of what we are trying to measure has certain things that should and shouldn't go with it, and so you need to clearly define what you are looking for before you start arguing about it. All of the different definitions are valid and useful things; but which one you are most interested in depends on your preferences and opinions. I personally am most interested in performance, with ability and value both being things I like to look at as well. Literal value or ability does not interest me at all(actually, literal ability probably doesn't interest many sabermetricians at all, because that's what scouts are for and they can probably do it better than we can, although not objectively). So here are the five definitions that I consider:

Literal Value

In a literal value method, you are looking to find the actual value of the player to his team. This means that if the player gets lucky in clutch situations or is used by his manager in a way that enhances his value beyond that of a player with identical basic stats who works in a less valuable situation(like being a closer v. setup man), you take this into account. Literal value is best measured through calculating the players impact on the Win Expectancy of the team, although Run Expectancy methods can also fall under this category. Examples of literal value stats include the Mills brothers' Player Win Average and Tom Ruane's Value Added Batting Runs.

Value

A value method uses conventional statistics, but attempts to do a similar thing with those as the literal value method did-determine how much the player has actually contributed to his team in terms of winning. The basic difference is the lack of a play by play database. It is impossible to implement a literal value system for, say, 1934 because the data that is required just doesn't exist. But in this category, if you have data like batting with runners in scoring position, you can include this. Or considering saves instead of just innings and runs allowed. Many value stats will try to reconcile the individual contributions with those of the team. Some examples of value stats are Bill James' Win Shares and Linear Weights modified for the men on base situations as Tango Tiger does.

Performance

Performance is the category that I am most interested in. In a performance method, you try to ascertain the players performance, based on his basic stats and with no consideration for what game situation they occurred under. A home run in a 15-2 game is just as valuable as a home run in a 2-2 game. A solo home run is equal in value to a game-winning grand slam. This is clearly wrong if you want to determine the players actual value, but many sabermetricians believe that clutch hitting effects are luck, so the method will correlate better from year to year if you look at all events equally. An appropriate Park Factor to couple with a performance measure is a run based park factor, although the line between performance and ability is somewhat blurred, so you could also use a specific event park factor. Some examples of performance measures are Pete Palmer's Total Player Rating, Keith Woolner's VORP, and Jim Furtado's Extrapolated Wins.

Ability

An ability method attempts to remove the player from his actual context completely and put him on an average team. The only proper park factors for an ability method are those that deal with each event separately, since a player can be hurt by playing in a park that doesn't fit his skills, like Juan Pierre with the Rockies. Here, you account for that. Other than that, an ability method will wind up being very similar to performance measures. I can't think of a pure ability method that is commonly used.

Literal Ability

Tango Tiger has called this skill, and that is a good description as well. Literal ability is not really quantifiable in sabermetrics. You can attempt to find a players' literal ability in a certain area of his game, like using Speed Score or Isolated Power. But a players total literal ability is hard to put your finger on. This is what scouts measure-they don't pay attention to the actual results the players put up, but rather how they look while doing it. Actually, if you wanted to do a sabermetric measure for literal ability, there are a host of other factors to consider. For example, I write about the silliness of adjusting for whether a player is right or left handed. This is all assuming you are measuring something other than literal ability. In a literal ability sense, a right handed hitter could be better than a left handed hitter in terms of their pure skills like speed and power, but be less valuable on the field because of the dynamics of the game.

If we can all decide which of the five we are interested in measuring, a lot of silly arguments can be prevented. People frequently criticize the Park Factors in Total Baseball because they are uniformly applied to all players, regardless of whether they hit lefty or righty or whether they have power or not. In terms of literal value, value, or performance, this is a proper decision. But if you want to measure ability, it is an incorrect Park Factor to use.

Additional Thoughts(added 12/05)

In the above article, I defined two classes of value, "Literal Value" and the regular "Value". Literal value, as I define it, involves only methods that track actual changes in run and win expectancy, like Value-Added Batting Runs or Win Probability Added. Value includes methods which use composite season statistics, but give credit for things like hitting with runners in scoring position or a pitcher who pitches in a lot of high leverage situation.

I also broke down ability into "Ability" and "Literal Ability". Ability is defined as "theoretical value", i.e. the value that a player would be expected to accumulate, on average, if he played in a given set of circumstances. Usually this would be our expectation for a player in a neutral park, but it could be "ability to help the team win games in Coors Field" or "in 1915" or "batting fifth in a lineup with A, B, C, and D hitting ahead of him and E, F, G, and H hitting behind him". There are all sorts of different ways you could define ability, but the mathematical result you get will be specific for the context you choose.

Literal ability goes even further, and attempts to distill the player's skill in a given area of the game (such as power, or speed, or drawing walks), or his "overall ability". This is very tricky, because nothing happens in a vacuum, everything happens in some sort of context, and so divorcing a metric from context is pretty much impossible. Therefore literal ability is more of a theoretical concept and not a measurable quantity (although methods like Speed Score are an attempt to measure literal ability in speed, but of course are acknowledged by their creators as approximations).

Anyway, to generalize, value is backwards-looking, and ability is forwards-looking (or at least what might have happened in a different context given the same production in a given timeframe).

The recent signing of BJ Ryan to a large contract by the Blue Jays has put the issue of when to time the value measurement into my head. Literal value methods like Win Probability Added value on a real-time basis. If at a given moment the probability of winning is 60%, and after the next play it increases to 62%, then the player responsible for that play is said to have added .02 wins. So a closer, who pitches as the highest leverage time, will come out with a higher WPA then a starter who had the same performance in the same number of innings.

But if we are ascertaining value after the fact, why do we have to do it in real time? Suppose that Scott Shields is called in to pitch on the road in the bottom of the seventh inning with a one-run lead. According to Tango Tiger's WE chart, the win probability is .647. He retires the side and at the end of the inning, the probability is .732, so he is +.085. He starts the eighth inning with a probability of .704, retires the side, and leaves with a probability of .842, so he is +.138 for the inning and +.223 for the game. In the bottom of the ninth, it is still a one-run game and Francisco Rodriguez is summoned with a win probability of .806. He finishes it off and of course the win probability is then 1, so he is +.194 wins. So Shields, for two innings of scoreless work, only gets .029 more wins then Rodriguez did in one inning. Is this fair? Sure, if you define value real-time. Rodriguez pitched in a more critical situation and his performance did more to increase the real-time win probability.

But since we are looking backwards, why can't we step back and, now, omniscient about what happened in the game, ascertain what value the events actually had? Each out in the game had a win value of 1/27, and since neither allowed any runs or anything else, we don't have to consider that. So Shields should have added 6/27 wins and Rodriguez 3/27. Viewed from the post-game perspective, Shields performance is much more valuable then Rodriguez'. Now you could also argue that if you took this perspective far enough, any event that didn't lead to a run in the end(like a hit that does not score) has no value. And that's a possible outcome of this school of thought.

Now the point is not that real-time value determinations are incorrect or invalid. They are simply a different way of defining literal value. But I would contend that they are not the only way to define literal value. It is one of the easiest to explain and define, and it certainly makes sense. I'm not arguing against it, just arguing that it is not an undeniable choice for what I have called "literal value". Of course, you can define "value" or "literal value" reasonably, in such a way as to make it an obvious choice.

No comments:

Post a Comment

I reserve the right to reject any comment for any reason.