Tuesday, March 19, 2013

Calculus

Those of us who are well-versed (or at least like to delude ourselves into thinking we are well-versed) in mathematics have doubtlessly encountered someone who uses "calculus" as a shorthand way of saying "complicated math that I don’t understand". For many people, calculus stands as the last branch of mathematics to which they were exposed in school (the progression might generally be arithmetic, algebra, geometry, trigonometry, "pre-calculus") and is thus a convenient term under which to amalgamate any sort of math beyond one’s grasp. (I do realize that there exist broad dictionary definitions of calculus which extend beyond the specific branch of math to which the title generally applies).

These sorts of statements are harmless enough--while I certainly don’t want to encourage even more ignorance about math than already exist in popular discourse, a factually incorrect flippant reference to calculus never really hurt anything. I’ve even found one to be a welcome bit of levity for those of us who know a little calculus in the room on a particularly boring conference call.

It can be a little more obnoxious when "calculus" is brought out as a jab against those using quantitative analysis--in this case, of course, I’m thinking about sabermetrics. Those opposed to sabermetrics sometimes include snide references to calculus in the same vein one might talk about slide rules or spreadsheets. Still, these are easy enough to brush off or laugh at--bragging about one’s own ignorance is not impressive.

What prompted this post was not a calculus barb directed at sabermetrics, but one of the reactions to such a barb in a Baseball Think Factory thread: a flat out statement that "calculus has no place in baseball statistics". On one hand, I really should just ignore this. The statement itself is so outlandish as to be difficult to respond to. It’s akin to saying that "cymbals have no place in music" or that "rice has no place in one’s diet". Calculus is obviously not used directly by most sabermetricians, and one can certainly be a practice high-level sabermetrics without using any calculus. But to simply write off the possibility of using an entire branch of mathematics in the discipline is absurd.

At this point I need to issue a disclaimer regarding my own use of calculus. I am not by any means an expert on the topic--my calculus education consists of two years in high school and college. Calculus is an extensive branch of math and what I know only scratches the surface, and thus the sabermetric applications I use might only scratch the surface of what could be possible, but I’m not well positioned to speculate on what other applications might be.

Fundamentally, though, the type of calculus that I use is nothing more complex than examining changes in functions. Any time you have a mathematical function that varies according to one or more independent variables, a change in those variables results in a change in the output of the function. Calculus provides a systematic way to study that change.

It should be noted that calculus does not inherently involve complex computations--it certainly can, but for many simple functions, taking a derivative is a piece of cake. It is certainly easier to take the derivative of the function Y = 3X^2 + 6X - 2 with respect to X than it is to divide 15 into 309, but the former is something done by college students and the latter something done by fourth graders. Granted, the understanding of why the derivative of that function is 6X + 6 involves higher level understanding (by “why” I’m getting at the reasoning behind the rules of differentiation, not the application of those rules), and one can easily cite functions that are impossible to differentiate cleanly. Still, the level of difficultly need not be very high at all.

If you don’t think there are any applications of studying rates of change in functions to sabermetrics, then you must not use any mathematical functions in your practice of sabermetrics. (Which would mean that one would not really be a sabermetrician at all, but hey, let’s not get judgmental about it). In fact, questions of this nature come up all the time. For example, how do OPS and OPS+ value singles relative to walks? While there are a number of ways to infer an answer to such a question (many of which involve pseudo-calculus, such as the "+1 method"), it can be definitively and precisely answered using calculus.

Sometimes, the lengths to which authors of sabermetric articles will go to avoid using calculus defy belief. Take for example this article published at The Hardball Times. I do not wish to make a punching bag of the authors of this piece, but there’s no need for me to make up strawmen when a perfect example has been recently published.

You may read this and say to yourself, "Sure, I concede that calculus is useful for understanding how metrics behave and how they weigh various inputs. But those aren’t baseball questions--those are just math questions involving baseball metrics. Understanding how OPS values doubles tells us something about OPS, but OPS is itself a construct that provides a simplified model of the effectiveness of a baseball offense. Understanding how OPS works does not actually teach us anything about baseball."

To this, my rejoinder is that we create metrics to further our understanding of baseball and to distill the things that we know into a useable format. We know that there is a relationship between runs and wins, and Pythagorean formulas are a way of formally expressing that relationship. If we establish that the Pythagorean formula is a useful model that captures some degree of the real baseball relationship between runs and wins, then by extension we can learn more about that baseball relationship by understanding the equation. If we can’t learn anything about baseball from studying the behavior of the equation, then the equation is by definition not useful and we need to go back to the drawing board.

I do not wish to give the impression that I think the application of calculus is central to the current practice of sabermetrics. Clearly it is not, given the paucity of work applying it to sabermetric questions. But it is another tool at our disposal, and one that is perfectly suited to assist in the types of sabermetric questions that have always interested me. Calculus certainly has vast applications in understanding the mathematical relationships between sabermetric formulas. Why can you predict team runs scored fairly accurately (at least in a normal team context) using a dynamic equation like Base Runs or a linear weights equation? Why does any variant of the Pythagorean family of win estimators match up so well in practice with linear equations that follow the rule that ten runs = one win? Calculus is also inherent in any sort of exercise involving hypothesis testing, even if it is only implicit. After all, the normal distribution is defined as an integral of a particular function.

I will close with a list of links to articles on this blog that have used calculus in some manner. As you will see, the scope of topics that I have applied calculus to are fairly limited--mostly to understand how events are valued in various offensive measures and to estimate runs per win from non-linear win estimators. Hopefully those of you with more imagination and a broader range of research interests can come up with other applications. Even if what I’ve written about did represent the full extent of possible applications of calculus in sabermetrics, it should be clear that there is a place for it. And if there wasn’t a place for a branch of mathematics which has countless applications in the sciences, statistics, and probability in sabermetrics, I’d suggest it would be time to re-evaluate how we practice sabermetrics.

Intrinsic Weights for Steve Mann’s RPA

Intrinsic Weights for Mike Gimbel’s RPA

Runs Per Win from Pythagenpat

Bill Kross’ W% Estimator

Fibonacci Win Points

Intrinsic Weights for OPS and OPS+

Intrinsic Weights for “dq”’s OTSE

Intrinsic Weights for Bases per PA and Out

Intrinsic Weights for Equivalent Runs

Intrinsic Weights for Runs Created

Intrinsic Weights for Base Runs

No comments:

Post a Comment

Comments are moderated, so there will be a lag between your post and it actually appearing. I reserve the right to reject any comment for any reason.