“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies and statistics’.” Mark Twain(1)
Felix Hernandez, the workhorse ace of the hapless Seattle Mariners, was awarded the 2010 American League Cy Young Award despite having, at 13-12, by far the worst winning percentage among the candidate starting pitchers and the lowest number of wins by a starter in the history of the award. Hernandez won a lot easier on the ballot box than on the mound, collecting 21 of the 28 first place votes cast by baseball writers. The election set off a debate about the purpose of the award and the appropriate bases for selecting the winner. Many traditionalists, while acknowledging Felix’ enormous talent, are uncomfortable declaring that a barely .500 pitcher is the best in the league. Others, many of them in the sabermetric community, are trumpeting the selection as a triumph of scientific analysis.
For our purposes, I will accept the notion that the Cy Young Awards are intended to honor the best pitchers in each league and will concede also that the voters honorably and conscientiously seek to interpret the candidates’ records to determine who is best. I do think they have become overly mesmerized by the sabermetricians’ efforts to distill the pure essence of individual performances.
Historically, our definition of best pitcher emphasized the number of wins credited to the pitcher. For context, I have reproduced below the ESPN Cy Young Predictor developed by Bill James and Rob Neyer, based on a statistical analysis of past winners(2). As you can see, Hernandez tied or led all of the leading starting pitchers in every major category except won-lost record.
The N-J formula projected the Yankees’ CC Sabathia the winner followed closely by Tampa Bay’s David Price with Hernandez a distant sixth. I am not endorsing the N-J predictions, but I am using them because their research reflects the historical pattern of voting and is reasonably representative of the type of calculations, explicit or implicit, that most of us fans perform when determining our favorites for the award. My preferred candidates were David Price and John Lester. (I do not consider relievers valid Cy Young candidates so I never examined the three listed by N-J.) My impression based on listening to sports radio and TV was that Sabathia was the leading choice among commentators. So I was more than mildly surprised when Hernandez won by a landslide.
The N-J formula is the weighted sum of various statistics: Cy Young Points (CYP) = ((5*Innings Pitched/9)-Earned Runs) + (Strike Outs/12) + (Saves*2.5) + Shutouts + ((Wins*6)-(Losses*2)) + VB. (3)
Wins are the single most important factor in the N-J formula with a multiplier of 6 while innings pitched (divided by nine) is next with a multiplier of 5. Wins seem to have been the least important factor to the Cy Young voters. Using the N-J formula, had Hernandez been able to convert four of his losses to wins (17-8) he would have gained 32 points and passed Sabathia. Three conversions of losses to wins (16-9) and he would have passed Price and been only 3 points behind CC. Either a three or four game swing would put have put Hernandez in the same general range as that of Tim Lincecum who won the 2009 NL award with a 15-7 record and Zach Greinke, the winner of 2009 AL edition for a 16-8 season. But 13-12? To the extent that N-J captures accurately the historical consensus weighting of performance factors – what would justify such a large departure from traditional patterns? If Felix had been 12-13 with otherwise identical statistics, would the voters have still given Felix the reward? I can’t prove it, but I suspect that a losing record for a starting pitcher would be a non-starter. But, if we accept, as many do, the superiority of the granular statistics over the won-lost record, why not so award a losing pitcher?
The voters seem to have concluded that Hernandez should be judged only on, as an AP writer said, “things he could fully command.”(4) So voters discounted (ignored) his won-loss results and focused on the other measures with ERA, innings pitched and strikeouts weighted most heavily. The sabermetric community is reported to have emphasized Wins Over Replacement Player (WARP), a measure of calculated wins relative to those credited to a player of slightly below average major league quality, often a triple-A player of the sort likely to be called up to replace a player (pitcher in this case).
I love to compare and debate player statistics as much as anyone but I do want to point out something that is too often neglected – baseball statistics don’t really measure individual performance. The pitcher has total command over absolutely none of the factors measured by statistics though his influence is relatively greater in some areas than others. For example, the pitcher is relatively more responsible for his strikeout and walk numbers than for his won-lost record. In this sense, emphasis on granular statistics such as ERA can provide some insight into the relative performance of pitchers.
No one will be able ever to develop a perfect measure of a player’s contribution to his team’s success. Not because of the fact that teams win or lose together, as your old coach told you (although he was correct) and not because it is a difficult analytical challenge but because it is logically and mathematically impossible. This is because a baseball team is engaged in what economists call joint or cooperative production and it is impossible to determine precisely the marginal value of an individual team-member in a joint production setting.
Pitchers are the most obvious examples of joint production of outs and runs. Baseball is fairly unique in that outputs are produced jointly by the defense and offense. A hit or an out is produced jointly by the batter and the pitcher and, in most cases, his fielders(5). In fact, in the early days of the game the pitcher’s explicit function was to put the ball in play by throwing it “for the bat” and then later to throw either above or below the belt as directed by the batter. Today, some batter-pitcher combinations are more “efficient” at producing hits (to the chagrin of the pitcher) and thus the manager is interested in the historical rates of success his batters have had against that day’s opposing pitcher.
The team’s final output is a win or a loss. Wins or losses are the result of the intermediate products called runs and outs. These final and intermediate outputs are produced jointly by the team members whose efforts comprise the team production function. Any team production function involves at least two inputs and is not separable into the sum of two input functions. In other words the output of team members is not additive and this has serious implications for estimating functional relationships. This point, made by Armen A. Alchian and Harold Demsetz in 1972 is not a trivial or esoteric technicality but a fundamental basis of our modern understanding of the nature of firms and the role of managers.
Alchian and Demsetz observed further, “In team production, marginal products of cooperative team members are not so directly and separably (i.e. cheaply) observable. What a team offers to the market can be taken as the marginal product of the team but not of the team members (emphasis added). Clues to each input’s productivity can be secured by observing the behavior of individual inputs…” The impossibility of perfect measures of individual performance and contribution in a team setting is the primary reason that human managers are important; judgment based on experience and observation is a necessary ingredient.
Team or joint output whether it be runs or wins cannot be attributed precisely to the efforts of the individual team members. The fact that the batting order matters is evidence of the joint nature of producing runs. The manager seeks the batting order that maximizes the joint production of his hitters. Therefore responsibility for output or runs is shared. Branch Rickey recognized long ago that runs batted in (RBI) could be a very misleading measure of individual contribution because it depends greatly on the success of the preceding batters.
Statisticians and sabermetricians using regression analysis must assume the separability (additivity) of the baseball production functions. (Even non-linear regression is really a set of linear regressions covering portions of the observed data.) The typical multivariate regression formula seeks to explain the dependent variable, Y, as the weighted sum of the independent variables, xi, where the weights or coefficients, Bi, represent the proportional influence of each independent variable plus a residual. If the underlying function is not additive then the use of a linear or additive regression technique introduces bias into the estimate. The severity of the bias depends on the underlying form of the production function. Another layer of distortion or bias comes about when sabermetricians use regression coefficients to build composite models such as Wins Above Replacement Player. The true WARP (if such a thing exists) is almost certainly non-linear and so constructing a linear representation of it using biased coefficients based on linear representations of other non-linear functions seems capable of introducing serious distortions into the analysis. I am not saying that the modern statistics are not useful; many are intriguing but all contain a measure of distortion and their use involves some bias and it cannot be said they are necessarily more objective than some of the old stand-bys.
My other, and in many ways more important, concerns about discounting wins and losses are that the pitcher often contributes to a team’s wins in ways that are not evident even with the most granular of statistics. Winning a ball game in the major leagues is very hard, winning around twenty games is extremely hard and to do so a pitcher must overcome a number of specific moments or threats where a single pitch can have significant consequences. Some pitchers seem to make those pitches when they most need to and so win more games than do pitchers with similar skills and statistics. Pitchers’ effort is somewhat elastic; a pitcher with a big lead will tend to bear down less than one in a tight game. This is rational; the return on full effort adjusted for the cost of potential injury or arm fatigue is not as high with a five run lead as with a one run lead and a man on second. This is true of games as well, a pitcher who can win late in the season with a postseason spot on the line or in the postseason with a title on the line has often to perform at a higher level to overcome the extra intensity of his opposition at those times. Those wins, in my opinion, reveal more about the pitcher than do wins earlier in the year. But if we discount all wins as “being beyond the control of the pitcher” we sacrifice that value and information. So, until we have a rigorous and unbiased model of the baseball production function and means of imputing personal performances into that model, I think we should continue to give a lot of consideration to the actual results of games played.
Posted by Bob
Footnotes (for the appearance of scientific rigor):
1.There is some evidence that Twain misattributed the origin of the line to Disraeli, see http://www.twainquotes.com/Statistics.html where Courtney and perhaps Carlyle are suggested as possible antecedents.
2.The Neyer-James model seeks to predict the award winner based on historical evidence and patterns; it does not necessarily reflect either observer’s own judgment as to who should win.
3.Saves almost always accrue only to relief pitchers. The VB term is a bonus to reflect the historical tendency to recognize pitchers from league-leading teams but did not come into play this year.
4.Felix Hernandez wins AL Cy Young, November 18, 2010 http://sports.espn.go.com/mlb/news/story?id=5820623
5.The catcher is involved in the sense he is necessary for the pitcher to perform his part of the process so it would be reasonable to say a hit involves 3 players. An out, whether by strikeout, force-out, throw-out or caught fly, involves at least three players. Other forms of out, such as pick-offs, usually involve three or more players to produce.