That’s Why They Play the Games

Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies and statistics’.” Mark Twain(1)

Felix Hernandez, the workhorse ace of the hapless Seattle Mariners, was awarded the 2010 American League Cy Young Award despite having, at 13-12, by far the worst winning percentage among the candidate starting pitchers and the lowest number of wins by a starter in the history of the award. Hernandez won a lot easier on the ballot box than on the mound, collecting 21 of the 28 first place votes cast by baseball writers. The election set off a debate about the purpose of the award and the appropriate bases for selecting the winner. Many traditionalists, while acknowledging Felix’ enormous talent, are uncomfortable declaring that a barely .500 pitcher is the best in the league. Others, many of them in the sabermetric community, are trumpeting the selection as a triumph of scientific analysis.

For our purposes, I will accept the notion that the Cy Young Awards are intended to honor the best pitchers in each league and will concede also that the voters honorably and conscientiously seek to interpret the candidates’ records to determine who is best. I do think they have become overly mesmerized by the sabermetricians’ efforts to distill the pure essence of individual performances.

Historically, our definition of best pitcher emphasized the number of wins credited to the pitcher. For context, I have reproduced below the ESPN Cy Young Predictor developed by Bill James and Rob Neyer, based on a statistical analysis of past winners(2). As you can see, Hernandez tied or led all of the leading starting pitchers in every major category except won-lost record.

The N-J formula projected the Yankees’ CC Sabathia the winner followed closely by Tampa Bay’s David Price with Hernandez a distant sixth. I am not endorsing the N-J predictions, but I am using them because their research reflects the historical pattern of voting and is reasonably representative of the type of calculations, explicit or implicit, that most of us fans perform when determining our favorites for the award. My preferred candidates were David Price and John Lester. (I do not consider relievers valid Cy Young candidates so I never examined the three listed by N-J.) My impression based on listening to sports radio and TV was that Sabathia was the leading choice among commentators. So I was more than mildly surprised when Hernandez won by a landslide.

The N-J formula is the weighted sum of various statistics: Cy Young Points (CYP) = ((5*Innings Pitched/9)-Earned Runs) + (Strike Outs/12) + (Saves*2.5) + Shutouts + ((Wins*6)-(Losses*2)) + VB. (3)

Wins are the single most important factor in the N-J formula with a multiplier of 6 while innings pitched (divided by nine) is next with a multiplier of 5. Wins seem to have been the least important factor to the Cy Young voters. Using the N-J formula, had Hernandez been able to convert four of his losses to wins (17-8) he would have gained 32 points and passed Sabathia. Three conversions of losses to wins (16-9) and he would have passed Price and been only 3 points behind CC. Either a three or four game swing would put have put Hernandez in the same general range as that of Tim Lincecum who won the 2009 NL award with a 15-7 record and Zach Greinke, the winner of 2009 AL edition for a 16-8 season. But 13-12? To the extent that N-J captures accurately the historical consensus weighting of performance factors – what would justify such a large departure from traditional patterns? If Felix had been 12-13 with otherwise identical statistics, would the voters have still given Felix the reward? I can’t prove it, but I suspect that a losing record for a starting pitcher would be a non-starter. But, if we accept, as many do, the superiority of the granular statistics over the won-lost record, why not so award a losing pitcher?

The voters seem to have concluded that Hernandez should be judged only on, as an AP writer said, “things he could fully command.”(4) So voters discounted (ignored) his won-loss results and focused on the other measures with ERA, innings pitched and strikeouts weighted most heavily. The sabermetric community is reported to have emphasized Wins Over Replacement Player (WARP), a measure of calculated wins relative to those credited to a player of slightly below average major league quality, often a triple-A player of the sort likely to be called up to replace a player (pitcher in this case).

I love to compare and debate player statistics as much as anyone but I do want to point out something that is too often neglected – baseball statistics don’t really measure individual performance. The pitcher has total command over absolutely none of the factors measured by statistics though his influence is relatively greater in some areas than others. For example, the pitcher is relatively more responsible for his strikeout and walk numbers than for his won-lost record. In this sense, emphasis on granular statistics such as ERA can provide some insight into the relative performance of pitchers.

No one will be able ever to develop a perfect measure of a player’s contribution to his team’s success. Not because of the fact that teams win or lose together, as your old coach told you (although he was correct) and not because it is a difficult analytical challenge but because it is logically and mathematically impossible. This is because a baseball team is engaged in what economists call joint or cooperative production and it is impossible to determine precisely the marginal value of an individual team-member in a joint production setting.

Pitchers are the most obvious examples of joint production of outs and runs. Baseball is fairly unique in that outputs are produced jointly by the defense and offense. A hit or an out is produced jointly by the batter and the pitcher and, in most cases, his fielders(5). In fact, in the early days of the game the pitcher’s explicit function was to put the ball in play by throwing it “for the bat” and then later to throw either above or below the belt as directed by the batter. Today, some batter-pitcher combinations are more “efficient” at producing hits (to the chagrin of the pitcher) and thus the manager is interested in the historical rates of success his batters have had against that day’s opposing pitcher.

The team’s final output is a win or a loss. Wins or losses are the result of the intermediate products called runs and outs. These final and intermediate outputs are produced jointly by the team members whose efforts comprise the team production function. Any team production function involves at least two inputs and is not separable into the sum of two input functions. In other words the output of team members is not additive and this has serious implications for estimating functional relationships. This point, made by Armen A. Alchian and Harold Demsetz in 1972 is not a trivial or esoteric technicality but a fundamental basis of our modern understanding of the nature of firms and the role of managers.

Alchian and Demsetz observed further, “In team production, marginal products of cooperative team members are not so directly and separably (i.e. cheaply) observable. What a team offers to the market can be taken as the marginal product of the team but not of the team members (emphasis added). Clues to each input’s productivity can be secured by observing the behavior of individual inputs…” The impossibility of perfect measures of individual performance and contribution in a team setting is the primary reason that human managers are important; judgment based on experience and observation is a necessary ingredient.

Team or joint output whether it be runs or wins cannot be attributed precisely to the efforts of the individual team members. The fact that the batting order matters is evidence of the joint nature of producing runs. The manager seeks the batting order that maximizes the joint production of his hitters. Therefore responsibility for output or runs is shared. Branch Rickey recognized long ago that runs batted in (RBI) could be a very misleading measure of individual contribution because it depends greatly on the success of the preceding batters.

Statisticians and sabermetricians using regression analysis must assume the separability (additivity) of the baseball production functions. (Even non-linear regression is really a set of linear regressions covering portions of the observed data.) The typical multivariate regression formula seeks to explain the dependent variable, Y, as the weighted sum of the independent variables, xi, where the weights or coefficients, Bi, represent the proportional influence of each independent variable plus a residual.  If the underlying function is not additive then the use of a linear or additive regression technique introduces bias into the estimate. The severity of the bias depends on the underlying form of the production function. Another layer of distortion or bias comes about when sabermetricians use regression coefficients to build composite models such as Wins Above Replacement Player. The true WARP (if such a thing exists) is almost certainly non-linear and so constructing a linear representation of it using biased coefficients based on linear representations of other non-linear functions seems capable of introducing serious distortions into the analysis. I am not saying that the modern statistics are not useful; many are intriguing but all contain a measure of distortion and their use involves some bias and it cannot be said they are necessarily more objective than some of the old stand-bys.

My other, and in many ways more important, concerns about discounting wins and losses are that the pitcher often contributes to a team’s wins in ways that are not evident even with the most granular of statistics. Winning a ball game in the major leagues is very hard, winning around twenty games is extremely hard and to do so a pitcher must overcome a number of specific moments or threats where a single pitch can have significant consequences. Some pitchers seem to make those pitches when they most need to and so win more games than do pitchers with similar skills and statistics. Pitchers’ effort is somewhat elastic; a pitcher with a big lead will tend to bear down less than one in a tight game. This is rational; the return on full effort adjusted for the cost of potential injury or arm fatigue is not as high with a five run lead as with a one run lead and a man on second. This is true of games as well, a pitcher who can win late in the season with a postseason spot on the line or in the postseason with a title on the line has often to perform at a higher level to overcome the extra intensity of his opposition at those times. Those wins, in my opinion, reveal more about the pitcher than do wins earlier in the year. But if we discount all wins as “being beyond the control of the pitcher” we sacrifice that value and information. So, until we have a rigorous and unbiased model of the baseball production function and means of imputing personal performances into that model, I think we should continue to give a lot of consideration to the actual results of games played.


Posted by Bob

Footnotes (for the appearance of scientific rigor):

1.There is some evidence that Twain misattributed the origin of the line to Disraeli, see where Courtney and perhaps Carlyle are suggested as possible antecedents.

2.The Neyer-James model seeks to predict the award winner based on historical evidence and patterns; it does not necessarily reflect either observer’s own judgment as to who should win.

3.Saves almost always accrue only to relief pitchers. The VB term is a bonus to reflect the historical tendency to recognize pitchers from league-leading teams but did not come into play this year.

4.Felix Hernandez wins AL Cy Young, November 18, 2010

5.The catcher is involved in the sense he is necessary for the pitcher to perform his part of the process so it would be reasonable to say a hit involves 3 players. An out, whether by strikeout, force-out, throw-out or caught fly, involves at least three players. Other forms of out, such as pick-offs, usually involve three or more players to produce.


To Extend or Not to Extend

As I write (Nov. 30, 2010) the Congress is about to start debating whether or not to extend all of the Bush era tax cuts or just a subset of them in order to deal with the deficit.  The Republicans want all of the current tax rates to be continued indefinitely, that is, until a real overhaul of the tax code could be tackled.  They appear to be willing to settle for a minimum of a two year extension.  The Democrats, led by President Obama, want the current rates continued except for those making over $250,000.  Senator Chuck Schumer, of NY, put forth the option to extend the rates for everyone making under $1,000,000.  The public, that is, us, prefer to extend them for everyone.


The reasoning underlying the Democrats views’ is suspect at best.  Obama says we can’t afford them, and it will only affect 2% of the taxpayers.  These 2% pay about 45% of all income taxes; the bottom 50% pay 3.5%.  Looked at another way the top 2% are paying 13 times as much as the bottom 50%.  As it stands the top 2% are certainly contributing heroically to funding the government.  Let’s look at the charge that we can’t afford not to raise the rates.  As is usual in government, when one is trying to make a point the cost/savings for multiple years are given because it balloons the figure.  We can’t “afford” maintaining the current rates for everyone else either, on their reasoning.   One factor, and it is a big one, that the Dems have overlooked is that those in the top bracket aren’t going to sit there to be shorn.  They will alter their behavior and reduce their taxable income.  In effect, the increased tax revenue will be less than estimated, by a long shot.  Further, in their efforts to reduce taxable income these individuals will spend resources to avoid taxes rather than devoting those resources to growing output.


We have the spectacle of columnist, Froma Harrop, shrilly saying  that;

“And what business is it of the chairmen — Erskine Bowles, a Democrat, and former Wyoming Sen. Alan Simpson, a Republican — to set an arbitrary (and low) maximum percentage on the tax revenue relative to gross domestic product that our society is allowed to collect? Their job is to find ways to bring down deficits. Period.”

She then goes on to say,

“For all the talk of the painful, painful(!) sacrifices needed to achieve the chairmen’s goal of reducing the federal deficit by $4 trillion through 2020, one thing should be kept in mind: Simply ending all the George W. Bush tax cuts would do the same thing. No one starved in the Clinton era. In fact, people did darn well then. That’s something for Democrats to think about now, before Republicans take over the House and start the fiscal voodoo dance all over again.”

You can’t make this stuff up.  I’m always amused by liberals/ progressives belief that 50.1% of us should be able to tell the other 49.9% what to do when it suits them.  The Bill of Rights were enacted precisely because the Founders recognized that the likes of Ms. Harrop were lurking out there.  Survey after survey shows that most Americans, 70% or more, believe that an individual’s total (State, Local, and Federal) tax burden shouldn’t exceed 25%.  These results hold for every subgroup out there .  Well, almost all.  I’m sure that the polls don’t have subgroups for: clergy (of any denomination); college English professors; carping liberal columnists; or unionized government employees.  These would demand expropriation of all income from those who made more than they did.  This is typically their definition of “the rich”.


Turning to Ms. Harrop’s comparison with the 1990s.  It is a totally inappropriate comparison.   Ms. Harrop confuses correlation with causation. To start with, the economy was still in the glow of the Reagan years.  The benefits of increased investment were still accruing.  Lawrence Meyers, who Clinton appointed to the Federal Reserve Board, had an economic consulting firm that analyzed the Clinton tax increases.  Their conclusion was that the economy grew slower and total taxes collected were lower than they otherwise would have been.  So, in fact, the Clinton higher tax rates weren’t a boon to the economy.  They didn’t appear to be a bad thing because of other decisions that were being made.  The two most important were the election of the Republican majorities in 1994 that slowed the growth of government spending and the slashing of the capital gains tax rate, against Clinton’s wishes, by the way.  These two events, against the backdrop of the Reagan growth agenda of the 1980s more than swamped the negative effects of the Clinton tax increases.


Sen. Schumer defends his proposal by falling back on the most naïve version of Keynesian economics.  He still believes in the concept of the marginal propensity to consume out of current income, fifty some years after Milton Friedman showed that people consume out of permanent income.  He seems to believe that if we just put more money into the pockets of people with high average propensities to consume the economy will grow.  Two problems with that: first, most obviously, we have been doing that for two years and have nothing to show for it; and second, what is being discussed by the Congress is not a tax cut but the prevention of a tax increase, which even Schumer realizes would be a disaster.


The Republicans push for maintaining the current taxes for everyone reflects the understanding that investment and job creation come from those making more than $250,000.  From a supply-side approach, the response to incentives is very disproportionately from the higher income small businessmen and other entrepreneurs.   Lowering marginal tax rates typically doesn’t cause a bank clerk, say, to increase their level of economic activity while it will to a business owner, or potential business owner.


Obama’s attitude was put on display during the campaign when he responded to Joe the Plumber, saying that he was for redistribution.  Raising rates for the so-called rich (m any two income families in NYC would fall into this definition of rich) appeals to his political orthodoxy which trumps his obligation create an environment in which the economy can grow.


When the dust settles where will we be?  I expect that the tax rates for all will be extended for at least two years and as a quid pro quo, unemployment benefits will also be extended for another 26 weeks. It is important to keep in mind that this will only prevent things from getting worse than they are.  In order for the economy to gain real traction, the plethora of mandates and regulations spewing out of the Executive branch must stop and many need to be repealed.  Simply put, regulations have the same effect as taxes but don’t get run through the government income statement.  The EPA, Health and Human Services, and the Dept. of the Interior are loose cannons that are circumscribing our daily lives to our detriment.  The Health Care bill needs to be rescinded and begun anew.  The financial reform legislation, another unread 2000+ page monstrosity, needs to be put in abeyance while cooler heads revisit every provision.   The energy drilling moratoria across the country need to be reassessed.


The vote on the tax rates will give a clear picture if the Congress got the message that the “Tea Party” sent on November 2.  If they didn’t the message , be prepared for another housecleaning in 2012.


posted by Jim