Statistics and Winning Baseball JohnF.McDonald Grinnell College ’65 Grinnell Baseball Team, 1962-65 Professor Emeritus of Economics University of Illinois at Chicago
‘Moneyball’ by M. Lewis Popularizes Sabermetrics • The book (2003) explains success of Oakland A’s in spite of low payroll (1/3 of Yankees). • A’s strategy (Billy Beane) includes: - Draft only college players, hitters based on on-base % and slugging %, pitchers based on getting outs (good data on college players). - Draft hitters, keep them 6 years (draft rules), trade them for more draft choices (J. Giambi). - Sacrifice bunts – bad idea for producing runs - Stolen bases – don’t matter much
Winning Games – Scoring Runs American League Central: 2005 W-L Runs Op. Runs W* Chicago 99-63 741 645 91 Cleveland 93-69 790 642 96 Minnesota 83-79 688 662 84 Detroit 71-91 723 788 75 KC 56-106 701 935 58 Conclusion: Figure out how to outscore opponents by 150 runs. For example, have batters who will score 750 (4.6 per game) runs and pitchers who will hold opponents to under 600 (3.7 per game) runs.
Basic Sabermetrics:Score Runs to Win Games • Winning percentage is function of run differential over the season. A simple version is W = 81 + (R-Ra)/10 White Sox 2005, R = 741, Ra = 645, so W* = 81 + (741-645)/10 = 91 (actual wins, 99) They were good, but they were also lucky. • More complex version: Pythagorean theorem Pct wins = 1/[1+(Ra/R)1.83].
Estimates for 1980, 1982, 1984* Pct Win = .4998 + .1093 (Run Diff/Game) (167) (19) t values, R sq. = .83 [Pct Win = .6 requires Run Diff = 150] Pct Win = .5065 + .1087 Runs/Game - .1102 Opponents Runs/Game [t values; 12,16,13] R sq. = .83 Pct Win = .019 + .964[1/1+(Ra/R)2] (.74) (19) R sq. = .83 * Study conducted by McDonald.
How to Produce Runs:Operations Research Study of Baseball There are 24 possible situations. Mean runs scored from each (2004 major leagues) are: No outs One out Two outs Empty .54 .29 .11 1st .93 .55 .25 2nd 1.16 .71 .34 1st & 2nd 1.47 .96 .46 3rd 1.45 .97 .36 1st & 3rd 1.85 1.22 .52 2nd & 3rd 2.13 1.47 .62 Loaded 2.25 1.59 .81
G. Lindsey Study from 1963 The numbers in the previous table are remarkably stable over time. The original study by George Lindsay (Operations Research, 1963) found basically the same expected run figures for 1959-1960 seasons. His conclusions include: - intentional walks are bad idea - sacrifice runner from 1st to 2nd a bad idea - slugging percentage is good measure So this stuff is not exactly “news.”
What the Table Shows (on average) • Leadoff man gets on 1st, worth 0.64 runs, vs. making an out. • Sacrificing runner from 1st to 2nd reduces expected runs. • Stealing 2nd increases expected runs by .23, but comes at a risk. • Intentional walk increases expected runs. • Leadoff man getting on 2nd is worth an additional .23 runs (compared to leadoff man on 1st). • What else do you see?
Run Production Functions • Many econometric studies of season’s run production as function of batting (singles, doubles, triples, homers, etc.) • Simple equation (J. C. Bradbury) says on-base percentage and slugging percentage tell the story. OBP = (hits+walks+hit by pitch) / (at-bats+walks+hbp+sac flies) SLG = (1x1b)+(2x2b)+(3x3b)+(4xHR)/(at-bats)
Bradbury’s Results for Run Production:1998-2004 NL AL Batting average 3.51 -.32 (not signif.) (not signif.) On-base pct 17.25 21.14 Slugging pct 9.27 9.01 R sq. .91 .93
Conclusions from Bradbury’s Study • Batting average does not matter, given on-base percentage and slugging percentage. • On-base percentage is worth 1.86 times (NL) or 2.35 times (AL) slugging percentage. • On-base percentage was an undervalued skill that Oakland A’s exploited in draft and trades (e.g., Scott Hatteberg in ‘Moneyball.’)
Hakes and Sauer Study* Hakes and Sauer (2006) took direct approach, estimating effects of OBP and Slugging Pct on winning percentage. Their final model (1999-2003) is: Coeff. Std. Error Constant .500 .005 OBP 2.14 .296 OBP against -1.89 .291 Slugging pct .802 .149 SLG against -1.00 .152 R sq. .885 * Journal of Economic Perspectives, Summer 2006.
More Detailed Run Production Function Example of more detailed study (Albert & Bennett) Runs = .52(1b) + .66(2b) + 1.17(3b) + 1.49 (HR) + .35(BB) + .19(Stolen bases) - .11(Caught stealing) (Note Slugging Pct is not quite correct aggregate.) Average for caught stealing is .33, so runs per steal attempt can be computed as .67(.19) - .33(.11) = .127 - .036 = .09
How to Evaluate Pitchers • Batters can be evaluated on their on-base percentage and slugging percentage, but what about pitchers? • The current Sabermetric method is called DIPS – defense independent pitching statistics, which consist of - strike outs per nine innings - walks per nine innings - home runs per nine innings [Pitchers have little control over fielding.]
How Well Does DIPS Work? • The DIPS variables for this season predict a pitcher’s earned run average for next season better than does ERA for this season. • Why? Because the “noise” of earned runs caused by balls put in play hinders discovery of pitcher’s true ability.
Bradbury’s Study of DIPS(all pitchers with 100 innings, 1980-2004) Dependent variable: ERA for next season Coefficient Strikeouts per 9 inn. -.18 Walks per 9 inn. .14 Home runs per 9 inn. .20 Batting ave., balls put in play -1.63 (for team, decimal) R sq. .28 All coefficients statistically significant. Note: BABIP has “wrong” sign. R sq. of ERA and future ERA is .16 (not stat. signif.)
Bradbury’s Results for Current Season’s ERA Coefficient Strikeouts per 9 inn. -.17 Walks per 9 inn. .30 Home runs per 9 inn. 1.42 Hit batters .34 Batting ave., balls put in play 18.16 (team figure, as decimal; e.g., .280) R sq. .77 All coefficients statistically significant.
Hakes and Sauer Test the ‘Moneyball’ Hypothesis • Hypothesis: The valuation of batting skills was grossly inefficient; salaries did not reflect contribution to winning games. • The undervalued skill was the ability to get on base, the skill which has the largest effect on runs and winning percentage. • The hypothesis was confirmed; on-base percentage was not a statistically significant factor in salaries for 2000-03 (but slugging was).
Market Inefficiency Is Corrected Quickly • In 2004 salaries reflected OBP and Slugging from the previous year,with the OBP coefficient 1.69 times the size of the Slugging coefficient. • Value of one-std. deviation increase (in millions of dollars) 2000-03 2004 On-base .165 .49 Slugging .62 .61
But Why Didn’t the A’s Win the World Series under Billy Beane? • Oakland A’s have averaged 96 wins per season since 2001, but have had no success in the post season. • Study by Silver and Perry (2006) shows that post season success is a result of pitching and defense; strength of closer, strike out rate of pitching staff, and “fielding runs above average.” Regular season offense has no correlation. • FRAA is number of runs a fielder saved for his team, compared to average player at same position.
Lessons for Hitters • Go up to bat with a plan. Know which pitches you can drive, and where you can drive them. (I liked to drive pitches on outside half of plate between thighs and letters to right-center.) With less than two strikes, don’t try to hit pitches you don’t hit well. • Practice hitting foul balls so you can “protect the plate” with two strikes. • Know the strike zone so you can take balls. Take a walk – avoid making an out. • The lead-off man in an inning needs to get on base rather than making an out. It’s worth 0.64 runs (in the major leagues).
Lessons for Pitchers • Get the first batter out. • Find out which pitches your guys hit well – the opponent’s guys probably are similar. Then avoid throwing those pitches… • Find out which pitches your guys can’t hit. Throw those. • Avoid walks.
Does This Stuff Work for Other Levels of Baseball? Research Project Use your own game box scores to compile the Operations Research table. Are the results similar to those for major leagues? For example, what are average runs scored for: - nobody on, nobody out - runner on first, nobody out - nobody on, one out And so on… My guess is that the patterns are similar to major leagues, except that stealing a base may be a better option in Division III baseball because fewer catchers can throw out a fast runner.
References Bennett, J. and J. Flueck, “An evaluation of major league baseball offensive performance models,” American Statistician, Feb. 1983 Bradbury, J. C., The Baseball Economist. Dutton, 2007. By the Numbers, on-line sabermetrics journal. Hakes, J. and R. Sauer, “An economic evaluation of the moneyball hypothesis,” Journal of Economic Perspectives, Summer 2006, pp. 173-185. Keri, J. ed., Baseball Between the Numbers. Basic Books, 2006. Lewis, Michael, Moneyball. Norton, 2003. Lindsay, George, “An investigation of strategies in baseball,” Operations Research, Jul.-Aug. 1963, pp. 477-501. Silver, N. and D. Perry, “Why doesn’t Billy Beane’s s…work in the playoffs?” in Keri, ed., pp. 352-368.