AP Statistics Final Project: Comparing Pitchers & Hitters in the 2017 Season

AP Statistics Final Project: Comparing Pitchers & Hitters in the 2017 Season Samuel K. Haney AP Statistics 2017-18 Period 6

Background & Introducing Research Question • Major League Baseball has two sides: there are fierce hitters that attempt to get on base and score runs for their team, and there are pitchers that try to get these hitters out with crafty tactics and skills. But how can the two be compared? • Baseball fanatics will know that it’s not incredibly common for a pitcher to maintain an earned run average (9 × earned runs given up / innings pitched) below 3.50 in a full season, and above-average hitters will have a batting average above .275 (hits in at least 27.5% of their at-bats) throughout a whole season. • So, with these two sides in one iconic sport, how do the two sides compare in terms of excellent performance? • MAIN RESEARCH QUESTION: How do the proportions of hitters that hit at/above .275 compare to the proportion of pitchers with an ERA of 3.50 or lower in the MLB? Boston Red Sox slugger Mookie Betts

Collecting Data (Intro) • For this research question, I need to make this exploration RANDOM, NORMAL, and INDEPENDENT. • To make this data as RANDOM as possible, I plan on utilizing an MLB team randomizer to randomly select a team, and then randomly selecting a starting pitcher/hitter on that team from the 2017 campaign from a random “drawing” (without replacement of players). I will do this 25 times for the pitchers, and then 25 times for hitters to ensure INDEPENDENCE and to gain NORMALITY (via the Large Counts Condition). • I will construct a 95% confidence interval in a one-proportion z interval to find 95% confidence for the proportions of the hitters that hit at/above a .275 average and the pitchers that pitch at/below a 3.50 earned run average, and then compare their intervals to analyze the “excellent performance” on both sides. LA Angels legend Albert Pujols

Data Collection (five starting pitchers, page 1/5) Number of starting pitchers who have an ERA of 3.50 or lower here: 4

Graphical Analysis of Starting Pitching Data (Histogram) 40% of the pitchers that I randomly selected satisfied the 3.50 or lower earned run average in the 2017 season, whereas the other 60% did not.

Analysis of Starting Pitching Data (moving into “P.A.N.I.C.”) ★ PARAMETER: “The proportion of all Major League Baseball pitchers who started at least one game in 2017 who have an ERA of 3.50 or under.” ★ CHECKLIST FOR ASSUMPTIONS: • Is the data randomly selected/randomized? YES • Is the data approximately Normal (Large Counts Condition)? YES (25)(.4) = 10 (equal to 10) (25)(1-.4) = 15 (greater than 10) 3) Is this data independent (10% Condition)? YES Fangraphs.com: There were 270 pitchers that made at least one start in 2017. Ageless starting pitcher Bartolo Colón

Carrying out “P.A.N.I.C.” for Starting Pitching Data ★ NAMING THE APPROPRIATE PROCEDURE: Since all the conditions are met, we can utilize a one-proportion z interval to determine the proportion of all MLB pitchers that made at least one start in 2017 that had an ERA of 3.50 or under. ★ INTERVAL: Since I am constructing a 95% confidence interval, the critical value for this equation is 1.96. .4 ∓ (1.96 × √.4(1-.4)/25) = ? With subtracting, the lower value of the interval is approximately .208 (20.8%), whereas with adding, the upper value of the interval is approximately .608 (60.8%).

Concluding the Starting Pitching “P.A.N.I.C.” Process ★ CONCLUSION IN CONTEXT: “We are 95% confident that the interval from 20.8% to 60.8% will capture the true proportion of all Major League Baseball pitchers who started at least one game in 2017 who have an ERA of 3.50 or under.” Quick evaluation: This is a big gap (40%). Maybe next time, a bigger sample size of starting pitchers could help capture the parameter in context with more confidence. J.A. Happ, Toronto Blue Jays starting pitcher

Beginning Data Collection for Hitters (ten hitters, ⅓)

Data Collection for Hitters (ten hitters, ⅔)

Data Collection for Hitters (five hitters, 3/3) Total number of hitters who batted at/above .275 in this data: 11 out of 25

Graphical Analysis of Hitter’s Data (Histogram) Approximately 44% of the hitters that I randomly selected satisfied the batting average of .275 or higher in the 2017 season, whereas the other approximate 56% did not.

Analysis of Hitter’s Data (moving into “P.A.N.I.C.”) ★ PARAMETER:“The proportion of all Major League Baseball hitters who played in at least one game in 2017 and has a batting average of .275 or higher. ★ CHECKLIST FOR ASSUMPTIONS: • Is the data randomly selected/randomized? YES • Is the data approximately Normal (Large Counts Condition)? YES (25)(.44) = 11 (greater than 10) (25)(1-.44) = 14 (greater than 10) 3) Is this data independent (10% Condition)? YES It is safe to assume that at least 250 hitters had one at bat in the 2017 season. Former Twins outfielder Carlos Gómez

Carrying out “P.A.N.I.C.” for Hitter’s Data ★ NAMING THE APPROPRIATE PROCEDURE: Since all the conditions are met, we can utilize a one-proportion z interval to determine the proportion of all MLB hitters with at least one MLB at bat in 2017 that had a batting average of .275 or higher. ★ INTERVAL: Since I am constructing a 95% confidence interval, the critical value for this equation is 1.96. .44 ∓ (1.96 × √.44(1-.44)/25) = ? With subtracting, the lower value of the interval is approximately .2454 (24.54%), whereas with adding, the upper value of the interval is approximately .6346 (63.46%).

Concluding the Hitter’s “P.A.N.I.C.” Process ★ CONCLUSION IN CONTEXT: “We are 95% confident that the interval from 24.54% to 63.46% will capture the true proportion of all Major League Baseball hitters with at least one MLB at bat in 2017 who have batting averages of .275 or higher. Quick evaluation: This is another big gap (38.92%). Next time, a bigger sample size of hitters could help capture the parameter in context with more confidence. 2017 MLB American League MVP Jose Altuve

Comparison of the Two Intervals P = Pitching interval, H = Hitting interval

Evaluation/Analysis of Findings & Conclusion From this data, it appears that the proportions of all Major League Baseball hitters who played in at least one game in 2017 and has a batting average of .275 or higher and all Major League Baseball pitchers who started at least one game in 2017 who have an ERA of 3.50 or under are very similar from this data. The intervals both had approximately .4 within them, and both had nearly the same exact amount of successes in the data somehow. After this, though, it seems as though a larger sample size should have been conducted due to very large (and potentially inaccurate) intervals - these numbers most likely don’t reflect the true proportions that I was exploring. If there was a “next time” in this experiment, I would instead randomly select, perhaps, 100 hitters to see how the intervals compare then. NY Mets’ ace Jacob deGrom

Sources (URLs) • https://www.baseball-reference.com • https://www.randomlists.com/random-mlb-team • https://www.mlb.com • https://www.fangraphs.com/leaders.aspx?pos=all&stats=sta&lg=all&qual=10&type=8&season=2017&month=0&season1=2017&ind=0&team=0&rost=0&age=0&filter=&players=0&page=9_30

AP Statistics Final Project: Comparing Pitchers & Hitters in the 2017 Season