Download Presentation
## Chapters 1-9

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**AP Statistics Exam Review**Chapters 1-9**Chapter 1-2 Topics**• Data types – Categorical vs. Quantitative data • Displaying Quantitative data • with graphs • with numbers • Distributions • percentile, frequency / relative frequency • Measure of center – mean vs. median • Normal distributions (Z-tables)**Data Types**• Data types • Categorical variable – groups or categories • Quantitative variable – numerical valuesNote: not ALL variables that include numerical values are quantitative(e.g., Zip code is categorical – you don’t “average” a zip code) Discrete – finite set of options, countable Continuous - measureable**Displaying Categorical Data**• Ways to Display Data • Pie charts, Bar Graphs, Histograms • Dot plots, Stemplots • Frequency (Cumulative Relative Frequency) tables • Know the differences as well as the PROS and CONS of each type of chart**Displaying Categorical Data**• Pie charts Bar graphs Good for aggregate comparison Quick visual glance among categories Very little detail on individual values Can be difficult to qujickly determine 2nd-4th place values quickly Good for showing max/min values Quick, simplistic view of data Does not show trends/shape, because data is categorical**Displaying Categorical Data**• Histograms Dot plots Good for showing max/min values Shows data trends, shape Shows frequencies for data values Can be time consuming to create Can only approximate IQR values Good for showing mode, max/min values Shows data trends, shape Quick to create Does not show specific values Cannot truly determine IQR values**Displaying Categorical Data**• Stemplots, Side-by-Side Stemplots Shows specific values, gaps, outliers Shows data trends, shape Side-by-side stemplots good for aggregate and detailed comparisons of two variables Able to calculate all key data values (e.g., IQR) Large sets of data require split stemplot, which canpresent misleading conclusion of shape Must sequence the data to calculate IQR, time consuming (if not done with calculator)**Displaying Categorical Data**• Frequency table vs. Cumulative Relative Frequency tables Shows complete set of data (aggregate 0 to 100% cumulative) Can identify IQR values (or estimates) Can be time consuming to create Relative frequency… 11 breaks occurs at 20% 14 breaks occurs at 40% Cumulative Relative frequency… Cumulative frequency between 11 to 14 breaks occurs is 20% (40-20%)**Distributions**• Cumulative Relative Frequency Distributions Considerations: Cumulatively adds to 100% Can be presented with bar graph, histogram, or displayed in table May have to calculate cumulative values, then graph**Displaying Categorical Data**• Considerations: • Overall pattern • Shape – unimodal/bimodal, uniform, symmetricskewed right/left • Center – median, mean (which is better?) • Spread – values (range) of data • Outliers, gaps • Single vs. Back-to-back stemplots • Descriptive phrases, qualifying adjectives (somewhat, slightly)=> try to state in one cohesive, tight sentence CUSS: Center Unusual Features Shape Spread**Quantitative Data with Numbers**Mean and Median comparison applet, from the book: http://bcs.whfreeman.com/tps4e/#628644__666394__ • Mean vs. Median • Mean (average), Median (true midpoint) of data set • Note: median does not need to be included in the data set • Median – Arrange values in order of size, small to large • More on calculating Q2, IQR on next slide2 • Depending on the data, either the mean/median can be a better indicator of the ‘center’ • Skewed right Skewed left (mean > median) (mean < median) Remember: the mean “follows the tail…”**Quantitative Data with Numbers**• Measuring the spread: quartiles • 1st quartile, median, 3rd quartile • Interquartile Range, IQR = Q3 – Q1 • 5-number summary: min, Q1, median, Q3, max • Check for outliers: “1.5 x IQR” rule… • Add ‘1.5 x IQR’ to Q3 to determine max outliersSubtract ‘1.5 x IQR’ to Q1 to determine min outliers • Notes: • must arrange numbers from small to large • Q1 –median of 1st half of the values (left of median) • Q3 –median of 2nd half of the values (right of median) • Calculator can do the entire 5-number summary (1-variable statistic)**Quantitative Data with Numbers**• Calculating IQR… • With even set of numbers • 1 2 3 4 5 6 • Middle data point (median) is between 3rd and 4th term (3.5) of 6 • Divide data set into halves, 3.5 is NOT included in each half • 1st half median (Q1) is 2 • 2nd half median (Q3) is 5 • With an odd set of numbers • 1 2 3 4 5 6 7 • Median is 4th term (4) of 7 • Divide data set into halves, 4 is NOT included in each half • 1st half median is 2nd term, 2 • 2nd half median is 2nd term, 6 Note: If the number of terms is divisibleby 4, then Q1/Q2/Q3 fall after the 1st , 2nd, and 3rd ¼ths of the numbers(e.g., 8 values, after 2nd, 4th, 6th value)**Quantitative Data with Numbers**• Interpreting Boxplots Considerations: Many ways to display boxplots Legend, scale Median line, labeled IQR values (box plot) Actual data points “Whiskers”, no whiskers, Outliers**Quantitative Data with Numbers**• Creating Boxplots • Data set: 40 68 77 80 87 89 90 90 91 92 93 93 95 98 • Calculate 5-number summary, and IQR • Min = 40, Q1 = 80, Q2 = 90, Q3 = 93, Max = 98 • IQR = 93 – 80 = 13 98 40 80 90 93**Quantitative Data with Numbers**• Creating Modified Boxplots • Data set: 40 68 77 80 87 89 90 90 91 92 93 93 95 98 • Use IQR of 13 to calculate Outlier-Min/Max values • Outlier-Min = 80 – 1.5*13 = 60.5… 40 is outside outlier-min, and is considered an outlier. • Outlier-Max = 93 + 1.5*13 = 112.5… no data points above outlier-max, no max outliers. • Choose 1st min value inside Outlier-Min limit (68) • 68 becomes your modified MIN Note: only the MINchanged… not theremaining 5-numbersummary values *40 98 68 80 90 93**Distributions**• Density Curves • Describes overall patternof distribution • Area under curve is proportion of all observations that fall in the interval Considerations: Always on/above horizontal axis Area under curve = 100% (or 1)**Normal Distributions (Z-tables)**• Density Curves • Know these VALUES – 68%, 95%, 99.7% ! • Z-scores are multiples of standard deviations away from the mean**Normal Distributions (Z-tables)**• Density Curves Helpful to know these specific values, but can derive them from 68/95/99.7empirical values Note: these are ‘rounded’ percentages,exact values found from the Z-tables or calculator Be able to calculate any combination of standard deviation values(using the empirical rule, 68 / 95, 99.7%) e.g., from Z = -2 to Z =1… 13.5 + 34% + 34% = 81.5% from Z = -1 to Z = 3… 34% + 34% + 13.5 + 2.35% = 83.85%**Normal Distributions (Z-tables)**• Remember to note whether the question is asking for the probability of LESS or GREATER than a particular value… • Z-score values will always give you the value to the LEFT of the line.**Chapter 3-5 Topics**• Data correlation • Scatter plots, residual plots • Least-squares regression line (LSRL) • Sampling and Surveys • Sampling methods • Table of random digits • Issues with sampling • Experiments vs. Observational studies • Lurking, confounding variables • Experimental design – block, matched pair • Placebo effects**Data Correlation**• Scatter plot diagrams • Data plot among2 quantitative variables • Describe with:Direction, Strength, Forme.g., moderately strong, positive correlationMore on this with discussion of R and R2**Data Correlation**Income • Scatter plot diagrams • Overlay a Least-Squares Regression Line (LSRL)ŷ = a + bXŷ= 27.996 + 1.1616Xto the scatterplot • a: y-intercept, b: slope • Note: put the LSRL equation in full context of the probleme.g., at 0 years of experience, income = $20,000for every year of experience, income goes up $1800 Years of experience**Data Correlation**• Scatter plot diagrams • Once a LSRL is established, a Residual plot can be created • Looking for no recognizable pattern in the Residual plot (suggests: linear equation is a good model for this data set) Income Years of experience**Data Correlation**• Residuals • Residual = Observed – Predicted (Expected)Residual = y – ŷ • Note: residuals can be + or – values.Residual = -9 for years = 4Residual = +5 for years = 6 Income Years of experience**Data Correlation**• Correlation Coefficient, R, measures how well the 2 quantitative variables are correlated (measured in strength, direction) • R = + 0.858, strong positive correlationNote: R can be +/-, depending on the slope of line • Coefficient of Determination, R2, measures how much variation (data variance) is described by the line • R2= 0.736373.63% of the datavariance can be described by the line Income Years of experience**Regression and Correlation**• When doing linear regression, or establishing data correlation (among 2 variables), be sure to check for the residuals • Calculate residuals for each point (if asked) • Create a residual plot (provided by the calculator) • Use the residual as 1 of your ‘checking points’ to determine if a linear regression (i.e., line) is the appropriate equation for the data set**Regression and Correlation**• What if the residual has a pattern? • This indicates that a linear equation is possibly not the best fit for the data set • In this scenario, choose additional equation-types: • ŷ = a + b*Log(X) • Log(ŷ)= a + b*X • Log(ŷ)= a + b*Log(X) • And others… • For each equation-type, check the following: • Scatterplot • R and R2 values • Residual plot • Compare these to the original equation, values, and charts and determine which is the optimal equation-type**Sampling and Surveys**• Population vs. Sampling • Population – the entire data set of interest • Sample – a subset of the data set • Must read the problem carefully to assess what the population and sample is • Parameters vs. Statistic • Parameters are associated with populations • Statistics are associated with samples**Sampling and Surveys**• Types of sampling • Convenience – “closest to you” • Voluntary response – people voluntarily participate • Simple Random Sample – organized, randomness • Stratified Sample – groupings with logic • Clustered – grouping typically by area • Systematic – grouping with a consistent numeric sampling (e.g., every 7th person)**Sampling and Surveys**• How could you conduct sampling of the Toyota Center via the following methods? • Convenience • Voluntary response • Simple Random Sample • Stratified Sample • Clustered • Systematic Toyota Center Seating Chart**Simulation**• Steps to Define and Run a Simulation • Define variables • Define criteria for random sampling • Use criteria to go through Random Digits table • Calculate observed sampling values (probability of the sampled event) • Compare to expected values, and make a concluding statement with respect to the simulation 10480 15011 01536 02011 81647 91646 69179 14194 62590 36207 20969 99570 91291 90700 22368 46573 25595 85393 30995 89198 27982 53402 93965 34095 52666 19174 39615 99505 24130 48360 22527 97265 76393 64809 15179 24830 49340 32081 30680 19655 63348 58629 42167 93093 06243 61680 07856 16376 39440 53537 71341 57004 00849 74917 97758 16379**Simulation**• Example… flipping a coin. Steps: • Define variables: 0-4 Heads, 5-9 Tails • Define criteria for random sampling: Start at row 1, analyze 1 row of data • Use criteria to go through Random Digits table: In 1st row, 39 0-4 values observed • Calculate observed sampling values (probability of the sampled event): 39/70 = 0.557 (55.7%) • Compare to expected values, concluding statement: we would expect to see 50%. Varies because of randomness and sample size. 10480 15011 01536 02011 81647 91646 69179 14194 62590 36207 20969 99570 91291 90700 22368 46573 25595 85393 30995 89198 27982 53402 93965 34095 52666 19174 39615 99505 24130 48360 22527 97265 76393 64809 15179 24830 49340 32081 30680 19655 63348 58629 42167 93093 06243 61680 07856 16376 39440 53537 71341 57004 00849 74917 97758 16379**Random Digit Tables**• Random digit tables can be used in many different types of problems • Sampling, surveying • Experimental (block) design • Probability • Inference (including confidence intervals) • If problem asks for ‘designing a random’ situation, describe all the steps (previous slide) 10480 15011 01536 02011 81647 91646 69179 14194 62590 36207 20969 99570 91291 90700 22368 46573 25595 85393 30995 89198 27982 53402 93965 34095 52666 19174 39615 99505 24130 48360 22527 97265 76393 64809 15179 24830 49340 32081 30680 19655 63348 58629 42167 93093 06243 61680 07856 16376 39440 53537 71341 57004 00849 74917 97758 16379**Sampling and Surveys**• Potential Sampling Issues • Under-coverage, non-response • Bias, wording of the question**Chapter 6 Topics**• Probability • Law of averages, Law of Large Numbers • Randomness, Simulations • Probability Rules • Probability Model – Sample space S{ }, event • Two events – mutually exclusive (disjoint), union/intersection, Addition rule • Two-way tables, Venn Diagrams • Conditional probability • Tree diagrams, Multiplication rule**Probability**• Probability – the likelihood (chance) of a particular outcome to occur (0-1 or 0-100%) • Short-run (independent, random) vs. Long-run (law of large numbers) • Myth of ‘law of averages’**Probability**• Simulation – imitation of chance behavior, based on a model that accurately reflects the situation • State (the question of interest) -> Plan (determine chance device) -> Do (repetitions) -> Conclude • e.g., Use of Table of Random Digits 10480 15011 01536 02011 81647 91646 69179 14194 62590 36207 20969 99570 91291 90700 22368 46573 25595 85393 30995 89198 27982 53402 93965 34095 52666 19174 39615 99505 24130 48360 22527 97265 76393 64809 15179 24830 49340 32081 30680 19655 63348 58629 42167 93093 06243 61680 07856 16376 39440 53537 71341 57004 00849 74917 97758 16379**Probability Rules**• Probability Model • Event – any collection of outcomes from some chance process • Sample space, S { } – all possible outcomes • Probability of each outcome, P (A) • Mutually exclusive (disjoint) events P(A or B) = P(A) + P(B) • Example: Rolling 2 diceSample space, S {1-1, 1-2…, … 5-6, 6-6}36 possible outcomes, each equally likely • Comments: • Make sure and identify the ENTIRE sample spaceCan shorten by use of “…” if a pattern is formed • Remember: all possible outcomes = 100% • Note: It may be easier to calculate the Complement of an event P(Ac)**Probability Rules**• Two-way Tables, Probability • Example: How common is pierced ears among college students? • Comments: • Calculate totals (if needed) • May need to convert to % • Calculations: • Mutually exclusive (Addition rule)e.g., P(Male + Yes) = P(Male∩ Yes) = 19/178 = 10.7%reads… Male AND Yes => “inside the table” • Not-mutually exclusive (Addition rule)e.g., P(Male or Yes) = P(Male∪ Yes)= P(Male) + P(Yes) – P(Male ∩ Yes)= (90+103-19) / 178 = 174 / 178 = 97.8%same as P (Female ∩ No)c = 1 – (4/178)reads… Male OR Yes => subtract double-count of totals**Venn Diagrams**• Graphical displays of events, with probabilities Mutually exclusive(disjoint) Intersection Union A ∩ B**Venn Diagrams**• Continuing with our example… • Note: You cannot diagram ALL of the combinations • Diagram the topic ofinterest (Males / ears) Notes: Male is comprised of 71 (male only + No Pierced) + 19 (male AND Pierced) = 90 All 3 values (71, 19, 84) will be included INSIDE the table Male Pierced Ears? 19 71 84**Conditional Probability**• Conditional Probability – the likelihood of an event GIVEN another event is already known (or has occurred) • Examples:Find the likelihood… • Has pierced ears given that he is maleP(Yes I Male) = 19 / 90 = 21.1% • Is female given that she has pierced earsP(Female I Yes) = 84 / 103 = 81.6%**Tree Diagrams**21.1% 10.7% 50.7% => Know how to create a Tree Diagram, and calculate % values from a 2-way table 40.0% 78.9% 95.4% 47.2% 49.3% 4.6% 2.1%**Chapter 7 Topics**• Discrete vs. Continuous variables • Discrete random variables • Discrete – fixed set of possible values • Expected value (mean, ) • Variance ( ) • Standard deviation ( ) • Continuous random variables • Continuous – all values within an interval(e.g., set of all numbers between 0 and 1) • Z-curves, Z-table distributions • Probability distributions • All possible values and their probabilities (extension of Probability Model: Sample space, event) • Transforming random variables • Add/Subtract/Multiply values for mean, variance, standard deviation**Variables and Distributions**• Probability Distribution • All possible values + their probabilities (extension of Probability Model: sample space, event) • P(0,1,2) = P(0)+P(1)+P(2) = 0.01+0.05+0.11 = 0.17 • P(>4) = P(5)+P(6)+P(7) = 0.23+0.17+0.10 = 0.50 • P(>1) = P(<1)c = 1 – [0.01+0.05] = 1-0.06 = 0.94**Discrete Variables**• Discrete random variables • Expected value (mean) • Variance • Standard deviationExpected value is… Believe it or not, they might ask you to calculate these! (or, you can use your calculator)**Discrete Variables**• Discrete random variables • Expected value (mean)**Continuous Variables**• Continuous random variables • Area under the curve, probability of an event • Remember: • X values are ALONG the curve • Z-values are ‘normalized’ standard deviation values • The +/- 1, 2, 3 values are standard deviations away from the mean • Percentiles correspond to the area under the curve (50% is mean; 84% is +1standard deviation mean)**Continuous Variables**• Z-values… • Z-scores are ‘mirrored’ +/- from the mean values • Z-scores give values to the left of the line • P(Z > 1.4), find the z-value, then subtract from 1 • To find a probability within 2 values, calculate two z-scores and find (subtract) the difference