370 likes | 592 Views
Chapter 5.1 Producing Data. Producing Data. 5.1 Designing Samples Populations vs. Samples, Survey inquiries Types of samples: Voluntary Response, Convenience sampling, Simple random samples (SRS), Stratified samples, Clustered, Systematic Simulations, use of Table B (Random Digit Table)
E N D
Producing Data • 5.1 Designing Samples • Populations vs. Samples, Survey inquiries • Types of samples: Voluntary Response, Convenience sampling, Simple random samples (SRS), Stratified samples, Clustered, Systematic • Simulations, use of Table B (Random Digit Table) • Cautions with sampling • 5.2 Designing Experiments • Observational Studies vs. Experiments • Experimental Design – Subjects, Factors, Levels • Block Design, Matched Pair Design • 5.3 Simulating Experiments
Designing Samples • If one was curious about… • What % of the adults in the city of Houston consider themselves “conservative”? • What is the average family size in Texas? • What % of the college applicant pool is female? • What is the average starting salary for undergraduate STEMS (Science Technology Engineering Math) majors? • How many hours per week do teachers spend grading? • What is your favorite color?! Etc., etc. • What are the challenges to answering these questions? • How can we work around these data gathering ‘limitations’? • Sampling…
Surveys • A survey is a systematic method to collect information. • Surveys can take on many different methods and approaches • In-person, self-administered, conducted via phone/mail • Individual, small group, large group, aggregate population • Research-based, questionnaire, general inquiry • Open-ended surveys • Question/inquiry which leaves responses open to the responder • Generally straight-forward questions with no multiple parts • Close-ended surveys • Question/inquiry which forced responses from individuals • Different types • Surveys can include multiple parts, some open/closed • Source: “Desigining Surveys that Count”, Therese Seibert, PhD
Surveys • Close-ended surveys can be: • Multiple choice, Yes/No responses • Categorical (nominal; i.e., word categories) • e.g., How many times have you been to the movies this year? • Ordinal (i.e., rankings), examples: • Strongly Agree / Agree / Disagree / Strongly Disagree • Excellent / Good / Fair / Poor • Always / Very Often / Fairly Often / Sometimes / Almost Never / Never • Completely satisfied, Very satisfied, Somewhat satisfied, Somewhat dissatisfied, Very dissatisfied, Completely dissatisfied • Once/day, once/week, several times/week
Surveys • From our In-class Activity…
Surveys • From our In-class Activity…
Designing Samples • Population vs. Sample • Population – the entire group of individuals that we want information about • Sample – a part (or subset) of the population that we actually examine in order to gather information • Sampling involves studying a smaller group (a part), in order to gain information about the entire population (the whole) • A census attempts to contact EVERY individual in the entire population • requires complete accuracy, where sampling would be not be adequate (e.g., 2010 USA population census) • Simulation – an alternative method to sampling the population, often used when polling the population is not logistically possible or inconvenient
Designing Samples • Types of Sampling • Voluntary response – sampling technique where individuals choose to participate by responding to a general appeal (i.e., they volunteered to participate) • Convenience sample – sampling technique which chooses the individuals easiest to reach (i.e., convenient to sample) • Simple Random Sample (SRS) – sampling technique where each individual of the population has an equally likely chance to be selected, determined by a specific SRS sampling process • Stratified random sample – where samples are taken from segmented/divided subsets of the population (called strata), grouped into logical units • Clustered random sample – where samples are taken from segmented/divided subsets of the population, grouped into equal numbered units • Systematic sample – sampling technique which follows a consistent, systematic approach (i.e., every 10th person)
Common Technique Used in Sampling • One approach to sampling, used frequently in simulations, is to use a Random Digit Table • Referred to as Table B (located in the back cover of your text) • Structure: Groups of randomly listed numbers in many rows • Purpose: Scan the list of random digits to simulate a particular experiment, study, or simulation • Steps to use the Random Digit Table… • Understand the question at hand (i.e., desired sampling), and the possible outcomes • Define and design ‘variables’ appropriate to mimic the total possible outcomes, and the desired sampling results • Define the criteria to scan thru the list of values (e.g., 1 digit, 2) • Once defined, run the simulation using the defined variables approach and calculate the observed percentages
Random Digit Table Example • Simulate picking a Texas senator from the U.S. Senate • Assess problem: 100 senators in total, 2 from Texas • Define variables: • Use the entire data range from 00-99 to represent the 100 senators • Set: 00-01 = Texas senators, 02-99 = all other states’ senators(alternatively, 01-02 = Texas, 03-99,00 = non-Texas… starting with 00 is more common though) • Since variables are defined with 2 digits, look at the set of numbers within the table in groups of 2 • Run the simulation • Looking at the 1st 2 rows from the Random Digit Table (e.g., lines 101, 102), scan the rows looking for the 00-01 and 02-99 numbers above • Record each occurrence of 00-01 (representing Texas senators) and 02-99 (non-Texas senators) • When done with the 1st 2 rows, calculate the overall % of occurrences of 00-01 (Texas) • Compare with the expected result of 2% (2/100)
Random Digit Table Example • Simulate picking a Texas senator from the U.S. Senate • Counting the number of occurrences of 00-01 on the previous page… • Result: ONE occurrence found (on 2nd row)… • 40 pairs of possible 2-digit combinations in the 1st 2 rows • Simulation percentage: 1/40 = 2.5%(interestingly, fairly close to expected value of 2%!)
Random Digit Table Example • Simulate picking a member from Texas out of all of the US House of RepresentativesNote: If you have forgotten, there are 435 US HofR members! (Texas has 36) • Steps: • Establish your simulation: • Define variables • Set your variable criteria (including omitted numbers, if applicable) • Run the simulation: • Establish starting point and rows used to simulate • Run simulation • Identify number of occurrences of your defined set • Calculate simulated percentage, compare with expected results
Random Digit Table Example • Simulate picking a US House of Representative member • Establish your simulation: • Define variables (note: 435 US HofR members)Define: US HofR member (435), or not… Within that, Texas HofR member (36), or not…We’ll use 870 as our ‘factor’ for criteria, omitting numbers > 870 • Set your variable criteria (including omitted numbers, if applicable)Define: 000-071 = Texas HofR member072-869 = Not from Texas, but a US HofR memberOmit 870-1000
Random Digit Table Example • Simulate picking a US House of Representative member • Run the simulation: • Establish starting point and rows used to simulateStart on row 1, look at groups of 3-consecutive digits, search 3 rows (because of 3 digits (good rule of thumb) • Run simulation • Identify number of occurrences of your defined setLooking at the 1st 3 rows, I found the following:2 occurrences of 000-069, 32 occurrences of 070-869, and 6 omissions • Calculate simulated percentage, compare with expected results2 / 34 = 5.88% (compared to expected 36 / 435 = 8.27%)Note: did not include the 6 omissions in the calculation
Cautions Related to Sampling • Things to look out for during sampling… • Bias – when a sampling process systematically favors certain outcomes • Impact: Removes the randomness of the sampling, affects overall results • Undercoverage – occurs when some groups in the population are left out of the process of choosing the sample • Impact: Under-representation of some/many groups • Non-response – occurs when an individual chosen for the sample cannot be contacted or does not cooperate • Impact: Skews the results of the data • Response bias – when the behavior of or response by the respondent is affected, due to the circumstances of the sampling=> Responder could lie in an effort to avoid a sensitive topic or unpopular behavior • Wording of the question – how the question/survey is worded can impact the effectiveness of the sampling technique and results=> Poorly worded questions, leading questions
Producing Data • 5.1 Designing Samples • 5.2 Designing Experiments • Components of an Experiment – • Experimental units, Subjects, Treatments • Factors, Levels • The Placebo Effect • Principles of Experimental Design • Block Design, Matched Pairs Design • 5.3 Simulating Experiments
Observational Studies vs. Experiments • Observes individuals and measures variables of interest but does not attempt to influence the responses. • Characteristics: • Curiosity for information • Quick data gathering • Can cover a wide range of data, from a number of sources • Typically results in ‘observing’ or ask questions (ex. surveying) • Imposes some type of treatment on individuals in order to observe their responses • Characteristics: • An objective is to ‘impose change’ (by experimenting) • Typically more formal – i.e., has a structured approach/design • Interested in observing the response (to an experiment) Observational Study Experiment
Experiment Design Terms • A study becomes an experiment when something is actually done (witnessed, tested, analyzed) to people, animals, objects. • Individuals on which the experiment is conducted are the experimental units (called subjects when humans) • A specific experimental condition applied to the units is a treatment. • The explanatory variable in an experiment is called a Factor • When several factors are considered, a specific value (level) can be administered for each factor
Subjects, Factors, Levels • An example… Does regularly taking aspirin help protect people against heart attacks? (example 5.9, p. 290)The Physicians’ Health Study looked at the effects of two drugs – aspirin and beta carotene – administered by 21,996 physicians • Subjects – 21,996 physicians • Factors (2) – aspirin, beta carotene • Levels (2) – yes/no (for each factor) Factor 2: Beta Carotene Factor 1: Aspirin
Experiment Design Terms • Experiment terms • Control group – group of patients who receive an alternative treatment (possibly a placebo), to control the effects of outside variables • A placebo(‘fake pill’ given to simulate taking a pill or medicine) can be introduced to further control the use of particular medicines • Placebo Effect – effect on an experiment of using a placebo in a controlled setting Statistically significant – an observed effect so large that it would rarely occur by chance • Double-blind Experiment • An experiment conducted where neither the subjects nor the people who have contact with them know which treatment a subject received • Why would one run a double-blind experiment? • Avoids unconscious bias by …the doctor (who maybe feels a placebo can’t benefit a patient) and the subject (unaware of what has been administered)
Experimental Design • Features of Experiments • Give good evidence of causation • Allow us to study specific factors while controlling the effects of lurking variables • Principals of Experimental Design Control Randomize Replicate Control lurking variables, by comparing 2+ treatments Using chance to assign experimentalunits to treatments Repeat procedureto reduce chancevariation in results
Block Design ** At the end, analyze all results… • Outline of a Block Design: Therapy 1 Group 1 Random Assignment Men Group 2 Therapy 2 Group 3 Therapy 3 Subjects Group 1 Therapy 1 Random Assignment Women Group 2 Therapy 2 Group 3 Therapy 3
Block Design example • Suppose the SLHS cafeteria wanted to determine if students liked a new chicken menu item, as compared to an original chicken item. Additionally, they were also interested to learn about the specific responses from males and female at SLHS. • Create a block design for this example…
Block Design ** At the end, analyze all results… • Outline of a Block Design: Group 1 Random assignment of groups Original chickenthen new chicken SLHS Men Group 2 New chicken thenoriginal chicken Subjects Group 1 Original chickenthen new chicken Random assignmentof groups SLHS Women Group 2 New chicken thenoriginal chicken
Block Design examples • In your group, create the following Block Designs… Properly label all starting and ending points, as well as all decision points (i.e., tree nodes) • Test a new type of TIDE detergent on your family’s clothes • Introduce a new set of Firestone tires on Honda Accords • Create block designs for these 2 examples…
Block Design ** At the end, analyze all results… • Test a new type of TIDE detergent on your family’s clothes Group 1 Random assignment of groups(of clothes) Clothes washed with TIDE TIDE used 1st Group 2 Clothes washed with original brand Clothes Group 1 Clothes washed with original brand Random assignmentof groups(of clothes) TIDE used 2nd Group 2 Clothes washed with TIDE
Block Design ** At the end, analyze all results… • Introduce a new set of Firestone tires on Honda Accords Group 1 Random assignment of groups(of cars) Accords with Firestone tires Firestone tiresused 1st Group 2 Honda Accords Accords with other tire brands Group 1 Accords with other tire brands Random assignmentof groups(of cars) Firestone tires used 2nd Group 2 Accords with Firestone tires
Matched Pair Design • Matched Pair Design • Compare just 2 treatments, ideally 2 units that are as closely matched as possible • Is an example of a block design (i.e., blocks of 2 units) • Select which block is assessed 1st randomly, then conduct the experiment on both blocks • Alternatively, a block may consist of 1 subject who gets 2 different treatments (one after the other) • Block – group of experimental units or subjects known before the experiment to be similar in some way, and expected to affect the response to treatments • Block Design – the random assignment of units to treatments, carried out separately within each block
Producing Data • 5.1 Designing Samples • 5.2 Designing Experiments • 5.3 Simulating Experiments • Steps for a Simulation • Simulation examples • Surveys (open/closed, types of surveys)
Simulation • The imitation of change behavior, based on a model that accurately reflects the experiment under consideration is called a Simulation. • Simulations need to use independent events (i.e., no effect on each other, from 1 step to another) • Steps for a Simulation: • State the problem, describe the experiment/simulation • State the assumptions • Define your variables (assign digits to represent outcomes) • Simulate many repetitions • State your conclusions
Simulation • Assigning digits… you can use numbers 0-9 or 00-99 (or other sets of numbers/combinations) to randomly assign digits • Suggest that you pick the number range that best fits the data (i.e., have the fewest omitted/unused numbers) • Examples: How would you assign the simulation digits? • It has been stated that a computer chip has an overall 5% failure rate. You want to conduct a simulation of randomly selecting a computer chip. • For an organization, 70% of the employees are female. You want to conduct a simulation of randomly selecting an employee… • Let’s suppose a “smaller” SLHS has roughly the following class sizes: Seniors 700, Juniors 700, Sophomores 800, Freshmen 800. You want to conduct a simulation of randomly selecting a student…
Simulation • Examples: How would you assign the simulation digits? • It has been stated that a computer chip has an overall 5% failure rate. You want to conduct a simulation of randomly selecting a computer chip. • 00-04 Failed chip, 05-99 Satisfactory chip • For an organization, 70% of the employees are female. You want to conduct a simulation of randomly selecting an employee… • 00-69 Female employees, 70-99 Male employees
Simulation • Examples: How would you assign the simulation digits? • Let’s suppose SLHS has roughly the following class sizes: Seniors 900, Juniors 900, Sophomores 1000, Freshmen 1100. You want to conduct a simulation of randomly selecting a student… • 3900 total students… could use the ratio of 9/9/10/11 x2 factor => 18/18/20/22 (for a total of 78) • 00-17 Seniors, 18-35 Juniors, 36-55 Sophomores, 56-77 FreshmenOmit 78-99