1 / 37

Chapter 5.1 Producing Data

Chapter 5.1 Producing Data. Producing Data. 5.1 Designing Samples Populations vs. Samples, Survey inquiries Types of samples: Voluntary Response, Convenience sampling, Simple random samples (SRS), Stratified samples, Clustered, Systematic Simulations, use of Table B (Random Digit Table)

juro
Download Presentation

Chapter 5.1 Producing Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5.1 Producing Data

  2. Producing Data • 5.1 Designing Samples • Populations vs. Samples, Survey inquiries • Types of samples: Voluntary Response, Convenience sampling, Simple random samples (SRS), Stratified samples, Clustered, Systematic • Simulations, use of Table B (Random Digit Table) • Cautions with sampling • 5.2 Designing Experiments • Observational Studies vs. Experiments • Experimental Design – Subjects, Factors, Levels • Block Design, Matched Pair Design • 5.3 Simulating Experiments

  3. Designing Samples • If one was curious about… • What % of the adults in the city of Houston consider themselves “conservative”? • What is the average family size in Texas? • What % of the college applicant pool is female? • What is the average starting salary for undergraduate STEMS (Science Technology Engineering Math) majors? • How many hours per week do teachers spend grading? • What is your favorite color?! Etc., etc. • What are the challenges to answering these questions? • How can we work around these data gathering ‘limitations’? • Sampling…

  4. Surveys • A survey is a systematic method to collect information. • Surveys can take on many different methods and approaches • In-person, self-administered, conducted via phone/mail • Individual, small group, large group, aggregate population • Research-based, questionnaire, general inquiry • Open-ended surveys • Question/inquiry which leaves responses open to the responder • Generally straight-forward questions with no multiple parts • Close-ended surveys • Question/inquiry which forced responses from individuals • Different types • Surveys can include multiple parts, some open/closed • Source: “Desigining Surveys that Count”, Therese Seibert, PhD

  5. Surveys • Close-ended surveys can be: • Multiple choice, Yes/No responses • Categorical (nominal; i.e., word categories) • e.g., How many times have you been to the movies this year? • Ordinal (i.e., rankings), examples: • Strongly Agree / Agree / Disagree / Strongly Disagree • Excellent / Good / Fair / Poor • Always / Very Often / Fairly Often / Sometimes / Almost Never / Never • Completely satisfied, Very satisfied, Somewhat satisfied, Somewhat dissatisfied, Very dissatisfied, Completely dissatisfied • Once/day, once/week, several times/week

  6. Surveys • From our In-class Activity…

  7. Surveys • From our In-class Activity…

  8. Designing Samples • Population vs. Sample • Population – the entire group of individuals that we want information about • Sample – a part (or subset) of the population that we actually examine in order to gather information • Sampling involves studying a smaller group (a part), in order to gain information about the entire population (the whole) • A census attempts to contact EVERY individual in the entire population • requires complete accuracy, where sampling would be not be adequate (e.g., 2010 USA population census) • Simulation – an alternative method to sampling the population, often used when polling the population is not logistically possible or inconvenient

  9. Designing Samples • Types of Sampling • Voluntary response – sampling technique where individuals choose to participate by responding to a general appeal (i.e., they volunteered to participate) • Convenience sample – sampling technique which chooses the individuals easiest to reach (i.e., convenient to sample) • Simple Random Sample (SRS) – sampling technique where each individual of the population has an equally likely chance to be selected, determined by a specific SRS sampling process • Stratified random sample – where samples are taken from segmented/divided subsets of the population (called strata), grouped into logical units • Clustered random sample – where samples are taken from segmented/divided subsets of the population, grouped into equal numbered units • Systematic sample – sampling technique which follows a consistent, systematic approach (i.e., every 10th person)

  10. Common Technique Used in Sampling • One approach to sampling, used frequently in simulations, is to use a Random Digit Table • Referred to as Table B (located in the back cover of your text) • Structure: Groups of randomly listed numbers in many rows • Purpose: Scan the list of random digits to simulate a particular experiment, study, or simulation • Steps to use the Random Digit Table… • Understand the question at hand (i.e., desired sampling), and the possible outcomes • Define and design ‘variables’ appropriate to mimic the total possible outcomes, and the desired sampling results • Define the criteria to scan thru the list of values (e.g., 1 digit, 2) • Once defined, run the simulation using the defined variables approach and calculate the observed percentages

  11. Random Digit Table

  12. Random Digit Table Example • Simulate picking a Texas senator from the U.S. Senate • Assess problem: 100 senators in total, 2 from Texas • Define variables: • Use the entire data range from 00-99 to represent the 100 senators • Set: 00-01 = Texas senators, 02-99 = all other states’ senators(alternatively, 01-02 = Texas, 03-99,00 = non-Texas… starting with 00 is more common though) • Since variables are defined with 2 digits, look at the set of numbers within the table in groups of 2 • Run the simulation • Looking at the 1st 2 rows from the Random Digit Table (e.g., lines 101, 102), scan the rows looking for the 00-01 and 02-99 numbers above • Record each occurrence of 00-01 (representing Texas senators) and 02-99 (non-Texas senators) • When done with the 1st 2 rows, calculate the overall % of occurrences of 00-01 (Texas) • Compare with the expected result of 2% (2/100)

  13. Random Digit Table

  14. Random Digit Table Example • Simulate picking a Texas senator from the U.S. Senate • Counting the number of occurrences of 00-01 on the previous page… • Result: ONE occurrence found (on 2nd row)… • 40 pairs of possible 2-digit combinations in the 1st 2 rows • Simulation percentage: 1/40 = 2.5%(interestingly, fairly close to expected value of 2%!)

  15. Random Digit Table Example • Simulate picking a member from Texas out of all of the US House of RepresentativesNote: If you have forgotten, there are 435 US HofR members! (Texas has 36) • Steps: • Establish your simulation: • Define variables • Set your variable criteria (including omitted numbers, if applicable) • Run the simulation: • Establish starting point and rows used to simulate • Run simulation • Identify number of occurrences of your defined set • Calculate simulated percentage, compare with expected results

  16. Random Digit Table Example • Simulate picking a US House of Representative member • Establish your simulation: • Define variables (note: 435 US HofR members)Define: US HofR member (435), or not… Within that, Texas HofR member (36), or not…We’ll use 870 as our ‘factor’ for criteria, omitting numbers > 870 • Set your variable criteria (including omitted numbers, if applicable)Define: 000-071 = Texas HofR member072-869 = Not from Texas, but a US HofR memberOmit 870-1000

  17. Random Digit Table Example • Simulate picking a US House of Representative member • Run the simulation: • Establish starting point and rows used to simulateStart on row 1, look at groups of 3-consecutive digits, search 3 rows (because of 3 digits (good rule of thumb) • Run simulation • Identify number of occurrences of your defined setLooking at the 1st 3 rows, I found the following:2 occurrences of 000-069, 32 occurrences of 070-869, and 6 omissions • Calculate simulated percentage, compare with expected results2 / 34 = 5.88% (compared to expected 36 / 435 = 8.27%)Note: did not include the 6 omissions in the calculation

  18. Cautions Related to Sampling • Things to look out for during sampling… • Bias – when a sampling process systematically favors certain outcomes • Impact: Removes the randomness of the sampling, affects overall results • Undercoverage – occurs when some groups in the population are left out of the process of choosing the sample • Impact: Under-representation of some/many groups • Non-response – occurs when an individual chosen for the sample cannot be contacted or does not cooperate • Impact: Skews the results of the data • Response bias – when the behavior of or response by the respondent is affected, due to the circumstances of the sampling=> Responder could lie in an effort to avoid a sensitive topic or unpopular behavior • Wording of the question – how the question/survey is worded can impact the effectiveness of the sampling technique and results=> Poorly worded questions, leading questions

  19. Producing Data • 5.1 Designing Samples • 5.2 Designing Experiments • Components of an Experiment – • Experimental units, Subjects, Treatments • Factors, Levels • The Placebo Effect • Principles of Experimental Design • Block Design, Matched Pairs Design • 5.3 Simulating Experiments

  20. Chapter 5.2 Designing Experiments

  21. Observational Studies vs. Experiments • Observes individuals and measures variables of interest but does not attempt to influence the responses. • Characteristics: • Curiosity for information • Quick data gathering • Can cover a wide range of data, from a number of sources • Typically results in ‘observing’ or ask questions (ex. surveying) • Imposes some type of treatment on individuals in order to observe their responses • Characteristics: • An objective is to ‘impose change’ (by experimenting) • Typically more formal – i.e., has a structured approach/design • Interested in observing the response (to an experiment) Observational Study Experiment

  22. Experiment Design Terms • A study becomes an experiment when something is actually done (witnessed, tested, analyzed) to people, animals, objects. • Individuals on which the experiment is conducted are the experimental units (called subjects when humans) • A specific experimental condition applied to the units is a treatment. • The explanatory variable in an experiment is called a Factor • When several factors are considered, a specific value (level) can be administered for each factor

  23. Subjects, Factors, Levels • An example… Does regularly taking aspirin help protect people against heart attacks? (example 5.9, p. 290)The Physicians’ Health Study looked at the effects of two drugs – aspirin and beta carotene – administered by 21,996 physicians • Subjects – 21,996 physicians • Factors (2) – aspirin, beta carotene • Levels (2) – yes/no (for each factor) Factor 2: Beta Carotene Factor 1: Aspirin

  24. Experiment Design Terms • Experiment terms • Control group – group of patients who receive an alternative treatment (possibly a placebo), to control the effects of outside variables • A placebo(‘fake pill’ given to simulate taking a pill or medicine) can be introduced to further control the use of particular medicines • Placebo Effect – effect on an experiment of using a placebo in a controlled setting Statistically significant – an observed effect so large that it would rarely occur by chance • Double-blind Experiment • An experiment conducted where neither the subjects nor the people who have contact with them know which treatment a subject received • Why would one run a double-blind experiment? • Avoids unconscious bias by …the doctor (who maybe feels a placebo can’t benefit a patient) and the subject (unaware of what has been administered)

  25. Experimental Design • Features of Experiments • Give good evidence of causation • Allow us to study specific factors while controlling the effects of lurking variables • Principals of Experimental Design Control Randomize Replicate Control lurking variables, by comparing 2+ treatments Using chance to assign experimentalunits to treatments Repeat procedureto reduce chancevariation in results

  26. Block Design ** At the end, analyze all results… • Outline of a Block Design: Therapy 1 Group 1 Random Assignment Men Group 2 Therapy 2 Group 3 Therapy 3 Subjects Group 1 Therapy 1 Random Assignment Women Group 2 Therapy 2 Group 3 Therapy 3

  27. Block Design example • Suppose the SLHS cafeteria wanted to determine if students liked a new chicken menu item, as compared to an original chicken item. Additionally, they were also interested to learn about the specific responses from males and female at SLHS. • Create a block design for this example…

  28. Block Design ** At the end, analyze all results… • Outline of a Block Design: Group 1 Random assignment of groups Original chickenthen new chicken SLHS Men Group 2 New chicken thenoriginal chicken Subjects Group 1 Original chickenthen new chicken Random assignmentof groups SLHS Women Group 2 New chicken thenoriginal chicken

  29. Block Design examples • In your group, create the following Block Designs… Properly label all starting and ending points, as well as all decision points (i.e., tree nodes) • Test a new type of TIDE detergent on your family’s clothes • Introduce a new set of Firestone tires on Honda Accords • Create block designs for these 2 examples…

  30. Block Design ** At the end, analyze all results… • Test a new type of TIDE detergent on your family’s clothes Group 1 Random assignment of groups(of clothes) Clothes washed with TIDE TIDE used 1st Group 2 Clothes washed with original brand Clothes Group 1 Clothes washed with original brand Random assignmentof groups(of clothes) TIDE used 2nd Group 2 Clothes washed with TIDE

  31. Block Design ** At the end, analyze all results… • Introduce a new set of Firestone tires on Honda Accords Group 1 Random assignment of groups(of cars) Accords with Firestone tires Firestone tiresused 1st Group 2 Honda Accords Accords with other tire brands Group 1 Accords with other tire brands Random assignmentof groups(of cars) Firestone tires used 2nd Group 2 Accords with Firestone tires

  32. Matched Pair Design • Matched Pair Design • Compare just 2 treatments, ideally 2 units that are as closely matched as possible • Is an example of a block design (i.e., blocks of 2 units) • Select which block is assessed 1st randomly, then conduct the experiment on both blocks • Alternatively, a block may consist of 1 subject who gets 2 different treatments (one after the other) • Block – group of experimental units or subjects known before the experiment to be similar in some way, and expected to affect the response to treatments • Block Design – the random assignment of units to treatments, carried out separately within each block

  33. Producing Data • 5.1 Designing Samples • 5.2 Designing Experiments • 5.3 Simulating Experiments • Steps for a Simulation • Simulation examples • Surveys (open/closed, types of surveys)

  34. Simulation • The imitation of change behavior, based on a model that accurately reflects the experiment under consideration is called a Simulation. • Simulations need to use independent events (i.e., no effect on each other, from 1 step to another) • Steps for a Simulation: • State the problem, describe the experiment/simulation • State the assumptions • Define your variables (assign digits to represent outcomes) • Simulate many repetitions • State your conclusions

  35. Simulation • Assigning digits… you can use numbers 0-9 or 00-99 (or other sets of numbers/combinations) to randomly assign digits • Suggest that you pick the number range that best fits the data (i.e., have the fewest omitted/unused numbers) • Examples: How would you assign the simulation digits? • It has been stated that a computer chip has an overall 5% failure rate. You want to conduct a simulation of randomly selecting a computer chip. • For an organization, 70% of the employees are female. You want to conduct a simulation of randomly selecting an employee… • Let’s suppose a “smaller” SLHS has roughly the following class sizes: Seniors 700, Juniors 700, Sophomores 800, Freshmen 800. You want to conduct a simulation of randomly selecting a student…

  36. Simulation • Examples: How would you assign the simulation digits? • It has been stated that a computer chip has an overall 5% failure rate. You want to conduct a simulation of randomly selecting a computer chip. • 00-04 Failed chip, 05-99 Satisfactory chip • For an organization, 70% of the employees are female. You want to conduct a simulation of randomly selecting an employee… • 00-69 Female employees, 70-99 Male employees

  37. Simulation • Examples: How would you assign the simulation digits? • Let’s suppose SLHS has roughly the following class sizes: Seniors 900, Juniors 900, Sophomores 1000, Freshmen 1100. You want to conduct a simulation of randomly selecting a student… • 3900 total students… could use the ratio of 9/9/10/11 x2 factor => 18/18/20/22 (for a total of 78) • 00-17 Seniors, 18-35 Juniors, 36-55 Sophomores, 56-77 FreshmenOmit 78-99

More Related