1 / 35

Chapter 1

Chapter 1. Data Collection -- Where, Why and How. Content. Concept of Statistics, tools for data collection Populations and Samples, Four Sampling Techniques Data Types and Measurement Levels. A decision process. Chapter 4 and after + Decision Models. Chapter 2 and 3.

davissamuel
Download Presentation

Chapter 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1 Data Collection -- Where, Why and How BUS304 – Data Collection

  2. Content • Concept of Statistics, tools for data collection • Populations and Samples, Four Sampling Techniques • Data Types and Measurement Levels BUS304 – Data Collection

  3. A decision process Chapter 4 and after + Decision Models Chapter 2 and 3 Chapter 1 Data Presentation and Characterization Making inference Making Decision Data Collection • risks of using cell • phone while driving • public reaction, • police department • ability to enforce • etc… • list of risks • tables • graphs • data measurements • inference consequences • evaluate probabilities • evaluate alternatives • evaluate costs BUS304 – Data Collection

  4. Basic Concepts • Business Statistics • A collection of tools and techniques that are used to convert data into meaningful information in a business environment Descriptive statistics: • tools that collect, present, and describe data Inferential Statistics • tools that draw conclusions and/or make decisions concerning a population based only on sample data. • Estimation: how many people will buy (population) based on the sample response of the product? • Hypothesis Testing: should I change mkting strategy? BUS304 – Data Collection

  5. Data Collection Methods • Experiments • evaluating the reliability and gas consumption of the hybrid car. • Telephone Surveys • Marketing researches for existing customers, after sales queries, etc. • Written questionnaires and Surveys • Mailed survey to existing customers • Written survey distributed on the street • Direct observation and Personal Interviews • Largely used in consulting and IT development BUS304 – Data Collection

  6. Survey: Have you ever been • asked to fill out a written survey? • called to answer a phone survey? What questions did they ask? Why did they ask those questions? What will they do with those questions? Do you always answer all the questions? Do you always provide the “truth”? BUS304 – Data Collection

  7. Survey Design Steps • Define the issue • What are the purpose and objectives of the survey? • Define the population of interest • Who you want to ask questions for? • Formulate survey questions • Make questions clear and unambiguous • Use universally-accepted definitions • Limit the number of questions • Pretest the survey • Pilot test with a small group of participants • Assess clarity and length • Determine the sample size and sampling method • Select sample and administer the survey BUS304 – Data Collection

  8. Exercise • What could be some potential problems with the following survey questions? • Do you agree with most other reasonably minded people that the city should spend more money on neighborhood parks? • To what extent would you support paying a small increase in your property taxes if it would allow poor and disadvantaged children to have food and shelter? • How much money do you make at your current job? • After trying the new product, please provide a rating from 1 to 10 to indicate how you like its taste and freshness? • Do you agree that the ambiance was divine? • Do you agree that the service was impeccable? Some of those problems are obvious, but not always that obvious when you make them -- pretest is very important! BUS304 – Data Collection

  9. Data Collection Bias Bias are generally inevitable. We try to reduce the bias, but sometimes there are still some. • Interview Bias • Nonresponse Bias • Selection Bias • Observer Bias • Measurement Error BUS304 – Data Collection

  10. A statistic research always starts with a question: What is the average starting salary for a business major nation-wide? for CSUSM? Are the house prices in San Diego unaffordable? Are the college textbooks too expensive? Is Dr. Fang a nice person? Will Obama win the presidential election? Lots of others… Population a b c d ef gh i jk l m n o p q rs t u v w x y z Sample b c g i n o r u y Population and Samples • Population: -- All the items that are of interest -- Exercise: Determine the population for each question on the left Sample: -- A subset of the population How to determine? -- Check whether it covers all the items of interest BUS304 – Data Collection

  11. Sampling: • Techniques to select only part of the population to conduct the study • You definitely loss certain accuracy in the answer • But sometimes it is more reasonable to use sample than use population • Less time consuming • Less costs • Sometimes, study is destructive. e.g. car durability test, matches BUS304 – Data Collection

  12. Sampling Techniques • Non-Statistical Sampling • Samples are selected at convenience • Results will be subject to bias • Examples: • Ask a friend, a neighbor, etc. • Judges. • Statistical Sampling • Use probability theory to guide the selection • Ensure that the sample is very likely (or at least with a measurable odd) to represent the population • Sampling bias can be estimated (as we will learn later the semester) BUS304 – Data Collection

  13. Simple Random Systematic Stratified Cluster Four Statistical Sampling Techniques (1) • Simple random Sampling • The most basic statistical sampling method. • Select at random • Dice, Card, Random number generator (calculator, Excel) Exercise: • Use random number generator in Excel to select a sample of ten NBA players and find out the average weight. • “NBA Roster” File • Tutorial – RNG PPT BUS304 – Data Collection

  14. Simple Random Systematic Stratified Cluster Four Statistical Sampling Techniques (2) • Systematic Sampling • A simplified version of simple random sampling • Select a random start, and then go by equal space (interval) • Question: how to determine the interval so that everyone has a chance to be selected? Formula: Interval = Population size / sample size BUS304 – Data Collection

  15. Systematic sampling exercise • Use systematic sampling technique to select 10 NBA players and find out the average weight. • Think? How many random numbers you need to generate? BUS304 – Data Collection

  16. Simple Random Systematic Stratified Cluster Four Statistical Sampling Techniques (3) • Stratified Sampling • Divide the population into subgroups • Use simple random sampling method (or systematic sampling) to select from each group • Combine to form one big sample • Think: what is the benefit of using stratified sampling? • More representative BUS304 – Data Collection

  17. Stratified sampling exercise • Use stratified sampling technique to select a sample of 10 NBA players, including 2 Power Forwards, 2 Shooting Forwards, 2 Shooting Guards, 2 Point Guards, and 2 Centers. • Find out the average weight. • Why we want to control the proportion for each position? BUS304 – Data Collection

  18. Simple Random Systematic Stratified Cluster Four Statistical Sampling Techniques (4) • Cluster Sampling • Divide the population into subgroups -- called “clusters”. • Randomly select some subgroups (not all!) • In each selected subgroup, use random sampling technique to select sub-samples • Combine the sub-samples to form one aggregate sample • Think: when we use cluster sampling? (e.g. market research, select towns first) BUS304 – Data Collection

  19. Clustered Sampling Exercise • Use each NBA team as a cluster • Randomly select 5 teams to conduct the study • In each of the selected teams, select 2 players • Combine them into an aggregate sample of five. • Think, how many times do you need to use the Random Number Generator? • Discuss the difference between cluster sampling technique and stratified sampling technique. BUS304 – Data Collection

  20. Compare different techniques • Simple random sampling and systematic sampling: • Need to know the population size • Doesn’t care about the composition of the population • Stratified sampling: • Use the information about the population composition to control sample • The sample can be more representative to the population • Cluster sampling: • Generally used when you have a geographically distributed population • Divide the population into several geographical areas • Randomly select some areas (not all) to study – cost saving. • Sometimes, a combination of techniques can be used. BUS304 – Data Collection

  21. Discussion • Which sampling techniques should be used for (or are used in) the following studies? – discuss the potential bias of the techniques. • NBC wants to conduct an opinion poll to understand people’s opinion on Hillary Clinton’s chance of being selected as president in 2008. • CSUSM wants to collect opinions about how the junior faculty members teach their classes • Policemen want to detect drunk drivers to prevent potential accidents. • Oscar judges determine the best pictures of the year. • Fans vote for the NBA all-star team. • American Citizens vote for president. BUS304 – Data Collection

  22. Primary Source Observations Surveys Experiments Data Source • Secondary Source • Books & CDs • Newspaper, magazine • Internet Difference: Whether you collect the data or not. BUS304 – Data Collection

  23. Think • For the NBA players’ weight experiment, is the data source primary or secondary? Why? • Is the data collected from students’ evaluation primary or secondary? Why? BUS304 – Data Collection

  24. Discussion • What are the benefits of using primary data? • What are the benefits of using secondary data? BUS304 – Data Collection

  25. Data Types • Quantitative Data • Numerical data (all numbers) • E.g. number of hours that students work at a paying job • Qualitative Data • Non-numerical (e.g. with non-numerical characters) • E.g. students judge the quality of education: very poor, poor, fair, good, or very good. • Note: mostly recorded as 1, 2, 3, 4, 5, but it indicates quality level, which should be translate to the meaning and considered as qualitative data • “2” (poor) + “3” (fair)  “5” (very good) BUS304 – Data Collection

  26. Data Measurement Levels Highest Level Complete Analysis Measurements Numerical Value Ratio/Interval Data Higher Level Mid-Level Analysis Ordinal Data Rankings Ordered Categories Categorical Codes ID Numbers Category Names Nominal Data Lowest Level Basic Analysis BUS304 – Data Collection

  27. Nominal Data • The lowest form of data, Yet you always encounter such data • Mostly “qualitative”, non-numerical • Students Names, Addresses, Majors, Customer Preferences, Marriage Status, Payment Methods, etc……. • Sometimes can be numerical, normally used as to identify the individual, cannot be grouped and aggregated to provide more information. • Student ID number • Bank account • Social security number, etc. • Nominal data are the most basic type of data measurement level. We generally cannot do much analysis about it • In designing the survey, you should try to avoid all “nominal data.” BUS304 – Data Collection

  28. Ordinal Data • Many students confuse the name of “ordinal” with “interval” • Also called “rank data”. – can rank orders on the basis of some relationship among them. • Outstanding example: • income intervals: “under $20,000”, “$20,000 to $40,000”, “over $40,000” • GPA intervals: “<2.0”, “2.0 to 3.0”, “>3.0” • Professor Ranking: “adjunct professor”, “assistant professor”, “associate professor”, “professor” • Note, sometimes ranking are within certain content, extending the ranking may cause controversial issues (e.g. social ranking in some country) • Such data Allows decision maker to equate two or more observations or to rank-order the observations. BUS304 – Data Collection

  29. Ration/Interval Data • Numerical Data Values: temperature, grade, income, age, etc. • If “0”  nothing  Interval Data (Temperature) • Can find interval between two values: Today is 2F higher than yesterday • Cannot find ratio: 80F is not twice as warm as 40F. • If “0” = nothing  Ratio Data (income, age, grade, etc.) • Can find both interval and ratio: I earn $15,000 more than your annual income, and I earn twice as much as his salary. • Both can be used to conduct certain mathematically and statistical analysis, e.g. averaging, etc. BUS304 – Data Collection

  30. A matrix used to determine data measurement level BUS304 – Data Collection

  31. Exercise • Determine the data measurement levels Nominal Ordinal Ratio BUS304 – Data Collection

  32. More exercise • For each of the following variables, indicate the level of data measurements: • marital status {single, married, divorced, other} • home ownership {own, rent, other} • product rating {1=excellent, 2=good, 3=fair, 4=poor, 5=very poor} • unemployment rates of CA • monthly sales • student gender BUS304 – Data Collection

  33. Data Measurement Level • Indentifying Data Measurement Levels are the starting point of data analysis, presentation and characterization. It tells you what you can do about the data that collected! BUS304 – Data Collection

  34. Summary • Basic Concept of Statistics • Data Collection Methods • Sampling Techniques: • Concepts: Population vs. Sample • Four sampling techniques: processes, pros and cons • Data sources: primary or secondary • Whether you collect the data or use some one else’s • Data types: quantitative or qualitative • Whether the data were purely numerical • Data Measurement Levels Nominal / Ordinal / Interval / Ratio BUS304 – Data Collection

  35. Concepts checklist BUS304 – Data Collection

More Related