Speaking of Statistics (13) Instructor: Prof. Ken Tsang T.A. : Ms Lisa Liu Office: E409 Tel: 362 0606(office) 362 0630(T.A.) Email: email@example.com(Instructor) firstname.lastname@example.org(TA)
What is Statistics all about? • The subject of statistics involves the study of how to collect, summarize, analyze and interpret data. • Data are numerical facts and figures from which conclusions can be drawn. Such conclusions are important to the decision-making processes of many professions and organizations.
Data • Some sources of data are: • Data distributed by an organization or an individual • A designed experiment • A survey • An observational study • Web, telephone • Data can be • Curves, figures • Sounds • Papers, books • Web, telephone process
Distinguished Statisticians in History! Sir R. A. Fisher 1890-1962 Karl Pearson 1857-1936
Data Scientist: The Sexiest Job of the 21st Century
CRA (Criterion-Referenced Assessment) • Adoption of the Criterion-Referenced Assessment (CRA) for evaluating students’ performance • OBTL Syllabus • CRA model is directly compatible with the OBTL philosophy.
UIC Regulations on CRA • We will use rubric for following assessments. • Oral Presentation and Group Presentation (20%) • Final Examination (40%)
Oral and Group Presentation (1) • Choose your teammates • 4-5 members in one team • Submit your group form before November • Study rubric for oral presentation (1)
Choose a topic for your team • Prepare your PPT • Oral presentation will be given (roughly) on the 12th week (Nov ? Dec 2014)
Suggested Grade Distribution Assessment grade system: • A (Not more than 5%) • A and A- (Not more than 15%) • A and B that include A, A-, B+, B and B- (Not more than 75%) • Below C and not include C (No any limit ).
Some notices on this Course • Assignments must be handed in before the deadline. After the deadline, we refuse to accept your assignments! • For the mid-term test and final examination, you cannot bring anything except some stationeries and water! Mobile are not allowed. • For the final examination, we cannot tell you the score before the AR inform the official results. If you have any question on the score, you can check the marked sheet via AR.
General Information • Textbook Essentials of Business Statistcs Bowerman/O'Connell/Murphree/Orris McGraw Hill, International Edition ISBN 978-0-07-131471-8 • Advantages • Unified textbook for all the year one students • More applications
General Information • References • Basic Statistics, for Business & Economics Fifth Edition D.A. Lind, W.G. Marchal and S.A. Wathen 2006, McGraw Hill, International Edition • Business Statistics, A First Course, Fourth Ed. D.M. Levine, T.C. Krehbiel and M.L. Berenson 2006, Pearson Prentice Hall, New Jersey • Statistics for Business and Economics, Ninth Ed. J.T. McClave, P.G. Benson and T. Sincich 2005, Pearson Prentice Hall, New Jersey • Modern Elementary Statistics, 11th Ed. J.E. Freund, 2004, Prentice Hall.
Statistics for the Behavioral Sciences Frederick J Gravetter and Larry B. Wallnau Wadsworth Publishing; 8 edition (December 10, 2008)
Chapter 1An introduction to Business Statistics • Populations and Samples • Sampling a Population of Existing Units • Sampling a Process • An Introduction to Survey Sampling
Section 1.1 Populations(总体) and Samples(样本) Population A set of existing units (people, objects, or events) • All of the last year’s graduates of Dartmouth College’s Master of Business Administration program. • All Lincoln Town Cars that were produced last year. • All accounts receivable invoices accumulated last year by The Procter & Gamble Company. • All fire reported last month to the Tulsa, Oklahoma, fire department.
Variable(变量) A measurable characteristic of the population. We carry out a measurement to assign a value of a variable to each population unit. • The variable is said to be quantitative(定量的): Measurements that represent quantities (for example, “how much” or “how many”). For example, annual starting salary is quantitative, age and number of children is also quantitative • The variable is said to be qualitative(定性的) or categorical(属性的): A descriptive category to which a population unit belongs. For example, a person’s gender, the make of an automobile and whether a person who purchases a product is satisfied with the product are qualitative.
There are two types of qualitative variables: • Nominative(无顺序分类的): • Identifier or name • Unranked categorization • Example: gender, car color • Ordinal(顺序的): • All characteristics of nominative plus… • Rank-order categories • Ranks are relative to each other • Example: Low (1), moderate (2), or high (3) risk
Population Sample Census(普查) An examination of the entire population of measurements. Note: Census usually too expensive, too time consuming, and too much effort for a large population. Sample A selected subset of the units of a population.
For example, a university graduated 8,742 students • This is too large for a census. • So, we select a sample of these graduates and • learn their annual starting salaries. Sample of measurements • Measured values of the variable of interest for the sample units. • For example, the actual annual starting salaries of the sampled graduates.
Descriptive statistics The science of describing the important aspects of a set of measurements • For example, for a set of annual starting salaries, we want to know: • How much to expect • What is a high versus low salary • How much the salaries differ from each other • If the population is small enough, could take a census and not have to sample and make any statistical inferences • But if the population is too large, then ……….
There is a criteria on how to choose a sample: the information contained in a sample is to accurately reflect the population under study.
The Lady Tasting Tea Tea is tasted different depending upon whether the tea was poured into the milk or whether the milk was poured into the tea. Let us test the proposition!
Section 1.2 Sampling a Population of Existing Units Random sample(随机样本) A random sample is a sample selected from a population so that: • Each population unit has the same chance of being selected as every other unit • Each possible sample (of the same size) has the same chance of being selected • For example, randomly pick two different people from a group of 15: • Number the people from 1 to 15; and write their numbers on 15 different slips of paper • Thoroughly mix the papers and randomly pick two of them • The numbers on the slips identifies the people for the sample
Sample with replacement(有放回抽样) Replace each sampled unit before picking next unit • The unit is placed back into the population for possible reselection • However, the same unit in the sample does not contribute new information Sample without replacement(无放回抽样) A sampled unit is withheld from possibly being selected again in the same sample • Guarantees a sample of different units • Each sampled unit contributes different information • Sampling without replacement is the usual and customary sampling method
Example 1.1 The Cell Phone Case: Estimating Cell Phone Costs The bank has 2,136 employees on a 500-minute-per-month plan with a monthly cost of $50. The bank will estimate its cellular cost per minute for this plan by examining the number of minutes used last month by each of 100 randomly selected employees on this 500-minute plan. According to the cellular management service, if the cellular cost per minute for the random sample of 100 employees is over 18 cents per minute, the bank should benefit from automated cellular management of its calling plans.
In order to randomly select the sample of 100 cell phone users, the bank will make a numbered list of the 2,136 users on the 500-minite plan. This list is called a frame(设计框架). • The bank can use a random number table, such as Table 1.1(a), or a computer software package, such as Table 1.1 (b), to select the needed sample. • The 100 cellular-usage figures are given in Table 1.2.
Approximately Random Samples Sometimes it is not possible to list and thus number all the units in a population. In such a situation we often select a systematic sample, which approximates a random sample. A Systematic Sample(系统抽样) Randomly enter the population and systematically sample every kth unit.
The Marketing Research Case: Rating a New Bottle Design Example 1.2 To study consumer reaction to a new design, the brand group will use “mall intercept method” in which shoppers at a large metropolitan shopping mall are intercepted and asked to participate in a consumer survey. The questionnaire are shown in Figure 1.1. Each shopper will be exposed to the new bottle design and asked to rate the bottle image using a 7-point “Likert scale.” We select a systematic sample. To do this, every 100th shopper passing a specified location in the mall will be invited to participate in the survey. During a Tuesday afternoon and evening, a sample of 60 shoppers is selected by using the systematic sampling process. The 60 composite scores are given in Table 1.3. From this table, we can estimate that 95 percent of the shoppers would give the bottle design a composite score of at least 25.
Process Inputs Section 1.3 Sampling a Process Process(过程) A sequence of operations that takes inputs (labor, raw materials, methods, machines, and so on) and turns them into outputs (products, services, and the like) Outputs
Processes produce output over time • The “population” from a process is all output produced in the past, present, and the yet-to-occur future. • For example, all automobiles of a particular make and model, for instance, the Lincoln Town Car • Cars will continue to be made over time
The Coffee Temperature Case: Monitoring Coffee Temperatures Example 1.3 This case concerns coffee temperatures at a fast-food restaurant. To do this, the restaurant personnel measure the temperature of the coffee being dispensed (in degrees F) at half-hour intervals from 10 A.M. to 9:30 P.M. on a given day. Data is list on Table 1.7. • A process is in statistical control if it does not exhibit any unusual process variations. • To determine if a process is in control or not, sample the process often enough to detect unusual variations • A runs plot is a graph of individual process measurements over time. Figure 1.3 shows a runs plot of the temperature data.
Figure 1.3 Runs Plot of Coffee Temperatures: The Process is in Statistical Control.
Results • Over time, temperatures appear to have a fairly constant amount of variation around a fairly constant level • The temperature is expected to be at the constant level shown by the horizontal blue line • Sometimes the temperature is higher and sometimes lower than the constant level • About the same amount of spread of the values (data points) around the constant level • The points are as far above the line as below it • The data points appear to form a horizontal band • So, the process is in statistical control • Coffee-making process is operating “consistently”
Remark • Because the coffee temperature has been and is presently in control, it will likely stay in control in the future • If the coffee making process stays in control, then coffee temperature is predicted to be between 152o and 170o F • In general, if the process appears from the runs plot to be in control, then it will probably remain in control in the future • The sample of measurements was approximately random • Future process performance is predictable
Section 1.4 An Introduction to Survey Sampling • Already know some sampling methods • Also called sampling designs, they are: • Random sampling • The focus of this book • Systematic sampling • Voluntary response sampling • But there are other sample designs: • Stratified random sampling(分层随机抽样) • Cluster sampling(分块抽样)
Stratified Random Sample • Divide the population into non-overlapping groups, called strata, of similar units • Separately, select a random sample from each and every stratum • Combine the random samples from each stratum to make the full sample • Appropriate when the population consists of two or more different groups so that: • The groups differ from each other with respect to the variable of interest • Units within a group are similar to each other • For example, divide population into strata by age, gender, income, etc
Cluster Sampling • “Cluster” or group a population into subpopulations • Cluster by geography, time, and so on… • Each cluster is a representative small-scale version of the population (i.e. heterogeneous group) • A simple random sample is chosen from each cluster • Combine the random samples from each cluster to make the full sample • Appropriate for populations spread over a large geographic area so that… • There are different sections or regions in the area with respect to the variable of interest • A random sample of the cluster
More on Systematic Sampling • Want a sample containing n units from a population containing N units • Take the ratio N/n and round down to the nearest whole number • Call the rounded result k • Randomly select one of the first k elements from the population list • Step through the population from the first chosen unit and select every kth unit • This method has the properties of a simple random sample, especially if the list of the population elements is a random ordering
Sampling Problem • Random sampling should eliminate bias • But even a random sample may not be representative because of: • Under-coverage • Too few sampled units or some of the population was excluded • Non-response • When a sampled unit cannot be contacted or refuses to participate • Response bias • Responses of selected units are not truthful