1 / 31

Statistics 02

Statistics 02. Normal distribution. Also called normal curve or the Gaussian curve Any variable whose value comes about as the result of summing the values of several independent, or almost independent, components can be modeled successfully as a normal distribution. Normal distribution.

remy
Download Presentation

Statistics 02

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics 02

  2. Normal distribution Also called normal curve or the Gaussian curve Any variable whose value comes about as the result of summing the values of several independent, or almost independent, components can be modeled successfully as a normal distribution.

  3. Normal distribution

  4. Normal distribution • Three features of the normal distribution • 1.    symmetrical histogram • 2.    the mean of the sample is very close to that of the original population. • 3.    the standard deviation of the set of sample means will be very close to the original population standard deviation divided by the square root of the sample size, n.

  5. Z score • Converted raw score on the basis of standard deviation. We convert a raw score to z score o determine how many standard deviation units that raw score is above or below the mean. • Z=(X-M)/s

  6. Application of Z score • Comparison of two scores from two tests • Conversion to standardized score (T score): T=50+10Z • Determining the proportion below a particular raw score: X < Score • Statistic inference: Range estimation

  7. Case • Student A takes 2 tests with the following data: • Test 1: Raw score=67. Mean=63, Standard deviation=3 • Test 2: R=56, M=51, s=4 • Question: What possible information can we obtain?

  8. Case • Two students take two different tests of English. • Student A: RS=67, M=63, s=3 • Student B: RS=56, M=51, s=4 • Question 1: Which student is better in English? • Question 2: Their T scores?

  9. Table of Normal Distribution • Relation between Z score and Proportion

  10. Case • When we select a score randomly from the population, how much probability is this score below or above a certain score? • That is: the probability of this score (X) < a certain score (say: 60) • X<60

  11. Case • Z<? • Z=(X-M)/s • Therefore, inequality • X-M < 60-M • (X-M)/s < (60-M)/s • Z<-1 • P=0.1587 • The chance that we randomly select a score that is below 60 is 16%.

  12. Case • Xiamen University wants to give the freshmen a placement test upon the admission and put them into 5 levels of English learning. Work out a plan for this test and inform the students before the test the scores required for each level. • Total of freshmen: 5000 • Classes for each level: • B0: 4 • B1: remaining • B2: 20 • B3: 8 • B4: 4 • Normal class size: 35

  13. 140 3810 700 210 140 -1.90 0.80 1.5 1.90 44.6 60.8 65 67.4

  14. Statistic inference • Use a collection of observed values to make inferences about a larger set of potential values. • Classical problem of statistic inference: how to infer from the properties of a part the likely properties of the whole. • Because of the way in which samples are selected, it is often impossible to generalize beyond the samples.

  15. Population • The largest class to which we can generalize the results of an investigation based on a subclass, in other words, the set of all possible values of a variable. • A population, for statistical purpose, is a set of values. • We need to be sure that the values that constitute the sample somehow reflect the target statistical population.

  16. Sampling • Random sampling gives us reasonable confidence that our inference from sample values to population values are valid. • The most common type of sampling frame is a list (actual or notional) of all the subjects in the group to which generalization is intended. • What the techniques of statistics offer is a common ground, a common measuring stick by which experimenters can measure and compare the strength of evidence for one hypothesis or another that can be obtained from a sample of subjects.

  17. Sampling • Careful considerations are needed to ensure the sample represents the population. eg. The gravity of errors in written English as perceived by two different groups: native English-speaking teachers of English and Greek teachers of English. Both samples contained individuals from different institutions to avoid institution attitude bias. • Researchers have an inescapable duty of describing carefully how their experimental material -- including subjects -- was actually obtained. It is also a good practice to attempt to foresee some of the objections that might be made about the quality of the material and either attempt to forestall criticism or admit openly to any serious defects.

  18. Case Study • Study the population and sample for the following investigations: • Vocabulary size • Listening input and listening comprehension • Social backgrounds and learning strategy

  19. Random Sampling • Use the Table of Random Numbers • Other methods

  20. Statistic Parameters • Population parameters • Mean: μ(mu, [mju], English correspondent: m) • Standard deviation: σ(sigma [sigm], English correspondent: s) • Sample parameters • Mean: M • Standard deviation: s

  21. Other Greek Alphabets • Σ sigma, symbol of sum, English correspondent: S • ε: epsilon, symbol of error, English correspondent: e • α: alpha • χ: chi [kai], English correspondent: x

  22. Parameter Estimation (参数估计) • Point estimator (点估计): a single number calculated from a sample and used to estimate a population parameter. • Interval estimator (区间估计): a likely range within which the population value may lie.

  23. Standard error of the sample means • If we draw repeatedly a sample from the population and calculate the means of these samples, these means will fall into a normal distribution. The variability of these means from the population mean is called standard error of the sample means, and is calculated as follows: • Standard error σx = σ/√n • When the population standard deviation σ is unknown, we often use the sample standard deviation s: σx = s/√n

  24. Case • If the following data are obtained from a test • N=132 • M=67 • S=6.5 • What is the standard error of the sample means?

  25. Case • σx = s/√n • =6.5/√132 • =6.5/11.49 • =0.566

  26. Confidence(置信度) • The probability at which we are confident the value will fall into, usually 95% or 99%. • Procedure: calculate the Z score • Look up in the Normal Distribution Table the Z score that corresponds to the probability of Z=α/2. • Compare Z and Z=α/2

  27. Case • N=132, M=67, S=6.5 • μα=0.05?

  28. Case • Z=(X-M)/s • =(X-μ)/ σx • =(67-μ)/0.566 • -Z=α/2 ≤ Z ≤ Z=α/2 • -1.96 ≤ (67-μ)/0.566 ≤ 1.96 • -1.10936 ≤67-μ≤ 1.10936 • -68.1036 ≤-μ≤ -65.89064 • 65.89064 ≤μ≤ 68.1036

  29. t Distribution • When the sample size becomes less than 30, the sample fall into T distribution. • T distribution is a family of curves • Degree of freedom (自由度):the number of conditions that are free to vary. In t distribution, df=n-1

  30. Case • Sample mean=63.16 • Sample standard deviation=7.25 • N=19 • μα=0.05?

  31. Case • Standard error=s/√19=7.25/ 4.36 = 1.66 • Z=(X-M)/s • =(X-μ)/ σx • =(63.16-μ)/1.66 • t0.05/2(18)=2.101 • -2.101<=(63.16-μ)/1.66<=2.101 • -3.48766<=63.16-μ<=3.48766 • -66.64766<=-μ<=59.67234 • 59.7<=μ<=66.6

More Related