1 / 90

Sampling & Probability Distributions: Research Methodology Lecture 13

This lecture covers sampling methods and probability distributions, with a focus on estimating the average student age at a university. It explains the concepts of mean, standard deviation, and Gaussian/normal probability distribution, as well as how to calculate z-scores and probabilities.

hawkins
Download Presentation

Sampling & Probability Distributions: Research Methodology Lecture 13

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Methodology Lecture 13 Sampling & Probability distributions Mazhar Hussain Dept of Computer Science ISP,Multan Mazhar.hussain@isp.edu.pk

  2. Road Map

  3. Sampling • How to findaveragestudentage in the university? • Askeachstudent and compute the average • Randomly select 3 to 4 studentsfromeach discipline and findtheiraverageage – Estimation of the averageage of student in the university

  4. Sampling • Whysampling? • Efforts and resourcesrequired to carry out the study on the population • Examples • Averageincome of families living in a city • Results of an election • Opinion about the a problem

  5. Sampling Samplingis the process of selcetion a few (a sample) from a bigger group (the sampling population) to become the basis for estimating or predicting the prevalence of an unknownpiece of information, situation or outcomeregarding the bigger group

  6. Recap – Mean & Standard deviation • Mean/Average • Standard Deviation • On the average, how far the data values are from the mean

  7. Population vs sample

  8. Gaussian Distribution Karl Friedrich Gauss 1777-1855

  9. Gaussian/Normal Probability Distribution • Most of the naturallyoccurringprocessescanbemodeled by a bellshapedcurve

  10. Gaussian/Normal Probability Distribution • The Gaussian probability distribution is perhaps the most used distribution in all of science. • Sometimes it is called the “bell shaped curve” or normal distribution. = mean of distribution  = standard deviation of distribution x is a continuous variable (-∞x∞

  11. Gaussian/Normal Probability Distribution The area within +/- σ is ≈ 68% The area within +/- 2σ is ≈ 95% The area within +/- 2σ is ≈ 99.7%

  12. Gaussian/Normal Probability Distribution • Probability (P) of x being in the range [a, b] is given by an integral: Gaussian pdf with m=0 and s=1 95% of area within 2s Only 5% of area outside 2s

  13. Gaussian/Normal Probability Distribution Standard Normal Distribution

  14. Standard Normal Distribution • Normal distribution with mean of zero and standard deviation of one • Since mean and standard deviation define any normal distribution… • Standard normal distribution can be used for any normally distributed variable by converting mean to zero and standard deviation to one—z scores

  15. Z Scores • By itself, a raw score or X value provides very little information about how that particular score compares with other values in the distribution. • A score of X = 53, for example, may be a relatively low score, or an average score, or an extremely high score depending on the mean and standard deviation for the distribution from which the score was obtained. • If the raw score is transformed into a z-score, however, the value of the z-score tells exactly where the score is located relative to all the other scores in the distribution.

  16. Z Scores • The process of changing an X value into a z-score involves creating a signed number, called a z-score, such that • The sign of the z-score (+ or –) identifies whether the X value is located above the mean (positive) or below the mean (negative). • The numerical value of the z-score corresponds to the number of standard deviations between X and the mean of the distribution. • Thus, a score that is located two standard deviations above the mean will have a z-score of +2.00

  17. Z Scores • In addition to knowing the basic definition of a z-score and the formula for a z-score, it is useful to be able to visualize z-scores as locations in a distribution. • Remember, z = 0 is in the center (at the mean), and the extreme tails correspond to z-scores of approximately –2.00 on the left and +2.00 on the right. • Although more extreme z-score values are possible, most of the distribution is contained between z = –2.00 and z = +2.00.

  18. Z Scores • z-score for a sample value in a data set is obtained by subtracting the mean of the data set from the value and dividing the result by the standard deviation of the data set. • NOTE: When computing the value of the z-score, the data values can be population values or sample values. Hence we can compute either a population z-score or a sample z-score.

  19. Z Scores • The Sample z-score for a value x is given by the following formula: • Where is the sample mean and s is the sample standard deviation.

  20. Z Scores • The Population z-score for a value x is given by the following formula: • Where  is the population mean and  is the population standard deviation.

  21. Example • Example: What is the z-score for the value of 14 in the following sample values? 3 8 6 14 4 12 7 10 • Thus, the data value of 14 is 1.57 standard deviations above the mean of 8, since the z-score is positive.

  22. Example • Dot Plot of the data points with the location of the mean and the data value of 14.

  23. Z Score & Probability • Whatis the probability of finding a value between 100 and 110? How to calculatethis area using z scores?

  24. Reading area undercurve for z=1.55 Z Score Chart 0.9394

  25. Z Score & Probability P=1-0.9394 P=0.0606 0.9394 P=.0606 Probability of z>1.55 (Area in tail) 1.55

  26. Z Score & Probability P=.0606+.0606 P=.1212 -1.55 1.55 Probability of z>1.55 + z<-1.55 (Area in both the tails)

  27. Z Score & Probability P=.5-.0606=.4394 1.55 Probability of z>0 and z<-1.55 )

  28. Example: 50 measures of pollution

  29. Example: 50 measures of pollution • Probability value > 45 .4372 P=.3300

  30. Example: 50 measures of pollution • Probabilityfrom 35 to 45 -.5749 .4372 P=.2157+.1700=.3857 P=.5-.3300=.1700 P=.5-.2843=.2157

  31. Sampling

  32. Sampling • Pros • Saves time • Resources – financial, human • Cons • Not exact value for the population • An estimate or prediction • Compromise on accuracy of findings

  33. Sampling – Terminology • Examples • Averagestudentage in the university • Averageincome of families living in a city • Results of an election • Population or study population (N) • The universitystudents, families living in the city, electors • Sample • The small group of students, families or electorsyou chose to collect the required information

  34. Sampling – Terminology • Sample size (n) • The number of entities in yoursample • Sampling design or strategy • The wayyou select the students, families or electors • Sampling unit or samplingelement • Eachstudent, family or elector in yourstudy • Samplestatistics • Yourfindingsbased on infomrationobtainedfromyoursample

  35. Sampling – Terminology • Population Parameters • Aim of research – findanswers to research question for study population not the sample • Use samplestatistics to estimateanswers to research questions in study population • Estimatesarrivedatfromsamplestatistics – population parameters • Saturation Point • When no new information iscomingfromyourrespondents

  36. Sampling – Terminology • Sampling Frame • A listidentifyingeachstudent, family or elector in the study population

  37. Principles of sampling • Example – Four individuals A,B,C, D • A = 18 years • B = 20 years • C = 23 years • D = 25 years • Averageage • (18+20+23+25) / 4 = 21.5 years • Use a sample of twoindivudals to estimate the averageage of yourstudy population (4 individuals)

  38. Principles of sampling • How many possible combinations of twoindividuals? • A and B • A and C • A and D • B and C • B and D • C and D

  39. Principles of sampling • A+B = 18+20 = 38/2 = 19.0 years • A+C = 18+23 = 41/2 = 20.5 years • A+D = 18+25 = 43/2 = 21.5 years • B+C = 20+23 = 43/2 = 21.5 years • B+D = 20+25 = 45/2 = 22.5 years • C+D = 23+25 = 48/2 = 24.0 years • In two cases – no differencebetweensamplestatistics and population parameters • Difference – Samplingerror

  40. Principles of sampling

  41. Principles of sampling • Principle I In majority of cases of sampling, therewillbe a differencebetweensamplestatistics and the true population parameterswhichisattribuatable to the selection of the units in the sample

  42. Principles of sampling • Instead of samples of two – take a sample of three • Four possible combinations • A+B+C = 18+20+23 = 61/3 = 20.33 years • A+B+D = 18+20+25 = 63/3 = 21.00 years • A+C+D = 18+23+25 = 66/3 = 22.00 years • B+C+D = 20+23+25 = 68/3 = 22.67 years

  43. Principles of sampling

  44. Principles of sampling -2.5 to +2.5 -1.17 to +1.17

  45. Principles of sampling • The gap betweensamplestatistics and population parametersisreduced • Principle II The greater the sample size, the more accuratewillbe the estimate of the true population statistics

  46. Principles of sampling • SameExample – Different Data • A =18 years • B = 26 years • C = 32 years • D = 40 years • Variable (age) – markedlydifferent

  47. Principles of sampling • Estimateaverageusing • Samples of two • Samples of three • Difference in the averageage: • Sample size of 2: -7.00 to +7.00 years • Sample size of 3: -3.67 to +3.67 years • Range of differenceisgreaterthanpreviouslycalculated

  48. Principles of sampling • Principle III The greater the difference in the variable understudy in a population for a givensample size, the greaterwillbe the differencebetween the samplestatistics and the true population parameters

  49. Factorsaffecting the inference • Principlessuggestthattwofactorsmay influence the degree of certainity about the inferencesdrawnfrom a sample • Size of sample • Larger the sample size, the more accuratewillbe the findings • The extent of variation in the sampling population • Greater the variation in the study population w.r.t. the chracteristicsunderstudy, the greaterwillbe the uncertainity for a givensample size

  50. Aims in selecting a sample • Achieve maximum precision in yourestimate • Avoidbias in selection • Biascanoccur if: • Non-randomsampling – consciously or unconsciouslyaffected by humanchoice • Sampling frame does not cover the sampling population accurately or completely • A section of sampling population is impossible to find or refuses to cooperate

More Related