Create Presentation
Download Presentation

Download Presentation

Data Collection & Sampling Techniques

Download Presentation
## Data Collection & Sampling Techniques

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**MEANING OF STATISTICS**• Statistics is used to mean either statistical data or statistical methods • Statistics is a method of collecting, organising and analysing the numerical data for understanding a phenomenon or making wise decisions**FUNCTIONS OF STATISTICS**• 1. To present facts in proper form • 2. To simplify unwieldy and complex data and to make them easily understandable. • 3. To help the classification of data according to various characteristics. • 4. To provide techniques for making comparisons • 5. To study relationships between different phenomena. • 6. To indicate the trend behaviou.**LIMITATIONS OF STATISTICS**• 1. Statistics does not study individuals • 2. Statistics does not study qualitative phenomena • 3. Statistical results are true only on an average. • 4. Statistical laws are not exact. (like laws of physical and natural sciences, statistical laws are only approximations and not exact. • 5. Statistics does not reveal the entire story • 6. Statistics is liable to be misused.**Uses of Statistics**Describe data Compare two or more data sets Determine if a relationship exists between variables Make estimates about population characteristics Predict past or future behavior of data**Misuse of statistics**“There are three types of lies---lies, damn lies, and statistics” Benjamin Disraeli “Figures don’t lie, but liars figure” “Statistics can be used to prove anything ---especially statisticians” Franklin P. Jones**Sources of Misuse**• There are two main sources of misuse of statistics: • An agenda on the part of a dishonest researcher • Unintentional errors on part of a researcher**Misuses of Statistics**• Survey Questions • Loaded Questions---unintentional wording to elicit a desired response • Order of Questions • Nonresponse (Refusal)—subject refuses to answer questions • Self-Interest ---Sponsor of the survey could enjoy monetary gains from the results**Misuses of Statistics**Exercise 1 • Missing Data (Partial Pictures) • Detached Statistics ---no comparison is made • Percentages -- • Implied Connections • Correlation and Causality –when we find a statistical association between two variables, we cannot conclude that one of the variables is the cause of (or directly affects) the other variable**Data Collection**In research, statisticians use data in many different ways. Data can be used to describe situations. Data can be collected in a variety of ways, BUTif the sample data is not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them.**Course objectives**Trainees will analyze graphs. a. Analyze data presented in a graph. b. Compare and contrast multiple graphic representations (circle graphs, line graphs, line plot graphs, pictographs, Venn diagrams, and bar graphs) for a single set of data and discuss the advantages/disadvantages of each. c. Determine and justify the mean, range, mode, and median of a set of data.**Terms**• Mean: The sum of the numbers in a set of data divided by the number of pieces of data. ( D+ X analysis, scan compliance, delivery percentage etc in MNOP KPI, work load calculation for post office ) • Median: The number in the middle of a set of data when the data are arranged in order from least to greatest. When there are 2 middle numbers, the median is the number that is halfway between the two middle numbers.**Terms**• Mode: The number that occurs most frequently in a set of numbers. • Range: The difference between the largest and smallest values in a numerical data set.**Finding the Mean**• Step 1: Add all numbers in your set of data. • Step 2: Divide the sum by the number of pieces of data. Example: Set of Data: 15, 15, 14, 16 Sum: 60 Total number of pieces of data: 4 Mean: 60 ÷ 4 = 15**Finding the Median**• Step 1: Put all numbers in order from least to greatest. • Step 2: Find the middle number. Example: Set of Data: 15, 15, 14, 16 Ordered: 14, 15, 15, 16 Middle Number: 15 and 15 Median: 15**Ex: test check figures for two days for unregistered article**. The day should be a normal working day – ex Wednesday or Thursday . • Here what we are assuming is that these days will have normal transactions. Hence these are the median for normal transactions. • Other days work may vary between minimum and maximum**Finding the Mode**• Step 1: Put all numbers in order from least to greatest. • Step 2: Find the most popular number. Example: Set of Data: 15, 15, 14, 16 Ordered: 14, 15, 15, 16 Mode: 15**Ex – checking post man in the beat by the PRIP . The PRIP**should select a point where the probability of the post man visiting the point is high . The Prip should be selecting the mode i. e. the point visited more frequently**Finding the Range**• Step 1: Put all numbers in order from least to greatest. • Step 2: Subtract the lowest number from the highest number. Example: Set of Data: 15, 15, 14, 16 Ordered: 14, 15, 15, 16 Range: 16 – 14 = 2**Activity**• The no of articles booked in an MPCM in a post office is as follows • Monday – 175 • Tuesday - 202 • Wednesday - 180 • Thursday – 130 • Friday – 198 • Saturday – 175 • Find the mean , median , mode and range for the above set of data**Types of Graphs**Bar Graph Circle Graph Line Graph Line Plot Graph Venn Diagram Pictograph**Bar Graph**Definition: a graph that shows data using horizontal or vertical bars. Advantages: • Easy to read • Compares multiple sets of data Disadvantages • Not best for showing trends**Circle ( Pie )Graph**Definition: A graph that shows data in the form of a circle. Advantages: • Shows percentages • Shows how a total is divided into parts Disadvantages • Not best for showing trends**Line Graph**Definition: A graph that shows data in the form of a line. Advantages: • Shows change over time • Helps you see trends Disadvantages • Not easy to use to compare different categories of data**Pictograph**Definition: A graph that displays data using symbols or pictures. Advantages: • Compares multiple sets of data • Visually appealing Disadvantages • Hard to read when there are parts of pictures.**Venn Diagram**Definition: Circles that show relationships among sets. Advantages: • Shows comparisons and contrasts easily. Disadvantages • Does not show trends**Exercise 2**• 100 trainees attended PA induction program in your PTC • 80 trainees attended IP induction program in your PTC • 20 have attended both IP and PA induction program in the same PTC – make a Venn diagram**Sample and population (ASW, 15)**• A population is the collection of all the elements of interest.(census enumeration) • A sample is a part of the population. • Good or bad samples. • Representative or non-representative samples. A researcher hopes to obtain a sample that represents the population, at least in the variables of interest for the issue being examined. • Probabilistic samples are samples selected using the principles of probability. This may allow a researcher to determine the sampling distribution of a sample statistic.**MEANING OF SAMPLING**Sampling is a method in which only those items that are included in the sample are observed for purpose of drawing conclusions about the population from which sample is drawn. The so obtained sample will be called as statistic (i.e. The measures of central tendency and measures of dispersion are called statistic and are used as a basis for estimation population parameters).**NEED FOR SAMPLING**• 1. Savings in time and money • 2. When the population is infinately large • The fact that the characteristics of the sample are able to provide an approximately correct idea about the population parameters is borne out by the theory of probability.**Methods of sampling – probabilistic**• Random sampling methods – each member has an equal probability of being selected. • Systematic – every kth case. Equivalent to random if patterns in list are unrelated to issues of interest. Eg. Inspection of BO by divisional head. • Stratified samples – sample from each stratum or subgroup of a population. Eg. SB withdrawal verification( more than 10000) . • Cluster samples – sample only certain clusters of members of a population. Eg. city blocks, firms, test cards only on the addressees in the periphery of the jurisdiction, SB withdrawal checked only for C class offices , inspection of bad Bos . • Multistage samples – combinations of random, systematic, stratified, and cluster sampling. Ex – checking of transaction particulars of selected days during the inspection of BO • If probability involved at each stage, then distribution of sample statistics can be obtained.**Basic Methods of Sampling**• Random Sampling • Selected by using chance or random numbers • Each individual subject (human or otherwise) has an equal chance of being selected • Examples: • MO verification by PRIP • Drawing names from a hat • Random Numbers**Basic Methods of Sampling**• Systematic Sampling • Select a random starting point and then select every kth subject in the population • Simple to use so it is used often**Basic Methods of Sampling**• Convenience Sampling • Use subjects that are easily accessible • Examples: • Using family members or students in a classroom • Mall shoppers**Basic Methods of Sampling**• Stratified Sampling • Divide the population into at least two different groups with common characteristic(s), then draw SOME subjects from each group (group is called strata or stratum) • Basically, randomly sample each subgroup or strata • Results in a more representative sample**Basic Methods of Sampling**• Cluster Sampling • Divide the population into groups (called clusters), randomly select some of the groups, and then collect data from ALL members of the selected groups • Used extensively by government and private research organizations • Examples: • Exit Polls**Objects of sampling**• To Obtain information about the population on the basis of sample drawn from such population. • To setup the limits of accuracy of the estimates of the population parameters computed on the basis of sample statistic.**Some terms used in sampling**• Sampled population – population from which sample drawn (ASW, 258). Researcher should clearly define. • Frame – list of elements that sample selected from (ASW, 258). Eg. telephone book, city business directory. May be able to construct a frame. • Parameter – Numerical characteristics of a population (ASW, 259). Eg. total (annual GDP or exports), proportion p of population that votes Liberal in federal election. Also, µ or σ of a probability distribution are termed parameters. • Statistic – numerical characteristics of a sample. Eg. pre-election polls. • Sampling distribution of a statistic is the probability distribution of the statistic.**Sampling distribution of a sample**• Sampling distribution of a statistic refers to the distribution of the various values, which can be assumed by that statistic, computed from the various samples of the same size randomly drawn from the population. Any statistical measure of statistic like mean, standard deviation etc. may be computed for each of the samples so drawn and a series of those value of statistic may be compiled. The various values of the statistic so obtained may be arranged as a frequency distribution which is known as the sampling distribution.**Selecting a sample (ASW, 259-261)**• N is the symbol given for the size of the population or the number of elements in the population. • n is the symbol given for the size of the sample or the number of elements in the sample. • Simple random sample is a sample of size n selected in a manner that each possible sample of size n has the same probability of being selected. • In the case of a random sample of size n = 1, each element has the same chance of being selected.**Selecting a simple random sample**• Sample with replacement – after any element randomly selected, replace it and randomly select another element. But this could lead to the same element being selected more than once. • More common is sample without replacement. Make sure that on each stage, each element remaining in the population has the same probability of being selected.**Simple random sample of size 2 from a population of 4**elements Population elements are A, B, C, D. N=4, n=2. 1st element selected could be any one of the 4 elements and this leaves 3, so there are 4 x 3 = 12 possible samples, each equally likely: AB, AC, AD, BA, BC, BD, CA, CB, CD, DA, DB, DC. If the order of selection does not matter (ie. we are interested only in what elements are selected), then this reduces to 6 combination. If {AB} is AB or BA, etc., then the equally likely random samples are {AB}, {AC}, {AD}, {BC}, {BD}, {CD}. This is the number of combinations (ASW, 261, note 1).**Standard error of a statistic**• The average amount of variability of the observations of a population is computed, it is known as standard deviation and the average amount of variability of observations of a sampling distribution computed is known as standard error.**Sampling from a process (ASW, 261)**• Careful design for sample is especially important. • Sample production of milk at random times. • Sample of data of various products in the department Like speed post, logistic post, business post etc ., • we need to calculate the mean and standard deviation for the observations from the samples. • How to calculate the mean and standard deviation of the population. • (the standard deviation is the square root of the average of the squared distances of the observations from the mean.)