1 / 89

Collect, Explore & Summarise

This article discusses the various data collection techniques used in research, including using available information, observation, interviews, questionnaires, and focus group discussions. It also covers the process of exploring and summarizing data, including data screening and cleaning.

cathiej
Download Presentation

Collect, Explore & Summarise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FK6163 Collect, Explore & Summarise Dr Azmi Mohd TamilDept of Community HealthUniversiti Kebangsaan Malaysia

  2. Data Collection • Data collection begins after deciding on design of study and the sampling strategy

  3. Data Collection • Sample subjects are identified and the required individual information is obtained in an item-wise and structured manner.

  4. Data Collection • Information is collected on certain characteristics, attributes and the qualities of interest from the samples • These data may be quantitative or qualitative in nature.

  5. Data Collection Techniques • Use available information • Observation • Interviews • Questionnaires • Focus group discussion

  6. Using Available Information • Existing Records • Hospital records - case notes • National registry of births & deaths • Census data • Data from other surveys

  7. Disadvantages of using existing records • Incomplete records • Cause of death may not be verified by a physician/MD • Missing vital information • Difficult to decipher • May not be representative of the target group - only severe cases go to hospital

  8. Disadvantages of using existing records • Delayed publication - obsolete data • Different method of data recording between institutions, states, countries, making comparison & pooling of data incompatible • Comparisons across time difficult due to difference in classification, diagnostic tools etc

  9. Advantages of using existing records • Cheap • convenient • in some situations, it is the only data source i.e. accidents & suicides

  10. Observation • Involves systematically selecting, watching & recording behaviour and characteristics of living beings, objects or phenomena • Done using defined scales • Participant observation e.g. PEF and asthma symptom diary • Non-participant observation e.g. cholesterol levels

  11. Interviews • Oral questioning of respondents either individually or as a group. • Can be done loosely or highly structured using a questionnaire

  12. Administering Written Questionnaires • Self-administered • via mail • by gathering them in one place and getting them to fill it up • hand-delivering and collecting them later • Large non-response can distort results

  13. Questionnaires • Influenced by education & attitude of respondent esp. for self-administered • Interviewers need to be trained • open ended vs close ended • the need for pre-testing or pilot study

  14. Focus group discussion • Selecting relevant parties to the research questions at hand and discussing with them in focus groups • examples in your own field of interest?

  15. Plan for data collection • Permission to proceed • Logistics - who will collect what, when and with what resources • Quality control

  16. Accuracy & Reliability • Accuracy - the degree which a measurement actually measures the measures the characteristic it is supposed to measure • Reliability is the consistency of replicate measures

  17. Reliability & Accuracy

  18. Accuracy & Reliability • Both are reduced by random error and systematic error from the same sources of variability; • the data collectors • the respondents • the instrument

  19. Strategies to enhance precision & accuracy • Standardise procedures and measurement methods • training & certifying the data collectors • Repetition • Blinding

  20. Introduction Method of Exploring and Summarising Data differs According to Types of Variables

  21. Dependent/Independent Independent Variables Frequency of Exercise Food Intake Obesity Dependent Variable

  22. Explore • It is the first step in the analytic process • to explore the characteristics of the data • to screen for errors and correct them • to look for distribution patterns - normal distribution or not • May require transformation before further analysis using parametric methods • Or may need analysis using non-parametric techniques

  23. Data Screening • By running frequencies, we may detect inappropriate responses • How many in the audience have 15 children and currently pregnant with the 16th?

  24. Data Screening • See whether the data make sense or not. • E.g. Parity 10 but age only 25.

  25. Data Screening • By looking at measures of central tendency and range, we can also detect abnormal values for quantitative data

  26. Interpreting the Box Plot The whiskers extend to 1.5 times the box width from both ends of the box and ends at an observed value. Three times the box width marks the boundary between "mild" and "extreme" outliers. "mild" = closed dots "extreme"= open dots Largest non-outlier Upper quartile Median Lower quartile Smallest non-outlier Outlier Outlier

  27. Data Screening • We can also make use of graphical tools such as the box plot to detect wrong data entry

  28. Data Cleaning • Identify the extreme/wrong values • Check with original data source – i.e. questionnaire • If incorrect, do the necessary correction. • Correction must be done before transformation, recoding and analysis.

  29. Parameters of Data Distribution • Mean – central value of data • Standard deviation – measure of how the data scatter around the mean • Symmetry (skewness) – the degree of the data pile up on one side of the mean • Kurtosis – how far data scatter from the mean

  30. Normal distribution • The Normal distribution is represented by a family of curves defined uniquely by two parameters, which are the mean and the standard deviation of the population. • The curves are always symmetrically bell shaped, but the extent to which the bell is compressed or flattened out depends on the standard deviation of the population. • However, the mere fact that a curve is bell shaped does not mean that it represents a Normal distribution, because other distributions may have a similar sort of shape.

  31. Normal distribution 99.7% • If the observations follow a Normal distribution, a range covered by one standard deviation above the mean and one standard deviation below it includes about 68.3% of the observations; • a range of two standard deviations above and two below (+ 2sd) about 95.4% of the observations; and • of three standard deviations above and three below (+ 3sd) about 99.7% of the observations 95.4% 68.3%

  32. Normality • Why bother with normality?? • Because it dictates the type of analysis that you can run on the data

  33. Normality-Why?Parametric

  34. Normality-Why?Non-parametric

  35. Explored graphically Histogram Stem & Leaf Box plot Normal probability plot Detrended normal plot Explored statistically Kolmogorov-Smirnov statistic, with Lilliefors significance level and the Shapiro-Wilks statistic Skew ness (0) Kurtosis (0) + leptokurtic 0 mesokurtik - platykurtic Normality-How?

  36. Kolmogorov- Smirnov • In the 1930’s, Andrei Nikolaevich Kolmogorov (1903-1987) and N.V. Smirnov (his student) came out with the approach for comparison of distributions that did not make use of parameters. • This is known as the Kolmogorov-Smirnov test.

  37. Skew ness • Skewed to the right indicates the presence of large extreme values • Skewed to the left indicates the presence of small extreme values

  38. Kurtosis • For symmetrical distribution only. • Describes the shape of the curve • Mesokurtic - average shaped • Leptokurtic - narrow & slim • Platikurtic - flat & wide

  39. Skew ness & Kurtosis • Skew ness ranges from -3 to 3. • Acceptable range for normality is skew ness lying between -1 to 1. • Normality should not be based on skew ness alone; the kurtosis measures the “peak ness” of the bell-curve (see Fig. 4). • Likewise, acceptable range for normality is kurtosis lying between -1 to 1.

  40. Normality - ExamplesGraphically

  41. Q&Q Plot • This plot compares the quintiles of a data distribution with the quintiles of a standardised theoretical distribution from a specified family of distributions (in this case, the normal distribution). • If the distributional shapes differ, then the points will plot along a curve instead of a line. • Take note that the interest here is the central portion of the line, severe deviations means non-normality. Deviations at the “ends” of the curve signifies the existence of outliers.

  42. Normality - ExamplesGraphically

  43. Normal distributionMean=median=mode

  44. Normality - ExamplesStatistically Normal distributionMean=median=mode Skewness & kurtosis within +1 p > 0.05, so normal distribution Shapiro-Wilks; only if sample size less than 100.

  45. K-S Test

More Related