1 / 50

Data analysis: 1. Describing data

Data analysis: 1. Describing data. Ana Jerončić , PhD Department for Research in Biomedicine and Health. Contact. E-mail: ana.jeroncic@mefst.hr Location: main building, 5th floor, room 512 Phone: 557-862. Contents of the 2nd week. Describing data - Central tendency and variability

Download Presentation

Data analysis: 1. Describing data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data analysis:1. Describing data Ana Jerončić, PhD Department for Research in Biomedicine and Health

  2. Contact E-mail: ana.jeroncic@mefst.hr Location: main building, 5th floor, room 512 Phone: 557-862

  3. Contents of the 2nd week • Describing data - Central tendency and variability • Estimation - Accuracy, precision, standard error, confidence intervals • Hypothesis testing - Test statistics, P-value, choice of a statistical test • Interpretation of data - Causality and association, odds ratio, risk, correlation, linear regression • Sources of error - Type 1 and type 2 errors, power, bias, confounding

  4. Learning outcome Critical appraisal of scientific papers NOT! Implementation of data analysis

  5. Why? • To identifythe best available treatment • To prevent “medical zombies” • To perform your own research

  6. Data analysis:Describing data • How the data should be organized prior to data analysis • Data types • Graphical & tabular techniques for description, summary statistics • Qualitative Data • Quantitative Data

  7. Height measurements among 1st year medical students

  8. What is the unit of measurement? • How many observations per subject ?

  9. Variables and their values, Units of observation VARIABLES OBSERVATIONS Measurement/Observation

  10. Types of data (variables)

  11. Types of data

  12. Type of data? • Height • Grades • Age in years • Weight • Insuline concentration • Blood glucose

  13. Type of data? How many cigarettes do you smoke a day? • 1-5 • 6-10 • 11-15 • 16-20 • 21 and more

  14. Type of data? Have you ever had a heart attack? • Yes • No Do you suffer from hypertension? • Yes • No • ?

  15. Type of data? Gender: • Male • Female

  16. Type of data? Marital status: • married • divorced • widowed • single • lives alone • ?

  17. Type of data? Education: • elementary school • high school • two-year college • four-year college • ?

  18. Type of data? • Likert scale • Claim: Violence among the youth is becoming an increasing problem in Croatia. I agree completely I agree Undecided I disagree I argue strongly against 1 2 3 4 5

  19. Type of data? • Visually analogous scale • E.g. pain level that examinee experiences I feel intolerable pain I don’t feel pain

  20. Variables – Transformation of variables

  21. Watch out for… literature inconsistency in data type classification

  22. Observe and Describe

  23. Observe and Describe Organized data are input for Graphical & Tabular data representations

  24. Qualitative data

  25. Tabular Techniques for Qualitative Variable(s) –YPEL5 example – Contingency Table In one study researchers investigated genotype of the YPEL5 gene in a population sample from Split. They got the following results on 10 examinees : Table Frequency Distributionof YPEL5 genotypes proportion percentage

  26. Graphical Techniques for Qualitative Variable(s) –YPEL5 example – Bar Chart Counts Or Percentages categories’ names Bar Charts are often used to display frequencies…

  27. Is there an association between the medicine taken and the length of cold? (19%) (84%) (81%) (16%) (100%) (100%)

  28. Graphical & Tabular Techniques • The only allowable calculation => count the frequency of category. • We can summarize the data in a contingency table that presents the categories and their counts called a frequency distribution. • A relative frequency distribution lists the categories and the proportion with which each occurs.

  29. Graphical Techniques for Qualitative Variable(s) –Bar Chart – pareto chart Nominal data has no order. However, sometimes it is usefull to arrange the outcomes from the most frequently occurring to the least frequently occurring. We call this bar chart representation a “paretochart” counts categories’ names

  30. Graphical Techniques for QualitativeVariable(s) –BarChart – paretochart Chart with relative frequency is more informative percentages categories’ names

  31. Graphical Techniques for Qualitative Variable(s) –YPEL5 example - Pie Chart Pie Charts show relative frequencies…

  32. Watch out for . . . • Authors can use percentages to hide the true size of the data. • To say that 50% of a sample has a certain condition when there are only four people in the sample is clearly not providing the same level of information as 50% of a sample based on 400 people. • So, percentages should be used as an additional help for the reader rather than replacing the actual data

  33. Chart that changed the medicine

  34. QuaNTItative data

  35. Graphical Technique for Quantitative Data Height measurementsamong 1st year medical students Frequency distribution for quantitative data: Building a Histogram

  36. Building a Histogram… Frequencydistributionofheight

  37. Graphical Techniques for Quantitative Data • There are several graphical methods that are used when the data are quantitative( numeric). • The most important of these graphical methods is the histogram. • The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities.

  38. Histogram simulations • http://www.shodor.org/interactivate/activities/Histogram/

  39. Summary: Graphs/Tables for Describing Data • Qualitative • Frequency Distribution – tabular summary of data • Bar Chart • Pie Chart • Quantitative • Frequency Distribution – tabular summary of data • Histogram • Line Chart (Time-Series Plot) • Stem and Leaf Display

  40. Relationship between two variables

  41. Relationship between two variables To compare two variables weuse: • Scatter plot/diagram (quantitative) • Cross table (qualitative)

  42. Scatter plot – for two quantitative variables • Scatter plot, showing the strong association between enzyme activity at pH 5.5 and the 5α-reductase 2-specific mRNA expression, as expressed on the basis of β-actin (n = 30; rs = 0.81; 95% confidence interval, 0.64–0.91; P < 0.0001).

  43. Patterns of Scatter Diagrams… Linearity and Direction are two concepts we are interested in Positive Linear Relationship Negative Linear Relationship Weak or Non-Linear Relationship

  44. Scatter plot • Analysis of expressionlevelfrommicroarraydata Squamous cell carcinoma tumor and perilesional display distinctly different scatter plots from normal tissue. Expresionlevelsfor gene subset 1 in patient 1

  45. Cross Table - for two qualitative variables • Used to compare two qualitative variables • If first variable has r categories, second variable c categories, then we have an r×c cross table.

  46. Association of two qualitative variables Based on datapresented do youthinkthat YPEL5 couldbeassociatedwithdisease X?

  47. Questions? Room 512 (5th floor) E-mail: ajeronci@mefst.hr

  48. The projected clinical cost for breast cancer detection program in 2011-12, broken down by service category.

  49. Histogram – common mistake! The results of measuring the height among med. students Height [cm] subjects Height [cm] subjects

More Related