Mastering Descriptive Statistics for Research

Welcome

The research process

The research process STAGE 1: DEFINING THE PROBLEM • Deciding on the research topic • Conducting a literature review • Specify a research question • Formulating a hypothesis • Operationalizing concepts

The research process STAGE 2: Obtaining the information • Ethics • Research design • Sampling • Data collection STAGE 3: Analysing and interpreting the information • Describing and interpreting quantitative data • Analyses and interpretation of qualitative data

The research process STAGE 4: Communicating the results • Report

Describing and interpreting quantitative data

Outcomes • The purpose of descriptive statistics • Frequency distribution table and graphs • Measures of central tendency – mean, median and mode • Measures of variability – range, variance, standard deviation • Interpretation • The relationship between variables

Descriptive statistics • The original data we collect is a large set of data. It can be difficult to form an overall impression of the answer to the research question. • We use descriptive statistics to organise, summarise and visualise quantitative data. • Mathematical techniques used by researcher to see underlying patterns of data

Procedures used to compile descriptive statistics Tables • Frequency distribution tables • Percentages Graphs • Graphic presentation of frequency distributions • Skewness and kurtosis

Tables • Frequency distribution tables • Indicates • The nr of cases in a data set • That obtained a certain score • Or that fall in a particular category of a variable

Tables and graphs • Frequency distribution tables • Frequency distribution: a table or graph indicating how observations are distributed • Is the grouping of raw data • The nr of cases is called the frequency of the score / category. Symbol f is used to refer to the frequency • The sum of the frequencies should be the same as the nr of cases in the sample

Example • Researcher does a study on aggression in adolescents • Convenience sampling consisting of 20 secondary school students • Gender = male and female, scores (0 – 40) on an aggression questionnaires • Gender is measured on a nominal level of measurement

Raw data / information TABLE 8.1 p218 List of gender and aggression score From this the researcher creates a frequency distribution table

Frequency distribution table

Grouped frequency distribution tables • Grouped Frequency distribution tables • Is a frequency distribution table with a limited number of categories. • Usually choose class intervals with equal size • Scores are grouped into class intervals that each include a series of scores eg:

Grouped frequency distribution table • Class interval: the midpoint of the interval can be used to represent all the values in a certain interval • Tally • Frequency – how many cases there are in that class interval • Cumulative frequency (cf) of a class interval: • is the nr of cases in the specified interval plus all the cases in the previous intervals • the number of scores (cases) that falls below the lower limit of the next interval p220

Grouped frequency distribution table for aggression scores

Tables • Percentage of a category, a score value or a class interval • Indicates which part of the whole sample of scores it represents (this part represent 10% of the whole sample of scores) • Divide the frequency by the total nr of cases (n) and then multiply it by 100 • f / n x 100 = Percentage • In the next ex the class interval 21-27 has the higher percentage (40%)

Distribution of percentage and cumulative percentages for aggression scores

Graphic representation of frequency distributions • Bar Charts – frequency distribution of categorical data • Histogram – frequency distribution of successive scores or class intervals • Frequency Polygon – frequency distribution of class intervals that are connected by straight lines

Barchart • Is a graph representing the frequency distribution of categorical data • Used if data is measured on nominal level – measurement is in the form of categories • X axis – categories • Y axis – frequency • Figure 8.2 Barchart for gender (n=20 students) p222

Histograms • Graph representing the frequency distribution of successive scores or class intervals • The scores or the midpoint of each class interval are marked on the X axis and above each of these a bar is drawn. • The height of the bar as measured on the Y axis corresponds with the frequency or the nr of cases for that particular score or in that particular class interval

Histograms • The bars represent successive scores or class intervals and there are no spaces between the bars • Used to illustrate the frequency distribution of numerical data - data measured on an interval or ratio level of measurement • If we add up the frequencies represented by all the bars, this will give us the total nr of cases in our sample • Figure 8.3 Histogram for aggression scores (n=20 students)

Frequency polygon • Graph in which the frequencies of class intervals are connected by straight lines • In a histogram we assume that all cases within a class interval are uniformly distributed over the range of the interval • In the polygon we assume that the cases are concentrated at the midpoint of the interval • A polygon can accommodate moreclass intervals that a histogram

Frequency polygon • Smoothed polygons (the midpoints are linked by curved lines) are frequently used to display the distribution of score for large data sets or populations • Figure 8.4 Frequency polygon for aggression scores (n=20 students)

Skewness and kurtosis Distributions of data differ in terms of : 1. Central location – the middle point of the distribution • Variation – the spread of the scores around the middle point • Skewness – symmetry or asymmetry of distribution • Kurtosis

Skewness 3. Skewness refers to the symmetry or asymmetry of the distribution around the midpoint • Figure 8.5 Frequency distributions differing in skewness • Symmetrical distribution have the same shape on both sides of the midpoint • Positively skewed – larger frequencies are concentrated towards the low end, asymmetrical • Negatively skewed – larger frequencies are concentrated towards the high end • Normal – larger frequencies are concentrated towards the middle

Kurtosis 4. Kurtosis • Figure 8.6 Frequency distributions differing in kurtosis • Refers to the flatness or peakednessof the distribution • Normal distribution – symmetrical bell-shaped distribution – mesokurtic • Leptokurtic – more peaked distribution • Platykurtic – a flatter distribution

Measures of central tendency • Measure of central tendency – a score which represents all the scores in the sample • Three measures of central tendency • Mode • Median • Mean

Mode 1. Is the score value with the highest frequency 23 26 28 37 37 37 45 48 49 • If two or more successive scores have the highest frequency, the average of those scores is taken as the mode 23 26 37 37 38 38 45 • If two scores are not successive, then the sample has two modes – bimodal distribution 23 26 37 37 38 45 45 50 • Mode is the only measure of central tendency that can be used for nominal data • Grouped frequency distribution ~ Mode = The midpoint of the interval with the highest frequency

Median 1. To work out the median of a sample of scores, we first have to arrange the scores in ascending or descending order. 2. The median is the score that falls right in the middle of the list • 50 % of scores is above it, and 50% of scores fall below it. 1 2 3 4 5 • If an even nr of scores, then median is calculated by the average of the two scores 1 2 3 4 5 6 • Median is used with ordinal data

Mean 1. The sum of a sample of scores (Ʃ) divided by the number of scores (n) in the sample 2. Represents all the scores in the sample 3. x – refer to raw scores 4. n – refer to the number of scores in the distribution • Ʃ – add up • Mean = Ʃx / n

Measures of variability • Variability – the degree to which scores in a sample differ, that is, how spread out they are • Measures of variability • Range • Variance • Standard deviation

Range • A measure of variability where the range is taken as the difference between the highest and lowest scores 1 2 3 4 5 • Range is 4

Variance • The deviation of each score in a distribution from the mean (midpoint) of that distribution • Deviation score – subtracting the mean of the sample from each raw score in the sample • Deviation score = raw score (x) - mean • This score indicates the extent to which each raw score deviates from the mean

Variance • The sum of deviations would be zero, therefore we square the deviation from the mean before adding them up • The variance is calculated by dividing the sum of the squared deviation scores by the number of scores to obtain an average of the squared deviation scores • Variance = sum of squared deviation / n-1

Standard deviation • We calculate the square root of the variance • Is expressed in the same units as the original measures • Indicate the average extent to which scores in a distribution differ from one another

Relationships between variables • A relationship between 2 variables means that a person’s position on one variable is related to his or her position on the other variable • Direct / positive relationship – high scores on one variable are associated with high scores on the other and the same with low scores

Relationships between variables • Inverse or negative relationship – high scores on one variable correspond with low score on the other variable • If the variables are not related, changes on one variable do not correspond with changes on the other 5. The statistical relationship between variables is referred to as correlation, and the statistic to describe it is the correlation coefficient

Correlation Coefficient • Indicates the statistical relationship between two variables • Is an index of the extent of the linear relationship between two variables, is the statistic used to describe the correlation • It can range in value from -1,00 to +1,00, a perfect negative or a perfect positive correlation • A value close to 0 indicates a weak relationship, while 0 means there is no relationship • The numerical size of a correlation coefficient indicates the strength of the relationship,

Correlation Coefficient 5. The sign indicates the direction of the relationship 6. Positive correlation – an increase in one variable is associated with an increase in the other 7. Negative correlation – as the value of one variable increases, the value of the other one decreases 8. If there is a correlation, it does not necessarily mean that the one causes the other

Mastering Descriptive Statistics for Research

Mastering Descriptive Statistics for Research

Presentation Transcript

Welcome

Welcome

WELCOME

Welcome!!

WELCOME

WELCOME

Welcome

Welcome

Welcome

Welcome

WELCOME

Welcome!

WELCOME

WELCOME

Welcome

WELCOME

WELCOME

Welcome

Welcome

WELCOME

Welcome

Welcome Welcome