Data Analysis

Data Analysis

Introduction • In this presentation we will examine: • Procedures for analysing data; • The value of plotting data; • Primary methods for analysing data: • regression; • correlation; • variance; • factor analysis.

Data analysis • All to often ‘enthusiastic’ researchers jump into the most complex statistical analyses only to emerge some while latter rather bemused. • Such an approach is not rigourous or wise. • Data analysis should be undertaken in stages.

Examining raw data • No matter what data is collected the first stage in all analyses is to examine the raw data to search for any patterns. • Patterns may: • be expected from theory/literature; • emerge from observation. • Pattern searching must be undertaken with an ‘open-mind’.

Examining raw data • Pattern searching can be aided by the use of pictures. • The most common type of pictures used by researchers are data plots. • Plotting the data can help indicate the nature of the distribution of the data and any relationships between data.

Pie charts • Pie charts are a useful way of showing the overall distribution to a given variable from the various subgroups being investigated.

Bar charts • Bar charts are a simple way of viewing the range of values recorded for a variable. • Bar charts can also indicate the shape of the distribution.

Bar charts • Bar charts can be used to contrast the values for a given variable obtained from two or more subgroups. • Universities appear to be different to the other groups.

Bar charts • Bar charts can be used to investigate the overall distribution of a variable and its subgroup breakdown.

Scatter plots • Scatter plots can be used to examine possible relationships between variables. • In this example one might indeed suspect a relationship exists.

Plotting data • In summary, plotting data can be useful in identifying possible trends in data and suggesting possible relationships between data. • If relationships are suggested then it is usual to support the graphical representation by the use of statistics.

Statistics • However, statistics is a highly complex subject and the use of statistical analyses should not be taken lightly. • In the following slides we will introduce some of the more common statistical techniques that can be used to analyse research data.

Correlation & Regression • Correlation and regression are used to test possible relationships between two (or more) variables. • Correlation is used to establish an association between variables. • Regression is used to express the association in mathematical terms.

Correlation & Regression • Neither correlation nor regression can establish causality. • Only theory, evidence and logical reasoning, used in conjunction with statistics, can establish causality.

Correlation • Consider the general scatter plot shown opposite. • Conventionally the independent variable is plotted on the x-axis and the dependent variable on the y-axis.

Correlation • From inspection it would appear that a relationship may exist between the two variables. • The correlation coefficient measures the degree and nature of the relationship.

Correlation • The value of the correlation coefficient ranges from +1 to -1. • A correlation coefficient of +1 implies perfect positive relationship: • An increase in the variable x is matched by an equiproportional increase in the variable y.

Correlation • A correlation coefficient of zero implies that there is no relationship between the two variables. • A correlation coefficient of -1 implies a perfect negative relationship: • An increase in x is matched by a equiproportional decrease in the variable y.

Correlation • The square of the correlation coefficient is known as the coefficient of determination and it can be used to establish how much of the change in the dependent variable can be accounted for by the change in the independent variable.

Correlation • For example a correlation coefficient of 0.9 gives a coefficient of determination of 0.81. • This in turn implies that 81% of the observed change in the dependent variable can be explained by the changes in the independent variable.

Correlation • Finally, the level of confidence placed on a correlation coefficient depends upon the number of observations used in its calculation and can be obtained by comparing the calculated value to those contained in standard statistical tables.

Regression • Regression analyses attempts to develop a mathematical equation which describes the relationship between two (or more) variables. • Consider the scatter plot presented previously.

Regression • The relationship between the two variables could be modeled by a straight line: y = ax + b.

Regression • The relationship between the two variables could be modeled by a straight line: • y = ax + b.

Regression • The equation of the ‘best’ straight is established by minimising the miss-match between the predicted values and the data. • This is achieved by the method of Least squares.

Regression • Unfortunately not all relationships can be modeled using a straight line relationship. • If you suspect a relationship should exist between two variables BUT a straight line doesn’t ‘look right’ then you can examine the possibility that the relationship may be non-linear.

Regression • When examining a possible non-linear relationship you have to assume the basic form of the relationship: • logarithmic ( y = abx); • exponential (y = aex); • polynomial (y = a + bx + cx2 ....) • There should always be a logical reason for the form you choose.

Correlation & Regression • One final word of warning: • Not all data lends itself to advanced statistical analysis. • The choice of statistical technique depends upon the type of data being analysed. • Inappropriate use of statistics is worse than no use at all.

Appropriate statistics • For nominal data scales: • number of case; • mode; • contingency correlation; • For ordinal data scales: • median; • percentiles.

Appropriate statistics • For interval scales: • mean; • standard deviation; • rank-order correlation; • product-moment correlation. • For ration scales: • coefficient of variation.

Nominal scales • The number of cases is a simple count of the number of times a variable is a given value. • The mode is the most frequent value recorded for the variable. • Contingency correlation uses the Chi square statistic to correlate between variables.

Ordinal scales • The median is a measure of central tendency of a variable. • Percentiles summarise the percentage of the variable that lies between certain (preset) limits. • Rank-Order (Spearman) correlation can be used to measure relationships between variables

Interval scales • The mean is the average value of a given variable. • The standard deviation measures the dispersion of the variable around the mean. • Product-moment (Pearson) correlation can be used to measures relationships between variables.

Ratio scales • All statistical techniques can be used with ration scales of measurement.

Appropriate statistics • In more complex multi-variable analyses another way of assessing the appropriateness of a statistical technique is to examine the nature of the measurement.

Appropriate statistics Interval Continuos Ordinal Nominal Discrete

Multi-variable methods • If more complex statistical techniques are needed then the following restrictions should be considered.

Multi-variable methods

Multi-variable methods • Factor analysis is a technique which can be used to identify underlying trends which can be described by combining variables into distinguishable factors.

Summary • In this presentation we have examined ways of analysing data. • Plotting the data. • The use of statistics: • Scales of measurement. • Next week....... Results, Inferences & Conclusions

Data Analysis

Data Analysis

Presentation Transcript

Data Analysis

Data analysis

Data analysis

Data Analysis

Data analysis

Data Analysis

DATA ANALYSIS

DATA ANALYSIS

DATA ANALYSIS

DATA ANALYSIS

Data Analysis

Data Analysis

Data Analysis

Data Analysis

Data Analysis

Data Analysis

Data Analysis

Data Analysis

Data Analysis

Data Analysis

Data Analysis

DATA ANALYSIS