Quantitative data analysis
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Quantitative data analysis PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

Quantitative data analysis. Module in research methods course for tourism program Reza Mortazavi 2014 Lecture 4. Relationship between variables. Correlation When two variables are linearly related (or covary ) we say they are correlated either positively or negatively.

Download Presentation

Quantitative data analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Quantitative data analysis

Quantitative data analysis

Module in research methods course for tourism program

Reza Mortazavi

2014

Lecture 4


Relationship between variables

Relationship between variables

  • Correlation

    • When two variables are linearly related (or covary) we say they are correlated either positively or negatively.

  • Correlation is not causation!

  • Open the file PLdata.dta

  • sum incd1000 age

  • scatter incd1000 age, yline(121.6) xline(22.9)


Scatter plot

Scatter plot


Correlation

Correlation

  • Interpret the scatterplot.

  • twoway scatter incd1000 age,by(female)

  • Correlation coefficient

    • Measures the strength of linear association between two variables. Is between [-1,1].

  • pwcorrincd1000 age,sig

    • H0:no correlation.

  • pwcorr incd1000 female,sig


Correlation1

Correlation

  • pwcorr incd1000 education,sig

  • pwcorr incd1000 totexpdayage,sig star(0.05)

    Interpret the output!

  • gen neg5incd=-5*incd1000

  • What do we expect in terms of correlation between them?

  • scatter neg5incd incd1000

  • pwcorr neg5incd incd1000,sig


Correlation2

Correlation

  • gen x=rnormal()

  • gen y=rnormal()

  • What do we expect? (two independent variables have been drawn randomly…)

  • pwcorr x y,sig


Correlation3

Correlation

  • Zero correlation does not mean independence

    • gen seq = int((_n-_N/2))

    • gen seqsq=seq^2

    • scatter seqsq seq

    • pwcorr seqsq seq


Caution

Caution

  • Correlation does not imply causation.

  • Statistical significance is not the same as practical significance.

    • Use common sense when interpreting and drawing conclusions.

  • Correlation is about linear association

    • Use scatterplot to discover possible nonlinear association.


Some details

Some details

  • Normally distributed data are assumed.

  • The correlation coefficient is sensitive to outliers (extreme values)

  • Sometimes transformations (e.g. logarithmic) of non-normally distributed data are normal

  • Non-normal data may be converted into ordinal (ranked) data and non-parametric test, Spearman’s rank correlation, may be used.


Regression analysis

Regression analysis

  • Note that the purpose is not to go into all details regarding regression analysis. Even though there are a couple of slides with some algebraic expressions the exposition is not intended to be technical.

  • The purpose is, however, to cover the basics so that you can run your own regression analysis using software and present, interpret and discuss results.


Purpose of regression analysis

Purpose of Regression Analysis

  • Estimatea relationshipamong some variables, such as y = f(x). Here y is the dependent and x is the independent variable.

    For example, food consumption or tourism demand depends on income.

    2. Forecast or predict the value of one variable, y, based on the value of another variable, x.


Terminology

Terminology

  • Y is called dependent variable, response variable, explained variable, output variable or regressand.

  • X’s are called independent variable, predictor variable, explanatory variable, input variable or regressor.

  • A model is an abstraction from reality. It is a simplified representation focusing on some features while ignoring details.


Weekly food expenditure

Weekly food expenditure

y = dollars spent each week on food items.

x = consumer’s weekly income.

The relationship between x and the expected value of y , given x, might belinear:E(y|x) = b1 + b2 x


Quantitative data analysis

f(y|x)

f(y|x=480)

f(y|x=800)

my|x=480

my|x=800

y

Probability Distribution of Food Expenditures given

income x=$480 and x=$800.


Quantitative data analysis

Average

Expenditure

E(y|x)

E(y|x)=b1+b2x

DE(y|x)

b2=

DE(y|x)

Dx

Dx

{

b1

x (income)

a linear relationship between average expenditure

on food and income.


Quantitative data analysis

The population parametersb1andb2are unknown population constants.

The formulas that produce thesample estimates b1 and b2 arecalled the estimators of b1andb2.

When b1 and b2 are used to representthe formulas rather than specific values,they are called estimators of b1andb2which are random variables becausethey are different from sample to sample.


Simple regression an example

Simple regression: an example

  • twoway (scatter totexpday incd1000) (lfittotexpday incd1000)

  • regress totexpday incd1000

  • What is the “intercept” here? What does it mean?

  • What is the “slope” here? What does it mean?

  • Interpret your estimated model!


Simple regression an example1

Simple regression: an example

  • In interpreting the results you have to be careful about what are the units of measurements

  • regress totexpdayinccont

  • What is the “intercept” here? What does it mean?

  • What is the “slope” here? What does it mean? Compare with the previous model.


Reading computer output

Reading computer output


Simple regression an example2

Simple regression: an example

  • Hypothesis tests:

    • Is income (statistically) significantly related to visitors expenditures?

      • The output table gives us several ways to answer this question.

  • How good is our model?

    • R-squared

      • R-squared = 0.0575 in our example. How can we interpret this number?


Simple regression an example3

Simple regression: an example

  • Can we make a prediction of the totexpday for say an average person earning 200000 SEK per year?

    Well: 411.123+ 1.03526*200= 618. 18

    This is a point (prediction) estimate. We can calculate say a 95% confidence (prediction) interval.

    95 % PI: (570.1205-666.2293)


Exercises on simple regression

Exercises on simple regression

  • regr incd1000 age

  • regr incd1000 education

    Iterpretthe results!


  • Login