Fundamentals of Data Analysis

Fundamentals of Data Analysis

Four Types of Data • Alphabetical / Categorical / Nominal data: • Information falls only in certain categories, not in-between categories • No inferences possible between groups except that one group may contain more / less observations than the other • Only reporting frequencies, percentages and mode makes sense (descriptive statistics) • Chi Square measure of Association (inferential Statistics) • Examples: gender, age groups, income groups, etc.

Four Types of data • Rank order data: • Ranked according to some logic, e.g. preference, etc. • Again an in-between rank does not make sense. • Difference between say rank 1 and 2 need not necessarily be of the same magnitude as the difference between rank 3 and 4. • Only reporting frequencies, percentages and mode makes sense (descriptive statistics); Spearman Rho coefficient of correlation (Inferential statistics) • Examples: brand preferences, class rank on test, etc.

Four Types of data • Interval Level • Numerical data in which the numbers denote the amount of presence / absence of a trait. • zero point does not necessarily mean complete absence of the trait • In-between numbers make sense • Magnitude of difference between numbers on the scale is constant. • All descriptive and inferential statistics possible • Examples: attitude, satisfaction, temperature, etc.

Four Types of data • Ratio level data • Interval level data with a meaningful zero point meaning complete absence of the trait • Magnitude of the difference between numbers of the scale is constant AND the zero point denotes complete absence of the trait being measured. • All descriptive and inferential statistics possible • Examples: sales, profits, weight, height, etc.

Type of data?

Preparing the Data for Analysis • Data editing – the process of identifying omissions, ambiguities and errors in the responses • Coding – process of assigning numerical values to responses according to a pre-defined system • Statistically adjusting the data – the process of modifying the data to enhance its quality for analysis • Weighting, transformations, variable re-specification

Preparing the Data for Analysis Problems Identified With Data Editing • Omissions – some unanswered questions • Ambiguity – illegible response, choosing two boxes when only one has to be chosen • Inconsistencies – logically inconsistent response • Lack of Cooperation – checking the same response regardless of the question • Ineligible Respondent – ignoring a filter question

Preparing the Data for Analysis • Solutions to such problems • Contact the respondent again and make corrections • Throw out the whole questionnaire as unusable • Disregard questions with missing values in the analysis • Code illegible or missing responses as ‘don’t know’ • Compute missing values on the basis of means

Preparing the Data for Analysis Coding • closed-ended questions • Relatively simple and straightforward • open-ended questions • Define all possible responses and categorize each response and then assign a numerical code • If judgment calls are needed then have several coders do the same task and check inter-coder reliability

Statistical adjustment of data • Weighting – • process of enhancing / reducing the importance of certain data by assigning a number • Usually done to increase the representativeness of the sample or achieve study objectives • E.g. a sports drink survey would weigh younger respondents higher than older respondents • Scale transformations • Manipulation of scales to make them comparable with other scales e.g. converting lbs to kgs. etc. • Z-scores (standardized scales)

Preparing the Data for Analysis • Variable Re-specification • Existing data modified to create new variables • Large number of variables collapsed into fewer variables • Creates variables that are consistent with research questions • Determine if the variable is categorical, rank-order, interval level or ratio level.

Categorical Data Analysis - Objectives • Describing the sample distribution for the variable (e.g. gender) • Frequencies, percentages, quartiles, percentiles, graphs (bar, line, histogram, pie) • What are the typical characteristics of the sample? • Mode • Does the categorical variable bear any relationship with a distribution of another categorical variable (e.g. gender w.r.t. buy the product or not) • Cross tabs and chi-square as a measure of association

Cross tabulations – example – buyers by age Distribution of customer types by age: If there were no differences between age groups, then each age group’s distribution would have matched the distribution for the total sample.

Crosstabs - conclusions • The 25-34 yrs. Group is least likely to be first time buyers than the sample average • The under 18 year group is more likely to be a brand loyal than the sample average

Rank order data analysis - Objectives • What are respondent preferences amongst several competing alternatives? (e.g. rank your preferences amongst ten different brands of cars) • Frequencies, Percentages, Graphs • What is the typical preference pattern in the sample (e.g. which car does the sample prefer the most and which one the least?) • Mode

Rank order data analysis - Objectives • Are two sets of respondent preferences correlated? (e.g. wrist watches brand preferences with car brand preferences) • Spearman’s rank correlation coefficient

Interval level / Ratio level data analysis - Objectives • What is the average response in the sample (e.g. what is the mean attitude to the brand?) • Mean / Median • What is the average variability of the response in the sample (e.g. On an average, how dispersed are the sample’s attitudes to the brand from the mean?) • Standard deviation

Interval level / Ratio level data analysis - Objectives • Do two or more subgroups in the sample differ from each other on the response / differ from a previously known / hypothesized value • E.g. do males like the brand significantly more than the females? (t tests, z tests) • E.g. Does attitude to WU vary by student status (freshman, sophomore, junior, senior) • ANOVA

Interval level / Ratio level data analysis - Objectives • Are sample responses on two variables correlated? (e.g. are sales related to the advertising expenditure?) • Pearson correlation • Can we determine the value of the sample’s response on a variable, if we know the value on another variable? (e.g. If we need to achieve 1 million dollars in sales next year, how much should we spend on advertising?) • Regression analysis

Fundamentals of Data Analysis

Fundamentals of Data Analysis

Presentation Transcript

Fundamentals of Needs Analysis

Fundamentals of Engineering Analysis

Fundamentals of RCM Analysis

Fundamentals of Engineering Analysis

Fundamentals of Engineering Analysis

Fundamentals of Algorithm Analysis

Fundamentals of Statistical Analysis

Fundamentals of Data Analysis Lecture 7 ANOVA

Fundamentals of Data Analysis Lecture 2 Theory of error

Fundamentals of Engineering Analysis

Fundamentals of Engineering Analysis

Fundamentals of Engineering Analysis

Fundamentals of Geographic Data

Fundamentals of Sequence Analysis

Fundamentals of Data Warehousing

Fundamentals of Data Analysis Lecture 3 Basics of statistics

Fundamentals of Engineering Analysis

FUNDAMENTALS OF CRIME ANALYSIS

The 5 Fundamentals Of Data Analysis

Fundamentals of Geographic Data

FUNDAMENTALS OF DATA STRUCTURES

Fundamentals of Data Analysis