Data Analysis and a Brief Intro to Stats for Writers

Data Analysis and a Brief Intro to Stats for Writers For English 8125 By Dr. Bowie

A Brief Overview of Stats for Writers: Central Tendency • Principle 1: “The smaller the variance in data, the more relievable the inference” (Hughes and Hayhoe 62) • Mean: or average; the most commonly used measure of central tendency • Average the numbers (add them all up divide by n) • Mode: most frequently obtained measure. Example: 3, 4, 4, 4, 5, 5, 6, 6, 7. 4 is the mode. This is a rougher measure and can be used to misrepresent the data. • Median: middle score/measure corresponding to the 50th percentile. • If odd number of scores: it is the middle score once sorted into ascending /descending order • If even number it is the average of the two middle scores • Range: The differences between the highest score and lowest. R=Xmax-Xmin. • Standard Deviation: how close the set of data is to the mean. Smaller standard deviations means a tighter more precise data group, and larger deviations means the data is more spread out. • The formula: Image from Process Dynamics and Controls Open Textbook

Luckily Excel can do this for us!

Standard Deviation • How close the set of data is to the mean. Smaller standard deviations means a tighter more precise data group, and larger deviations means the data is more spread out. • One standard deviation contains about 68% of the population • Two standard deviation contains about 95% of the population • Three standard deviation contains about 99.7% of the population • The formula: • With a normal distribution (bell curve) Image from Process Dynamics and Controls Open Textbook Image http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg

The t-Test • The t-Test: the comparison of sample populations to determine if there is a significant difference between their means. The result is a ‘t’ value used to find the p-value. • One-tailed: hypothesis states the direction of the difference or relationship: Students will score higher on assignment 1 than 2. Women will run faster marathons after X training then men. • Two-tailed: hypothesis states there is a difference, but not the direction of the difference. Students will score higher on assignment one of the two assignments. After X training men and women will have different average marathon times. • The one-tailed probability is half the value of the two-tailed probability. • Three types of t-tests in Excel: • Type 1: One group of participants before and after treatments. • Type 2: Two groups of participants with equal variances (standard deviation about the same) • Type 3: Two groups of participants with unequal variances (standard deviation not about the same)

Probability and Confidence • P value (p): the probability of getting the results by chance. Confidence levels of the data being “real.” p is often “acceptable” a 0.05, 0.01, or 0.001, depending on how conservative you want to be and other factors. • Confidence Interval: describes reliability. It is a range of plausible values or range of probabilities within which the true probability would lie q certain percentage of the time (normally 95% or 90%). • Narrow: Implies high precision—small range of plausible values. More reliable. • Wide: Poor precision, the range is broad and uninformative. • Also “provides a way of determining whether the sample is large enough to make the trial definitive. If the lower boundary of a confidence interval is above the threshold considered clinically significant, then the trial is positive and definitive, if the lower boundary is somewhat below the threshold, the trial is positive, but studies with larger samples are needed.” (http://www.cmaj.ca/cgi/content/abstract/152/2/169) • Add/subtract the confidence interval value from the mean to find the confidence interval range

Correlations • Correlation: a measure of the relation between two or more variables. Correlation coefficients range from -1.00 to +1.00, with -1 a perfect negative correlation and +1 a perfect positive, but 0.00 is a lack of correlation.

Analysis of Variance (ANOVA) • The purpose of (ANOVA): to test for significant differences between means (which means comparing variances, thus the name). If we are only comparing two means, then ANOVA gives the same results as the t-test. The ANOVA produces an F statistic, the ratio of the variance among the means to the variance within the samples. • One-way ANOVA: for differences among two or more independent groups, typically 3 or more, as the t-test covers 2. Example: The times of masculine, feminine, androgynous, and undifferentiated genders in completing a task. • Factorial ANOVA: for the effects of two or more treatment variables. Most common is the 2×2 with two independent variables and each variable has two levels or distinct values. Can be multi-level, such as 3×3, or higher order, such as 2×2×2. Example 2x2:Female and Malescores before and after the treatment. • Multivariate analysis of variance (MANOVA): for when there is more than one dependent variable. Example: Student scores in audience analysis and grammar after using a website or a textbook. • To do in Excel: You need the Analysis ToolPak add-in

Qualitative Data Analysis • Top down: start with categories from theory, literature, hypotheses, your topics. This is considered “more rigorous” by more empirical researchers. It is not “biased” by the data. Best when you know what you are looking for. • Bottom up: develop your codes after the research during the analysis; see what codes develop. This may result in more natural and reflective coding. Best for exploratory research. • Both: Obviously you can use a bit of both • Coding: Can do manually or with software. Look for: • Themes, Topics, Ideas, Concepts • Terms/phrases or Keywords • Also consider developing codes for • Setting and context • Participant perspective • Process codes • Activity codes • Strategy codes • Relationships and social structure • Reassigned coding (all from Creswell 193, drawing on Bogdna & Biklen)

Coding • Develop a coding system • Make it flexible and easy to use • Create a “memo” or resource with code definitions to refer to, add to this as new codes develop • Consider “quantifying” where possible • Figure out what and how you will code • Whole texts? • Passages, lines, words? • Include the coded material in with coding • Copy and paste or link to in a cell in your spreadsheet (time in track for audio or video recordings) • Copy the print text and do old fashioned copying and pasting, or highlighting, and put the material in folders by code

Some Data Analysis Methods • Affinity diagram: method for sorting all the idea/points/items collected into groups and clusters, often resulting (but not required) in a hieratical diagram showing scope • Work Models: provide graphical, concrete, systematic view of work (or other) practice • Flow: Shows how work is broken up across people, keeping track of individuals, responsibilities, groups, flow, artifacts, communication topics or action, places, and breakdowns, • Sequence Model: Maps the sequence of work including intent, triggers, steps, order, and breakdowns • Artifact Models: shows the interpretation of the conceptual distinctions of use of artifact including information, parts, structure, annotations, presentation, usage, breakdowns • Cultural Model: maps out the intangible forces of culture including influencers and influences

Coding Examples

This is just the start Have fun & analyze well!

Data Analysis and a Brief Intro to Stats for Writers

Data Analysis and a Brief Intro to Stats for Writers

Presentation Transcript

A Brief Intro to “Gothic”

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

Intro to Stats

A Brief intro to the prov data model

Welcome to Intro Stats

Welcome to Intro Stats

A Brief Intro to Scala

EART20170 data analysis lecture 1: intro to stats and data

EART10160 stats / data analysis descriptive stats and outliers