measures of centrality. Last lecture summary. Which graphs did we meet? scatter plot ( bodový graf ) bar chart (sloupcový graf) histogram pie chart (koláčový graf) How do they work, what are their advantages and/or disadvantages?. SDA women – histogram of heights 2014. n = 48 or N = 48

Download Presentation

measures of centrality

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Last lecture summary • Which graphs did we meet? • scatter plot (bodový graf) • bar chart (sloupcový graf) • histogram • pie chart (koláčový graf) • How do they work, what are their advantages and/or disadvantages?

Distributions negatively skewed skewed to the left positively skewed skewed to the left e.g., body height e.g., life expectancy e.g., income http://turnthewheel.org/free-textbooks/street-smart-stats/

Life expectancy data • Watch TED talk by Hans Rosling, Gapminder Foundation: http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html

Female www.udacity.com – Introduction to statistics

Female www.udacity.com – Introduction to statistics

Gender bias What do you think, is there a gender bias? Who do you think is favored? Male or female? www.udacity.com – Introduction to statistics

Gender bias male female www.udacity.com – Introduction to statistics

Gender bias male female www.udacity.com – Introduction to statistics

Statistics is ambiguous • This example ilustrates how ambiguous the statistics is. • In choosing how to graph your data you may majorily impact what people believe to be the case. “I never believe in statistics I didn’t doctor myself.” “Nikdy nevěřím statistice, kterou si sám nezfalšuji.” Who said that? Winston Churchill www.udacity.com – Introduction to statistics

What is statistics? • Statistics – the science of collecting, organizing, summarizing, analyzing and interpreting data • Goal – use imperfect information (our data) to infer facts, make predictions, and make decisions • Descriptive statistic – describing and summarising data with numbers or pictures • Inferential statistics – making conclusions or decisions based on data

Variables • variable – a value or characteristics that can vary from individual to individual • example: favorite color, age • How variables are classified? • quantitative variable – numerical values, often with units of measurement, arise from the how much/how many question, example: age, annual income, number children • continuous (spojitá proměnná), example: height, weight • discrete (diskrétní proměnná), example: number of children • continuous variables can be discretized

Variables • categorical (qualitative) variables • categories that have no particular order • example: favorite color, gender, nationality • ordinal • they are not numerical but their values have a natural order • example: tempterature low/medium/high

Choosing a profession • We made an interval estimate. • But ideally we want one number that describes the entire dataset. This allows us to quickly summarize all our data. www.udacity.com – Statistics

Choosing a profession • The value at which frequency is highest. • The value where frequency is lowest. • Value in the middle. • Biggest value of x-axis. • Mean Geography Chemistry www.udacity.com – Statistics

Three big M’s • The value at which frequency is highest is called the mode. i.e. the most common value is the mode. • The value in the middle of the distribution is called the median. • The mean is the mean (average is the synonymum). Geography Chemistry www.udacity.com – Statistics

Quick quiz • What is the mode in our data? 2 5 6 5 2 6 9 8 5 2 3 5 www.udacity.com – Statistics

More of mode True or False? • The mode can be used to describe any type of data we have, whether it’s numerical or categorical. • All scores in the dataset affect the mode. • If we take a lot of samples from the same population, the mode will be the same in each sample. • There is an equation for the mode. • Ad 3. • http://onlinestatbook.com/stat_sim/sampling_dist/ • http://www.shodor.org/interactivate/activities/Histogram/ - mode changes as you change a bin size. • Because 3. is not true, we can’t use mode to learn something about our population. Mode depends on how you present the data. www.udacity.com – Statistics

Minimum minimum = 47.8 Sierra Leone www.coursera.org – Statistics: Making Sense of Data

Maximum maximum = 84.3 Japan www.coursera.org – Statistics: Making Sense of Data

Life expectancy data all countries www.coursera.org – Statistics: Making Sense of Data

Life expectancy data half larger 73.2 half smaller Egypt 1 99 197 www.coursera.org – Statistics: Making Sense of Data

Life expectancy data Maximum= 83.4 Median= 73.2 Minimum = 47.8 www.coursera.org – Statistics: Making Sense of Data

Q1 1st quartile = 64.7 Sao Tomé & Príncipe 50 (¼ way) 1 197 www.coursera.org – Statistics: Making Sense of Data

Q1 1st quartile = 64.7 ¼ smaller ¾ larger www.coursera.org – Statistics: Making Sense of Data

Q3 3rd quartile = 76.7 Netherland Antilles 148 (¾ way) 1 197 www.coursera.org – Statistics: Making Sense of Data

Q3 3rd quartile = 76.7 ¾ smaller ¼ larger www.coursera.org – Statistics: Making Sense of Data

Life expectancy data Maximum= 83.4 3rd quartile = 76.7 Median= 73.2 1st quartile = 64.7 Minimum = 47.8 www.coursera.org – Statistics: Making Sense of Data

Box Plot www.coursera.org – Statistics: Making Sense of Data

Box plot maximum 3rd quartile median 1st quartile minimum

Modified box plot outliers 1.5 x IQR IQR interquartile range outliers

Quartiles, median – how to do it? Find min, max, median, Q1, Q3 in these data. Then, draw the box plot. 79, 68, 88, 69, 90, 74, 87, 93, 76 www.coursera.org – Statistics: Making Sense of Data

Another example Min. 1st Qu. Median 3rd Qu. Max. 68.00 75.00 81.00 88.50 93.00 78, 93, 68, 84, 90, 74

3rd M – Mean • Mathematical notation: • … Greek letter capital sigma • means SUM in mathematics • Another measure of the center of the data: mean (average) • Data values:

Robust statistic Salary of 25 players of the American football (NY red Bulls) in 2012. median = 112 495 mean = 518 311 Mean is not arobuststatistic. Median is a robust statistic.