 Download Presentation measures of centrality

measures of centrality - PowerPoint PPT Presentation

measures of centrality. Last lecture summary. Which graphs did we meet? scatter plot ( bodový graf ) bar chart (sloupcový graf) histogram pie chart (koláčový graf) How do they work, what are their advantages and/or disadvantages?. SDA women – histogram of heights 2014. n = 48 or N = 48 I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation measures of centrality

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
1. measures of centrality

2. Last lecture summary • Which graphs did we meet? • scatter plot (bodový graf) • bar chart (sloupcový graf) • histogram • pie chart (koláčový graf) • How do they work, what are their advantages and/or disadvantages?

3. SDA women – histogram of heights 2014 n = 48 or N = 48 bin size = 3.8

4. Distributions negatively skewed skewed to the left positively skewed skewed to the left e.g., body height e.g., life expectancy e.g., income http://turnthewheel.org/free-textbooks/street-smart-stats/

5. statistics is beatiful new stuff

6. Life expectancy data • Watch TED talk by Hans Rosling, Gapminder Foundation: http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html

7. statistics is deep

8. UC Berkeley Though data are fake, the paradox is the same Simpson’s paradox www.udacity.com – Introduction to statistics

9. Male www.udacity.com – Introduction to statistics

10. Male www.udacity.com – Introduction to statistics

11. Female www.udacity.com – Introduction to statistics

12. Female www.udacity.com – Introduction to statistics

13. Gender bias What do you think, is there a gender bias? Who do you think is favored? Male or female? www.udacity.com – Introduction to statistics

14. Gender bias male female www.udacity.com – Introduction to statistics

15. Gender bias male female www.udacity.com – Introduction to statistics

16. Statistics is ambiguous • This example ilustrates how ambiguous the statistics is. • In choosing how to graph your data you may majorily impact what people believe to be the case. “I never believe in statistics I didn’t doctor myself.” “Nikdy nevěřím statistice, kterou si sám nezfalšuji.” Who said that? Winston Churchill www.udacity.com – Introduction to statistics

17. What is statistics? • Statistics – the science of collecting, organizing, summarizing, analyzing and interpreting data • Goal – use imperfect information (our data) to infer facts, make predictions, and make decisions • Descriptive statistic – describing and summarising data with numbers or pictures • Inferential statistics – making conclusions or decisions based on data

18. Variables • variable – a value or characteristics that can vary from individual to individual • example: favorite color, age • How variables are classified? • quantitative variable – numerical values, often with units of measurement, arise from the how much/how many question, example: age, annual income, number children • continuous (spojitá proměnná), example: height, weight • discrete (diskrétní proměnná), example: number of children • continuous variables can be discretized

19. Variables • categorical (qualitative) variables • categories that have no particular order • example: favorite color, gender, nationality • ordinal • they are not numerical but their values have a natural order • example: tempterature low/medium/high

20. Variables variable (proměnná) quantitative (kvantitativní) categorical (kategorická) ordinal (ordinální) continuous (spojitá) discrete (diskrétní)

21. Choosing a profession Chemistry Geography 50 000 – 60 000 40 000 – 55 000 www.udacity.com – Statistics

22. Choosing a profession • We made an interval estimate. • But ideally we want one number that describes the entire dataset. This allows us to quickly summarize all our data. www.udacity.com – Statistics

23. Choosing a profession • The value at which frequency is highest. • The value where frequency is lowest. • Value in the middle. • Biggest value of x-axis. • Mean Geography Chemistry www.udacity.com – Statistics

24. Three big M’s • The value at which frequency is highest is called the mode. i.e. the most common value is the mode. • The value in the middle of the distribution is called the median. • The mean is the mean (average is the synonymum). Geography Chemistry www.udacity.com – Statistics

25. Quick quiz • What is the mode in our data? 2 5 6 5 2 6 9 8 5 2 3 5 www.udacity.com – Statistics

26. Mode in negatively skewed distribution www.udacity.com – Statistics

27. Mode in uniform distribution www.udacity.com – Statistics

28. Multimodal distribution www.udacity.com – Statistics

29. Mode in categorical data www.udacity.com – Statistics

30. More of mode True or False? • The mode can be used to describe any type of data we have, whether it’s numerical or categorical. • All scores in the dataset affect the mode. • If we take a lot of samples from the same population, the mode will be the same in each sample. • There is an equation for the mode. • Ad 3. • http://onlinestatbook.com/stat_sim/sampling_dist/ • http://www.shodor.org/interactivate/activities/Histogram/ - mode changes as you change a bin size. • Because 3. is not true, we can’t use mode to learn something about our population. Mode depends on how you present the data. www.udacity.com – Statistics

31. Life expectancy data www.coursera.org – Statistics: Making Sense of Data

32. Minimum minimum = 47.8 Sierra Leone www.coursera.org – Statistics: Making Sense of Data

33. Maximum maximum = 84.3 Japan www.coursera.org – Statistics: Making Sense of Data

34. Life expectancy data all countries www.coursera.org – Statistics: Making Sense of Data

35. Life expectancy data half larger 73.2 half smaller Egypt 1 99 197 www.coursera.org – Statistics: Making Sense of Data

36. Life expectancy data Maximum= 83.4 Median= 73.2 Minimum = 47.8 www.coursera.org – Statistics: Making Sense of Data

37. Q1 1st quartile = 64.7 Sao Tomé & Príncipe 50 (¼ way) 1 197 www.coursera.org – Statistics: Making Sense of Data

38. Q1 1st quartile = 64.7 ¼ smaller ¾ larger www.coursera.org – Statistics: Making Sense of Data

39. Q3 3rd quartile = 76.7 Netherland Antilles 148 (¾ way) 1 197 www.coursera.org – Statistics: Making Sense of Data

40. Q3 3rd quartile = 76.7 ¾ smaller ¼ larger www.coursera.org – Statistics: Making Sense of Data

41. Life expectancy data Maximum= 83.4 3rd quartile = 76.7 Median= 73.2 1st quartile = 64.7 Minimum = 47.8 www.coursera.org – Statistics: Making Sense of Data

42. Box Plot www.coursera.org – Statistics: Making Sense of Data

43. Box plot maximum 3rd quartile median 1st quartile minimum

44. Modified box plot outliers 1.5 x IQR IQR interquartile range outliers

45. Quartiles, median – how to do it? Find min, max, median, Q1, Q3 in these data. Then, draw the box plot. 79, 68, 88, 69, 90, 74, 87, 93, 76 www.coursera.org – Statistics: Making Sense of Data

46. Another example Min. 1st Qu. Median 3rd Qu. Max. 68.00 75.00 81.00 88.50 93.00 78, 93, 68, 84, 90, 74

47. Percentiles věk [roky] http://www.rustovyhormon.cz/on-line-rustove-grafy

48. 3rd M – Mean • Mathematical notation: • … Greek letter capital sigma • means SUM in mathematics • Another measure of the center of the data: mean (average) • Data values:

49. Robust statistic Salary of 25 players of the American football (NY red Bulls) in 2012. median = 112 495 mean = 518 311 Mean is not arobuststatistic. Median is a robust statistic.