Dealing with Data

1 / 25

Dealing with Data - PowerPoint PPT Presentation

Dealing with Data. 7 th grade math. What is data?. Data is information. Raw data can come in many different forms, the two most common are: Categorical data – data with specific labels or names for categories (usually in word form)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about ' Dealing with Data' - tovah

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Dealing with Data

What is data?
• Data is information.
• Raw data can come in many different forms, the two most common are:
• Categorical data – data with specific labels or names for categories (usually in word form)
• Numerical data – data that are counts or measures (usually in number form)
Variability
• Variability – indicates how widely spread or closely clustered data values are
• Students collect data on the amount of change in the pocket of every student at NHM. (Clustered or spread?)
• Students survey current students at NHM to find out their grade level – 6th,7th, or 8th.

How do you display data?
• The easiest way to display data is in a graph or chart.
• Pictograph Circle Graph
• Histogram Line Plot
• Bar Graph Scatter Plot
• Line Graph Box-and-Whisker Plot
• Frequency Distribution
• Stem and Leaf Plot
What makes a good graph?
• A good graph…
• Fits the data you have collected.
• Has a title and labels.
• Allows a reader to easily draw conclusions.
• Is easy to read and understand.
Where does data come from?
• ????
• Surveys
• Studies
• Questionnaires
• Census data
Populations, Samples, and Statistics
• Population – the entire set of items from which data can be selected (ex. Every 7th grade student, every girl at NHM)
• If we collected data from EVERY member of a population we would refer to this as a census.
• Collecting data from an entire population can be a long and difficult process, but the data obtained would be extremely accurate and reliable.
Populations, Samples, and Statistics
• Sample – a selected group of a population that is representative of the entire population. (ex. Twenty 7th grade students in Mr. Ridley’s math class)
• Samples can be:
• Random – data is obtained from random members of a population
• Systematic – data is obtained using a system for selection (ex. Every 10th person)
• Convenient – data is obtained from the easiest source available within your population (ex. People who sit next to you in class)
Populations, Samples, and Statistics
• Anytime you obtain data about a measured characteristic of your sample, you have collected a statistic.
• If you obtain data about a measured characteristic of an entire population, you have collected a parameter.
• If you find a data point that is not consistent with your other results (way too high, way too low) we call it an outlier and it can be removed.
• Which data would be more reliable?
Interpreting Data
• Raw data does not come in a user-friendly format.
• It must be processed and presented in a form that is easy to read and understand.
• One system for doing this is graphing, which allows for a visual picture of a data set.
Measures of Central Tendency
• Another system for interpreting data are the measures of central tendency.
• Also called measures of center, these numbers attempt to summarize a data set by describing the overall clustering of data in a set
• The goal of these numbers is to find one single numerical value that can represent the “average” value found in the entire set.
Measures of Central Tendency
• The 3 most common measures are:
• Mean – the average, found by dividing the sum of all the numbers in a data set by the number of pieces of data you collected.
• Median – the middle value, found by locating the middle number in a ordered data set
• Mode – the most common value, found by locating the most frequently appearing value in a data set
• Median – the cross out method
• Order your data set from least to greatest
• Repeatedly cross out the smallest and largest value in your data set until you arrive at the median
• If you have two values left, add them together and divide by two.
• Mode – it’s the “MOST”
• Both four letter words
• Both begin with MO
• Mean – sorry =(
• I really am sorry, but you just have to do the math.
• Add them up, divide by the number of pieces of data in your set.
Practice
• Its almost report card time and Sam is worried about his grade. He has made the following scores on his 7 tests in math: 77, 84, 83, 78, 92, 90, 84. Help Sam out by finding his …
• Mean
• Median
• Mode
Practice
• Sam’s football coach told him he was going to be benched if his grade was below a “B”, should Sam be worried? Explain.
• Which measure of central tendency would give Sam the best grade possible?
• Which measure of central tendency best reflects Sam’s actual test performance?
• Are there any outliers in his test scores?
• A statistician randomly selected 12 7th grade students and asked them how much time they spend each night on homework. The responses were:
• 0 mins 20 mins 15 mins
• 1 hour 30 mins 45 mins
• 15 mins 0 mins 15 mins
• 30 mins 1 hour 1 hr & 10 mins
• What is the average amount of time these students spent on homework?
• Does your answer reflect the mean, the median, or the mode? Explain how you know.
• If you had found a different measure of central tendency, would you expect your answer to be the same or different? Explain.
• If a 7th grader spends 15 hours per day at home, what percent of home time does the “average” student spend on homework?
Measures of Variability
• Attempt to describe the clustering seen in a set of numbers.
• The two most common measures of variability are:
• Range (easy)
• Interquartile Range (complicated)
• Range is used quite often, interquartile range is really only seen when creating a box-and-whisker plot
Range
• Range is quite simply the difference between the largest value and smallest value in a numerical data set.
• Code word: difference = subtraction
• EX. 12, 15, 19, 21, 41, 67
• The range is the largest value (67) minus the smallest value (12), which equals 55.
Interquartile Range
• Yes, it is as complicated as it sounds.
• First, what is a quartile?
• Think quad, which means four.
• Ok, so 4 of what?
• Quartile refers to one of 3 numbers that can break a set of data into 4 even sections.
• Quartile – a number that creates 4 equal sections of numbers in a distribution
Interquartile Range
• Lets see these quartiles in action!
• Step 1: Put a set of numbers in order
• 13, 15, 16, 18, 22, 25, 26
• Step 2: Find the median
• 13, 15, 16, 18, 22, 25, 26
• This separates the data into two sections, exclude the median
• [13, 15, 16] 18 [22, 25, 26]
• The median is now called the Second Quartile or Q2.
Interquartile Range
• Step 3: Find the median of the set of numbers less than Q2.
• [13, 15, 16] 18, 22, 25, 26
• 13, 15, 16
• This number is now called the First Quartile or Q1.
• Step 4: Find the median of the set of numbers greater than Q2.
• 13, 15, 16, 18, [22, 25, 26]
• 22, 25, 26
• This number is now called the Third Quartile or Q3.
Interquartile Range
• Step 5: Find the distance between the Third Quartile and the First Quartile
• (Q3 – Q1)
• 13, 15, 16, 18, 22, 25, 26

Q1 Q2 Q3

(25 – 15) = 10

This value is the interquartile range!

Interquartile Range
• So why did we do all of that work?
• What does a range tell us?
• All values fall between the smallest and largest value……..well duh!!!
• What does the interquartile range tell us?
• Half (50%) of all values fall between the first and third quartile.
• The interquartile range reflects the real “heart” of the data set.