- 97 Views
- Uploaded on
- Presentation posted in: General

Statistics

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Statistics

By:

M. Yasir Ali

Umer Saeed

Luqman Bashir

Waqas Hussain

Ahsan Raza

- Meaning of Statistics
- Observations and Variable
- Collection of Data

- Meaning of Statistics: Statistics is concerned with scientific methods for collecting, organizing, summarizing, presenting and analyzing data as well as deriving valid conclusions and making reasonable decisions on the basis of this analysis. Statistics is concerned with the systematic collection of numerical data and its interpretation. The word ‘statistic’ is used to refer to
- 1. Numerical facts, such as the number of people living in particular area.
- 2. The study of ways of collecting, analyzing and interpreting the facts

- The word “Statistics” comes from Latin word Status, meaning a political state, originally meant information useful to the state,for example information about the sizes of population and armed forces.
- The word statistics refers to “numerical facts systematically arranged.
- Use of Statistical information to inform public; to explain things happened; to justify a claim; to provide general comparisons.

- In statistics, observation means any sort of numerically recording of information. A classification such as head or tail.
- Variable is a characteristics that varies with an individual or an object, is called a variable. For example age is variable as it varies from person to person.
- A quantitative variable may be classified as discrete or continuous.
- A discrete variable is one that can take only a discrete set of integers or whole numbers, that is the values are taken by jumps or breaks.
- A variable is called a continuous variable if it can take on any value—fractional or integer—within a given interval.

- The most important part of statistical work is perhaps the collection of data.
- Statistical data is collected either by a complete enumeration of the whole field, called census, which in many cases would be too costly and too time consuming.
- Data that have been originally collected (raw data) and have not undergone any sort of statistical treatment, are called primary data,
- While data that have undergone any sort of treatment by statistical methods at least once, i.e. the data have been collected, classified, tabulated or presented in some form for a certain purpose, are called secondary data.
- Editing of data, uses and misuses of data.

- Data and Data Types
- Representing Data: pie chart, bar chart.
- Summarizing Data: box plot, histogram
- Central tendency
- Spread
- Distribution (shape)

Pizza Sales by Type

What do the data tell you?

How can you use the information?

What additional information would make these data more informative?

- Quantitative
- Discrete = count: Number of car accidents by city by time
- Continuous = measurement: Housing prices

- Qualitative
- Categorical: Shopping mall, car brand, trip mode
- Ordinal: Survey data on attitudes; “How do you feel about…?”
Strongly disagree Disagree Neutral Agree Strongly agree

Moody’s bond ratings: Aaa, Aa, A, Bbb, Bb, B, and so on.

Ordered Qualitative DataGerman Health Satisfaction Survey; 27,326 individuals. On a scale from 0 to 10, how do you feel about your health?

Qualitative Data:

No units of measurement

Arithmetic manipulation is usually meaningless. The average of Air and Bus is not Train

Quantitative Data:

Units of measurement make sense. Arithmetic computations make sense.

- Statistics is concerned with the collection and analysis of data.
- There are several different types of statistical studies that are used to collect data.
- Let's take a look at surveys, experimental studies and observational studies.

- 1. Survey - Statistical surveys are used to collect quantitative information from a specific population. A survey may focus on opinions or factual information depending upon the purpose of the study. Surveys may involve answering a questionnaire or being interviewed by a researcher. The census is a type of survey.

- 2. Experimental study - In an experimental study, the researcher takes measurements, or surveys, the sample population. The researcher then manipulates the sample population in some manner. After the manipulation, the researcher re-measures, or re-surveys, using thesame procedures to determine if the manipulation possibly changed the measurements.
- During a "controlled" experiment, the researcher will separate the sample population into groups with one group established as the control group. All groups will be manipulated in some manner, except for the control group which will remain the same.

- 3. Observational study - In an observational study, the sample population being studied is measured, or surveyed, as it is. The researcher does not influence the population in any way or attempt to intervene in the study. There is no experimental manipulation. Instead, data is simply gathered and correlations are investigated.

- In raw form
- Transformed to a visual form
- Summarized graphically
- Summarized statistically

Pizza Pies Sold, by Type

BARCHARTPIE CHART

Same data. Which is easier to understand?

Hawaii

Box and Whisker Plot for House Price Listings

Maximum=31136

3rdQuartile = 24933

Interquartile Range = IQR= 24933-21677 = 3256

Median=22610

1stQuartile = 21677

Minimum=17043

A histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings.

HOG, pp. 16-18

… shows up in the box and whisker plot. Note the long whisker at the top of the figure.

Asymmetry (skewness) in the histogram of listing prices…

Graphical tools can be very badly behaved when:

(1) The data have only a few observations.

(2) There are wild observations in the data set.

The box and whisker plot is distorted (and dominated) by one wildly errant observation.

- Often it is not possible to list all the data or draw a histogram; it would be nice to have one number which best represents a data set
- Often where the data lies is of interest, for which purpose a measure of location is useful.

Measure of location

- Mean – arithmetic average = x/n
- Median – the halfway point
- Mode – the most common answer
- Every value in the list is a MODE: If each value occurs exactly once, so all are "most common."

Measure of location

1

2

2

2

3

3

Mean = 5.3

Median = 3

Mode = 2

4

4

27

Measure of location

0

1

2

3

4

5

Mean = 8.3

Median = 4

Mode = 27

6

27

27

Measure of location

1

1

1

2

2

23

Mean = 9.4

Median = 2

Mode = 1

24

26

27

Measure of Variability

- Range – Overall difference between the highest and lowest scores.
- SET OF SCORES:
7, 2, 7, 6, 5, 6, 2

RANGE = 7 - 2 = 5

- Variance – Average difference from the mean.
- CALCULATED BY SQUARING THE STANDARD DEVIATION (S2)
- STANDARD DEVIATION = S = 4
- VARIANCE = S2 = 42 = 16

Variability

Identical Range

19

99

99

1111

1111

1119

- 11
- 11
- 11
- 1919
- 1919
- 1919

Variability

Identical Variance

19

99

99

1111

1111

1119

- 66
- 66
- 67
- 1314
- 1414
- 1414

Conclusions

- Statistics are useful for figuring out random noise from real effects
- 2) Numbers are not absolute, and they can be easily manipulated
- 3) Always scrutinize data closely, and draw your own conclusions.
- 4) 85% of all statistics are made up on the spot: the rest are all wrong

Frequency Distribution

- A frequency distribution is a table that organises data into classes
- A class is a group of values describing ONE characteristic of the data

- It shows the number of observations from the data that fall into each class
- Frequency distribution can be constructed by determining how often ('with what frequency') values occur inside each class of a data set

- Fewer classes mean more data compression

- Frequency of each value can be expressed as a fraction or percentage of the total number of observations
- This could help us compare data from samples that are of different sizes

- DISCRETE : In this case, the data in a class can take ONE discrete value :
- 0, 1, 2, ...

- CONTINUOUS : In this case, the data in a class can take any value in a range
- > 0; <= 1
- > 1; <= 2
- > 2; <= 3
- And so on

- Discrete Classes can also be used to model Qualitative Classes
- Where the data does not take specific numerical values but falls into certain qualitative that is non-numeric categories

- Continuous classes cannot have qualitative data
- Unless you want to prove a point !!

- All Inclusive
- All the data must fall into or other class
- Sum of relative frequencies must add up to 1

- Mutually Exclusive
- Greater Than ( > ) Lower Class Boundary
- Less Than OR Equal to ( <=) Upper Class Boundary

- First and Last Class open ended

0 count for

ratings <= 10

1 count for ratings > 90

ratings <= 100

0 count for

ratings > 100

- Decide on Type of Class
- Quantitative or Qualitative measure ?

- Decide on Number of Classes
- More classes : give more information
- Fewer classes : easier to interpret
- Rule of Thumb : Between 6 and 15 classes

- Determine width of class interval
[Largest Value] – [Unit Value before Smallest Value]

Total Number of Class Intervals

- Determine the number of points in each class
- Illustrate the data in a chart

- Functions used
- Max
- Min
- Roundup
- Round down
- Sum
- Frequency

- Grouped data is data that has been organized into groups known as classes. Grouped data has been 'classified' and thus some level of data analysis has taken place, which means that the data is no longer raw.
- A data class is group of data which is related by some user defined property. For example, if you were collecting the ages of the people you met as you walked down the street, you could group them into classes as those in their teens, twenties, thirties, forties and so on. Each of those groups is called a class.

- Each of those classes is of a certain width and this is referred to as the Class Interval or Class Size. This class interval is very important when it comes to drawing Histograms and Frequency diagrams. All the classes may have the same class size or they may have different classes sizes depending on how you group your data. The class interval is always a whole number.

- Below is an example of grouped data where the classes have the same class interval.

- Data that has not been organized into groups.
- Ungrouped data looks like a big list of numbers.

QUESTIONS OR COMMENTS??

THANK YOU!!