Drawback of Using Frequency Distribution • Frequency of occurrence is the foundation for much of statistical analysis because it provides a meaningful arrangement of observed data. • But there are some drawbacks to basing evaluations directly on the number of data points found in each class interval. • It is clumsy to compare groups of different size in terms of straight tallies.
Drawback of Using Frequency Distribution • For example, knowing that there are 254 electrical engineering majors at one university and only 154 at a second says little about relative importance of that concentration in the two engineering schools unless the respective total enrollments are also included in the comparison. • The first school has 1542 engineering students, so that the proportion, or relative frequency, of electrical engineers is 254/1547 = .164. • The second university has 655 engineering students, and the relative frequency of the electrical concentration is much higher 154/655 = .235. • Thus electrical engineering is more dominant in the second institution. • The foregoing suggests that it may be helpful to divide each of the original frequencies by the sample size, expressing the distribution in terms of relative frequencies.
Relative Frequency Distribution Relative frequency: The ratio of the frequency of a class to the total number of observations. Relative-frequency distribution: A listing of all classes along with their relative frequencies. Relative-frequency histogram: A graph that displays the classes on the horizontal axis and the relative frequencies of the classes on the vertical axis. The relative frequency of each class is represented by a vertical bar whose height is equal to the relative frequency of the class.
HISTOGRAM Consider the following data that shows days to maturity for 40 short-term investments
RELATIVE FREQUENCY HISTOGRAM Relative-frequency distribution for the days-to-maturity data
RELATIVE FREQUENCY HISTOGRAM 30.00% 25.00% 20.00% Relative Frequency 15.00% 10.00% 5.00% 0.00% 40 50 60 70 80 90 100 Number of Days to Maturity
CUMULATIVE RELATIVE FREQUENCY GRAPH • When plotted on a graph, the cumulative frequency distribution gives another visual summary of the sample. • A cumulative relative frequency graph is a graph that represents the cumulative frequencies for the classes in a frequency distribution. • Each dot is plotted directly above the upper class limit at a height equal to the cumulative frequency for that interval.
OGIVE CUMULATIVE RELATIVE FREQUENCY GRAPH 1.000 1.000 0.900 0.800 0.725 0.600 0.550 Cumulative Frequency 0.400 0.300 0.200 0.100 0.075 0.000 40 50 60 70 80 90 100 Number of Days to Maturity
PIE CHARTS • A pie chart is the most popular graphical method for summarizing quantitative data • A pie chart is a circle is subdivided into a number of slices • Each slice represents a category • Angle allocated to a slice is proportional to the proportion of times the corresponding category is observed • Since the entire circle corresponds to 3600, every 1% of the observations corresponds to 0.01 3600 = 3.60
CHOICE OF A CHART • Pie chart • Small / intermediate number of categories • Cannot show order of categories • Emphasizes relative values e.g., frequencies • Bar chart • Small / intermediate/large number of categories • Can present categories in a particular order, if any • Emphasizes relative values e.g., frequencies
CHOICE OF A CHART • Line chart • Small/intermediate/large number of categories • Can present categories in a particular order, if any • Emphasizes trend, if any
SCATTTER DIAGRAMS • Often, we are interested in two variables. For example, we may want to know the relationship between • advertising and sales • experience and time required to produce an unit of a product
SCATTTER DIAGRAMS • Scatter diagrams show how two variables are related to one another • To draw a scatter diagram, we need a set of two variables • Label one variable x and the other y • Each pair of values of x and y constitute a point on the graph
SCATTTER DIAGRAMS • In some cases, the value of one variable may depend on the value of the other variable. For example, • sales depend on advertising • time required to produce an item of a product depend on the number of units produced before • In such cases, the first variable is called dependent variable and the second variable is called independent variable. For example, Independent variable Dependent variable Advertising Sales Number of units produced Production time/unit
SCATTTER DIAGRAMS • Usually, independent variable is plotted on the horizontal axis (x axis) and the dependent variable on the vertical axis (y axis) • Sometimes, two variables show some relationships • positive relationship: two variables move together i.e., one variable increases (or decreases) whenever the other increases (or, decreases). Example: advertising and sales. • negative relationship: one variable increases (or, decreases) whenever the other decreases (increases). Example: number of units produced and production time/unit
SCATTTER DIAGRAMS • Relationship between two variables may be linear or non-linear. For example, • the relationship between advertising and sales may be linear. • the relationship between number of units produced and the production time/unit may be nonlinear.
Problems to be solved in class 2.16 (text book) The cumulative relative frequency distribution for the depth (in feet) of oil well shafts in a particular region is given here: Altogether there are 700 wells in the region. • Make a table for relative frequency distribution • Determine the original frequency distribution • Using the answer of part b, determine the cumulative frequency distribution