1 / 75

Chapter 7: Statistical Analysis with Excel

Spreadsheet-Based Decision Support Systems. Chapter 7: Statistical Analysis with Excel. Prof. Name name@email.com Position (123) 456-7890 University Name. Overview. 7.1 Introduction 7.2 Understanding Data

danton
Download Presentation

Chapter 7: Statistical Analysis with Excel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spreadsheet-Based Decision Support Systems Chapter 7: Statistical Analysis with Excel Prof. Name name@email.com Position (123) 456-7890 University Name

  2. Overview • 7.1 Introduction • 7.2 Understanding Data • 7.3 Relationships in Data • 7.4 Distributions • 7.5 Summary

  3. Introduction • Performing basic statistical analysis of data using Excel functions • Statistical features of the Data Analysis ToolPak • Trend curves for analyzing data patterns • Basic linear regression techniques in Excel • Several different distribution functions in Excel

  4. Understanding Data • Statistical Functions • Descriptive Statistics • Histograms

  5. Statistical Functions • AVERAGE • Finds the mean of a set of data. • =AVERAGE(range or range_name) • MEDIAN • Finds the middle number in a list of sorted data. • =MEDIAN(range or range_name) • STDEV • Finds the standard deviation of a set of data. • This is equal to the square root of the variance, which measures the difference between the mean of the data set and the individual values. =STDEV.P(range or range_name) =STDEV.S(range or range_name)

  6. Figures 7.1 and 7.2

  7. Figures 7.3 and 7.4

  8. Data Analysis ToolPak • An Excel Add-In which includes several statistical analysis techniques • To ensure that it is an active Add-in: • Display Excel Options dialog box: • Select Options from the list of options in the File tab. • Select the Add-Ins tab on the left side of the dialog box. • Select Analysis ToolPak listed on the Add-ins window.

  9. Descriptive Statistics • Provides a list of statistical information about your data set including • Mean • Median • Standard deviation • Variance • Click on Data > Analysis > Data Analysis command to display the Data Analysis dialog box. • Choose the Descriptive Statistics option and click OK.

  10. Descriptive Statistics (cont’d) • The Input Range refers to the location of the data set. • Check the option button Columns or Rows to indicate how your data is grouped. • If there are labels in the first row of each column of data, then check the Labels in First Row box. • The Output Range refers to where the results of the analysis will be displayed in the current worksheet. • Check the Summary Statistics box to calculate the most commonly used statistics.

  11. Figure 7.7 • Quarterly stock returns for three different companies are recorded. We want to know • Average stock return • Variability of stock returns • Which quarters had the highest and lowest stock returns

  12. Figures 7.8 and 7.9

  13. Figure 7.10 • Almost all of the data points lie between +2s and –2s from the mean. • Outliers are data that are inconsistent with the main pattern of data.

  14. Figure 7.11 • The standard deviation is used to identify outliers in a data set.

  15. Figure 7.12 • Conditional Formatting with the Formula Is option is used to identify outliers. • Select the column of values in the data set; and fill in the Conditional Formatting dialog box to highlight outlier points.

  16. Figure 7.13 • The cell that holds an outlier is highlighted.

  17. More Descriptive Statistics • Confidence Level for Mean • The mean is calculated using the specified confidence level (for example, 95% or 99%), the standard deviation, and the size of the sample data. • The confidence level and calculated mean are then added to the analysis report. • You can compare the actual mean to this calculated mean based on the specified confidence level. • Kth Largest • Gives the largest ranked data value for a specified value of k. • For k = 1, the maximum data value would be returned. • Kth Smallest • Gives the smallest ranked data value for a specified value of k. • For k = 1, the minimum data value would be returned.

  18. Descriptive Statistics Functions • PERCENTILE.INC • Returns a value for which a desired percentile k of the specified data_set falls below. = PERCENTILE.INC(data_set, k) • For example, for the MSFT data, the value for which 95% of the data falls below is =PERCENTILE.INC(B4:B27,0.95) = 0.108 • PERCENTILE.EXC • Excludes the value of k-th percentile from the calculations = PERCENTILE.EXC (data_set, k) • For the MSFT data, the value for which 95% of the data falls below is =PERCENTILE.EXC(B4:B27,0.95) = 0.135

  19. Descriptive Statistics Functions (cont’d) • PERCENTRANK.INC • Returns the percentile of the data_set which falls below a given value. =PERCENTRANK.INC(data_set, value) • For example, percent of the MSFT data falls below the value 0.108, inclusive of 0.108 is =PERCENTRANK.INC(B4:B27, 0.108) = 0.95, or 95% • PERCENTRANK.EXC • Calculates the same percentile, exclusive of the value of k. =PERCENTRANK.EXC(data_set, value) • For example, percent of the MSFT data falls below the value 0.135, exclusive of 0.135 is =PERCENTRANK.EXC(B4:B27, 0.135) = 0.95, or 95%

  20. Histograms • Histograms calculate the number of occurrences, or frequency, with which values in a data set fall into various intervals. • Choose the Histogram option from the Analysis ToolPak list.

  21. Histograms (cont’d) • The Input Range is the range of the data set. • The Bin Range is used to specify the location of the bin values. • Bins are the intervals into which values can fall; they can be defined by the user or can be evenly distributed among the data by Excel. • The Output Range is the location of the output, or the frequency calculations for each bin. • The chart options include a simple Chart Output (the actual histogram), CumulativePercentage for each bin value, and a Pareto organization of the chart.

  22. Figures 7.15 and 7.16

  23. Figures 7.17 and 7.18 • To create your own bin values, make a list of upper bounds for each interval.

  24. Figure 7.19

  25. Histograms (cont’d) • To change the format of a Histogram: • Click on the histogram to activate the Chart Tools contextual tabs. • Use the commands listed on these tabs to change the design, layout and format of the histogram.

  26. Histograms (cont’d) • There are four basic shapes to a histogram: • Symmetric: has peaks and dips with equal amplitude • A curve with only one peak is also symmetric; that is, there is a central high part and almost equal lower parts to the left and right of this peak. • Positively skewed: has a peak on the left and many lower points (stretching) to the right. • Negatively skewed: has a peak on the right and many lower points (stretching) to the left. • Multiple peaks: imply that more than one source, or population, of data is being evaluated.

  27. Relationships in Data • Trend Curves • Regression

  28. Data Relationships • Relationships in data are usually identified by comparing two variables: the dependent variableand theindependent variable. • The dependent variable is the variable we are most interested in. By understanding its current behavior we can better predict its future behavior. • The independent variable is the variable we use as the comparison in order to make this prediction.

  29. Trend Curves • Trend curves are used to graph and analyze these relationships between data. • Trend curves graph the data with • The independent variable on the x-axis • The dependent variable on the y-axis • To add a trend curve to your chart: • Click on the data points in an XY Scatter chart to activate Chart Tools contextual tabs. • Click on the Chart Tools Layout > Analysis > Trendline command. • Select a trend curves from the trendlines options listed.

  30. Trend Curves (cont’d) • There are six types of trend curves which Excel can model: • Exponential • Linear • Logarithmic • Polynomial • Power • Moving Average

  31. Trend Curves (cont’d) • Double click on a trendline to activate the Format Trendline dialog box. • We can modify: • The type of the trendline by selecting one of the options listed. • The trendline’s name. • We can specify a period forward or backward for which we want to predict the behavior of our dependent variable.

  32. Linear Trend Curves • Number of Units Produced each month and the corresponding Monthly Plant Cost are recorded. • The company needs to estimate plant costs based on the planned production amounts. • The dependent variable is therefore the Monthly Plant Cost and the independent variable is the Units Produced.

  33. Figure 7.25 • Begin this analysis by making an XY Scatter chart of the data.

  34. Figure 7.26 • Right-click on any of the data points and choose Add Trendline from the short-cut menu. • The Format Trendline dialog box appears. • Select Linear from the Types listed. • Select Display Equation on Chart checkbox.

  35. Figure 7.27 • The trendline and the equation are then added to the chart.

  36. Figure 7.28 • Use the displayed equation to predict future values. • Check the accuracy of the equation by calculating the error from the known data. • Linear trends have the relationship: y = a*x - b

  37. Figure 7.29 • Copy the formula for “Predicted Cost” to the rest of the rows to calculate the predicted monthly costs.

  38. Exponential Trend Curves • Sales data for ten years is recorded. • We want to predict sales for the next few years. • The independent variable is Years and our dependent variable is Sales.

  39. Figure 7.31 • Exponential trends have the following relationship: • y = a*e^(b*x)or • y = a*EXP(b*x) • Build a XY Scatter chart of the data. • Right-click on a data point to add the trendline. • Choose the Exponential curve to fit the data.

  40. Figures 7.32 and 7.33

  41. Figure 7.34 • We use the formula to predict sales values for future years. • However, the Exponential trend curve has a sharply increasing slope that may not be accurate for many situations.

  42. Power Trend Curves • We are given yearly Production values and yearly Unit Cost for production. • We want to determine the relationship between Unit Cost and Production in order to predict future Unit Costs.

  43. Figure 7.36 • Power trends have the relationship: y = a*x^b • Begin by creating the XY Scatter chart. • Right-click on a data point to add a trendline. • Choose a Power curve to fit the data.

  44. Figures 7.37 and 7.38

  45. Regression Analysis • We can use some regression analysis parameters to ensure that the relationships we have chosen for our data are “good” fits. • These parameters include • R-Squared value • Standard error • Slope • Intercept

  46. R-Squared Value • The R-Squared value measures the amount of influence the independent variable has on the dependent variable. • The closer the R-Squared value is to 1, the stronger the relationship is between the independent and dependent variables. • If the R-Squared value is closer to 0, then there may not be a relationship between these two variables.

  47. Figure 7.39 • We fit a Linear trendline to the Monthly Plant Cost per Units Produced chart. • The R-Squared value is 0.8137, which is fairly close to 1, implying a good fit.

  48. Figure 7.40 • We fit an Exponential trendline to the Sales per year chart. • The R-Squared value is 0.9828, which is fairly close to 1, implying a sound fit.

  49. Figure 7.41 • We fit a Power trendline to the Unit Cost per Cumulative Production chart. • The R-Squared value is 0.9062, which is fairly close to 1, implying a good fit.

  50. Figure 7.42 • The RSQ Excel function can calculate the R-squared value from a set of data. • =RSQ(y_range, x_range) • Note that this function only works with Linear trend curves.

More Related