Z 519: Information Analytics. Social Statistics: Introduction. Social Statistics. Statistics describes a set of tools and techniques for describing, organizing and interpreting information or data. Do we need statistics? When and Why?. Why we need statistics.

Z519: Information Analytics

Social Statistics: Introduction

Social Statistics
• Statistics describes a set of tools and techniques for describing, organizing and interpreting information or data.
• Do we need statistics? When and Why?
Why we need statistics
• Everybody relies on data in one way or another:
• corporate presidents decide company policy based on quarterly sales figures
• politicians decide on campaign strategy based on polls
• teachers decide grading curves based on a bell curve
• you and I decide whether to smoke or not based on health records of other people
• Therefore, we need a comprehensive and understandable way to deal with data:
• Statistics is the study of making sense of data.
Descriptive statistics
• Used to organize and describe the characteristics of a collection of data

Descriptive statistics
• How can you describe this table?

Inferential statistics
• Make inferences from a smaller group of data to a possible larger one
• Sample: a smaller group of data
• Population: the whole group of a certain subject

Population & Sample
• population
• the set of all photographs of Mars
• the set of heights of people in the US Army
• the set of all measurements of water quality taking from the Hudson river
• the set of all problems that can be solved using statistics.
• sample
• the pictures selected from a specific region of Mars
• the heights of people in a particular division of the US Army
• the set of water measurements of the Hudson River taken on 7/24/2009
• the statistical problems we are solving in this class

Steps for statistical analysis
• Problem definition what is the population of interest, and what are the variables that are to be investigated
• Data collection describe and select the sample from the population
• Data analysis make some statistical inferences from the sample about the population
• Analysis Reporting report the inference together with a measure of reliability for the inference where we use the term variable to mean a characteristic or property of an individual population where the observations can vary.

An example
• Example: A tax auditor is responsible for 25,000 accounts. How many accounts are in error?
• Defining the problem: The entire population consists of all 25,000 accounts. Our goal is to obtain a reasonable estimate for the number of accounts that are, in all likelihood, in error. Our variable x counts whether an account is in error.
• Data collection and summary: The auditor decides to select 2000 accounts at random, tests each of these, and finds that 84 of them are in error.
• Data analysis: In this case, the likely theory involves computing 84/2000 = 4.2%.
• Analysis reporting: Based on our data analysis we infer that approximately 4.2% of the accounts will be in error.

Tools
• Excel
• Excel Toolpak
• SPSS/PASW

Excel Toolpak (1)
• Click the Microsoft Office Button , and then click Excel Options.
• Click Add-Ins, and then in the Manage box, select Excel Add-ins.
• Click Go.
• In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.
• If you get prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to install it.
• After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on the Data tab.

Excel Toolpak (2)
• Powerful, reliable, accessible, easy, and free

Formula

How does it work in Excel?

• So let's get started digging into what makes a spreadsheet work. Spreadsheets are made up of:
• columns
• Rows
• cells
• In each cell there may be the following types of data:
• text (labels)
• number data (constants)
• formulas (mathematical equations)

Column

Row

Cell

Types of Data

ALL formulas MUST begin with an equal sign (=).

Formulas – SUM
• The Sum function takes all of the values in each of the specified cells and totals their values. The syntax is: =SUM(first value, second value, etc)

Formulas – AVERAGE
• The average function finds the average of the specified data. The syntax is as follows =Average(first value, second value, etc.)

Formulas – MAX/MIN
• MAX: This will return the largest (max) value in the selected range of cells.
• MIN: This will return the smallest (Min) value in the selected range of cells.

Formulas – COUNT
• This will return the number of entries (actually counts each cell that contains number data) in the selected range of cells.

