1 / 76

# Data Collection - PowerPoint PPT Presentation

2. Chapter. Data Collection. Data Vocabulary Level of Measurement Time Series and Cross-sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research. McGraw-Hill/Irwin. © 2008 The McGraw-Hill Companies, Inc. All rights reserved.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Data Collection' - apria

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Chapter

### Data Collection

Data Vocabulary

Level of Measurement

Time Series and Cross-sectional Data

Sampling Concepts

Sampling Methods

Data Sources

Survey Research

McGraw-Hill/Irwin

• In business, data usually arise from accounting transactions or management processes.

Data Vocabulary

• Data is the plural form of the Latin datum (a “given” fact).

• Important decisions may depend on data.

• Subjects, Variables, Data Sets

• We will refer to Data as plural and data set as a particular collection of data as a whole.

• Observation – each data value.

• Subject(or individual) – an item for study (e.g., an employee in your company).

• Variable – a characteristic about the subject or individual (e.g., employee’s income).

• Subjects, Variables, Data Sets

• Three types of data sets:

• Subjects, Variables, Data Sets

Consider the multivariate data set with

5 variables

8 subjects

5 x 8 = 40 observations

Attribute(qualitative)

Numerical(quantitative)

CodedX = 3(i.e., economics)

Data Vocabulary

• Data Types

• A data set may have a mixture of data types.

Data Vocabulary

• Attribute Data

• Also called categorical, nominal or qualitative data.

• Values are described by words rather than numbers.

• For example, - Automobile style (e.g., X = full, midsize, compact, subcompact).- Mutual fund (e.g., X = load, no-load).

• Data Coding

• Coding refers to using numbers to represent categories to facilitate statistical analysis.

• Coding an attribute as a number does not make the data numerical.

• For example, 1 = Bachelor’s, 2 = Master’s, 3 = Doctorate

• Rankings may exist, for example, 1 = Liberal, 2 = Moderate, 3 = Conservative

• Binary Data

• A binary variable has only two values, 1 = presence, 0 = absence of a characteristic of interest (codes themselves are arbitrary).

• For example, 1 = employed, 0 = not employed 1 = married, 0 = not married 1 = male, 0 = female 1 = female, 0 = male

• The coding itself has no numerical value so binary variables are attribute data.

• Numerical Data

• Numerical or quantitative data arise from counting or some kind of mathematical operation.

• For example, - Number of auto insurance claims filed in March (e.g., X = 114 claims).- Ratio of profit to sales for last quarter (e.g., X = 0.0447).

• Can be broken down into two types – discrete or continuous data.

• Discrete Data

• A numerical variable with a countable number of values that can be represented by an integer (no fractional values).

• For example, - Number of Medicaid patients (e.g., X = 2).- Number of takeoffs at O’Hare (e.g., X = 37).

• Continuous Data

• A numerical variable that can have any value within an interval (e.g., length, weight, time, sales, price/earnings ratios).

• Any continuous interval contains infinitely many possible values (e.g., 426 < X < 428).

• Four levels of measurement for data:

• Nominal Measurement

• Nominal data merely identify a category.

• Nominal data are qualitative, attribute, categorical or classification data (e.g., Apple, Compaq, Dell, HP).

• Nominal data are usually coded numerically, codes are arbitrary (e.g., 1 = Apple, 2 = Compaq, 3 = Dell, 4 = HP).

• Only mathematical operations are counting (e.g., frequencies) and simple statistics.

• Ordinal Measurement

• Ordinal data codes can be ranked(e.g., 1 = Frequently, 2 = Sometimes, 3 = Rarely, 4 = Never).

• Distance between codes is not meaningful (e.g., distance between 1 and 2, or between 2 and 3, or between 3 and 4 lacks meaning).

• Many useful statistical tests exist for ordinal data. Especially useful in social science, marketing and human resource research.

• Interval Measurement

• Data can not only be ranked, but also have meaningful intervals between scale points (e.g., difference between 60F and 70F is same as difference between 20F and 30F).

• Since intervals between numbers represent distances, mathematical operations can be performed (e.g., average).

• Zero point of interval scales is arbitrary, so ratios are not meaningful (e.g., 60F is not twice as warm as 30F).

• Likert Scales

• A special case of interval data frequently used in survey research.

• The coarseness of a Likert scale refers to the number of scale points (typically 5 or 7).

• Likert Scales

• A neutral midpoint (“Neither Agree Nor Disagree”) is allowed if an odd number of scale points is used or omitted to force the respondent to “lean” one way or the other.

• Likert data are coded numerically (e.g., 1 to 5) but any equally spaced values will work.

• Likert Scales

• Careful choice of verbal anchors results in measurable intervals (e.g., the distance from 1 to 2 is “the same” as the interval, say, from 3 to 4).

• Ratios are not meaningful (e.g., here 4 is not twice 2).

• Many statistical calculations can be performed (e.g., averages, correlations, etc.).

• Likert Scales

• More variants of Likert scales:

• Ambiguity

• Grades are usually coded numerically (A = 4, B = 3, C = 2, D = 1, F = 0) and are used to calculate a mean GPA.

• Is the interval from 3.0 to 4.0 really the same as the interval from 1.0 to 2.0?

• What is the underlying reality ranging from 0 to 4 that we are measuring?

• Best to be conservative and limit statistical tests to those for ordinal data.

• Ratio Measurement

• Ratio data have all properties of nominal, ordinal and interval data types and also possess a meaningful zero (absence of quantity being measured).

• Because of this zero point, ratios of data values are meaningful (e.g., \$20 million profit is twice as much as \$10 million).

• Zero does not have to be observable in the data, it is an absolute reference point.

• Use the following procedure to recognize data types:

• Changing Data by Recoding

• In order to simplify data or when exact data magnitude is of little interest, ratio data can be recoded downward into ordinal or nominal measurements (but not conversely).

• For example, recode systolic blood pressure as “normal” (under 130), “elevated” (130 to 140), or “high” (over 140).

• The above recoded data are ordinal (ranking is preserved) but intervals are unequal and some information is lost.

• We are interested in trends and patterns over time (e.g., annual growth in consumer debit card use from 1999 to 2006).

Time Series and Cross-sectional Data

• Time Series Data

• Each observation in the sample represents a different equally spaced point in time (e.g., years, months, days).

• Periodicity may be annual, quarterly, monthly, weekly, daily, hourly, etc.

• Cross-sectional Data

• Each observation represents a different individual unit (e.g., person) at the same point in time (e.g., monthly VISA balances).

• We are interested in - variation among observations or in - relationships.

• We can combine the two data types to get pooled cross-sectional and time series data.

Sampling Concepts

• Sample or Census?

• A sample involves looking only at some items selected from the population.

• A census is an examination of all items in a defined population.

• Mobility - Illegal immigrants- Budget constraints- Incomplete responses or nonresponses

Sampling Concepts the population?

Sampling Concepts the population?

Sampling Concepts the population?

Sampling Concepts the population?

• Parameters and Statistics

• Statistics are computed from a sample of n items, chosen from a population of N items.

• Statistics can be used as estimates of parameters found in the population.

• Symbols are used to represent population parameters and sample statistics.

Sampling Concepts the population?

• Parameters and Statistics

Sampling Concepts the population?

• Parameters and Statistics

• The population must be carefully specified and the sample must be drawn scientifically so that the sample is representative.

• Target Population

• The target population is the population we are interested in (e.g., U.S. gasoline prices).

• The sampling frame is the group from which we take the sample (e.g., 115,000 stations).

• The frame should not differ from the target population.

N the population?

n

Sampling Concepts

• Finite or Infinite?

• A population is finite if it has a definite size, even if its size is unknown.

• A population is infinite if it is of arbitrarily large size.

• Rule of Thumb: A population may be treated as infinite when N is at least 20 times n (i.e., when N/n > 20)

Here,N/n > 20

Sampling Methods the population?

Sampling Methods the population?

=RANDBETWEEN(1,48) the population?

Sampling Methods

• Simple Random Sample

• Every item in the population of N items has the same chance of being chosen in the sample of n items.

• We rely on random numbersto select a name.

Sampling Methods the population?

• Random Number Tables

• A table of random digits used to select random numbers between 1 and N.

• Each digit 0 through 9 is equally likely to be chosen.

• Setting Up a Rule

• For example, NilCo wants to award cash prizes to 10 of its 875 loyal customers.

• To get 10 three-digit numbers between 001 and 875, we define any consistent rule for moving through the random number table.

Sampling Methods the population?

• Setting Up a Rule

• Randomly point at the table to choose a starting point.

• Choose the first three digits of the selected five-digit block, move to the right one column, down one row, and repeat.

• When we reach the end of a line, wrap around to the other side of the table and continue.

• Discard any number greater than 875 and any duplicates.

Start Here the population?

Table of 1,000 Random Digits

Sampling Methods the population?

• With or Without Replacement

• If we allow duplicates when sampling, then we are sampling with replacement.

• Duplicates are unlikely when n is much smaller than N.

• If we do not allow duplicates when sampling, then we are sampling without replacement.

Sampling Methods the population?

• Computer Methods

These are pseudo-random generators because even the best algorithms eventually repeat themselves.

Sampling Methods the population?

• Randomizing a List

• In Excel, use function =RAND() beside each row to create a column of random numbers between 0 and 1.

• Copy and paste these numbers into the same column using “Paste Special | Values” (to paste only the values and not the formulas).

• Sort the spreadsheet on the random number column.

Sampling Methods the population?

• Randomizing a List

• The first n items are a random sample of the entire list (they are as likely as any others).

Sampling Methods the population?

• Systematic Sampling

• Sample by choosing every kth item from a list, starting from a randomly chosen entry on the list.

• For example, starting at item 2, we sample every k = 4 items to obtain a sample of n = 20 items from a list of N = 78 items.

• Note that N/n = 78/20  4.

Sampling Methods the population?

• Systematic Sampling

• A systematic sample of n items from a population of N items requires that periodicity k be approximately N/n.

• Systematic sampling should yield acceptable results unless patterns in the population happen to recur at periodicity k.

• Can be used with unlistable or infinite populations.

• Systematic samples are well-suited to linearly organized physical populations.

Sampling Methods the population?

• Systematic Sampling

• For example, out of 501 companies, we want to obtain a sample of 25. What should the periodicity k be?

k = N/n

= 501/25 20.

• So, we should choose every 20th company from a random starting point.

Sampling Methods the population?

• Stratified Sampling

• Utilizes prior information about the population.

• Applicable when the population can be divided into relatively homogeneous subgroups of known size (strata).

• A simple random sample of the desired size is taken within each stratum.

• For example, from a population containing 55% males and 45% females, randomly sample 120 males and 80 females (n = 200).

Sampling Methods

• Stratified Sampling

• Or, take a random sample of the entire population and then combine individual strata estimates using appropriate weights.

• For a population with L strata, the population size N is the sum of the stratum sizes:N = N1 + N2 + ... + NL

• The weight assigned to stratum j is wj = Nj / n

Sampling Methods the population?

• Cluster Sample

• Strata consist of geographical regions.

• One-stage cluster sampling – sample consists of all elements in each of k randomly chosen subregions (clusters).

• Two-stage cluster sampling, first choose k subregions (clusters), then choose a random sample of elements within each cluster.

Sampling Methods the population?

• Cluster Sample

• Here is an example of 4 elements sampled from each of 3 randomly chosen clusters (two-stage cluster sampling).

Sampling Methods the population?

• Cluster Sample

• Cluster sampling is useful when- Population frame and stratum characteristics are not readily available- It is too expensive to obtain a simple or stratified sample- The cost of obtaining data increases sharply with distance- Some loss of reliability is acceptable

Sampling Methods the population?

• Judgment Sample

• A nonprobability sampling method that relies on the expertise of the sampler to choose items that are representative of the population.

• Can be affected by subconscious bias (i.e., nonrandomness in the choice).

• Quota sampling is a special kind of judgment sampling, in which the interviewer chooses a certain number of people in each category.

Sampling Methods the population?

• Convenience Sample

• Take advantage of whatever sample is available at that moment. A quick way to sample.

• Sample Size

• Sample size depends on the inherent variability of the quantity being measured and on the desired precision of the estimate.

Data Sources the population?

• Useful Data Sources

Survey Research the population?

• Basic Steps of Survey Research

• Step 1: State the goals of the research

• Step 2: Develop the budget (time, money, staff)

• Step 3: Create a research design (target population, frame, sample size)

• Step 4: Choose a survey type and method of administration

Survey Research the population?

• Basic Steps of Survey Research

• Step 5: Design a data collection instrument (questionnaire)

• Step 6: Pretest the survey instrument and revise as needed

• Step 8: Code the data and analyze it

Survey Research the population?

• Survey Types

Survey Research the population?

• Survey Types

Survey Research the population?

• Survey Types

Survey Research the population?

• Survey Types

Survey Research the population?

• Survey Types

Survey Research the population?

• Survey Guidelines

Plan What is the purpose of the survey? Consider staff expertise, needed skills, degree of precision, budget.

Design Invest time and money in designing the survey. Use books and references to avoid unnecessary errors.

Quality Take care in preparing a quality survey so that people will take you seriously.

Survey Research the population?

• Survey Guidelines

Pilot Test Pretest on friends or co-workers to make sure the survey is clear.

Buy-in Improve response rates by stating the purpose of the survey, offering a token of appreciation or paving the way with endorsements.

Expertise Work with a consultant early on.

Survey Research the population?

• Consider hiring a consultant in the early stages.

• Many resources are available to help - The American Statistical Association- The Research Industry Coalition- The Council of American Survey Research Organizations

Survey Research the population?

• Questionnaire Design

• Use a lot of white space in layout.

• Begin with short, clear instructions.

• State the survey purpose.

• Assure anonymity.

• Instruct on how to submit the completed survey.

Survey Research the population?

• Questionnaire Design

• Break survey into naturally occurring sections.

• Let respondents bypass sections that are not applicable (e.g., “if you answered no to question 7, skip directly to Question 15”).

• Pretest and revise as needed.

• Keep as short as possible.

Survey Research the population?

• Questionnaire Design

Survey Research the population?

• Questionnaire Design

Survey Research the population?

• Questionnaire Design

Survey Research the population?

• Question Wording

• The way a question is asked has a profound influence on the response. For example,

• Shall state taxes be cut?

• Shall state taxes be cut, if it means reducing highway maintenance?

• Shall state taxes be cut, it is means firing teachers and police?

Survey Research the population?

• Question Wording

• Make sure you have covered all the possibilities. For example,

Are you married?  Yes  No

• Overlapping classes or unclear categories are a problem. For example,

How old is your father?  35 – 45  45 – 55 55 – 65  65 or older

Survey Research the population?

• Coding and Data Screening

• Responses are usually coded numerically (e.g., 1 = male 2 = female).

• Missing values are typically denoted by special characters (e.g., blank, “.” or “*”).

• Discard questionnaires that are flawed or missing many responses.

• Watch for multiple responses, outrageous or inconsistent replies or range answers.

• Follow-up if necessary and always document your data-coding decisions.

Survey Research the population?

• Sources of Error

Survey Research the population?

• Data File Format

• Enter data into a spreadsheet or database as a “flat file” (n subjects x m variables matrix).

Survey Research the population?