IS 4800 Empirical Research Methods for Information Science Class Notes Feb 8, 2012. Instructor: Prof. Carole Hafner, 446 WVH [email protected] Tel: 617-373-5116 Course Web site: www.ccs.neu.edu/course/is4800sp12/. Outline. Assignment 2: Relational Agents for Patient Education study
IS 4800 Empirical Research Methods
for Information Science
Class Notes Feb 8, 2012
Instructor: Prof. Carole Hafner, 446 WVH
[email protected] Tel: 617-373-5116
Course Web site: www.ccs.neu.edu/course/is4800sp12/
Respondents are given a list of alternatives and check the desired alternative
Respondents are asked to answer a question in their own words
An “Other” alternative is added to a restricted item, allowing the respondent to write in an alternative
Respondents circle a number on a scale (e.g., 0 to 10) or check a point on a line that best reflects their opinions
Two factors need to be considered
Number of points on the scale
How to label (“anchor”) the scale (e.g., endpoints only or each point)
A Likert Scaleis a scale used to assess attitudes
Respondents indicate the degree of agreement or disagreement to a series of statements
I am happy.
Disagree 1 2 3 4 5 6 7 Agree
A Semantic Differential Scaleallows participate to provide a rating within a bipolar space
How are you feeling right now?
Sad 1 2 3 4 5 6 7 Happy
Sample Survey Questionshttp://www.custominsight.com/survey-question-types.aspComposite Measures
Constructs are general codifications of experience and observations.
Observe differences in social standing -> concept of social status.
Observe differences in religious commitment -> concept of religiosity
Most psychological constructs have no ultimate definitions
Constructs are ad hoc summaries of experience and observations
Indexes (aka “scales”) provide an ordinal ranking of respondents with respect to a construct of interest (e.g., liking of computers)
Usually assessed through a series of related questions.
It is seldom possible to arrive at a single question that adequately represents a complex variable.
Any single item is likely to misrepresent some respondents (e.g., church-going)
A single item may not provide enough variation for your purposes.
Single items give crude assessments; several items give a more comprehensive and accurate assessment.
The process of specifying empirical observations that are indicators of the concept of interest
Begin by enumerating all the subdimensions (“factors”) of the concept
Review previous research
E.g., going to church
Acceptance of religious beliefs
Extent of knowledge about religion
Range of religious experiences
Extent to which religion guides social decisions
(there are many others)
Also think about related measures which should not be indicators of your construct
In particular if you will be measuring another related variable, make sure none of your indicators include any attributes of it.
Want to study the relationship between religiosity and attitudes towards war => including a question about adherence to “peace on earth” doctrine is not a good idea.
All items measure same concept
Should provide variance in responses
Don’t pick items that classify everyone one way.
If you are interested in a binary classification (e.g., liberal vs. conservative), each item should split respondents roughly in half
Negate up to half of the items to avoid response bias.
Every pair of items should be related, but not too strongly
Scoring high on item A should increase likelihood of scoring high on item B
But, if two items are perfectly correlated (e.g. one logically implies the other), then one can be dropped.
Should also look at combinations of >2 items to ensure that they all provide additional information.
Average the item scores
Weight items equally unless you have a compelling reason to do otherwise
Impute average/intermediate score
“Last value forward” for repeated measures
Many other strategies
“NU Husky Fanatic”
What are some factors?
What are some items per factor?
Previous measures, theoretical concepts
Brainstorm on Factors
Brainstorm on Items
Preliminary /Validity Reliability testing
INTERNAL VALIDITY is the degree to which your design tests what it was intended to test
In an experiment, internal validity means showing the observed difference in the dependent variable is truly caused by changes in the independent variable
In correlational research, internal validity means that observed difference in the value of the criterion variable are truly related to changes in the predictor variable
Internal validity is threatened by Extraneous and Confounding variables
Internal validity must be considered during the design phase of research
EXTERNAL VALIDITY is the degree to which results generalize beyond your sample and research setting
External validity is threatened by the use of a highly controlled laboratory setting, restricted populations, pretests, demand characteristics, experimenter bias, and subject selection bias (such as volunteer bias)
Steps taken to increase internal validity may decrease external validity and vice versa
Internal validity may be more important in basic research; external validity, in applied research
Internal vs. External Validity of a study..
You want to evaluate a new sensor to detect whether people are happy or not.
You hire actors and randomly assign them to act happy or sad, and test your sensors on them.
What kind of validity (internal/external) might be challenged?
You conduct the “Conversational Agents to Promote Health Literacy” study by assigning the first 30 patients who volunteer to the intervention group, and the next 30 to the control group.
What kind of validity (internal/external) might be challenged?
The laboratory setting
Affords greatest control over extraneous variables
Attempt to recreate the real world in the laboratory
Realism is an issue
The field setting
Study conducted in a real world environment
Field experiment: Manipulate variables in the field
High degree of external validity, but internal validity may be low
Validating a Composite Measure
For psychological measures, these are collectively referred to as a measure’s “psychometrics”.
A reliable measure produces similar results when repeated measurements are made under identical conditions
Reliability can be established in several ways
Test-retest reliability: Administer the same test twice
Parallel-forms reliability: Alternate forms of the same test used
Split-half reliability: Parallel forms are included on one test and later separated for comparison
For surveys, this also encompasses internal consistency:
Do all of the questions address the same underlying construct of interest?
That is, do scores covary?
A standard measure is Cronbach’s alpha
0 = no correlation
1 = scores always covary in the same way
0.7 used as conventional threshold
Check to be sure the items on your questionnaire are clearly written and appropriate for those who will complete your questionnaire
Increase the number of items on your questionnaire
Standardize the conditions under which the test is administered (e.g., timing procedures, lighting, ventilation, instructions)
Make sure you score your questionnaire carefully, eliminating scoring errors
Characteristics of Individuals Who Volunteer for Research
1.tend to be more highly educated than nonvolunteers
2.tend to come from a higher social class than nonvolunteers
3.are of a higher intelligence in general, but not when volunteers for atypical research (such as hypnosis, sex research)
4.have a higher need for approval than nonvolunteers
5.are more social than nonvolunteers
Volunteers are more “arousal seeking” than nonvolunteers (especially when the research involves stress)
Individuals who volunteer for sex research are more unconventional than nonvolunteers
Females are more likely to volunteer than males, except when the research involves physical or emotional stress
Volunteers are less authoritarian than nonvolunteers
Jews are more likely to volunteer than Protestants; however, Protestants are more likely to volunteer than Catholics
Volunteers have a tendency to be less conforming than nonvolunteers, except when the volunteers are female and the research is clinically oriented
Source: Adapted from Rosenthal & Rosnow, 1975.
Remedies for Volunteer Bias
The degree to which a measure corresponds to what happens in the real world.
Assessing productivity/day in the lab vs.
Assessing productivity/day in the office
Is a dependent measure sensitive enough to detect behavior change?
An insensitive measure will not detect subtle behaviors
Occur when a dependent measure has an upper or lower limit
Ceiling effect: When a dependent measure has an upper limit
Floor effect: When a dependent measure has a lower limit.
You want to assess the effect of TV viewing on whether people like large computer monitors or not (yes/no).
You run an experiment in which participants are randomized to watch either 2 hrs or 0 hrs of TV per day for a week, then answer your question.
What’s going on?
2No TV Yes
4No TV Yes
Say you decide you need a new survey measure, “attitude towards large computer monitors” (ATLCM)
I like big monitors.
Big monitors make me nervous.
I prefer small monitors, even if they cost more.
7-pt Likert scales
How would you validate this measure?
You want to assess the effect of TV viewing on attitude towards large computer monitors (ATLCM).
You run an experiment in which participants are randomized to watch either 2 hrs or 0 hrs of TV per day for a week, then fill out the ATLCM.
What’s going on?
2No TV 6.7
4No TV 7.0
A valid measure measures what you intend it to measure
Very important when using psychological tests (e.g., intelligence, aptitude, (un)favorable attitude)
Validity can be established in a variety of ways
Face validity: Assessment of adequacy of content. Least powerful method
Content validity: How adequately does a variable sample the full range of behavior it is intended to measure?
Criterion-related validity: How adequately does a test score match some criterion score? Takes two forms
Concurrent validity: Does test score correlate highly with score from a measure with known validity?
Predictive validity: Does test predict behavior known to be associated with the behavior being measured?
Construct validity: Do the results of a test correlate with what is theoretically known about the construct being evaluated?
Convergent validity (subtype): measures of constructs that should be related to each other are
Discriminant validity (subtype): measures of constructs that should not be related are not
You should obtain a representative sample
The sample closely matches the characteristics of the population
A biased sample occurs when your sample characteristics don’t match population characteristics
Biased samples often produce misleading or inaccurate results
Usually stem from inadequate sampling procedures