- 49 Views
- Uploaded on
- Presentation posted in: General

Experimental Methods in Social Ecological Systems

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Experimental Methods in Social Ecological Systems

Juan-Camilo Cárdenas

Universidad de los Andes

Jim Murphy

University of Alaska Anchorage

- Noon–12:15Welcome, introductions
- 12:15 – 1:15Play Game #1 (CPR: 1 species vs. 4 species)
- 1:15 – 2:00Debrief game #1 and other results from the field
- 2:00 – 2:15Break
- 2:15 – 3:15Game #2 (Beans game)
- 3:15 – 4:00Debrief Game #2
- 4:00 – 4:15Break
- 4:15 – 5:00Basics of Experimental design
- Homework for Day 2: Think of an interesting question or problem to be worked in groups tomorrow

- 8:30 – 9:15Designing and running experiments in the field
- 9:15 – 10:15Classwork: work in groups solving experimental design problems
- 10:15 – 10:30Break
- 10:30 – 11:15Discussion on group solutions
- 11:15 – noonBegin design your own experiment(form groups based on best ideas proposed)
- Noon – 1:00 Lunch
- 1:00 – 1:30Continue design your own experiment (work in groups)
- 1:30 – 2:30Present designs
- 2:30 – 3:00Feedback: how could we make this workshop better?

- We will create a web site with materials from the workshop.
- Please give us your email address (write neatly!!) and we will send you a link when it is ready.

1. “Speaking to Theorists”

- Test a theory or discriminate between theories
- Compare theoretical predictions with experimental observations
- Does non-cooperative game theory accurately predict aggregate behavior in an unregulated CPR?

- Explore the causes of a theory’s failure
- If what you observe in the lab differs from theory, try to figure out why.
- Communication increases cooperation in a CPR even though it is “cheap talk”
- Why?

- Is my experiment designed correctly?
- What caused the failure?
- Theory stress tests (boundary experiments)

- If what you observe in the lab differs from theory, try to figure out why.

2. “Searching for Facts”

- Establish empirical regularities as a basis for new theory
- In most sciences, new theories are often preceded by much observation.
- “I keep noticing this. What’s going on here?”
- The Double Auction
- Years of experimental data showed its efficiency even though no formal models had been developed to explain why this was the case.

- Behavioral Economics
- Many experiments identifying anomalies, but have not yet developed a theory to explain.

- In most sciences, new theories are often preceded by much observation.

3. “Whispering in the Ears of Princes”

- Evaluate policy proposals
- Alternative institutions for auctioning emissions permits
- Allocating space shuttle resources

- Test bed for new institutions
- Electric power markets
- Water markets
- Pollution permits
- FCC spectrum licenses

- Common pool resource experiment
- Social dilemma
- Individual vs group interests
- Benefits to cooperation
- Incentives to not cooperate

- Social dilemma
- Field experiments in rural Colombia
- Groups of 5 people
- Decide how much to extract/harvest from a shared natural resource

Subjects choose a level of extraction 0 – 8

Low harvest levels

(“conservative”)

High harvest levels

Payoffs also depend on choices of other 4 group members

Group earnings largest if all choose 1

Strong incentives to harvest more than 1

Social optimum:All choose 1

Nash equilibrium:All choose 6

- The early CPR experiments typically used payoff tables.
- We don’t live in a world of payoff tables
- Frames how a person should think about the game
- A lot of numbers, hard to read
- Too abstract??

- More recent CPR experiments using richer ecological contexts
- e.g., managing a fishery is different than an irrigation system

- To explore interaction between:
- Formal regulations imposed on a community to conserve local natural resources
- Informal non-binding verbal agreements to do the same.

- Groups of N=5 participants
- Play 10 rounds of one of the 6 treatments
- Enforcement
- Individual harvest quota = 1 (Social optimum)
- Exogenous probability of audit
- Fine (per unit violation) if caught exceeding quota

- Participants paid based on cumulative earnings in all 10 rounds

These 2 treatments have been conducted ad nauseum.

Are they necessary?

- Replication
- In any experimental science, it is important for key results to be replicated to test robustness
- Link to previous research. Is your sample unique?

- Baseline or control group
- The baseline treatment also gives us a basis for evaluating what the effects are of each treatment
- In any experimental study, it is crucial to think carefully about the relevant control!

- Stage 1 – Baseline CPR (5 rounds)
- Stage 2 – one of the 5 remaining treatments (5 rounds)
- Comm only
- Low
- Low + Comm
- Med
- Med + Comm

- Advantage – Having all groups play Stage 1 baseline facilitates a clean comparison across groups.
- Disadvantage – fewer rounds of the Stage 2 treatments. Enough time to converge??
- Disadvantage(?) – All stage 2 decisions conditioned upon having already played a baseline

- Groups of N=5 participants
- How many groups per treatment cell?

Also see:John A. List · Sally Sadoff · Mathis Wagner

“So you want to run an experiment, now what? Some simple rules of thumb for optimal experimental design”

Experimental Economics (2011). 14:439-457

A. 0 (control) / 1 (treatment), equal outcome variances

B. 0/1 treatment, unequal outcome variances

C. Treatment Intensity—no longer binary

D. Clusters

Assume that X0is N(μ0,σ02) and X1 is N(μ1, σ12); and the minimum detectable effect μ1– μ0= δ. H0: μ0= μ1and H1: μ1– μ0= δ. We need the difference in sample means X1 –X0to satisfy:

1.Significance level (probability of Type I error) = α:

2. Power (1 – probability of Type II error) = 1-β:

A. Our usual approach stems from the standard regression model: under a true null what is the probability of observing the coefficient that we observed?

B. Power calculations are quite different, exploring if the alternative hypothesis is true, then what is the probability that the estimated coefficient lies outside the 95% CI defined under the null.

- Solving equations 1 and 2 assuming equal variances σ12 = σ22:
- Note that the necessary sample size
- Increases rapidly with the desired significance level (ta/2) and power (tb).
- Increases proportionally with the variance of outcomes (s).
- Decreases inversely proportionally with the square of the minimum detectable effect size (d).

- Sample size depends on the ratio of effect size to standard deviation. Hence, effect sizes can just as easily be expressed in standard deviations.

- Standard is to use α=0.05 and have power of 0.80 (β=0.20).
- So if we want to detect a one-standard deviation change using the standard approach, we would need:
- n = 2(1.96 + 0.84)2*(1)2 = 15.68 observations in each cell
- ½ std. dev. change is detectable with 4*15.68 ~ 64 observations per cell
- n=30 seems to be the magic number in many experimental studies: ~ 0.70 std. dev. change.

- Assuming α =0.05 and β = 0.20 requires n subjects:
- α = 0.05 and β = 0.05 1.65 × n
- α = 0.01 and β = 0.20 1.49 × n
- α = 0.01 and β = 0.05 2.27 × n

- Local homeless shelter was conducting a fundraising campaign.
- They asked us to replicate List’s study about the effects of matching contributions.
- The shelter wanted the same 4 treatments as in List:
- No match, 1:1, 2:1, and 3:1 to test whether high match ratios would increase contributions.

- Local oil company agreed to donate up to $5000 to provide a match for money donated.

- The shelter had funds to send out 16,000 letters to high income women in Anchorage who had never donated before.
- Expected response rate was about 3 to 4% (n»480-640)
- Question: How many treatments should we run, if we expect about 500 responses?
- They said a “meaningful” treatment effect would be ~$25.
- Standard deviation from previous campaigns was ~$100.

- With only 500 expected responses, we could only conduct 2 treatments.

Another Rule of Thumb—if the outcome variances are not equal then:

The ratio of the optimal proportions of the total sample in control and treatment groups is equal to the ratio of the standard deviations.

Example: Communication tends to reduce the variance, so perhaps groups in this treatment.

- How many levels of enforcement do we need?

Do we need 3 levels of enforcement?

- Assume that you are interested in understanding the intensity of treatment :
- Level of enforcement (e.g., audit probability)
- Assume that the outcome variance is equal across various cells.

- How should you allocate the sample if audit probability could be between 0-1?
- For simplicity, say X=25%, 50%, or 75%

- Assume that you have 1000 subjects available.

Y = XB + e

One goal in this case is to derive the most precise estimate of B by using exogenous

variation in X.

Recall that the standard error of B is =

var(e)/n*var(X)

Linear

Quadratic

- ½ sample @ X=25%
- 0 @X=50%
- ½ @ X=75%

Intuition:The test for a quadratic effect compares the mean of the outcomes at the extremes to the mean of the outcome at the midpoint

- What happens when the level of randomization differs from the unit of observation? Think of randomization at the village level, or at the store level, and outcomes are observed at the individual level.
- Classic example: comparing two textbooks.
- Randomization over classrooms
- Observations at individual level
Another Example:

- Classic example: comparing two textbooks.
- To test robustness of results, you may want to conduct the experiments in multiple communities.
- How do you allocate treatments across communities, especially if number of participants per village is small?
- In our Colombian enforcement study, we replicated the entire design in three regions.
- In a separate CPR experiment in Russia, we visited 3 communities in one region. Each treatment was conducted 1x in each community.
- We are assuming that the differences across communities are small.
- Cannot make cross-community comparison

- Real Sample Size (RSS) = mk/CE
m = number of subjects in a cluster

k = number of clusters

CE = 1 + ρ(m-1)

ρ = intracluster correlation coefficient

= s2B/(s2B + s2w)

s2B = variance between clusters

s2w = variance within clusters

- Advantages
- Independence among the factor variables
- Can explore interactions between factors

- Disadvantages
- Number of treatments grows quickly with increase in number of factors or levels within a factor
- Example: Conduct experiment in multiple communities and use community as a treatment variable

- Number of treatments grows quickly with increase in number of factors or levels within a factor

- Say we want to add informal sanctions with a 3:1 ratio
- I can pay $3 to reduce your earnings by $1
- 1 new “factor” with 2 “levels”

- To run all combinations would require 2x2x2 = 8 treatments
- Assume optimal sample size per cell is 6 groups of 5 people (30 total per cell)
- 8 treatments x 30 people/cell = 240 people
- Assume you can only recruit about half that (~120)
- You could run only 3 groups per cell (15 people) – lose power/significance

- Solution: conduct a balanced subset of treatments

- If you are considering this approach, there are a few different design options depending upon the effects you want to capture, number of treatments, etc.
- This is just one example!

Communication

Sanctions

ExternalEnforcement

- Advantage: dramatically reduces the number of trials
- Disadvantage: achieves balance by systematically confounding some direct effects with some interactions.
- It may not be serious, but you will lose the ability to analyze all of the different possible interactions.

- Other factors of little or no primary interest that can also affect decisions. These nuisance effects could be significant.
- Common examples
- Gender, age, nationality (most socio-economic vbls)
- Selection bias
- Recruitment -- open to whoever shows up vs random selection

- Experience
- Participated in previous experiments

- Learning
- Concern in multi-round experiments

- Non-experiment interactions
- People talking before an experiment while waiting to start
- In a community, people may hear about experiment from others

- Confounding occurs when the effects of two independent variables are intertwined so that you cannot determine which of the variables is responsible for the observed effect.
- Example:
- What are some potential confounds when comparing the Baseline with Low?

- If trying to identify factors that influence decisions, try adding them one at a time.
- Imposing a fine for non-compliance differs from the baseline CPR in multiple ways. Possible confounds:
- FRAME
- The simple existence of a quota may send a signal about expected behavior, independent of any audits or fines.

- GUILT = FRAME + audit
- Getting audited may generate feelings of guilt because the individual is privately reminded about anti-social choices

- FINE = FRAME + GUILT (audit) + fine for violations

- FRAME
- Are people responding to the expected penalty? Or are they responding to the frame from the quota?

- conditions of interest (wanted)
- measurement error (unwanted)
- People can make mistakes, misunderstand instructions, typos

- experimental material and process (unwanted)
- No two people are identical, and their responses to the same situation may not be the same, even if your theory predicts otherwise.

- Isolate the effects of interest
- Control what you can
- Randomize the rest

- Think carefully about your research question
- Formulate testable hypotheses grounded in theory
- How does your idea contribute to the literature?

- Think carefully about possible results and how they would be interpreted
- What if results are consistent with theory/expectations?
- What if they are not?
- Be prepared for either possibility

- Prepare code for data analysis BEFORE running experiments
- Forces you to think carefully about what your data will look like, and what you want to get out of it.

- Are your data discrete, binary or continuous?
- Multinomial logit, ordered probit, logit, Poission, linear

- Repeated observations or one-shot decisions
- Random effects, hierarchical mixed models, nonparametrics

- Subject payments and salience
- One distinguishing feature of economic experiments is that subjects are paid based on their decisions and possibly the decisions of others
- Must pay enough for subjects to take experiment seriously
- Avoid tournaments
- E.g., giving a bonus to person who earns the most money

- Typically pay in cash, in some field experiments may use another medium

- Never use deception!
- Keep earnings and decisions private

- Think carefully about every word in your instructions
- Framing effects
- “partner” in the UG or your “opponent”
- Could frame UG as an offer to “sell” at a price

- Using examples
- I used the example of $14/$6 split.
- Does that suggest proposers should take more than half?
- What if I used a 10/10 split? Or 6/14?
- Could give multiple examples…

- I used the example of $14/$6 split.

- Framing effects
- Experiment length
- Be aware that people get tired and bored

- Strategy method
- Hot vs cold decisions

- Paying for just one round in multi-round game
- AB-BA designs for within-subject comparisons
- Playing multiple games and paying for just one
- Factor levels should allow for “enough distance” between hypotheses
- Social optimum is people will harvest 10% of the fish
- Nash equilibrium predicts 15%.
- Nash equilibrium & social optimum should be “farther apart”