richard scheines carnegie mellon university n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Richard Scheines Carnegie Mellon University PowerPoint Presentation
Download Presentation
Richard Scheines Carnegie Mellon University

Loading in 2 Seconds...

play fullscreen
1 / 57

Richard Scheines Carnegie Mellon University - PowerPoint PPT Presentation


  • 119 Views
  • Uploaded on

Causal Graphical Models II: Applications with Search. Richard Scheines Carnegie Mellon University. Case Studies. Foreign Investment Welfare Reform Online Learning Charitable Giving Stress & Prayer Test Anxiety Causal Connectivity among Brain Regions. Case Studies. Exceedingly simple

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Richard Scheines Carnegie Mellon University' - saburo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2

Case Studies

  • Foreign Investment
  • Welfare Reform
  • Online Learning
  • Charitable Giving
  • Stress & Prayer
  • Test Anxiety
  • Causal Connectivity among Brain Regions
slide3

Case Studies

  • Exceedingly simple
  • Background theory weak
  • Claim:
    • Not: search output is true
    • Is: search adds value
case study 1 foreign investment
Does Foreign Investment in 3rd World Countries cause Political Repression?Case Study 1: Foreign Investment

Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, 141-146.

N = 72

PO degree of political exclusivity

CV lack of civil liberties

EN energy consumption per capita (economic development)

FI level of foreign investment

correlations

Case Study 1: Foreign Investment

Correlations

po fi en

fi -.175

en -.480 0.330

cv 0.868 -.391 -.430

regression results

Case Study 1: Foreign Investment

Regression Results

po = .227*fi - .176*en + .880*cv

SE (.058) (.059) (.060)

t 3.941 -2.99 14.6

Interpretation: foreign investment increases political repression

alternatives

Case Study 1: Foreign Investment

Alternatives

There is no model with testable constraints (df > 0) in which FI has a positive effect on PO that is not rejected by the data.

case study 2 welfare reform
Case Study 2: Welfare Reform

Single Mothers’ Self-Efficacy,

Parenting in the Home Environment, and  

Children’s Development in a Two-Wave Study

(Social Work Research, 29, 1, 7-20)

Aurora Jackson, Richard Scheines

two wave longitudinal study

Case Study 2: Welfare Reform

Two-Wave Longitudinal Study
  • Longitudinal Data
    • Time 1: 1996-97 (N = 188)
    • Time 2: 1998-99 (N = 178)
  • Single black mothers in NYC
  • Current and former welfare recipients
  • With a child who was 3 – 5 at time 1, and 6 to 8 at time 2
constructs scales measures

Case Study 2: Welfare Reform

Constructs/Scales/Measures
  • Employment Status
  • Perceived Self-efficacy
  • Depressive Symptoms
  • Quality of Mother/Father Relationship
  • Father/Child Contact
  • Quality of Home Environment
  • Behavior Problems
  • Cognitive Development
background knowledge

Case Study 2: Welfare Reform

Background Knowledge
  • Tier 1:
    • Employment Status
  • Tier 2:
    • Depression
    • Self-efficacy
    • Mother/Father Relationship
    • Father/Child Contact
    • Mother’s Parenting/HOME
  • Tier 3:
    • Negative Behaviors
    • Cognitive Development

Over 22 million path models consistent with these constraints

slide12

Case Study 2: Welfare Reform

Conceptual Model

c2 = 22.3, df = 20, p = .32

Tetrad Equivalence Class

c2 = 18.87, df = 19, p = .46

slide13

Case Study 2: Welfare Reform

Points of Agreement:

  • Mother’s Self-Efficacy mediates the effect of Employment on all other variables.
  • Home environment mediates the effect of all other factors on outcomes: Cog. Develop and Prob. Behaviors

Conceptual Model

Points of Disagreement:

  • Depression key cause vs. only an effect

Tetrad

variables

Case Study 3: Online Courseware

Variables
  • Pre-test (%)
  • Print-outs (% modules printed)
  • Quiz Scores (avg. %)
  • Voluntary Exercises (% completed)
  • Final Exam (%)
  • 9 other variables

Tier 1

Tier 2

Tier 3

variables1

Case Study 4: Charitable Giving

Variables

Cryder & Loewenstein (in prep)

  • Tangibility/Concreteness (Exp manipulation)
  • Imaginability (likert 1-7)
  • Impact (avg. of 2 likerts)
  • Sympathy (likert)
  • Donation ($)
theoretical model

Case Study 4: Charitable Giving

Theoretical Model

study 1 (N= 94) df = 5, c2 = 52.0, p= 0.0000

ges outputs

Case Study 4: Charitable Giving

GES Outputs

study 1:df = 5, c2 = 5.88, p= 0.32

study 1:df = 5, c2 = 3.99, p= 0.55

theoretical model1

Case Study 4: Charitable Giving

Theoretical Model

study 2:df = 5, c2 = 8.23, p= 0.14

study 2 (N= 115) df = 5, c2 = 62.6, p= 0.0000

study 2:df = 5, c2 = 7.48, p= 0.18

build pure clusters
Build Pure Clusters

Output - provably reliable (pointwise consistent):

Equivalence class of measurement models over a pure subset of measures

True Model

Output

build pure clusters1
Build Pure Clusters
  • Qualitative Assumptions
  • Two types of nodes: measured (M) and latent (L)
  • M L (measured don’t cause latents)
  • Each m  M measures (is a direct effect of) at least one l  L
  • No cycles involving M
  • Quantitative Assumptions:
  • Each m  M is a linear function of its parents plus noise
  • P(L) has second moments, positive variances, and no deterministic relations
case study 5 stress depression and religion

Specified Model

Case Study 5: Stress, Depression, and Religion
  • MSW Students (N = 127) 61 - item survey (Likert Scale)
  • Stress: St1 - St21
  • Depression: D1 - D20
  • Religious Coping: C1 - C20

p = 0.00

slide25

Case Study 5: Stress, Depression, and Religion

  • Assume Stress temporally prior:
  • MIMbuild to find Latent Structure:

p = 0.28

case study 6 test anxiety
Case Study 6: Test Anxiety

Bartholomew and Knott (1999), Latent variable models and factor analysis

12th Grade Males in British Columbia (N = 335)

20 - item survey (Likert Scale items): X1 - X20:

Exploratory Factor Analysis:

slide27

Case Study 6: Test Anxiety

Build Pure Clusters:

case study 6 test anxiety1
Case Study 6: Test Anxiety

Build Pure Clusters:

Exploratory Factor Analysis:

p-value = 0.00

p-value = 0.47

case study 6 test anxiety2

MIMbuild

Scales: No Independencies or Conditional Independencies

p = .43

Uninformative

Case Study 6: Test Anxiety
slide30

Case Study 7: fMRI Brain Connectivity

  • Goals:
    • Identify relatively BIG brain regions (ROIs).
    • Figure out how they influence one another, with what timing sequences, in producing behaviors of interest.
    • Figure out individual differences.
case study 7 fmri
Case Study 7: fMRI
  • Experiment: (Xue and Poldrack, unpublished)
    • 13 right handed subjects
    • On each trial, subject judged whether visual stimuli rhymed or not
    • 8 pairs of words/nonwords presented for 2.5 seconds each in eight 20 second blocks, separated by 20 seconds of visual fixation
    • TR = 2000 milliseconds
    • 160 time points.
slide32

Case Study 7: fMRI Brain Connectivity

  • Problems:
    • Criteria for identifying ROIs
    • Individuals differ
      • Brain ROIs
      • Parameter values
    • Brain processing is cyclic
    • Time:
      • Varying time delays of neuron  ROI BOLD response
      • Time series sampling rate vs. processing rate
    • Search Space
      • 11 ROIs – 323 DAGs
roi construction

Case Study 7: fMRI

ROI Construction
  • Mean of signal intensity among voxels in a cluster at a time
  • 1st or ....4th principal component
  • Average of top X% variance
  • Maximum variance voxel.
  • Eyeballs
  • Etc., etc
slide35

Case Study 7: fMRI Brain Connectivity

  • Individuals differ
    • Brain ROIs
    • Parameter values
  • Assume
    • same qualitiative causal structure
    • different quantitative causal structure (mixed effects)
  • iMAGES search
    • Apply GES to each subject, 1 step
    • Take step = max(avg. BIC score) to each search
    • Repeat
time problem 1

Case Study 7: fMRI

Time Problem 1
  • fMRI recordings at time intervals can be analyzed as a collection of independent cases.
  • Or, they can be analyzed as an auto-regressive time series.
  • Which is better?
    • No general answer.
    • But if you think the neural activities measured at time t influence the measurements at time t+1 then the data should be treated as a lag 1 auto-regressive time series.
    • But then Granger causality isn’t a consistent estimator of causal relations.
granger causality corrected

Case Study 7: fMRI

Granger Causality Corrected

Causal processes faster than the sampling rate:

Xt Xt+1 X

Yt Yt+1 Y

Zt Zt+1 Z

Regress on t variables

Apply GES to the RESIDUALS of the regression

(Demiralp, Hoover)

NO False path

time problem 2

Case Study 7: fMRI

Time Problem 2
  • Varying time delays : neurons  BOLD responses
  • Try all time shifts of one or two units over all subsets of 3 vars, choose shift that leads to best likelihoods
simulation studies

Case Study 7: fMRI

Simulation Studies:
  • 11 ROIs, each consisting of 50 simulated neurons:
  • Neuron output spikes simulated by thresholding a tanh function of the sum of neuron inputs.
  • Excitatory feedback
  • Random subset of neurons in one ROI input to random subset of neurons in an “effectively connected ROI”
  • Measured variables = BOLD function of sum of ROI neurons + Gaussian error with variance = error variances of empirical measured variables in the X/P experiment.
slide41

Case Study 7: fMRI

Simulate the Xue/Poldrack Experiment Time Series:

  • Repeat 10 times:
    • Randomly generate a graphical structure with 11 nodes and 11 (feedforward) directed edges
    • Randomly select a subset of simulated ROIs.
    • Generate data
    • Randomly shift 0 to 3 variables one or 2 time steps forward.
    • Apply the iMAGES method with 0 lag and 1 lag, with backshifting.
  • Tabulate the errors.
simulation results

Case Study 6: fMRI

Simulation Results

0 Lag:

Average number of false positive edges: 0.7

Average number of mis-directed edges: 1.6

1 Lag Residuals:

Average number of false positive edges: 1.2

Average number of mis-directed edges: 1.8

slide43

Other Cases

Climate Research

  • Glymour, Chu, , Teleconnections

Epidemiology

  • Scheines, Lead & IQ

Economics

  • Bessler, Pork Prices
  • Hoover, multiple
  • Cryder & Loewenstein, Charitable Giving

Biology

  • Shipley,
  • SGS, Spartina Grass

Educational Research

  • Easterday, Bias & Recall
  • Laski, Numerical coding

Neuroscience

  • Glymour & Ramsey, fMRI
straw men
Straw Men!
  • Model Search ignores theory
  • Model Search hides assumptions
  • Model Search needs more assumptions than standard statistical models
references
References

General

  • Spirtes, P., Glymour, C., Scheines, R. (2000). Causation, Prediction, and Search, 2nd Edition, MIT Press.
  • Pearl, J. (2000). Causation: Models of Reasoning and Inference, Cambridge University Press.

Biology

  • Chu, Tianjaio, Glymour C., Scheines, R., & Spirtes, P, (2002). A Statistical Problem for Inference to Regulatory Structure from Associations of Gene Expression Measurement with Microarrays. Bioinformatics, 19: 1147-1152.
  • Shipley, B. Exploring hypothesis space: examples from organismal biology. Computation, Causation and Discovery. C. Glymour and G. Cooper. Cambridge, MA, MIT Press.
  • Shipley, B. (1995). Structured interspecific determinants of specific leaf area in 34 species of herbaceous angeosperms. Functional Ecology 9.
references1
References

Scheines, R. (2000). Estimating Latent Causal Influences: TETRAD III Variable Selection and Bayesian Parameter Estimation: the effect of Lead on IQ, Handbook of Data Mining, Pat Hayes, editor, Oxford University Press.

Jackson, A., and Scheines, R., (2005). Single Mothers' Self-Efficacy, Parenting in the Home Environment, and Children's Development in a Two-Wave Study, Social Work Research , 29, 1, pp. 7-20.

Timberlake, M. and Williams, K. (1984). Dependence, political exclusion, and government repression: Some cross-national evidence. American Sociological Review 49, 141-146.

references2
References

Economics

Akleman, Derya G., David A. Bessler, and Diana M. Burton. (1999). ‘Modeling corn exports and exchange rates with directed graphs and statistical loss functions’, in Clark Glymour and Gregory F. Cooper (eds) Computation, Causation, and Discovery, American Association for Artificial Intelligence, Menlo Park, CA and MIT Press, Cambridge, MA, pp. 497-520.

Awokuse, T. O. (2005) “Export-led Growth and the Japanese Economy: Evidence from VAR and Directed Acyclical Graphs,” Applied Economics Letters 12(14), 849-858.

Bessler, David A. and N. Loper. (2001) “Economic Development: Evidence from Directed Acyclical Graphs” Manchester School 69(4), 457-476.

Bessler, David A. and Seongpyo Lee. (2002). ‘Money and prices: U.S. data 1869-1914 (a study with directed graphs)’, Empirical Economics, Vol. 27, pp. 427-46.

Demiralp, Selva and Kevin D. Hoover. (2003) !Searching for the Causal Structure of a Vector Autoregression," Oxford Bulletin of Economics and Statistics 65(supplement), pp. 745-767.

Haigh, M.S., N.K. Nomikos, and D.A. Bessler (2004) “Integration and Causality in International Freight Markets: Modeling with Error Correction and Directed Acyclical Graphs,” Southern Economic Journal 71(1), 145-162.

Sheffrin, Steven M. and Robert K. Triest. (1998). ‘A new approach to causality and economic growth’, unpublished typescript, University of California, Davis.

references3
References

Economics

Swanson, Norman R. and Clive W.J. Granger. (1997). ‘Impulse response functions based on a causal approach to residual orthogonalization in vector autoregressions’, Journal of the American Statistical Association, Vol. 92, pp. 357-67.

Demiralp, S., Hoover, K., & Perez, S. A Bootstrap Method for Identifying and Evaluating a Structural Vector Autoregression Oxford Bulletin of Economics and Statistics, 2008, 70, (4), 509-533

- Searching for the Causal Structure of a Vector Autoregression Oxford Bulletin of Economics and Statistics, 2003, 65, (s1), 745-767

  • Kevin D. Hoover, SelvaDemiralp, Stephen J. Perez, Empirical Identification of the Vector Autoregression: The Causes and Effects of U.S. M2*, This paper was written to present at the Conference in Honour of David F. Hendry at Oxford University, 2325 August 2007.
  • SelvaDemiralp and Kevin D. Hoover , Searching for the Causal Structure of a Vector Autoregression, OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 65, SUPPLEMENT (2003) 0305-9049

A. Moneta, and P. Spirtes “Graphical Models for the Identification of Causal Structures in Multivariate Time Series Model”, Proceedings of the 2006 Joint Conference on Information Sciences, JCIS 2006, Kaohsiung, Taiwan, ROC, October 8-11,2006, Atlantis Press, 2006.

lead and iq variable selection
Lead and IQ: Variable Selection

Final Variables (Needleman)

-lead baby teeth

-fab father’s age

-mab mother’s age

-nlb number of live births

-med mother’s education

-piq parent’s IQ

-ciq child’s IQ

needleman regression
Needleman Regression

- standardized coefficient

- (t-ratios in parentheses)

- p-value for significance

ciq = - .143 lead - .204 fab - .159 nlb + .219 med + .237 mab + .247 piq

(2.32) (1.79) (2.30) (3.08) (1.97) (3.87)

0.02 0.09 0.02 <0.01 0.05 <0.01

All variables significant at .1 R2 = .271

tetrad variable selection
TETRAD Variable Selection

Regression

mab _||_ ciq| { lead, med, piq, nlbfab}

fab _||_ ciq| { lead, med, piq, nlbmab}

nlb _||_ ciq| { lead, med, piq, mab, fab}

Tetrad

mab _||_ ciq

fab _||_ ciq

nlb _||_ ciq | med

regressions
Regressions

- standardized coefficient

- (t-ratios in parentheses)

- p-value for significance

Needleman (R2 = .271)

ciq = - .143 lead - .204 fab - .159 nlb + .219 med + .237 mab + .247 piq

(2.32) (1.79) (2.30) (3.08) (1.97) (3.87)

0.02 0.09 0.02 <0.01 0.05 <0.01

TETRAD (R2 = .243)

ciq = - .177 lead + .251 med + .253 piq

(2.89) (3.50) (3.59)

<0.01 <0.01 <0.01

measurement error
Measurement Error
  • Measured regressor variables are proxies that involve measurement error
  • Errors-in-all-variables model for Lead’s influence on IQ - underidentified
  • Strategies:
  • Sensitivity Analysis
  • Bayesian Analysis
prior over measurement error
Prior over Measurement Error

Proportion of Variance from Measurement Error

  • Measured Lead Mean = .2, SD = .1
  • Parent’s IQ Mean = .3, SD = .15
  • Mother’s Education Mean = .3, SD = .15

Prior Otherwise uninformative

posterior
Posterior

Robust over similar priors

Zero

using needleman s covariates
Using Needleman’s Covariates

With similar prior, the marginal posterior:

Zero

Very Sensitive to Prior Over Regressors

TETRAD eliminated