Causality challenge #2: Pot-Luck

Causality challenge #2:Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe Pellet, IBM Zürich Gregory F. Cooper, Pittsburg University Peter Spirtes, Carnegie Mellon

* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Motivations

Causality Workbench • February 2007: Project starts. Initial funding of the EU Pascal network. • August 15, 2007: Two-year grant from the US National Science Foundation. • December 15, 2007: Workbench made alive. First causality challenge: causation an prediction. • June 3-4, 2008: WCCI 2008, workshop to discuss the results of the first challenge. • September 15, 2008: Start pot-luck challenge. Target: NIPS 2008. • Fall, 2008: Start developing an interactive workbench.

Why a new challenge? • Causality challenge #1 • Favor “depth” • Single well defined task • Rigor of performance assessment • Causality challenge #2 • Favor “breadth” • Many different tasks • Encourage creativity

http://clopinet.com/causality 5

Pot-Luck challenge real • CYTO: Causal Protein-Signaling Networks in human T cells. Learn a protein signaling network from multicolor flow cytometry data. N=11 proteins, P~800 samples per experimental condition. E=9 conditions. • LOCANET: LOcal CAusal NETwork. Find the local causal structure around a given target variable (depth 3 network) in REGED, CINA, SIDO, MARTI. • PROMO: Simulated marketing task. Time series of 1000 promotion variables and 100 product sales. Predict a 1000x100 boolean influence matrix, indicating for each (i,j) element whether the ith promotion has a causal influence of the sales of the jth product. Data is provided as time series, with a daily value for each variable for three years. • SIGNET: Abscisic Acid Signaling Network. Determine the set of 43 boolean rules that describe the interactions of the nodes within a plant signaling network. 300 separate Boolean pseudodynamic simulations of the true rules. Model inspired by a true biological system. • TIED: Target Information Equivalent Dataset. Illustrates a case in which there are many equivalent Markov boundaries. Find them all. self eval real artif artif self eval artif artif

* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Learning causal structure

What is causality? • Many definitions. • Pragmatic (engineering) view: predicting the consequences of ACTIONS. • Distinct from making predictions in a stationary environment. • Canonical methodology: designed experiments. • Causal discovery from observational data.

The “language” ofcausal Bayesian networks • Bayesian network: • Graph with random variables X1, X2, …Xn as nodes. • Dependencies represented by edges. • Allow us to compute P(X1, X2, …Xn) as Pi P( Xi | Parents(Xi) ). • Edge directions have no meaning. • Causal Bayesian network: egde directions indicate causality.

Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Car Accident Small example LUCAS0: natural Markov boundary

Arrows indicate “mechanisms” If Lung Cancer (LC) is determined by Smoking (S) and Genetics (G), • In the language of BN, use the data table: P(LC=1| S=1, G=1)=… , P(LC=0| S=1, G=1)=… P(LC=1| S=1, G=0)=… , P(LC=0| S=1, G=0)=… P(LC=1| S=0, G=1)=… , P(LC=0| S=0, G=1)=… P(LC=1| S=0, G=0)=… , P(LC=0| S=0, G=0)=… • In the language of Structural Equation Models (SEM), use: LC = f(S, G) + noise where usually f is a linear function.

Common simplifications • Assume a Markov process • Assume a DAG • Assume causal sufficiency(no hidden common cause) • Assume stability or faithfulness(no particular parameterization implying dependencies not reflected by the structure) • Assume linearity of relationships • Assume Gaussianity of PDF’s • Discard relationships of low statistical significance • Focus on a local neighborhood of a target variable • Learn unoriented or partially oriented graphs • Assume uniqueness of the Markov boundary

10 2 5 3 9 4 1 0 6 11 8 7 How about time? Cross-sectional study

10 2 5 3 9 4 1 0 6 11 8 7 How about time? 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 Longitudinal study Cross-sectional study

* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Learning causal structurefrom “cross-sectional” studies:CYTOLOCANETTIED

Causal models as particular “generative models” • Imagine we have “prior knowledge” about a few alternative plausible “causal models” (we basically know the architecture). • Fit the parameters of the model to data. • Select the model based on goodness of fit (score), perhaps penalizing higher complexity models. • Could two models have identical scores?

Smoking Lung Cancer Key types of causal relationships 1 Anxiety Peer Pressure Born an Even Day Yellow Fingers Genetics Allergy Attention Disorder Coughing Fatigue Direct cause Car Accident

Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Car Accident Key types of causal relationships 2 Indirect cause (chain) AN  LC | S

Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Car Accident Key types of causal relationships 3 Confounder (fork) YF LC | S

How this might look in data Lung cancer Yellow Fingers

How this might look in data Lung cancer Non-smoking Smoking Simpson’s paradox YF  LC | S Yellow Fingers

Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Car Accident Key types of causal relationships 4 Collider (V-structure) AL  LC | C

How this might look in data Lung cancer Allergy

How this might look in data Lung cancer Coughing=1 Coughing=0 Allergy

No Markov equivalence Colliders (V-structures) : X1  Y | X2 X1 Y X2 P(X1, X2 , Y) = P(X2 | X1,Y) P(X1) P(Y)

10 2 5 3 9 4 1 0 6 10 2 5 11 8 3 9 4 7 1 0 6 11 8 7 Structural methods • Build unoriented graph (using conditional independencies). • Orient colliders. • Add more arrows by constraint propagation without creating new colliders.

* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * … towards CYTO:using experiments to learn the causal structure

Manipulating a single variable 1 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Direct cause Car Accident Smoking manipulated (disconnected from its direct causes): remains predictive of LC.

Manipulating a single variable 2 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Indirect cause Car Accident Anxiety manipulated: remains predictive of Lung Cancer.

Manipulating a single variable 3 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Consequence of common cause (correlated, but not cause) Coughing Fatigue Car Accident Yellow Fingers manipulated: no longer predictive of LC.

Manipulating a single variable 4 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics ? Allergy Lung Cancer Attention Disorder Coughing Fatigue Direct cause Car Accident Genetics manipulated: remains predictive of LC and AD.

Manipulating a single variable 5 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Direct cause Car Accident Attention disorder manipulated: no longer predictive of Genetics.

The CYTO problem 9 6 7 4 8 3 5 2 1 VAV LFA-1 CD3 CD28 SLP-76 L A T Cytohesin RAS PI3K JAB-1 Zap70 10 Lck PKC PLCg Akt PIP3 Activators 1. a-CD3 2. a-CD28 3. ICAM-2 4. PMA 5. b2cAMP Inhibitors 6. G06976 7. AKT inh 8. Psitect 9. U0126 10. LY294002 PIP2 PKA Raf MAPKKK MAPKKK Mek1/2 MEK4/7 MEK3/6 Erk1/2 JNK p38 Karen Sachs et al

* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * … towards LOCANET:learning the causal structure without experimentsto predict the consequences of future actions.

What if we cannot experiment? • Experiments may be infeasible, costly or unethical • Using only observations we may want to predict the effect of new policies. • Policies may consist in manipulating several variables.

Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Car Accident Manipulating a few variables LUCAS1: manipulated Markov boundary

Manipulating all variables Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue LUCAS2: manipulated Car Accident Markov boundary

Causality challenge #1:causation and prediction • Task: Predict the target (e.g., Lung cancer) in “unmanipulated” or “manipulated” test data. • Goals: • Introduce ML people to causal discovery problems. • Investigate ties between causation and prediction. • Findings: • Participants used either causal or non-causal feature selection. • Good causal discovery (feature set containing the “manipulated” MB) correlated with good predictions. • However, some participants using non-causal feature selection obtained good prediction results.

Causality challenge #2:The LOCANET problem • Task: Find the local causal structure around a given target variable (depth 3 network) in REGED, CINA, SIDO, MARTI. • Goal: Analyze more finely to which extent causal discovery methods recover the causal structure and how this affects predicting the target values.

* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * TIEDEquivalent Markov boundaries

Equivalent Markov boundaries Y Many almost identical measurements of the same (hidden) variable can lead to many statistically undistinguishable Markov boundaries. Markov boundary

Target Information Equivalence (TIE) Two disjoint subsets of variables V1 and V2 are Target Information Equivalent (TIE) with respect to target Y iff: • V1Y • V2Y • V1Y | V2 • V2Y | V1 Alexander Statnikov & Constantin Aliferis

TIE Data (TIED)Exact equivalence X1 X2 X3 X11 Y 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 Small example of the type of relationships implemented in TIED. The following TIE relations hold in the data: TIEY(X1, X2) TIEY(X1, X3) TIEY(X1, X11) TIEY(X2, X3) TIEY(X2, X11) TIEY(X3, X11) TIEX11(X1, X2) TIEX11(X1, X3) TIEX11(X2, X3) Notice that variables X1, X2, X3, X11, and Y are not deterministically related. Alexander Statnikov & Constantin Aliferis

* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Learning causal structurefrom “longitudinal” studies:SIGNET PROMO

SIGNET: a plant signaling network • Plants loose water and take in carbone dioxide through microscopic pores. • During drought, plant hormone abiscisic acid (ABA) inhibits pore opening (important for the genetic engineering of new drought resistant plants). • Unraveling the ABA signal transduction network took years of research. A recent dynamic model synthesizes many findings (Li, Assmann, Albert, PLOS, 2006). • The model is used by Jenkins and Soni to generate artificial data. The problem is to reconstruct the network from the data.

Abscisic Acid Signaling Network Li, Assmann, Albert, PLOS, 2006 47

SIGNET: sample data • 1011101110101101101101001010001011000011001 • 1100001110111101101101111111011001011101011 • 1100011110111110101101100011010001110101010 • 1100001110111110101101100011000011110101010 • 1100001110111110101101100011000011110101010 • Boolean model; asynchronous updates • 43 nodes • 300 simulations time Example of asynchronous updates for a 4-node network:

PROMO: simulated marketing task • 100 products • 1000 promotions • 3 years of daily data • Goal: quantify the effect of promotions on sales promotions products Jean-Philippe Pellet

1000 other 100 PROMO: schematically… • The difficulties include: • non iid samples • seasonal effects • promotions are binary, sales are continuous • the problem is more quantifying the relationships than determining the causal skeleton

Causality challenge #2: Pot-Luck

Causality challenge #2: Pot-Luck

Presentation Transcript

Causality

Causality Matters

Causality

Causality

CAUSALITY ANALYSIS

Results of the Causality Challenge

Causality

Causality Project