Can causal models be evaluated?
This presentation is the property of its rightful owner.
Sponsored Links
1 / 49

Can causal models be evaluated? Isabelle Guyon ClopiNet / ChaLearn PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on
  • Presentation posted in: General

Can causal models be evaluated? Isabelle Guyon ClopiNet / ChaLearn http://clopinet.com/causality [email protected] Acknowledgements and references. Feature Extraction, Foundations and Applications I. Guyon, S. Gunn, et al. Springer, 2006. http://clopinet.com/fextract-book

Download Presentation

Can causal models be evaluated? Isabelle Guyon ClopiNet / ChaLearn

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Can causal models be evaluated isabelle guyon clopinet chalearn

Can causal models be evaluated?

Isabelle Guyon

ClopiNet / ChaLearn

http:[email protected]


Acknowledgements and references

Acknowledgements and references

  • Feature Extraction,

  • Foundations and Applications

  • I. Guyon, S. Gunn, et al.

  • Springer, 2006.

  • http://clopinet.com/fextract-book

  • 2) Causation and Prediction Challenge

  • I. Guyon, C. Aliferis, G. Cooper,

  • A. Elisseeff, J.-P. Pellet, P. Spirtes,

  • and A. Statnikov, Eds.

  • CiML, volume 2, Microtome. 2010.

  • http://www.mtome.com/Publications/CiML/ciml.html


Http gesture chalearn org

http://gesture.chalearn.org

Co-founders:

Constantin AliferisAlexander Statnikov

André ElisseeffJean-Philippe Pellet

Gregory F. CooperPeter Spirtes

ChaLearn directors and advisors:

Alexander Statnivov Ioannis Tsamardinos

Richard Scheines Frederick Eberhardt

Florin Popescu


Preparation of expdeco experimental design in causal discovery

Preparation of ExpDeCoExperimental design in causal discovery

  • Motivations

  • Quiz

  • What we want to do (next challenge)

  • What we already set up (virtual lab)

  • What we could improve

  • Your input…

    Note: Experiment = manipulation = action


Causal discovery motivations 1 interesting problems

…your health?

…climate changes?

… the economy?

Causal discovery motivations (1) Interesting problems

What affects…

and…

which actions will have beneficial effects?


Predict the consequences of new actions

Predict the consequences of (new) actions

  • Predict the outcome of actions

    • What if we ate only raw foods?

    • What if we imposed to paint all cars white?

    • What if we broke up the Euro?

  • Find the best action to get a desired outcome

    • Determine treatment (medicine)

    • Determine policies (economics)

  • Predict counterfactuals

    • A guy not wearing his seatbelt died in a car accident. Would he have died had he worn it?


Causal discovery motivations 2 lots of data available

Causal discovery motivations (2) Lots of data available

http://data.gov

http://data.uk.gov

http://www.who.int/research/en/

http://www.ncdc.noaa.gov/oa/ncdc.html

http://neurodatabase.org/

http://www.ncbi.nlm.nih.gov/Entrez/

http://www.internationaleconomics.net/data.html

http://www-personal.umich.edu/~mejn/netdata/

http://www.eea.europa.eu/data-and-maps/


Causal discovery motivations 3 classical ml helpless

Y

Causal discovery motivations (3) Classical ML helpless

Y

X


Causal discovery motivations 3 classical ml helpless1

Y

Y

X

Causal discovery motivations (3) Classical ML helpless

Predict the consequences of actions:

Under “manipulations” by an external agent, only causes are predictive, consequences and confounders are not.


Causal discovery motivations 3 classical ml helpless2

Y

Causal discovery motivations (3) Classical ML helpless

Y

X

If manipulated, a cause influences the outcome…


Causal discovery motivations 3 classical ml helpless3

Y

Causal discovery motivations (3) Classical ML helpless

Y

X

… a consequence does not …


Causal discovery motivations 3 classical ml helpless4

Y

Causal discovery motivations (3) Classical ML helpless

Y

X

… neither does a confounder (consequence of a common cause).


Can causal models be evaluated isabelle guyon clopinet chalearn

n’

Causal discovery motivations (3) Classical ML helpless

  • Special case: stationary or cross-sectional data (no time series).

  • Superficially, the problem resembles a classical feature selection problem.

n

X

m


Can causal models be evaluated isabelle guyon clopinet chalearn

Quiz


What could be the causal graph

What could be the causal graph?


Could it be that

Y

X2

X1

Could it be that?


Let s try

Y

X2

X1

X1 || X2 | Y

Simpson’s paradox

x1

Let’s try

Y

x2

x1


Could it be that1

X2

X1

Y

Could it be that?


Let s try1

X2

X1

Y

Y

x2

x1

Let’s try


Plausible explanation

X2 || Y

X2 || Y | X1

baseline

(X2)

health

(Y)

baseline

x2

Y

disease

normal

peak

(X1)

peak

x1

Plausible explanation


What we would like

What we would like

Y

x2

Y

X2

X1

x1


Manipulate x 1

Manipulate X1

Y

x2

Y

X2

X1

x1


Manipulate x 2

Manipulate X2

Y

x2

Y

X2

X1

x1


What we want to do

What we want to do


Causal data mining how are we going to do it

Causal data miningHow are we going to do it?

Obstacle 1: Practical

Many statements of the "causality problem"

Obstacle 2: Fundamental

It is very hard to assess solutions


Evaluation

Evaluation

  • Experiments are often:

    • Costly

    • Unethical

    • Infeasible

  • Non-experimental “observational” data is abundant and costs less.


New challenge expdeco

New challenge: ExpDeCo

Experimental design in causal discovery

  • Goal: Find variables that strongly influence an outcome

  • Method:

    • Learn from a “natural” distribution (observational data)

    • Predict the consequences of given actions (checked against a test set of “real” experimental data)

    • Iteratively refine the model with experiments (using on-line learning from experimental data)


What we have already done

What we have already done


Can causal models be evaluated isabelle guyon clopinet chalearn

Anxiety

Peer Pressure

Born an

Even Day

QUERIES

Models of systems

Yellow

Fingers

Smoking

Genetics

Allergy

Lung Cancer

Attention

Disorder

Database

Coughing

Fatigue

ANSWERS

Car Accident

Virtual Lab


Http clopinet com causality

http://clopinet.com/causality

February 2007: Project starts. Pascal2 funding.

August 2007: Two-year NSF grant.

Dec. 2007: Workbench alive. 1st causality challenge.

Sept. 2008: 2nd causality challenge (Pot luck).

Fall 2009: Virtual lab alive.

Dec. 2009: Active Learning Challenge (Pascal2).

December 2010: Unsupervised and Transfer Learning Challenge (DARPA).

Fall 2012: ExpDeCo (Pascal2)

Planned: CoMSiCo


What remains to be done

What remains to be done


Expdeco new challenge

ExpDeCo (new challenge)

Setup:

  • Several paired datasets (preferably or real data):

    • “Natural” distribution

    • “Manipulated” distribution

  • Problems

    • Learn a causal model from the natural distribution

    • Assessment 1: test with natural distribution

    • Assessment 2: test with manipulated distribution

    • Assessment 3: on-line learning from manipulated distribution (sequential design of experiments)


Challenge design constraints

Challenge design constraints

  • Largely not relying on “ground truth” this is difficult or impossible to get (in real data)

  • Not biased towards particular methods

  • Realistic setting as close as possible to actual use

  • Statistically significant, not involving "chance“

  • Reproducible on other similar data

  • Not specific of very particular settings

  • No cheating possible

  • Capitalize on classical experimental design


Lessons learned from the causation prediction challenge

Lessons learned from theCausation & Prediction Challenge


Causation and prediction challenge

Toy datasets

Causation and Prediction challenge

Challenge

datasets


Assessment w manipulations artificial data

Assessment w. manipulations (artificial data)


Can causal models be evaluated isabelle guyon clopinet chalearn

Anxiety

Peer Pressure

Born an

Even Day

Yellow

Fingers

Smoking

Genetics

Allergy

Lung Cancer

Attention

Disorder

Coughing

Fatigue

LUCAS0: natural

Car Accident

Causality assessmentwith manipulations


Lucas 1 manipulated

Anxiety

Peer Pressure

Born an

Even Day

Yellow

Fingers

Smoking

Genetics

Allergy

Lung Cancer

Attention

Disorder

Coughing

Fatigue

Car Accident

Causality assessmentwith manipulations

LUCAS1: manipulated


Can causal models be evaluated isabelle guyon clopinet chalearn

Anxiety

Peer Pressure

Born an

Even Day

Yellow

Fingers

Smoking

Genetics

Allergy

Lung Cancer

Attention

Disorder

Coughing

Fatigue

Car Accident

Causality assessmentwith manipulations

LUCAS2: manipulated


Assessment w ground truth

10

2

5

3

9

4

1

0

6

11

8

7

Assessment w. ground truth

  • We define:

  • V=variables of interest

    • (Theoretical minimal set

    • of predictive variables, e.g.

    • MB, direct causes, ...)

  • Participants score feature relevance: S=ordered list of features

11

4

1

2

3

  • We assess causal relevance with AUC=f(V,S)


Assessment without manip real data

Assessment without manip. (real data)


Using artificial probes

P1

P2

P3

PT

Probes

Using artificial “probes”

Anxiety

Peer Pressure

Born an

Even Day

Yellow

Fingers

Smoking

Genetics

Allergy

Lung Cancer

Attention

Disorder

LUCAP0: natural

Coughing

Fatigue

Car Accident


Using artificial probes1

Anxiety

Peer Pressure

Born an

Even Day

Yellow

Fingers

Smoking

Genetics

Allergy

Lung Cancer

Attention

Disorder

Coughing

Fatigue

Car Accident

P1

P2

P3

PT

Probes

Using artificial “probes”

LUCAP1&2: manipulated


Scoring using probes

Scoring using “probes”

  • What we can compute (Fscore):

    • Negative class = probes (here, all “non-causes”, all manipulated).

    • Positive class = other variables (may include causes and non causes).

  • What we want (Rscore):

    • Positive class = causes.

    • Negative class = non-causes.

  • What we get (asymptotically):

    Fscore = (NTruePos/NReal) Rscore + 0.5 (NTrueNeg/NReal)


Pairwise comparisons

Pairwise comparisons


Causal vs non causal

Causal vs. non-causal

Jianxin Yin: causal

Vladimir Nikulin: non-causal


Insensitivity to irrelevant features

Insensitivity to irrelevant features

Simple univariate predictive model, binary target and features, all relevant features correlate perfectly with the target, all irrelevant features randomly drawn. With 98% confidence, abs(feat_weight) < w and Siwixi< v.

ngnumber of “good” (relevant) features

nbnumber of “bad” (irrelevant) features

m number of training examples.


How to overcome this problem

How to overcome this problem?

  • Leaning curve in terms of number of features revealed

    • Without re-training on manipulated data

    • With on-line learning with manipulated data

  • Give pre-manipulation variable values and the value of the manipulation

  • Other metrics: stability, residuals, instrument variables, missing features by design


Conclusion more http clopinet com causality

Conclusion(more: http://clopinet.com/causality)

  • We want causal discovery to become “mainstream” data mining

  • We believe we need to start with “simple” standard procedures of evaluation

  • Our design is close enough to a typical prediction problem, but

    • Training on natural distribution

    • Test on manipulated distribution

  • We want to avoid pitfalls of previous challenge designs:

    • Reveal only pre-manipulated variable values

    • Reveal variables progressively “on demand”


  • Login