causal inference l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Causal Inference因果推论 PowerPoint Presentation
Download Presentation
Causal Inference因果推论

Loading in 2 Seconds...

play fullscreen
1 / 27

Causal Inference因果推论 - PowerPoint PPT Presentation


  • 308 Views
  • Uploaded on

Causal Inference因果推论 Of Intermediate 中级 Phenotypes 表型 and Biomarkers 生物标记 in Rheumatoid Arthritis 风湿性关节炎 [An Application of Machine Learning 机器学习 Techniques to Genetic Epidemiology 遗传流行病学] Wentian Li 李问天 , Ph.D Feinstein Institute for Medical Research Genetic Association

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Causal Inference因果推论' - ostinmannual


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
causal inference

Causal Inference因果推论

Of Intermediate 中级 Phenotypes表型and Biomarkers 生物标记 in Rheumatoid Arthritis 风湿性关节炎

[An Application of Machine Learning 机器学习 Techniques to Genetic Epidemiology 遗传流行病学]

Wentian Li 李问天, Ph.D

Feinstein Institute for Medical Research

Wentian Li, North Shore LIJ Health System

genetic association
Genetic Association
  • Association 相关 is not equivalent to causal 因果的 relationship
  • Wrinkle-Cancer risk association does not mean one causes 导致 another
  • Age is a confounding factor 混杂因素

Wentian Li, North Shore LIJ Health System

when do we need to know cause and effect
When do we need to know cause and effect?
  • Rarely discussed in genetic analysis because genotype is always the cause 原因, and phenotype is always the effect 效果
  • In epidemiology 流行病学 factor 因素-disease 疾病 association can belong to three situations (1) factor is a cause; (2) reverse causality; (3) a third confounding factor
  • For two intermediate phenotypes (biomarkers), causal arrow can point either way

Wentian Li, North Shore LIJ Health System

causal inference in machine learning
Causal Inference in Machine Learning
  • Large text database (e.g. google)
  • Observational data (no controlled experiment, and no other approaches to determine causality)
  • Two-point association indeed cannot be used to claim causality
  • The key is a third variable, as well as conditional 条件的 association based on the third variable

Wentian Li, North Shore LIJ Health System

slide7

Data Mining and Knowledge Discovery (2000) v4, pp.163-192

Wentian Li, North Shore LIJ Health System

an example
An Example

Wentian Li, North Shore LIJ Health System

cooper s local causality discovery lcd rule
Cooper’s Local Causality Discovery (LCD) Rule
  • Six assumptions: 1.database completeness. 2. discrete variables. 3. Bayesian network model (directed acyclic 非环式的 graph: no loops). 4…. 5. no selection bias. 6. valid statistical testing.
  • Three variables: x,y,z
  • Hidden 潜在的 variable is allowed (but not in the dataset)
  • Determine three correlations: unconditional C(x,y), C(y,z) and conditional C(x,z|y)

Wentian Li, North Shore LIJ Health System

between two variables there are only 6 4 causal relationships allowing confounding variable
Between two variables, there are only 6(4) causal relationships (allowing confounding variable)

confounding

no relationship

confounding+causing

causing

NO

NO

confounding plus rev causing

Reverse causing

Wentian Li, North Shore LIJ Health System

number of causal relationships among three variables
Number of causal relationships among three variables
  • 6x6x6=216 possibilities
  • 4x4x6=96 if x is not caused by either y or z (but can receive an arrow from a hidden variable) [Cooper’97 paper]
  • 2x2x6=24 if x doesn’t even receive an arrow from hidden confounding variables [Li and Wang, unpublished]

Wentian Li, North Shore LIJ Health System

given a causal model
Given a causal model…
  • Unconditional 无条件 association between any two variables can be determined by whether they are connected by a path
  • Conditional 条件的 association can be determined by the so-called “d-separation” rule

Wentian Li, North Shore LIJ Health System

ccc causal inference rule
“CCC” causal inference rule

(Cooper version) if C(x,y)+, C(y,z)+, but C(x,z|y)-,

then there are only three possible causal models: x => y => z

x <= h => y => z

h =>x => y =>z

(Silverstein et al. version) if C(x,y)+, C(y,z)+, C(x,z)+, but C(x,z|y)-, C(x,y|y)+, C(y,z|x)+, then...

Wentian Li, North Shore LIJ Health System

in a three way correlated set

In a three-way correlated set

If one of the variable (x) is not an effect (only a cause)

AND

If correlation is lost between x and z conditionally,

THEN

y causes z

x: gene

y,z: two intermediate phenotypes

Wentian Li, North Shore LIJ Health System

the use of a not a effect variable has an amazing parallel in epidemiology
The use of a not-a-effect variable has an amazing parallel in epidemiology
  • Called “instrumental variable”
  • Martjin Katan’s idea on cholesterol 胆固醇

cancer 癌症 association: he proposed to use a genotype (apoliprotein 载脂蛋白 E) as the third variable (Lancer 1986, i:507-508)

  • Katan did not use conditional correlation
  • This idea is now called “Mendelian randomization”

Wentian Li, North Shore LIJ Health System

rheumatoid arthritis ra
Rheumatoid Arthritis (RA)
  • An autoimmune 自我免疫的 disease
  • Chronic inflammation 炎症 of joints 关节
  • Three times more likely to occur in women than men
  • Age of onset 40-60
  • Twin 双胞胎 concordance rates: 12-15% for MZ单合子,单卵双生, 5% for DZ 异卵双生
  • Genetic and environmental (e.g. smoking) risk factors

Wentian Li, North Shore LIJ Health System

mhc hla the main genetic contribution of ra
MHC/HLA: the main genetic contribution of RA
  • MHC (Major Histocompatibility Complex主要组织相容性复合体) or HLA (Human leukocyte antigens 人类白血球抗原): HLA-DRB1 gene on chromosome 6 (6p21.3)
  • The RA associated alleles are HLA-DRB1*0401, *0404, *0408 (Caucasian), not *0402, *0403, *0407
  • In Asian population, different DRB1 alleles are associated with RA (e.g. *0405, *0901)
  • A group of DRB1 risk alleles are called “shared epitope” (SE) 共同表位, or rheumatoid epitope, code position 70-74 amino acids in the third hypervariable region

Wentian Li, North Shore LIJ Health System

two auto antibodies are strongly associated with ra rf and anti ccp
Two Auto-antibodies are strongly associated with RA: RF and anti-CCP
  • RF (rheumatoid factor 类风湿因子): 80% of RA patients are RF positive
  • anti-CCP (anti-cyclic citrullinated peptide antibody 抗环瓜氨酸肽抗体,抗CCP抗体): even better predictor of RA in early stage
  • HLA-DRB1, RF, anti-CCP are all associated with the RA disease, and they are associated with each other. CCC rule can be applied!

张利方,阎有功,黄前川,等, “抗环瓜氨酸肽抗体在类风湿性关节炎诊断中的应用”, 免疫学杂志,2004,20:52-57

Wentian Li, North Shore LIJ Health System

q between rf and anti ccp which one is the cause and which is the effect

Q: Between RF and anti-CCP, which one is the cause and which is the effect?

Wentian Li, North Shore LIJ Health System

1723 caucasian ra patients
1723 Caucasian RA patients

anti-CCP positive

anti-CCP negative

Wentian Li, North Shore LIJ Health System

slide22

Association between RF and DRB1 genotype is lost conditional on anti-CCP

Wentian Li, North Shore LIJ Health System

by the ccc rule anti ccp is the cause rf is the effect

By the CCC rule, anti-CCP is the cause, RF is the effect

Or, anti-CCP is upstream and RF is downstream in a pathway

Wentian Li, North Shore LIJ Health System

discussions issues
Discussions/Issues
  • There are evidences that RA patients become anti-CCP positive before becoming RF positive
  • The three-way correlation might be lost in normal controls (here we have a “case-only” analysis)
  • In-between anti-CCP and RF, other factors are possible (so the cause-effect may not be direct)
  • It is not clear where the smoking factor comes in (could be an intriguing analysis with smoking data!)

Wentian Li, North Shore LIJ Health System

revisit katan s mendelian randomization mr by lcd wang li unpublished
MR needs a not-an-effect variable (gene)

Conditional association is not used

Only need a counter example (e.g. Apo E2 samples have low cholesterol, but NOT high cancer risk)

LCD needs a variable that is not a cause

Conditional association is used

Complete information of (G, IP, D) trio for all samples (e.g. Apo genotype, cholesterol level, cancer status)

Revisit Katan’s “Mendelian Randomization” (MR) by LCD[Wang, Li, unpublished]

Wentian Li, North Shore LIJ Health System

co authors
Co-Authors
  • Mingyi WANG (Zhejiang Univ, Computer Science Department, causal inference)
  • Patricia Irigoyen, Peter Gregersen (North Shore LIJ, RA data)

Wentian Li, North Shore LIJ Health System