mixed model analysis to discover cis regulatory haplotypes in a thaliana
Download
Skip this Video
Download Presentation
Mixed model analysis to discover cis-regulatory haplotypes in A. Thaliana

Loading in 2 Seconds...

play fullscreen
1 / 23

Mixed model analysis to discover cis-regulatory haplotypes in A. Thaliana - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

Mixed model analysis to discover cis-regulatory haplotypes in A. Thaliana. Fanghong Zhang*, Stijn Vansteelandt*, Olivier Thas*, Marnik Vuylsteke # * Ghent University # VIB ( Flanders Institute for Biotechnology). Overview. Genetic background Objectives Data Methodology

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Mixed model analysis to discover cis-regulatory haplotypes in A. Thaliana' - sanura


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
mixed model analysis to discover cis regulatory haplotypes in a thaliana

Mixed model analysis to discover cis-regulatory haplotypes in A. Thaliana

Fanghong Zhang*, Stijn Vansteelandt*,

Olivier Thas*, Marnik Vuylsteke#

* Ghent University # VIB (Flanders Institute for Biotechnology)

overview
Overview
  • Genetic background
  • Objectives
  • Data
  • Methodology
  • Results
  • Conclusions
genetic background
Genetic background
  • Regulation of gene expression is affected either in:
    • Cis:affecting the expression of only one of the two alleles in a
    • heterozygous individual;
    • - Trans : affecting the expression of both alleles in a heterozygous individual;
slide4

Genetic background

  • Why search for Cis-regulatory variants?
  • “low hanging fruit”: window is a small genomic region
  • Fast screening for markers in LD with expression trait.
  • How to search for Cis-regulatory variants?
  • Using GASED (Genome-wide Allelic Specific Expression Difference) approach (Kiekens et al, 2006)

- Based on a diallel design which is very popular in plant breeding system to estimate GCA (generation combination ability) and SCA (specific combination ability)

slide5

Genetic Background

  • What is GASED approach?
    • The expression of a gene in a F1 hybrid coming from the kth offspring of the cross can be written as: (c—cis-element, t-trans-element)

kth offspring of cross i j

From parent j

From parent i

From both (cross-terms)

Genotypic variation

In case homozygous

In case there is no trans-effect

In case there is cis-effect

A cis-regulatory divergence completely explains the difference between two parental lines

slide6

Objectives of this study

  • Using mixed model analysis to discover Cis-regulated Arabidopsis genes
    • Based on GASED approach, to partition between F1 hybrid genotypic variation for mRNA abundance into additive and non-additive variance components to differentiate between cis- and trans-regulatory changes and to assign allele specific expression differences to cis-regulatory variation.
  • To find its associated haplotypes (a set of SNPs) for these selected cis-regulated genes.
    • Systematic surveys of cis-regulatory variation to identify “superior alleles”.
slide7

Flow chart

Data contains all expressed genes (25527 genes)

Choose genes with significant genotypic variation:

Step I:

Choose genes from Step 1 with no trans-regulatory variation:

Step II:

Choose genes from step 2 displaying significant allelic imbalance to cis-regulatory variation:

Step III:

Step IV:

Choose genes from Step 3 showing significant association with founded haplotype blocks:

slide8

Data

  • Data acquisition:
  • Scan the arrays
  • Quantitate each spot
  • Subtract noise from background
  • Normalize
  • Export table

Data for us to analyze

slide9

Methodology - Step I

Mixed-Model Equations

yklnm = μ + dyek + replicatel + genotypen + arraym + errorklnm

Full model:

Gene X:

expression

values

Residual

RANDOM effect

FIXED effects

Reduced model: yklnm = μ + dyek + replicatel + arraym + errorklnm

      • error ~ N(0,Σe) , Σe =I2202e ; array ~ N(0, Σa) , Σa =I1102a
      • genotype ~ N(0,Σgenotype) , Σ genotype=G = K2g;
  • K = 55 x 55 marker-based relatedness matrix:
  • Calculated as 1 – dR;dR = Rogers’ distance
  • (Rogers ,1972; Reif et al. 2005)
slide10

Methodology - Step I

Mixed-Model Equations

K = 55 x 55 marker-based relatedness matrix:

pij and qijare allele frequencies of the jth allele at the ith locus

niis the number of alleles at the ith locus (i.e. ni= 2)

m refers to the number of loci (i.e. m = 210,205)

Rogers (1972); Reif et al. (2005)

Melchinger et al. (1991)

slide11

Methodology - Step I

Multiple testing correction

Likelihood ratio test (REML)

LRT ~ 0.52(0) + 0.52(1)) p-value

Gene X:

25527 Genes

Adjusted q-value (FDR)

FDR: false discovery rate

How many of the called positives are false?

5% FDR means 5% of calls are false positive

John Storey et al. (2002) : q-value to represent FDR

Estimate the proportion of features that are truly null:

We use adjusted q-value to represent FDR

slide12

Methodology - Step I

Multiple testing correction

Storey et al estimate π0 = m0 /m under assumption that true null p-values is uniformly distributed (0,1)

We estimate π0 –adj = m0 /m under assumption that true null p-values is 50% uniformly distributed (0,0.5) , 50% is just 0.5.

slide13

Methodology - Step II

Mixed-Model Equations

y klijm= μ + dyek + replicatel + gcai + gcaj + scaij + arraym + error klijm

Full model:

Gene X:

expression

values

Residual

RANDOM effect

FIXED effects

L is the Cholesky decomposition

Reduced model: y klijm= μ + dyek + replicatel + gcai + gcaj + arraym + error klijm

slide14

Methodology - Step II

Multiple testing correction

Likelihood ratio test (REML)

LRT ~ 0.52(0) + 0.52(1) p-value

Gene X:

qa-value (FNR)

20976 Genes

  • FNR: false non-discovery rate (Genovese et al , 2002)
  • How many of the called negatives are false?
  • 5% FNR means 5% of calls are false negative
  • Since we are interested in selecting genes with negativescaij effect, we control FNR instead of FDR

We use qa-value to represent FNR

slide15

Methodology - Step II

Multiple testing correction

False non-discovery rate (FNR) :

π0 is the estimate of the proportion of features that are truly null

slide16

Methodology - Step III

Mixed-Model Equations

yklijm = μ + dyek + replicatel + gcai + gcaj + arraym + errorkijlm

model:

Test 45 pairs ?

Gene X:

g1 =g2? g1 =g3? g1 =g4? … g1= g10? g2 =g3? g2= g4? g2=g5? … g2 =g10? ……, …… g9 = g10?

Two sample dependent t-test

Non-standard P-value

Distribution of true null p-values is not uniformly distributed from 0 to 1

slide17

Methodology - Step III

Multiple testing correction

two sample t-test testing BLUPs

Gene X:

Simulate H0 distribution from real data:

simulation-basedp-value

q-value (FDR)

1380 Genes

slide18

Methodology - Step IV

Mixed-Model Equations

Full model:

yklim = μ + dyek + replicatel + + genotypei + arraym + errorkijlm

Gene X:

(cis-regulated)

FIXED effects

RANDOM effect

Residual

Gene

chromosome

SNP1 SNP2 SNP3 ………SNPi (tag SNPs)

      • genotype ~ N(0,Σgenotype) , Σ genotype=G = K2g;
  • K = 55 x 55 marker-based relatedness matrix.
  • array ~ N(0,Σa) , Σ a=I1102a; error ~ N(0,Σe) , Σ e=I2202e

Reduced model: yklim = μ + dyek + replicate+ genotypei + arraym + errorkilm

slide19

Methodology - Step IV

Multiple testing correction

Gene X:

(cis-regulated)

Likelihood ratio test (ML)

p-value

LRT ~ 2(2n)

n is the number of SNPs

q-value (FDR)

836 Genes

slide20

Results

Data contains all expressed genes (25527 genes)

Step I:

Adjusted_q value<0.0005

20979 genes

Step II:

Adjusted_qa value<0.01

1328 genes

Step III:

q value<0.01

972 genes

q value<0.01

Step IV:

859 genes

slide21

Results

  • Among all 25527 genes, 20979 genes have significant genotypic variation (qvalue < 0.0005). (–Step I)
  • Among these 20979 genes, 1328 genes have no-trans regulated effect (qavalue < 0.01). (–Step II)
  • Among these 1328 genes, 972 genes have showed significant different allelic expressions (qvlaue < 0.01); these 972 genes are discovered as cis-regulated. (–Step III)
  • We confirm our discovery from these 972 cis-regulated genes in step IV:
    • an allelic expression difference caused by cis-regulatory variant implies a nearby polymorphism (SNP) that controls expression in LD;
    • We indeed found 96.5% selected cis-regulated genes have associated polymorphisms (haplotype blocks ) nearby.
slide22

Conclusions

  • This mixed-model approach used here for association mapping analysis with Kinship matrix included are more appropriate than other recent methods in identifying cis-regulated genes ( p-values more reliable).
  • Each step’s statistical method is controlled in a more accurate way to specify statistical significance (referring to FDR, FNR).
  • Using simulation-based pvalues when testing difference between random effects increases power of detecting association.
  • A comprehensive analysis of gene expression variation in plant populations has been described.
  • Using this mixed-model analysis strategy, a detailed characterization of both the genetic and the positional effects in the genome is provided.
  • This detailed statistical analysis provides a robust and useful framework for the future analysis of gene expression variation in large sample sizes.
  • Advanced statistical methods look promising in identifying interesting discoveries in genetics.
slide23

Many thanks

for your attention !

ad