Mixed model analysis to discover cis regulatory haplotypes in a thaliana
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

Mixed model analysis to discover cis-regulatory haplotypes in A. Thaliana PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on
  • Presentation posted in: General

Mixed model analysis to discover cis-regulatory haplotypes in A. Thaliana. Fanghong Zhang*, Stijn Vansteelandt*, Olivier Thas*, Marnik Vuylsteke # * Ghent University # VIB ( Flanders Institute for Biotechnology). Overview. Genetic background Objectives Data Methodology

Download Presentation

Mixed model analysis to discover cis-regulatory haplotypes in A. Thaliana

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Mixed model analysis to discover cis regulatory haplotypes in a thaliana

Mixed model analysis to discover cis-regulatory haplotypes in A. Thaliana

Fanghong Zhang*, Stijn Vansteelandt*,

Olivier Thas*, Marnik Vuylsteke#

* Ghent University # VIB (Flanders Institute for Biotechnology)


Overview

Overview

  • Genetic background

  • Objectives

  • Data

  • Methodology

  • Results

  • Conclusions


Genetic background

Genetic background

  • Regulation of gene expression is affected either in:

    • Cis:affecting the expression of only one of the two alleles in a

    • heterozygous individual;

    • - Trans : affecting the expression of both alleles in a heterozygous individual;


Mixed model analysis to discover cis regulatory haplotypes in a thaliana

Genetic background

  • Why search for Cis-regulatory variants?

  • “low hanging fruit”: window is a small genomic region

  • Fast screening for markers in LD with expression trait.

  • How to search for Cis-regulatory variants?

  • Using GASED (Genome-wide Allelic Specific Expression Difference) approach (Kiekens et al, 2006)

    - Based on a diallel design which is very popular in plant breeding system to estimate GCA (generation combination ability) and SCA (specific combination ability)


Mixed model analysis to discover cis regulatory haplotypes in a thaliana

Genetic Background

  • What is GASED approach?

    • The expression of a gene in a F1 hybrid coming from the kth offspring of the cross can be written as: (c—cis-element, t-trans-element)

kth offspring of cross i j

From parent j

From parent i

From both (cross-terms)

Genotypic variation

In case homozygous

In case there is no trans-effect

In case there is cis-effect

A cis-regulatory divergence completely explains the difference between two parental lines


Mixed model analysis to discover cis regulatory haplotypes in a thaliana

Objectives of this study

  • Using mixed model analysis to discover Cis-regulated Arabidopsis genes

    • Based on GASED approach, to partition between F1 hybrid genotypic variation for mRNA abundance into additive and non-additive variance components to differentiate between cis- and trans-regulatory changes and to assign allele specific expression differences to cis-regulatory variation.

  • To find its associated haplotypes (a set of SNPs) for these selected cis-regulated genes.

    • Systematic surveys of cis-regulatory variation to identify “superior alleles”.


Mixed model analysis to discover cis regulatory haplotypes in a thaliana

Flow chart

Data contains all expressed genes (25527 genes)

Choose genes with significant genotypic variation:

Step I:

Choose genes from Step 1 with no trans-regulatory variation:

Step II:

Choose genes from step 2 displaying significant allelic imbalance to cis-regulatory variation:

Step III:

Step IV:

Choose genes from Step 3 showing significant association with founded haplotype blocks:


Mixed model analysis to discover cis regulatory haplotypes in a thaliana

Data

  • Data acquisition:

  • Scan the arrays

  • Quantitate each spot

  • Subtract noise from background

  • Normalize

  • Export table

Data for us to analyze


Mixed model analysis to discover cis regulatory haplotypes in a thaliana

Methodology - Step I

Mixed-Model Equations

yklnm = μ + dyek + replicatel + genotypen + arraym + errorklnm

Full model:

Gene X:

expression

values

Residual

RANDOM effect

FIXED effects

Reduced model: yklnm = μ + dyek + replicatel + arraym + errorklnm

  • error ~ N(0,Σe) , Σe =I2202e ; array ~ N(0, Σa) , Σa =I1102a

  • genotype ~ N(0,Σgenotype) , Σ genotype=G = K2g;

  • K = 55 x 55 marker-based relatedness matrix:

  • Calculated as 1 – dR;dR = Rogers’ distance

  • (Rogers ,1972; Reif et al. 2005)


  • Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Methodology - Step I

    Mixed-Model Equations

    K = 55 x 55 marker-based relatedness matrix:

    pij and qijare allele frequencies of the jth allele at the ith locus

    niis the number of alleles at the ith locus (i.e. ni= 2)

    m refers to the number of loci (i.e. m = 210,205)

    Rogers (1972); Reif et al. (2005)

    Melchinger et al. (1991)


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Methodology - Step I

    Multiple testing correction

    Likelihood ratio test (REML)

    LRT ~ 0.52(0) + 0.52(1)) p-value

    Gene X:

    25527 Genes

    Adjusted q-value (FDR)

    FDR: false discovery rate

    How many of the called positives are false?

    5% FDR means 5% of calls are false positive

    John Storey et al. (2002) : q-value to represent FDR

    Estimate the proportion of features that are truly null:

    We use adjusted q-value to represent FDR


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Methodology - Step I

    Multiple testing correction

    Storey et al estimate π0 = m0 /m under assumption that true null p-values is uniformly distributed (0,1)

    We estimate π0 –adj = m0 /m under assumption that true null p-values is 50% uniformly distributed (0,0.5) , 50% is just 0.5.


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Methodology - Step II

    Mixed-Model Equations

    y klijm= μ + dyek + replicatel + gcai + gcaj + scaij + arraym + error klijm

    Full model:

    Gene X:

    expression

    values

    Residual

    RANDOM effect

    FIXED effects

    L is the Cholesky decomposition

    Reduced model: y klijm= μ + dyek + replicatel + gcai + gcaj + arraym + error klijm


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Methodology - Step II

    Multiple testing correction

    Likelihood ratio test (REML)

    LRT ~ 0.52(0) + 0.52(1) p-value

    Gene X:

    qa-value (FNR)

    20976 Genes

    • FNR: false non-discovery rate (Genovese et al , 2002)

    • How many of the called negatives are false?

    • 5% FNR means 5% of calls are false negative

    • Since we are interested in selecting genes with negativescaij effect, we control FNR instead of FDR

    We use qa-value to represent FNR


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Methodology - Step II

    Multiple testing correction

    False non-discovery rate (FNR) :

    π0 is the estimate of the proportion of features that are truly null


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Methodology - Step III

    Mixed-Model Equations

    yklijm = μ + dyek + replicatel + gcai + gcaj + arraym + errorkijlm

    model:

    Test 45 pairs ?

    Gene X:

    g1 =g2? g1 =g3? g1 =g4? … g1= g10? g2 =g3? g2= g4? g2=g5? … g2 =g10? ……, …… g9 = g10?

    Two sample dependent t-test

    Non-standard P-value

    Distribution of true null p-values is not uniformly distributed from 0 to 1


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Methodology - Step III

    Multiple testing correction

    two sample t-test testing BLUPs

    Gene X:

    Simulate H0 distribution from real data:

    simulation-basedp-value

    q-value (FDR)

    1380 Genes


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Methodology - Step IV

    Mixed-Model Equations

    Full model:

    yklim = μ + dyek + replicatel + + genotypei + arraym + errorkijlm

    Gene X:

    (cis-regulated)

    FIXED effects

    RANDOM effect

    Residual

    Gene

    chromosome

    SNP1 SNP2 SNP3 ………SNPi (tag SNPs)

    • genotype ~ N(0,Σgenotype) , Σ genotype=G = K2g;

  • K = 55 x 55 marker-based relatedness matrix.

  • array ~ N(0,Σa) , Σ a=I1102a; error ~ N(0,Σe) , Σ e=I2202e

  • Reduced model: yklim = μ + dyek + replicate+ genotypei + arraym + errorkilm


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Methodology - Step IV

    Multiple testing correction

    Gene X:

    (cis-regulated)

    Likelihood ratio test (ML)

    p-value

    LRT ~ 2(2n)

    n is the number of SNPs

    q-value (FDR)

    836 Genes


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Results

    Data contains all expressed genes (25527 genes)

    Step I:

    Adjusted_q value<0.0005

    20979 genes

    Step II:

    Adjusted_qa value<0.01

    1328 genes

    Step III:

    q value<0.01

    972 genes

    q value<0.01

    Step IV:

    859 genes


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Results

    • Among all 25527 genes, 20979 genes have significant genotypic variation (qvalue < 0.0005). (–Step I)

    • Among these 20979 genes, 1328 genes have no-trans regulated effect (qavalue < 0.01). (–Step II)

    • Among these 1328 genes, 972 genes have showed significant different allelic expressions (qvlaue < 0.01); these 972 genes are discovered as cis-regulated. (–Step III)

    • We confirm our discovery from these 972 cis-regulated genes in step IV:

      • an allelic expression difference caused by cis-regulatory variant implies a nearby polymorphism (SNP) that controls expression in LD;

      • We indeed found 96.5% selected cis-regulated genes have associated polymorphisms (haplotype blocks ) nearby.


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Conclusions

    • This mixed-model approach used here for association mapping analysis with Kinship matrix included are more appropriate than other recent methods in identifying cis-regulated genes ( p-values more reliable).

    • Each step’s statistical method is controlled in a more accurate way to specify statistical significance (referring to FDR, FNR).

    • Using simulation-based pvalues when testing difference between random effects increases power of detecting association.

    • A comprehensive analysis of gene expression variation in plant populations has been described.

    • Using this mixed-model analysis strategy, a detailed characterization of both the genetic and the positional effects in the genome is provided.

    • This detailed statistical analysis provides a robust and useful framework for the future analysis of gene expression variation in large sample sizes.

    • Advanced statistical methods look promising in identifying interesting discoveries in genetics.


    Mixed model analysis to discover cis regulatory haplotypes in a thaliana

    Many thanks

    for your attention !


  • Login