jorge viveros summer 2006 workshop june 29 th 2006
Download
Skip this Video
Download Presentation
Aracne

Loading in 2 Seconds...

play fullscreen
1 / 34

Aracne - PowerPoint PPT Presentation


  • 149 Views
  • Uploaded on

Jorge Viveros Summer 2006 Workshop June 29 th , 2006. Aracne. Contents. Overview (the problem, the alternatives, ARACNE’s arlgorithm central idea) Demo (reconstruction of gene regulatory networks for affymatrix gene expression data)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Aracne' - adamma


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
jorge viveros summer 2006 workshop june 29 th 2006
Jorge Viveros

Summer 2006 Workshop

June 29th, 2006

Aracne
contents
Contents
  • Overview (the problem, the alternatives, ARACNE’s arlgorithm central idea)
  • Demo (reconstruction of gene regulatory networks for affymatrix gene expression data)
  • Algorithm details (approximating the mutual information, comparative study results, ARACNE vs Bayesian and Relevance Networks)
  • Conclusions
  • Bibliography
1 overview aracne
1. Overview: ARACNE

Algorithm for the Reconstruction of Accurate Cellular Networks

“Reverse engineering” or “deconvolution” problem:

Samples

ga

gb

ga

gb

gc

gd

ge

Information-theory

+

max entropy methods

gc

gd

ge

Gene regulatory network

overview cont d authors
(overview, cont’d) Authors

A.A. Margolin [1,2], I. Nemenman [2], K. Basso [3], C. Wiggings [2,4], G. Stolovitzky [5], R. Dalla-Favera [3], A. Califano [1,2]

[1]Dept. Biomedical informatics, [2]Joint Centers for Sys Biology, [3]Institute for Cancer Genetics, [4]Dept. of Appl. Physics and Appl. Math.

Columbia University

[5]IBM T.J. Watson Research Center.

Main reference:

http://www.arxiv.org/abs/q-bio/0410037

BMC Bioinformatics 2006, 7(Suppl 1):S7

overview cont d goal
(overview, cont’d) Goal

Understandmammaliannormal cell physiology and complex pathologic

phenotypesthrough elucidating gene transcriptional regulatory networks.

Thesis

Statistical associations between mRNA abundance levels helps to

uncovergene regulatory mechanisms.

overview alternatives aracne vs clustering
(overview: alternatives) ARACNE vs Clustering

ARACNE recovers specific transcriptional interactions but does not attempt to

recover all of them (too complex a problem).

Genome-wide clustering of gene expression profiles: cannot discern direct

(irreducible) from “cascade” transcriptional gene interactions.

ga

gb

gc

gd

ge

a

b

clustering

ARACNE

c

d

e

ga,gb

gc,gd

ge

central idea gene network inference
(central idea) Gene network inference

edge = (direct) statistical dependency

= direct regulatory interaction

nodes = genes

Temporal gene expression data for higher eukaryotes, difficult to obtain.

Only steady-state statistical dependencies are studied.

gi

gj

accounting for dependence definition and measurement
Accounting for dependence: definition and measurement

Gene expression values samples from a joint probability distribution

Consider the multi-information = average log-deviation of the joint probability distribution (JPD) from the product of its marginals (also “Kullback-Leibler divergence” (KL-div)).

Use maximum entropy methods to approximate JPD by an element of its “m-way” marginal Frechet class (m-way maximum-entropy estimate m-MEE)

Use m-MEE to define mth-order connected information (m-cinfo) to account for m-way statistical dependencies (only!).

Multi-info = sum of all m-cinfo’s.

the multi information
The multi-information

Multi-information (KL-div)

JPD

“nodes, “expressions” or “genes”

Integral if conts case; sum if discrete case

Entropy of P(x)

JPD not known, approximate it!

m way max entropy estimate of jpd
m-way max entropy estimate of JPD

m-MEE , , has the same m-marginals as

Lagrange multipliers

m-MEE has the following form:

Have no analytical solution BUT

can be obtained via an iterative

Proportional fitting proc (IPFP)

connected and multi informations
Connected and Multi informations

mth-order connected information

Multi-information

Compensate for the lack of knowledge of JPD by using the (truncated!) multi-info

to establish and quantify statistical dependencies

detecting a particular m way interaction
Detecting a particular m-way interaction

M-way interaction contributes to multi-info, iff minimum of interaction multi-information (inter multi-info) over -specific Frechet class is positive.

Inter multi-info =

and are m-MEE sharing same m-way marginals except for, perhaps,

Positivity of minimal inter multi-info  is an irreducible (direct) interaction

Thus draw edges coming from nodes and meeting at m-edge vertex.

examples
Examples

Regulatory cascade (Markov chain)

Information processing inequalty

generically dependent (similarly, )

generically independent

No triplet interactions (coregulation)

examples cont d other dependencies
(examples, cont’d) Other dependencies

2 regulates 1 and 3 OR 1 and 3 regulate 2 jointly

does not factor

but pairwise marginals do

2 demo
2. Demo

Platforms

  • caWorkBench2.0 (downloadable through web site) (JAVA)

Most developed features: microarray data analysis, pathway analysis and reverse engineering, sequence analysis, transcription factor binding site analysis, pattern discovery.

http://amdec-bioinfo.cu-genome.org/html/caWorkBench.htm

  • Cygwin (for windows). Windows and Linux versions available in web site
demo sample input data file
(Demo) Sample input data file

Input_file_name.exp

N = 3 # genes

M = 2 # microarrays

Input file has N+1=4 lines

each lines has M+2 (2M+2) fields

AffyID HG_U95Av2 SudHL6.CHP ST486.CHP

G1 G1 16.477367 0.69939363 20.150969 0.5297595

G2 G2 7.6989274 0.55935365 26.04019 0.5445875

G3 G3 8.8098955 0.5445875 21.554955 0.31372303

Microarray chip names

annotation name

header line

(value,p-value)-chip1

demo cont d syntax cygwin
(Demo, cont’d) Syntax (Cygwin)

ARACNE: algorithm for gene regulatory network computation given

microarray data.

Usage:

aracne

aracne GeneExpressionFile [-a | -k | -s | -t | -e | -f]

aracne -adj GeneExpressioFile AdjacencyFile [-t | -e]

-a accurate | fast [default: accurate]

-k gaussian kernel width [accurate method only; default: 0.15]

-s Averaging Window step size [fast method only; default: 6]

-t Mutual Info. threshold [default: 0]

-e DPI tolerance (btw 0 and 1) [default: 1]

-f mean stdev [default: no filtering]

demo cont d sample output data file
(Demo, cont’d) Sample output data file

input_data_file_name[non-default_param_vals].adj

# lines = N = # genes

G1:0 8 0.064729

G2:1 2 0.0298643 7 0.0521425

G3:2 1 0.0298643

G4:3 8 0.0427217

G5:4 5 0.403516

G6:5 4 0.403516 6 0.582265

G7:6 5 0.582265 9 0.38039

G8:7 1 0.0521425 8 0.743262

G9:8 0 0.064729 3 0.0427217 7 0.743262 9 0.333104

G10:9 6 0.38039 8 0.333104

5

AffyID

ID#

MI value

Associated gene ID#

4

1

6

9

7

8

10

2

3

3 algorithm details
3. Algorithm details

Incorporate information-theoretic ideas (Markov networks) to model statistical dependencies (cf. [2])

= joint prob dist function of stationary expressions of all genes (i=1,…,N)

N = # genes, Z = partition fun (normalization factor), = Hamiltonian,

, , , … = interaction potentials (e.g., genes i,j,k do not interact in the

model iff = 0.

Aim: identify nonzero potentials.

algorithm details aracne s model
(Algorithm details) Aracne’s model

First-order approximation: genes are independent

1st order potentials obtained from marginal probabilities (estimated experimentally).

ARACNE’s approximation: truncate joint prob dist fun to pairwise potentials

In this model non-interacting genes (includes statistically

independent genes and genes that do not interact directly,

i.e., but ).

Reduce number of potential pairwise interactions via realistic biological

assumptions.

algorithm details cont d mi estimation
(algorithm details, cont’d) MI estimation

Assume two-way interaction: pairwise potentials determine all statistical dependencies.

Mutual information (MI) = measure of relatedness

= 0 iff

MI approximation:

G = bivariate standard Gaussian density

h = kernel width

algorithm details cont d
(algorithm details, cont’d)

Some details and technicalities:

Transform x, y so and their marginal distributions seem uniform

There is not a universal way of choosing h, however the ranking of the MI’s

depends only weakly on them.

algorithm details cont d establishing the network
(algorithm details, cont’d)Establishing the network

Define thresholdIO to discard MI’s (lower-bound interaction)

Shuffle genes across microarray profiles & evaluate MIs for seemingly

independent genes, choose IO based on what fraction of MIs falls below the

threshold.

Data processing inequality: if genes g1 and g2 interact thorugh g3 then

ARACNE starts with network so for every edge

look at gene triplets and remove edge with smallest MI

algorithm details cont d establishing the network1
(algorithm details, cont’d) Establishing the network

ARACNE’s algorithmcomplexity:

N = number of genes, M = number of samples

DPI analysis MI estimation (order

of pairwise interactions )

perfect network reconstruction theorems
Perfect network reconstruction theorems

Thm 1:If MI’s are estimated with no errors and true underlying interaction network is a tree with only pairwise interactions then ARACNE will reconstruct it.

Thm 2:If Chow-Liu maximum MI info tree is subnetwork of ARACNE’s network then this is the true network.

Thm 3: “ARACNE will reconstruct tree-network topologies exactly.”

comparative study results
Comparative study results

Reconstruction of class of synthetic transcriptional networks by Mendes et al

(cf. [1]) and human B lymphocyte genetic network from gene expressions

profile data.

Performance of ARACNE compared against Bayesian Networks (use LibB

package) and Relevance networks (similar to ARACNE but has less accurate

MI estimation procedure and less-developed of assigning statistical

significance).

results synthetic networks
(results) Synthetic networks

100 genes, 200 interactions organized in two types of networks

1. Erdos-Renyi: each vertex interaction is equally likely

2. Scale-free topology: distribution of vertex connections obeys a power law

results performance metrics
(results) Performance metrics

Pairwise gene interaction is

“(True) positive” if their statistical regulatory interaction is directly linked.

“(True) negative” if their interaction is not direct.

Precision fraction of true interactions correctly inferred

(expected success rate in experimental validation of

predicted interactions)

Recall fraction of true interactions among all inferred ones

Performance to be assessed via Precision-Recall curves (PRCs)

results cont d prcs for synthetic data
(results cont’d) PRCs for synthetic data

1

2

ARACNE’s performance above 40% for both models

result con td quantitative results on synthetic data
(result con’td) Quantitative results on synthetic data

ARACNE recovers far more true connections and predicts far less false ones

results cont d results on human b cells
(results cont’d) Results on Human B cells

Assembled expression profile data set of ~340 B lymphocytes from normal, tumor-related and experimentally manipulated populations.

Data set was deconvoluted by ARACNE to generate B-cell specific regulatory network of ~129,000 interactions.

Validation of the network’s quality was done by comparing inferred interactions

with those identified through biochemical methods.

See cf [3].

conclusions and discussions
Conclusions and Discussions
  • Algorithm is robust enough for its application in other network reconstruction problems in biology and the social and engineering fields.
  • Pairwise interaction model  higher-order potential interactions will not be accounted for (ARACNE’s algorithm will open 3-gene loops).
  • A two-gene interaction will be detected iff there are no alternate paths.
  • To keep three-gene loops, modify tolerance for edge-removal by introducing tolerance parameter, .
  • ARACNE’s performance deteriorates as local (true) network topology deviates from a tree (tight loops may be a problem).
  • ARACNE achieved high precision and substantial recall even for few data points when compared to BN and RN (synthetic data).
  • ARACNE cannot predict the orientation of the edges of the networks.
  • The algorithm is suited for more complex (mammalian) networks.
bibliography
Bibliography
  • P. Mendes, W. Sha, K. Ye. Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics 2003, 19 Suppl 2: II122-II129.
  • I. Nemenman. Information theory, multivariate dependence and genetic network inference. Technical report: arXiv:q-bio/0406015; 2004.
  • K. Basso, A.A. Margolin, G. Stolovitzky, U. Klein, R. Dalla-Favera, A. Califano. Reverse engineering of regulatory networks in human B cells. Nature Genetics, 2005, 37(4):382-390.
main web site
Main web site
  • Important documentation and relevant publications, application download and support.

AMDeC Bionformatics Core Facility at the Columbia Genome Center

AMDeC (Academic Medicine Development Company)

http://amdec-bioinfo.cu-genome.org/html/ARACNE.htm