1 / 69

Time dependent gene analysis

Time dependent gene analysis. OVERVIEW Omer Berkman. Contents. Biological background Using Gene-arrays to decipher gene-regulatory interactions Applications …. Hybridization. DNA double strand form by “ gluing ” of complementary single starnds Complementary rule: A-T/U, G-C.

janna
Download Presentation

Time dependent gene analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time dependent gene analysis OVERVIEW Omer Berkman

  2. Contents • Biological background • Using Gene-arrays to decipher gene-regulatory interactions • Applications…

  3. Hybridization • DNA double strand form by “gluing” of complementary single starnds • Complementary rule: A-T/U, G-C

  4. Protein production

  5. cells express different subset of the genes In different tissues and under different conditions Transcription Translation mRNA Protein Gene From DNA to Protein cells express different subset of the genes in different tissues and under different conditions

  6. Functional genomics • The complete sequences of many microbial genomes are already known - the inventory of the building blocks of life was collected. • next stage is ‘‘re-assembling the pieces’’ : • Defining the role of each gene in these genomes. • Understanding how the genome functions as a whole in the complex natural history of a living organism. Knowing when and where a gene is expressed often provides a strong clue as to its biological role

  7. Transcriptional process • This process is highly regulated. • One of the most important ways in which the cell regulates gene expression is by using a feedback loop. • some of the proteins are transcription factors. • These proteins regulate the expression of other genes (and possibly, their own expression) by either initiating or repressing transcription.

  8. Transcriptional networks • One gene can be a regulator of another gene. • Biochemical networks responsible for regulating the expression of genes in cells. • In these transcription networks, the nodes represent transcriptional factors (genes) and the edges represent direct transcriptional regulation. [Shen-Orr 2002, Thieffry 1998]

  9. Transcriptional networks example

  10. Gene-arrays for mRNA analysis • Differences in cell type or state are correlated with changes in the mRNA levels of its genes. • The only specific reagent required to measure the abundance of the mRNA for a specific gene is a cDNA sequence. • DNA microarrays provide a practical and economical tool for studying gene expression on a very large scale.

  11. Affymetrix model for DNA chip Now, we can infer which of the genes were expressed and in what intensity. Due to some biological processes, not always the correct sequence will hybridized to the oligo.

  12. Gene Arrays / DNA chips • From “one gene in one experiment” to “massively parallel biological data acquisition”. • Simultaneously analyzing the expression levels of large numbers of genes provides the opportunity to study the activity of whole genomes. • Large-scale gene expression analysis reveals the behavior of co-regulated gene networks.

  13. Raw Data • The curse of dimensionality : Thousands of Genes versus only few observations

  14. Static versus dynamic We distinguish between static experiments and time series experiments: • Static– • A snapshot in different samples is measured. • Data are assumed to be independent identically distributed. • Dynamic– • A temporal process is measured. • Data have strong autocorrelation between successive points.

  15. Temporal observations • It’s possible to produce time-dependent measurements, termed expression matrices. • These expression matrices are the result of the underlying regulatory network. • Reverse engineering seeks to extract information from time-series measurements in order to identify regulatory interactions in these genetic networks.

  16. Complications • The curse of dimensionality • Extremely noisy observations • Expensive experiments • Stochastic nature • Population averaged • Feasible time scale • Partially information We are facing a hard problem…

  17. 1. The curse of dimensionality (Bellman, 1961) • The number of genes typically far exceeds the number of time points for which data are available, making the problem an ill-posed one. • “Traditional statistics” won’t help here - the amount of samples, versus the number of genes, does not provide enough information to construct a full detailed model with high statistical significance. • New statistical methods/approaches were developed (Bootstrap, Interpolations, Clustering, FDR…)

  18. 2. Stochastic nature Deterministic Stochastic Biology has no deterministic processes…

  19. 3. Population averaged • Measurements are obtained as population-averaged data • The measurement itself kills or alters the organism • This mask the real regulatory interactions (quantization problem)

  20. 4. Feasible time scale Empirical limit on the number of time points : • The average speed of the biologic process determines the number of informative points. • The error of the method applied have to be smaller than the expression level difference. MISSING REGULATORY INTERACTIONS COST and ERROR

  21. 5. Partial information • Biological systems are robust, adaptable, and redundant. • Genes are not the only actor in the game – transcriptional factors can be of many kinds. • The regulatory interactions between genes are not deterministic at the mRNA level - a gene has few independently regulated derivatives. • mRNA expression data alone only gives a partial picture that does not reflect key events such as translation and protein (in)activation.

  22. Fundamental question • How much information is needed to map the gene-regulatory interactions of a biological system? • Hertz’s Estimation [1998] for the number of gene states to be measured for a successful reverse engineering: P=K log (N/K) N - The size of the network (e.g. the number of genes) K - The average number of interactions per gene.

  23. Application 1 [DeRisi 1997] Exploring the metabolic and genetic control of gene expression • Investigation of gene expression accompanying the metabolic shift from fermentation to respiration in yeast. • Identify genes whose expression was affected by deletion of TUP1 or over-expression of YAP1.

  24. Yeast genome micro-array Genes induced or repressed appear in this image as red and green spots, respectively.

  25. Temporal samples

  26. Analysis • Stable gene expression during exponential growth. • A marked change was seen as glucose was progressively depleted from the growth media. - mRNA levels for 710 genes were induced. - mRNA levels for 1030 genes declined. • The expression patterns observed for previously characterized genes showed concordance with previously published results. • About half of these differentially expressed genes have no apparent homology to any gene whose function is known. This provides the first small clue to their possible roles.

  27. Coordinated regulation of functionally related genes Genes can be grouped on the basis of the similarities in their expression patterns

  28. Distinct temporal patterns

  29. Metabolic Diagram Red boxes identify genes whose expression increases in the diauxic shift. Green boxes identify genes whose expression diminishes in the diauxic shift.

  30. Defining the contributions of individual regulatory genes • Using a DNA micro-array to identify genes whose expression is affected by mutations in each putative regulatory gene. • Performing: - Deletion the transcriptional repressor TUP1. - Overexpression of the transcriptional activator YAP1.

  31. Deleting the TUP1 gene • Wild-type yeast cells and cells bearing a deletion of the TUP1 gene were grown. • mRNA was isolated from the two populations and used to prepare c-DNA labeled with green and red. • The labeled probes were mixed and simultaneously hybridized to the micro-array. • Red spots on the array represent genes that were induced in the TUP1 strain, and thus presumably repressed by TUP1.

  32. Overexpressing the YAP1 gene • Complementary DNA from the control and YAP1 over-expressing strains, labeled with Cy3 and Cy5, respectively, was prepared from mRNA isolated from the two strains and hybridized to the micro-array. • Red spots on the array represent genes that were induced in the strain over-expressing YAP1.

  33. Characterization of regulatory pathways and networks • Use of a micro-array to characterize the transcriptional consequences of mutations provides a simple and powerful approach. • This strategy also has an important practical application in drug screening. • However, one should keep in mind that transcriptional regulations might be complicated.

  34. Application1 summary • DNA micro-arrays provide a simple and economical way to explore gene expression patterns on a genomic scale. • “The greatest challenge now is to develop efficient methods for organizing, interpreting, and extracting insights from the large volumes of data these experiments provide.” • Technical advances have made array experiments fairly easy to do, but tools for analysis of data produced have lagged behind.

  35. Application 2 [Friedman 2000] Using Bayesian Networks to Analyze Expression Data • Probabilistic approach. • Bayesian network as a model for genetic networks.

  36. Bayesian networks – definitions • Representation of a joint probability distribution. This representation, consists of two components: • G is a directed acyclic graph(DAG) whose vertices correspond to the random variables • θ describes a conditional distribution for each variable, given its parents in G.

  37. Simple example

  38. Bayesian networks – properties • Encodes the Markov assumption : Each variable is independent of its non-descendants, given its parents in the graph • A graph-based model that captures properties of conditional independence between variables. • Useful for describing processes composed of locally interacting components. • Provide models of causal influence.

  39. Equivalence classes • Let Ind(G)be the set of independence statements (of the form Xis independent of Ygiven Z). • More than one graph can imply exactly the same set of independencies. • Two graphs G’and G’’are equivalent if Ind(G’)=Ind(G’’), that is, both graphs are alternative ways of describing the same set of independencies. • Equivalent graphs have the same underlying undirected graph but might disagree on the direction of some of the arcs (we switch to PDAG).

  40. Learning Bayesian Networks • Given a training set D of independent instances of X, find a network B={G, θ} that best matches D. • Several scoring functions are available. • Finding the structure G that maximizes the score is a problem which is known to be NP-hard. • For Heuristic search we need : • A score function which is decomposable For example - S(G:D) = log P(D|G) + log P(G) + C • An iterative search method For example - Greedy/stochastic hill climbing, simulated annealing…

  41. Biological (causal) interpretation • Edges: the parents of a variable are its immediate causes (the parent of a node is a transcription factor for this gene). • A causal network models the effects of interventions: If X causes Y, then manipulating the value of X affects the value of Y, but not the other way around (If we knockout gene X then this will affect the expression of gene Y, but a knockout of gene Y has no effect on the expression of gene X).

  42. Analyzing Expression Data • Random variable denote the expression level of individual genes. • In addition, we can include random variables that denote other attributes that affect the system (experimental conditions, temporal indicators…). • We want to learn one from the available data and use it to answer questions about the system.

  43. Find high-scoring networks • The data is not informative enough to determine which single model is the right one • Focusing on features that are common to most of the possible models: • Markov relation - indicates that two genes are related in some joint biological interaction or process (if there is either an edge between them, or both are parents of another variable (Pearl 1988)). • Order relation - Xis an ancestor of Yin all the networks of a given equivalence class (the given PDAG contain a directed path from X to Y).

  44. How can we estimate a measure of confidence in the features? • bootstrap method (Efron & Tibshirani 1993) • A method to enlarge our data set by generating “perturbed” versions of our original data set. In this way we collect many networks, all of which are fairly reasonable models of the data. • For each feature fof interest calculate : where f(G) is 1 if fis a feature in G, and 0 otherwise.

  45. Local Probability Models In order to specify a Bayesian network model, we still need to choose the type of the local probability models we learn. In the current work, we consider two approaches: • Multinomial model (discretizing to (-1,0,1). • Linear Gaussian model.

  46. Robustness analysis

  47. Multinomial versus Gaussian The two methods highlight different types of connections between genes.

  48. Biological Analysis • Order relations reveals existence of dominant genes. Out of all 800 genes only few seem to dominate the order (i.e., appear before many genes). • Top Markov relations reveals genes that most are functionally related. • Nice presentation: http://www.cs.huji.ac.il/~nirf/GeneExpression/top800/

  49. An example of the graphical display of Markov features This suits biological knowledge!

  50. Application2 summary • Using Bayesian networks to model genetic network: • Involves thousands of genes while current data sets contain a few dozen samples. This raises problems in computational complexity and the statistical significance of the results. • Genetic regulation networks are sparse (gene assumed to have no more than a few dozen genes directly affect its transcription). Bayesian networks are especially suited for learning in such sparse domains. • Did not use any (biological) prior knowledge. • This theory can provide tools for experimental design.

More Related