1 / 38

Modeling and Understanding Stress Response Mechanisms with Expresso

Modeling and Understanding Stress Response Mechanisms with Expresso. Ruth G. Alscher Lenwood S. Heath Naren Ramakrishnan Virginia Tech, Blacksburg, VA 24061 . ORNL Workshop on Genomics Duke University May 1, 2001. Who’s Who. Computer Science. Plant Biology. Virginia Tech.

chandra
Download Presentation

Modeling and Understanding Stress Response Mechanisms with Expresso

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling and UnderstandingStress Response Mechanismswith Expresso Ruth G. Alscher Lenwood S. Heath Naren Ramakrishnan Virginia Tech, Blacksburg, VA 24061 ORNL Workshop on Genomics Duke University May 1, 2001

  2. Who’s Who Computer Science Plant Biology Virginia Tech Ruth Alscher Plant Stress Lenwood Heath (CS) Algorithms Virginia Tech Dawei Chen Molecular Biology Bioinformatics Boris Chevone Plant Stress Naren Ramakrishnan (CS) Data Mining Problem Solving Environments Ron Sederoff, Ross Whetten Len van Zyl Y-H.Sun Forest Biotechnology North Carolina State Univ. Craig Struble, Vincent Jouenne (CS) Image Analysis Ina Hoeschele (DS) Statistical Genetics Keying Ye (STAT) Bayesian Statistics Statistics Virginia Tech

  3. People Ron Sederoff Craig Struble Lenny Heath Ruth Alscher Keying Ye Ross Whetten Vincent Jouenne Boris Chevone Len van Zyl Y-H .Sun Dawei Chen Naren Ramakrishnan

  4. Overview • Plant responses to environmental stress • Stress on a chip • Summary of results obtained • Expresso • Managing expression experiments • Analyzing expression data • Reaching conclusions • Where we go from here • Modeling experiments • Modeling pathways

  5. Plant-Environment Interactions • Several defense systems that respond to environmental stress are known. • Their relative importance is not known. • Mechanistic details are not known. Redox sensing may be involved.

  6. Scenarios for Effect of Abiotic Stress on Plant Gene Expression

  7. The 1999 Experiment: A Measure of Long Term Adaptation to Drought Stress • Loblolly pine seedlings (two unrelated genotypes “C” and “D”) were subjected to mild or severe drought stress for four (mild) or three (severe) cycles. • Mild stress: needles dried down to –10 bars; little effect on growth, new flushes as in control trees. • Severe stress: needles dried down to –17 bars; growth retardation, fewer new flushes compared to controls. • Harvest RNA at the end of growing season, determine patterns of gene expression on DNA microarrays. • With algorithms incorporated into Expresso, identify genes and groups of genes involved in stress responses.

  8. Hypotheses • There is a group of genes whose expression confers resistance to drought stress. • Expression of this group of genes is lower under severe than under mild stress. • Individual members of gene families show distinct responses to drought stress.

  9. Selection of cDNAs for Arrays • 384 ESTs (xylem, shoot tip cDNAs of loblolly) were chosen on the basis of function and grouped into categories. • Major emphasis was on processes known to be stress responsive. • In cases where more than one EST had similar BLAST hits, all ESTs were used.

  10. Gene Expression Signal Transduction Protease-associated ROS and Stress Nucleus Environmental Change Protective Processes Cell Wall Related Trafficking Phenylpropanoid Pathway Secretion Cells Cytoskeleton Development Tissues Protected Processes Chloroplast Associated Metabolism Carbon Metabolism Respiration and Nucleic Acids Mitochondrion Categories within Protective and Protected Processes Plant Growth Regulation

  11. A Note about Categories • Categories are not mutually exclusive; gene(s) may be assigned to more then one category. For example, heat shock proteins have been grouped under these different categories and subcategories • Abiotic stress – heat • Gene expression – post-translational processing – chaperones • Abiotic stress - chaperones

  12. Drought Dehydrins, Aquaporins Heat Heat shock proteins (Chaperones) Abiotic Non-Plant Biotic Cytosolic ascorbate peroxidase Xenobiotics GSTs “Isoflavone Reductases” Chaperones Antioxidant Processes superoxide dismutase-Fe NADPH/Ascorbate/ Glutathione Scavenging Pathway superoxide dismutase-Cu-Zn Sucrose Metabolism Stress glutathione reductase Cellulose Protective Processes Cell Wall Related Arabionogalactan proteins Extensins and proline rich proteins Phenylpropanoid Pathway Hemicellulose Pectins Xylose 4-coumarate-CoA ligases Other Cell Wall Proteins Lignin Biosynthesis CCoAOMTs isoflavone reductases cinnamyl-alcohol dehydrogenase phenylalanine ammonia-lyases S-adenosylmethionine decarboxylases glycine hydromethyltransferases Categories within “Protective Processes”

  13. Quality Control • Positive: LP-3, a loblolly gene known to respond positively to drought stress in loblloly pine, was included. • LP-3 was positive in the moist versus mild comparison, and unchanged in the moist versus severe comparison. • Negative: Four clones of human genes used as negative controls in the Arabidopsis Functional Genomics project were included. The clones did not respond.

  14. Drought Dehydrins, Aquaporins Heat Heat shock proteins Abiotic Non-Plant Biotic Xenobiotics GSTs Cystosolic ascorbate peroxidase “Isoflavone Reductases” Chaperones Antioxidant Processes superoxide dismutase-Fe NADPH/Ascorbate/ Glutathione Scavenging Pathway superoxide dismutase-Cu-Zn Sucrose Metabolism ROS and Stress glutathione reductase Cellulose Protective Processes Cell Wall Related Extensins, Arabionogalactan, and Proline Rich Proteins Phenylpropanoid Pathway Hemicellulose Pectins Xylose 4-coumarate-CoA ligase Other Cell Wall Proteins Lignin Biosynthesis CCoAOMT isoflavone reductases cinnamyl-alcohol dehydrogenase phenylalanine ammonia-lyase S-adenosylmethionine decarboxylase glycine hydromethyltransferase Categories that contained positives in genotypes C and D (Control versus Mild) Data from two slides (4 arrays) for C and two slides (4 arrays) for D were collected.

  15. Hypotheses versus Results • Among the genes responding to mild stress, there exists a population of genes whose expression confers resistance. • Genes in 69 categories responded positively to mild stress in Genotypes C and D (the positive response was not observed in the severe stress condition in Genotype D). • There is evidence for a response to drought among genes associated with other stresses. • Isoflavone reductase homologs and GSTs responded positively to mild drought stress. • These categories are previously documented to respond to biotic stress and xenobiotics, respectively.

  16. Relationships among HSP homologs In control versus mild stress, HSP 100, 70, and 23 responded in C and D; HSP 80s did not respond in either C or D.

  17. Candidate Categories • Include • Aquaporins • Dehydrins • Heat shock proteins/chaperones • Exclude • Isoflavone reductases

  18. Experimental Design: Computational and Statistical Issues • Numerous sources of error in microarray experiments: identify, control, and analyze • Clones on a microarray need to be replicated and randomly placed (Lee et al., PNAS 97, August 29, 2000, 9834-9) • Differing results among replicates can indicate sources of error; consistency gives confidence

  19. Expresso: A Problem Solving Environment (PSE) for Microarray Experiment Design and Analysis • Integration of design and procedures • Integration of image analysis tools and statistical analysis (via Perl scripts) • Connections to web database and sequence alignment tools • The software Aleph was used for inductive logic programming (ILP).

  20. Expresso: A Microarray Experiment Management System

  21. Design of Microarrays I • Selected 384 archived ESTs • Organized into 4 microtitre source plates after PCR • Pipetted into 8 sets of 4 randomized microtitre plates; each set a different arrangement of the 384 ESTs • Printed type A microarrays from first 4 sets (16 plates); printed type B microarrays from second 4 sets • Each array type has 4 replicates of each EST, randomly placed

  22. Design of Microarrays II • Each slide contained 2 identical arrays (of type A or B), 4 replicates of each EST per array • Each slide, therefore, has a total of 8 replicates of each EST • A second slide also contained 2 arrays of the other type, 4 replicates of each EST • Total of 16 replicates of each EST for a 2 slide set

  23. Design of Microarrays II

  24. Spot and Clone Analysis • Image Analysis: gridding, spot identification, intensity and background calculation, normalization • Statistics: fold or ratio estimation, combining replicates • Higher-level Analysis: a slew of clustering methods, inductive logic programming (ILP)

  25. Analysis of Expression Data • Microarray Suite: Manual grid; extract intensities for each spot; compute ratios; compute calibrated ratios • Spot Statistics: • Every calibrated ratio is divided by the mean of all the uncalibrated ratios; the result is simply that the mean of the calibrated ratios is 1.0 • Our tools use the logarithm of each calibrated ratio • Positive: expression increase • Negative: expression decrease • Zero: no change in expression

  26. Analysis of Expression Data • The multiple (typically 16) log calibrated ratios for a replicated clone do NOT follow a normal distribution. • Distribution is spread relatively evenly over a large range. • Statistical analysis based on mean and standard deviation will be overly pessimistic in identifying clones that are up- or down-expressed. • From the observation of an even spread of the log ratios, we assume that a clone whose expression is not different from a probe pair will show a distribution centered at a mean log ratio of 0.0.

  27. Computational Methods (A Probabilistic Analysis) • In a zero-centered distribution, the probability that any particular log ratio is positive (or negative) is 0.5. • The number of positive (or negative) log ratios follows a binomial distribution with parameters 16 and 0.5. • The probability of 12 positive log ratios (or 12 negative log ratios), out of 16, for a clone whose expression was unaffected by drought stress is 0.0384064. • A clone with 12 or more positive log ratios is up-expressed with a probability 0.96.

  28. Computational Methods (Alternate Assumptions) • Our more general assumption avoids the trap of having to classify the response of each SPOT; rather, we classify the response of an EST as one of • Up-regulated • Down-regulated • No clear change • Response CLASSIFICATION rather than QUANTIFICATION allows us to develop unified relationships among genes and among treatments. • Provides sufficient results for the use of inductive logic programming (ILP).

  29. Related Statistical Results • Chen et al. (J. Biomed. Optics 2, 1997, 364-374) • Assume a normal distribution and normalize ratios • No replicates • Estimate a confidence interval for ratios that applies to each spot • Lee et al. (PNAS 97, August 29, 2000, 9834-9) emphasize need for replication • Black and Doerge (PNAS, to appear) • Investigate distributional assumptions of log-normal and gamma distributions on intensities • Determine the number of replicates needed for a particular confidence level under each distribution • Assume that normalization and location-dependent noise have been eliminated.

  30. Clustering Techniques Clustering Conceptual Clustering Attribute-Value Methods Similarity-Metric SVMs SOMs Divisive Agglomerative (top-down) (bottom-up)

  31. Inductive Logic Programming • ILP is a data mining algorithm expressly designed for inferring relationships. • By expressing relationships as rules, it provides new information and resultant testable hypotheses. • ILP groups related data and chooses in favor of relationships having short descriptions. • ILP can also flexibly incorporate a priori biological knowledge (e.g., categories and alternate classifications).

  32. ILP subsumes two forms of reasoning • Unsupervised learning • “Find clusters of genes that have similar/consistent expression patterns” • Supervised learning • “Find a relationship between a priori functional categories and gene expression” • Hybrid reasoning • “Is there a relationship between genes in a given functional category and genes in a particular expression cluster?” • ILP mines this information in a single step

  33. Rule Inference in ILP • Infers rules relating gene expression levels to categories, both within a probe pair and across probe pairs, without explicit direction • Example Rule: • [Rule 142] [Pos cover = 69 Neg cover = 3] • ~level(A,moist_vs_severe,positive) :- level(A,moist_vs_mild,positive). • Interpretation: • “If the moist versus mild stress comparison was positive for some clone named A, it was negative or unchanged in the moist versus severe comparison for A, with a confidence of 95.8%.”

  34. More Rules we Obtained • [Rule 6] • level(A,moist_vs_mild,positive) :- • category(A, transport_protein). • level(A,mild_vs_severe,negative) :- • category(A, transport_protein). • [Rule 13] • level(A,moist_vs_mild,positive) :- • category(A, heat). • [Rule 17] • level(A,moist_vs_mild,positive) :- • category(A, cellwallrelated).

  35. ILP in a Data Mining Context Clustering Conceptual Clustering Attribute-Value Methods ILP combines the expressiveness of conceptual clustering with the efficiency of attribute-value techniques. Similarity-Metric SVMs SOMs Divisive Agglomerative (top-down) (bottom-up)

  36. Current Status of Expresso • Completely automated and integrated • Statistical analysis • Data mining • Experiment capture in MEL • Current Work: Integrating • Image processing • Querying by semi-structured views • Automatic experiment composition • Future Work • Model-based design and management • Randomized experiment layout with constraints • Closing-the-loop

  37. Future Directions Next Generation Stress Chips • Time course, short and long term, to capture gene expression events underlying “emergency” and adaptive events following drought stress imposition. • (Use all available ESTs for candidate stress resistance genes.) • Generate cDNA library from stressed seedlings. Screen for full-length clones. Repeat Step 1. • Initiate modeling of kinetics of drought stress responses.

  38. Expresso: Future Directions • An open, integrated system for design, process, analysis, data mining, data storage, and integration of information from web-based resources. • Supports closing the experimental loop. Accumulated results influence later experiments, as well as enable construction of testable models of pathways. • Multiple models are refined and evaluated within Expresso. • Biologists have interactive access to models and control Expresso’s components.

More Related