1 / 9

AHM 2002

SCENARIO_Genome-scale Modeling of Low-Dose Irradiation Responses Using Microarray Based Gene Networks. Hypotheses: Genes that show similar expression patterns in response to low-dose irradiation are components of coordinated expression groups (called synexpression groups) and that understanding the

Albert_Lan
Download Presentation

AHM 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. AHM 2002 Tutorial on Scientific Data Mediation Example 1

    2. SCENARIO_Genome-scale Modeling of Low-Dose Irradiation Responses Using Microarray Based Gene Networks Hypotheses: Genes that show similar expression patterns in response to low-dose irradiation are components of coordinated expression groups (called synexpression groups) and that understanding the differential regulation of these synexpression groups will (a) provide DNA-sequenced-based understanding of the complex biological processes associated with low-dose radiation and (b) identify determinants of radiation dose and genetic susceptibility to radiation damage.

    3. SPECIFIC AIMS 1. Develop a web-accessible database resource to assemble microarray transcription profiles of radiation responsive genes and to link these genes to genomic and cDNA sequence information. 2. Apply statistical and bioinformatic tools to identify novel synergistic gene expression groups of radiation responsive genes. 3. Apply the model to the analysis of gene/pathway responses to low-dose IR.

    6. CLUSFAVOR CLUSFAVOR- CLUSter and Factor Analysis with Varimax Orthogonal Rotation A standalone program whose output consists of several clusters of named sequences that have similar expression characteristics in the current experiment. GOAL: Given a gene expression data, to end up with another set of related sequences from which to build a model. INPUT: gene expression data OUTPUT: collection of clustered cDNA fragments CLUSFAVOR is a Windows-based computer program for exploratory cluster and principal components analysis. It was designed for large sample problems devoted to genomic analysis of gene expression data from cDNA and oligonucleotide microarrays. Meanings of the components that form the name. CLUSTER ANALYSIS: An exploratory multivariate statistical method that attempts to find the natural groupings of objects based on attribute information about the objects. Here, whatever being clustered is the object (variables, records, arrays, genes). The end result of cluster analysis using CLUSFAVOR is a cluster image display containing the dendograms(tree diagrams) showing the grouping of arrays and genes according to the order in which they were joined during clustering. FACTOR ANALYSIS: is another exploratory multivariate statistical method for extracting a smaller set of orthogonal (uncorrelated) variables from the data to explain a majority of the total variance. VARIMAX ORTHOGONAL ROTATION: Components loadings from a typical principal components analysis can usually result in high loadings on more than one component. The varimax orthogonal rotation transform was introduced in CLUSFAVOR to rotate components in order to maximize the loadings on (mostly) one of the components. CLUSFAVOR is a Windows-based computer program for exploratory cluster and principal components analysis. It was designed for large sample problems devoted to genomic analysis of gene expression data from cDNA and oligonucleotide microarrays. Meanings of the components that form the name. CLUSTER ANALYSIS: An exploratory multivariate statistical method that attempts to find the natural groupings of objects based on attribute information about the objects. Here, whatever being clustered is the object (variables, records, arrays, genes). The end result of cluster analysis using CLUSFAVOR is a cluster image display containing the dendograms(tree diagrams) showing the grouping of arrays and genes according to the order in which they were joined during clustering. FACTOR ANALYSIS: is another exploratory multivariate statistical method for extracting a smaller set of orthogonal (uncorrelated) variables from the data to explain a majority of the total variance. VARIMAX ORTHOGONAL ROTATION: Components loadings from a typical principal components analysis can usually result in high loadings on more than one component. The varimax orthogonal rotation transform was introduced in CLUSFAVOR to rotate components in order to maximize the loadings on (mostly) one of the components.

    7. NCBI GeneBank GOAL: Given the name (or, better, the accession number) of a cDNA string from the clusfavor results, do a name lookup in GenBank to obtain the cDNA sequence. INPUT: The accession number or the name of a cDNA string OUTPUT: cDNA sequence for the input cDNA string GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research 2000 Jan 1;28(1):15-8). There are approximately 15,850,000,000 bases in 14,976,000 sequence records as of December 2001 (see GenBank growth statistics). As an example, you may view the record for a Saccharomyces cerevisiae gene. The complete release notes for the current version of GenBank are available. A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research 2000 Jan 1;28(1):15-8). There are approximately 15,850,000,000 bases in 14,976,000 sequence records as of December 2001 (see GenBank growth statistics). As an example, you may view the record for a Saccharomyces cerevisiae gene. The complete release notes for the current version of GenBank are available. A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.

    8. BLAST Basic Local Alignment Search Tool_ BLAST A set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. INPUT: Output cDNA sequence from GeneBank. OUPUT: A set of similar sequences. BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity to distant sequence relationships. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. BLAST uses a heuristic algorithm which seeks local as opposed to global alignments and is therefore able to detect relationships among sequences which share only isolated regions of similarity (Altschul et al., 1990). For a better understanding of BLAST you can refer to the BLAST Course which explains the basics of the BLAST algorithm. There is also a simple BLAST tutorial located under the Education link in the sidebar of the NCBI home page (http://www.ncbi.nlm.nih.gov/). BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity to distant sequence relationships. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. BLAST uses a heuristic algorithm which seeks local as opposed to global alignments and is therefore able to detect relationships among sequences which share only isolated regions of similarity (Altschul et al., 1990). For a better understanding of BLAST you can refer to the BLAST Course which explains the basics of the BLAST algorithm. There is also a simple BLAST tutorial located under the Education link in the sidebar of the NCBI home page (http://www.ncbi.nlm.nih.gov/).

    9. MatInspector V2.2 based on TRANSFAC MatInspector - Matrix Inspector TRANSFAC - The Transcription Factor Database Search for potential transcription factor binding sites in your own sequences and detect consensus matches in nucleotide sequence data using the TRANSFAC 4.0 matrices. New fast and sensitive tools for detection of consensus matches in nucleotide sequence data MatInd is a simple but powerful method to derive a matrix description of a consensus from a number of short sequences on which the definition of a IUPAC code would be based. A large library (~300 entries) of predefined matrix descriptions for protein binding sites exists and has been tested for accuracy and suitability. Information about the transcription factors connected to these matrices can be retrieved from the TRANSFAC database. MatInspector is a second software tool that utilizes this library of matrix descriptions to locate matches in sequences of unlimited length. MatInspector is almost as fast as a IUPAC search but has been shown to produce superior results. It assigns a quality rating to matches and thus allows quality-based filtering and selection of matches. MatInspector is able to compare one, several, or all sequences in a sequence file against all or selected subsets of matrices from the library in a single program run. It scans both strands of the sequence simultaneously. The methods are described in Quandt, K., Frech, K., Karas, H., Wingender, E., Werner, T.: MatInd and MatInspector - New fast and versatile tools for detection of consensus matches in nucleotide sequence data Nucleic Acids Research 23, pp. 4878-4884 (1995)New fast and sensitive tools for detection of consensus matches in nucleotide sequence data MatInd is a simple but powerful method to derive a matrix description of a consensus from a number of short sequences on which the definition of a IUPAC code would be based. A large library (~300 entries) of predefined matrix descriptions for protein binding sites exists and has been tested for accuracy and suitability. Information about the transcription factors connected to these matrices can be retrieved from the TRANSFAC database. MatInspector is a second software tool that utilizes this library of matrix descriptions to locate matches in sequences of unlimited length. MatInspector is almost as fast as a IUPAC search but has been shown to produce superior results. It assigns a quality rating to matches and thus allows quality-based filtering and selection of matches. MatInspector is able to compare one, several, or all sequences in a sequence file against all or selected subsets of matrices from the library in a single program run. It scans both strands of the sequence simultaneously. The methods are described in Quandt, K., Frech, K., Karas, H., Wingender, E., Werner, T.: MatInd and MatInspector - New fast and versatile tools for detection of consensus matches in nucleotide sequence data Nucleic Acids Research 23, pp. 4878-4884 (1995)

More Related