MDMS-A Web Tool to Manage & Analyze Gene Expression Microarray Data
MDMS-A Web Tool to Manage & Analyze Gene Expression Microarray Data

MDMS-A Web Tool to Manage & Analyze Gene Expression Microarray Data - PowerPoint PPT Presentation

Overview
Steps in analysis of Gene Expression Microarray Data
Preprocessing
Filtering
Statistical Analysis
Machine Learning & Data Mining (Clustering)
Functional Analysis
Data Analysis features in MDMS

PowerPoint Slideshow about ' MDMS-A Web Tool to Manage & Analyze Gene Expression Microarray Data' - frieda

  • Steps in analysis of Gene Expression Microarray Data
    • Preprocessing
    • Filtering
    • Statistical Analysis
    • Machine Learning & Data Mining (Clustering)
    • Functional Analysis
  • Data Analysis features in MDMS
  • Workflow in MDMS
  • Analysis of Early Lung Development dataset using MDMS
  • MDMS Demo
Image Quantification &

Quality Control



Statistical Analysis

Machine learning

Functional Analysis

Analysis of Data ~ Deriving Knowledgebase from Datum and mining

Information from the knowledgebase

Steps in Microarray Data Analysis
  • Image Quantification
    • Check for artifacts, Segmentation
    • Extraction of expression values of genes
  • Preprocessing
    • Background Correction
    • Normalization
    • Summarization
    • MAS5, RMA, GC-RMA, DChip

Steps in Microarray Data Analysis
  • Filtering
    • About 10%-50% of the genome is not expressed in a given tissue
    • Aim is to isolate the genes that are expressed
    • Also helps in more accuracy in statistical significance tests
    • Specific & Non-specific filtering
      • Filter of Presence/Absence calls
      • Filter on expression signal, Variability in gene expression
Steps in Microarray Data Analysis
  • Statistical Analysis
    • Many genes will be expressed to perform many routine tasks in the cell
    • Aim is to isolate genes responsible for phenotypic variation
    • Interesting Vs Random
    • Variant significance tests ~ T-Test, ANOVA
    • Multiple Testing Correction
Steps in Microarray Data Analysis
  • Machine Learning Approaches ~ Data Mining
    • Small changes in gene expressions can collectively regulate an important pathway, which by themselves may not be statistically significant
    • Limitations with fewer replicates and fitting approximate models on data during statistical analysis
    • Aim is to find significant patterns in the data set.
      • Periodic, Time-lagged, cyclic
    • Machine Learning approaches mine data for information ~ data mining using computational and statistical techniques (Eg Clustering)
functional analysis
Functional Analysis
  • Functional Analysis
    • Given a statistically significant pattern or list significant of genes, how significant is it biologically?
    • Aim is to find genes that are responsible for the phenotypic condition
    • Extracting annotations and finding functionally similar genes.
      • Gene Ontology
    • Gene set enrichment, relating genes to known pathways

data analysis features in mdms
Data Analysis Features in MDMS
  • All data analysis features in MDMS are implemented through Bioconductor Package (
    • Covers many aspects of data analysis for Gene-Expression, SNP, Custom made arrays
    • Many different tests for quality control, preprocessing, filtering, statistical analysis, machine learning and functional analysis
    • Large user community, helpful mailing lists, used by many labs in many countries
    • Tutorials are available on the website and hands-on training is also available.
    • Better than all available packages in terms of coverage of data analysis aspects.
    • Open Source
data analysis features in mdms1
Data Analysis Features in MDMS
  • MDMS supports Affymetrix Gene Expression arrays
  • No Image Quantification (usually done at microarray facility)
  • Quality Control
    • 3’/5’ bias
    • % Detection calls
    • Background signals
    • Correlation coefficients between arrays
mdms preprocessing
MDMS - Preprocessing
  • Preprocessing
    • MAS5 – Default Affymetrix normalization
    • RMA – Robust Multichip Analysis
    • GC-RMA, DChip (Li-Wong)
    • MAS5 and RMA are highly recommended
    • Available literature shows significant advantages of RMA over MAS5
mdms filtering
MDMS - Filtering
  • Filtering
    • Expression value cut-off
      • Eg. All genes > 200
    • Detection calls
      • Eg. All genes that are detected as Present
    • Fold Change
      • Eg. All genes that have > 2 fold or less than -2 fold
    • Inter-Quartile Range (1st & 3rd quartiles)
      • For genes that show higher variability
  • All analysis is done on a log 2 scale
mdms statistical analysis
MDMS – Statistical Analysis
  • Significance Tests
    • LIMMA (Linear Models of Microarrays)
    • SAM (Significance Analysis of Microarrays)
    • EBAM (E-Bayes Analysis of Microarrays)
    • Correction for Multiple Testing
      • FDR, Bonferroni, Holm’s correction
  • Machine Learning
    • Clustering
      • Hierarchical Clustering, K-Means, Self Organizing Maps.
mdms functional analysis
MDMS-Functional Analysis
  • Functional Analysis through GOAPhAR
    • Gene Annotation
    • Protein Annotation
    • Biological Pathways
    • Gene Ontology Annotation
    • Protein Interaction Evidence
  • All gene lists generated using the data analysis options can be saved in the database for future use. These can be also downloaded as text files.
mdms workflow




Data Repository


Rat2302, Hg133U





Statistical Analysis

Machine Learning



data analysis example
Data Analysis Example
  • Data set specifications (GSE3541)
  • The aim of the study is to find genes involved in early lung development.
  • Mechanical Stress was applied to fetal type II endothelial cells taken from 19 day old rat embryos
  • Data set Processing
    • Data was preprocessed by MAS5
    • Expression > 200, Invariant change between pairs of control & experiment samples > 50 (75% filtered)
    • SAM statistical method was used to find significant genes (92 genes, 63 up and 29 down-regulated)
    • 34 up-regulated genes were selected for further analysis
biological significance of clusterings
Biological Significance of Clusterings
  • K-Means was applied to 34 genes, with K=2, 3, 4, ….,29
  • Random clusterings were generated for K = 2,3,4,…29 to compare the statistical clusterings to random
  • Biological significance scores were calculated for all clusterings.
  • A z-score and P-value was calculated for each K value
biological significance of clusterings1
Biological Significance of Clusterings
  • The study found that genes related to amino acid synthesis, amino acid transport and sodium ion transport contributed to lung development.
  • 1 gene for sodium ion transport
  • 4 genes for amino acid transport were found in 2 clusters
  • 4 genes for amino acid synthesis were found in 2 clusters
  • Demonstration - Using MDMS to analyze data
  • Questions, comments, suggestions