310 likes | 432 Views
Dr Paul Lewis. Lecturer in Bioinformatics Cardiff University Biostatistics & Bioinformatics Unit. Biostatistics & Bioinformatics Unit (BBU). Bioinformatics resource for Institutions across Wales Backing of the Higher Education Funding Council for Wales
E N D
Dr Paul Lewis • Lecturer in Bioinformatics • Cardiff University • Biostatistics & Bioinformatics Unit
Biostatistics & Bioinformatics Unit (BBU) • Bioinformatics resource for Institutions across Wales • Backing of the Higher Education Funding Council for Wales • - £1.5 million grant through the Research Capacity Development Fund • 13 new posts in statistics & bioinformatics • UWCM, Cardiff University, Aberystwyth • MSc/Postgraduate Diploma/Postgraduate Certificate: • Bioinformatics • Genetic Epidemiology and Bioinformatics
Brief Overview of Microarray Bioinformatics • Introduce My Microarray Research Interests • My Microarray Analysis Software
Bioinformatics in Microarray Experiment Differential Gene Expression Experimental Design Pattern Discovery Annotation Hybridisation Class Prediction Data Normalisation
Normalization Remove non-biological influences on data (systematic variation) • 3 categories of Normalisation • Normalisation – transform data to make more like a normal distribution • log, lowess, linlog • Standardisation – expand or contract distribution so data from • different experiments can be compared • calculate Z-scores • Centralisation – move distribution so its centered around expected mean • mean / median / mean trimmed centering
Bioinformatics in Microarray Experiment Differential Gene Expression Experimental Design Pattern Discovery Annotation Hybridisation Class Prediction Data Normalisation
Find Differentially Expressed Genes Is fold change significant? With Replicates • Parametric tests • t-test (ANOVA) J. Comput. Biol. 2000 7: 817-838 • Bayesian t-test Bioinformatics 2001 17: 509-519. • Mixture modelling & bootstrapping (SAM) P.N.A.S. 2001 98: 5116-5121 • Regression modelling Genome Res. 2001 11: 1227-1236. • All give similar results but SAM reduces false positives • Non Parametric Tests • Wilcoxon rank sum test Bioinformatics 2002 18: 1454-1461 • Non-parametric t-test Bioinformatics 2002 18: 1454-1461 • Ideal discriminator method Bioinformatics 2002 18: 1454-1461 • low false positive rate but less power
Bioinformatics in Microarray Experiment Differential Gene Expression Experimental Design Pattern Discovery Annotation Hybridisation Class Prediction Data Normalisation
Pattern Discovery & Class Prediction Explore how genes or samples group: Clustering Hierarchical Cluster Analysis HIERARCHY K-Means Self Organising Maps (SOM) PARTITION Fuzzy ART Principal Components Analysis (PCA) Multidimensional Scaling (MDS) REDUCTION Correspondence Analysis (CoA) Assign genes to known groupings: Classification logistic regression neural networks linear discriminant analysis
Partitioning Clustering Methods K-Means & SOM • Need To Tell Methods Number of Clusters • Genes Partitioned into Clusters • What are Relationships Between Clusters?
2D & 3D Mapping Methods Data Projected onto 2 or 3 Dimensions CoA MDS But….What are Cluster Boundaries? PCA
Bioinformatics in Microarray Experiment Differential Gene Expression Experimental Design Pattern Discovery Annotation Hybridisation Class Prediction Data Normalisation
Annotation Online Tools: ARROGANT http://lethargy.swmed.edu/ DAVID http://apps1.niaid.nih.gov/david/ DRAGON http://207.123.190.10/dragon.htm EASE http://apps1.niaid.nih.gov/david/ FANTOM http://www.gsc.riken.go.jp/e/FANTOM/ GoMiner http://discover.nci.nih.gov/gominer/ MatchMiner http://discover.nci.nih.gov/matchminer/ Onto-Express http://vortex.cs.wayne.edu/Projects.html RESOURCERER http://pga.tigr.org/tigr-scripts/magic/r1.pl Affymetrix GO http://www.affymetrix.com Databases: Gene Ontology http://www.geneontology.org/ OMIM http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM LocusLink http://www.ncbi.nlm.nih.gov/LocusLink/ UniGene http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM LocusLink http://www.ncbi.nlm.nih.gov/LocusLink/
My Research Interests Pattern Discovery Take - 2D & 3D Mapping Methods Methods - Define Cluster Boundaries Make FUZZY Algorithm Development 2D & 3D Visualisation Tools EAS-I Biologist-Friendly Software Tools
Cluster Boundaries MDS CoA PCA
Fuzzy Clustering • Differs to standard clust by assigning membership of a gene to all clusters • Allows you to see the association of each gene within a cluster • Can calculate the number of clusters in Partitioning methods (Fuzzy ART) • Helps Combine Clusters • Helps to clear Ambiguity
Fuzzy Mapping Add Membership values of each gene to clusters
Fuzzy Partitioning K-Means & SOM
EASI DATA REDUCTION VISUALISATION
EASI BBUnit Microarray Pattern Discovery • Need for Comprehensive Pattern Discovery Software Suite • Fuzzy Data Analysis Suite • Visualisation Tools to explore data • Easy to use • Free • Web based version • Service by BBU • Increase traffic to BBU web site • Establish BBU for microarray • Cross platform
EASI INTERFACE Differential Gene Expression Pattern Discovery Utilities Normalisation • Hierarchical Cluster Analysis • SOM • K-Means • Fuzzy Art • PCA • MDS • CoA • Fuzzy C-Means • Log • Normalise • Mean Centre • Median centre • T test • ANOVA • Regression
INTERFACE EASI
Contact lewispd@cf.ac.uk http://bbu.uwcm.ac.uk
Acknowledgements • Pete Kille • Alan Clarke • Gareth Hughes (EASI team) • Karen Reed (Data) • Lesley Jones (Data, & EASI Collaborator) • BBU