tutorial 7 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Tutorial 7 PowerPoint Presentation
Download Presentation
Tutorial 7

Loading in 2 Seconds...

play fullscreen
1 / 53

Tutorial 7 - PowerPoint PPT Presentation


  • 200 Views
  • Uploaded on

Tutorial 7. Gene expression analysis. Gene expression analysis. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering - EPCLUST

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tutorial 7' - ide


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
tutorial 7

Tutorial 7

Gene expression analysis

slide2

Gene expression analysis

  • How to interpret an expression matrix
  • Expression data DBs - GEO
  • General clustering methods

Unsupervised Clustering

      • Hierarchical clustering
      • K-means clustering
  • Tools for clustering - EPCLUST
  • Functional analysis - Go annotation
slide3

Gene expression data sources

Microarrays

RNA-seq experiments

slide4

How to interpret an expression data matrix

  • Each column represents all the gene expression levels from:
    • In two-color array:from a single experiment.
    • In one-color array: from a single sample.
  • Each row represents the expression of a gene across all experiments.
slide5

How to interpret an expression data matrix

Each element is a log ratio:

  • In two-color array:log2 (T/R).

T - the gene expression level in the testing sample

R - the gene expression level in the reference sample

  • In one-colorarray: log2(X)

X - the gene expression level in the current sample

slide6

How to interpret an expression data matrix

In two-color array:

Scale

In one-color array:

Scale

Bright green indicates a high expression value

Red indicates a positive log ratio: T>R

Black indicates a log ratio of zero: T=~R

Green indicates a positive log ratio: T>R

Black indicates no expression

slide7

Microarray Data:

Different representations

T>R

Log ratio

Log ratio

T<R

Exp

Exp

slide9

Expression profiles DBs

  • GEO (Gene Expression Omnibus)

http://www.ncbi.nlm.nih.gov/geo/

  • Human genome browser

http://genome.ucsc.edu/

  • ArrayExpress

http://www.ebi.ac.uk/arrayexpress/

slide10

The current rate of submission and processing is over 10,000 Samples per month.

In 2002 Nature journals announce requirement for microarray data deposit to public databases.

slide11

Searching for expression profiles in the GEO

http://www.ncbi.nlm.nih.gov/geo/

geo accession ids
GEO accession IDs

GPL**** - platform ID

GSM**** - sample ID

GSE**** - series ID

GDS**** - dataset ID

  • A Series record denes a set of related Samples considered to be part of a group.
  • A GDS record represents a collection of biologically and statistically comparable GEO samples. Not every experiment has a GDS.
slide13

Clustering

Statistic analysis

Download dataset

clustering
Clustering

Grouping together “similar” genes

clustering1
Clustering
  • Unsupervised learning: The classes are unknown a priori and need to be “discovered” from the data.
  • Supervised learning: The classes are predefined and the task is to understand the basis for the classification from a set of labeled objects. This information is then used to classify future observations.

http://www.bioconductor.org/help/course-materials/2002/Seattle02/Cluster/cluster.pdf

unsupervised clustering
Unsupervised Clustering
  • Hierarchical methods - These methods provide a hierarchy of clusters, from the smallest, where all objects are in one cluster, through to the largest set, where each observation is in its own cluster.
  • Partitioning methods - These usually require the specification of the number of clusters. Then a mechanism for apportioning objects to clusters must be determined.

http://www.bioconductor.org/help/course-materials/2002/Seattle02/Cluster/cluster.pdf

slide24

Hierarchical Clustering

This clustering method is based on distances between expression profiles of different genes. Genes with similar expression patterns are grouped together.

slide25

Rings a bell?...

    • In both phylogenetic trees and in clustering we create a tree based on distances matrix.
  • When computing phylogenetic trees:

We compute distances between sequences.

  • When computing clustering dendogramswe compute distances between expression values.

ATCTGTCCGCTCG

ATGTGTGCGCTTG

Score

Score

slide26
How to determine the similarity between two genes?

Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology23, 1499 - 1501 (2005) ,

http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html

slide27

Hierarchical clustering methods produce a tree or a dendrogram.

They avoid specifying how many clusters are appropriate by providing a partition for each K. The partitions are obtained from cutting the tree at different levels.

2 clusters

4 clusters

6 clusters

slide28

The more clusters you want the higher the similarity is within each cluster.

http://discoveryexhibition.org/pmwiki.php/Entries/Seo2009

slide29

Hierarchical clustering results

http://www.spandidos-publications.com/10.3892/ijo.2012.1644

slide30

Unsupervised Clustering – K-means clustering

An algorithm to classify the data into K number of groups.

K=4

slide31

How does it work?

1

2

3

4

The centroid of each of the k clusters becomes the new means.

k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).

k clusters are created by associating every observation with the nearest mean

Steps 2 and 3 are repeated until convergence has been reached.

The algorithm iteratively divides the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters.

slide32

How should we determine K?

  • Trial and error
  • Take K as square root of gene number
slide33

Tool for clustering - EPclust

http://www.bioinf.ebc.ee/EP/EP/EPCLUST/

slide35

Choose distance metric

Choose algorithm

slide39

K-means clustering

K-means clustering

slide40

Samples found in cluster

Graphical representation of the cluster

Graphical representation of the cluster

slide42

Now what?

Now that we have clusters – we want to know what is the function of each group.

There is a need for some kind of generalization for gene functions.

slide43

Gene Ontology (GO)

http://www.geneontology.org/

The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains:

slide44

Gene Ontology (GO)

Cellular Component (CC) - the parts of a cell or its extracellular environment.

Molecular Function (MF) -the elemental activities of a gene product at the molecular level, such as binding or catalysis.

Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.

slide46

GO sources

ISS Inferred from Sequence/Structural Similarity

IDA Inferred from Direct Assay

IPI Inferred from Physical Interaction

TAS Traceable Author Statement

NAS Non-traceable Author Statement

IMP Inferred from Mutant Phenotype

IGI Inferred from Genetic Interaction

IEP Inferred from Expression Pattern

IC Inferred by Curator

ND No Data available

IEA Inferred from electronic annotation

slide48

DAVID 

http://david.abcc.ncifcrf.gov/

Functional Annotation Bioinformatics Microarray Analysis

  • Identify enriched biological themes, particularly GO terms
  • Discover enriched functional-related gene/protein groups
  • Cluster redundant annotation terms
  • Explore gene names in batch 
slide49

annotation

classification

ID conversion

slide50

Functional annotation

Genes from your list involved in this category

Upload

Charts for each category

Charts for each category

Charts for each category

slide51

Minimum number of genes for corresponding term

Maximum EASE score/ E-value

Genes from your list involved in this category

Genes from your list involved in this category

Enriched terms associated with your genes

Source of term

E-Value

slide53

Gene expression analysis

  • How to interpret an expression matrix
  • Expression data DBs - GEO
  • General clustering methods

Unsupervised Clustering

      • Hierarchical clustering
      • K-means clustering
  • Tools for clustering - EPCLUST
  • Functional analysis - Go annotation