expression analysis 2 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Expression analysis 2 PowerPoint Presentation
Download Presentation
Expression analysis 2

Loading in 2 Seconds...

play fullscreen
1 / 38
eden

Expression analysis 2 - PowerPoint PPT Presentation

75 Views
Download Presentation
Expression analysis 2
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Expression analysis 2 Introduction to Bioinformatics morten@binf.ku.dk

  2. Program • Jeppe Vinther • Array quality • Finding significantly expressed genes • Spreadsheet exercise • dChip exercise • Overrepresented gene sets • dChip exercise • Web exercise (DAVID) • Clustering • Distance measure exercise • Clustering in dChip exercise

  3. Array quality • Open the CEL-image for MCF7-AV_b_A • Look for artefacts • Also check the others

  4. Finding significant genes • Often a combination of • P-value from t-statistics • High variability requires more replicates • Fold change • Demonstrate in dChip • You do it! • Take a look at the resulting spreadsheet

  5. Putting genes into classes • What can we do with our list of genes? All genes angiogenesis On Y-chr Tyrosin-kinases Targeted to mitochondria Our genes Skeletal development Glycolysis DNA replication Upregulated in brainstem

  6. Gene ontology • Effort to categorize gene products using a controlled vocabulary • Three organising principles (cytochrome c) • Molecular function (oxidoreductase activity) • Biological process (oxidative phosphorylation, induction of cell death) • Cellular component (mitochondrial matrix, mitochondrial inner membrane)

  7. Organisation of GO • Example: Interleukin-12 • Directed acyclic graph • Note the GOIDs • Tools for finding overrepresented GO terms in a set of genes • dChip • EASE • DAVID • …many more

  8. Other classification schemes • GO • Pathways – the KEGG database • Protein domains (from PFAM) • Chromosomal location

  9. Overrepresentation exercises • ”classify genes” in dChip • Find overrepresented annotation in upregulated genes. Instructions in the handouts • DAVID • Do the same here

  10. Clustering

  11. Why cluster? • To find genes that behave similarily • Perhaps they have a common regulator? • To find samples that are similar • E.g. Discover subtypes of disease samples.

  12. Have you seen these? Experiments can also be clustered Ring a bell? 1 row = 1 expression vector Similar rows are grouped or clustered

  13. Agglomerative clustering 0 1 2 3 4 a a,b b c d e

  14. Agglomerative clustering 0 1 2 3 4 a a,b b c d d,e e

  15. Agglomerative clustering 0 1 2 3 4 a a,b b c c,d,e d d,e e

  16. Agglomerative clustering 0 1 2 3 4 a a,b b a,b,c,d,e c c,d,e d d,e e … and the tree is constructed

  17. Expression vectors • Each gene can be represented as a point in space • Dimension of the space = the number of different experiments

  18. Requirement for hierachical clustering • A distance matrix!! • Rings a bell from phylogeny?

  19. Distance measures • Euclidian metrics • Non-euclidean metrics • Semimetric distances

  20. c b a Euclidean metric (x1,y1) a2 + b2 = c2 (x2,y2) Generalised to n dimensions

  21. Requirements for a metric Non-negative Symmetric Distance to self is zero Triangle inequality

  22. Non-euclidean metrics Manhattan metric

  23. Semimetric distance - correlation • Similarity inversely related to distance • 1 – similarity measure

  24. Clustering of high dimensional data • Unsupervised learning of patterns in the data • Hierarchical clustering • K-means clustering • Self-organising maps

  25. Mini exercise • Calculate different distance measures in a spreadsheet

  26. Mini exercise • Try hierachical clustering in dChip • Do point 11 and 12 in the handouts • Try using different distance measures • Try exporting branches of the tree (Clustering->export branch) and do functional classification of those • Walkthrough afterwards

  27. Other ways of grouping data points • Hierachical clustering => builds a tree • K-means => partitions points into k groups • Self organising maps (a.k.a Kohonen maps) • demo

  28. In the clinic

  29. Clinical goals • Improve the diagnostic categorization • Identify useful predictive markers for outcome and therapeutic response • Identify points for intervention: • critical pathways • drug targets Supervised learning

  30. Supervised learning

  31. Training set Negative examples (not ovarian cancers) Positive examples (ovarian cancers) Machine Learning I think this is an ovarian cancer! (confidence is xxx) ”Machine” Unknown sample Neural networks Linear discriminant analysis K-nearest neighbours Support vector machines …

  32. A typical (easy) sample set I

  33. A typical (easy) sample set II Easy to distinguish by one measurement per individual.

  34. A harder sample set I We can tell apples from oranges. But can we distinguish different kinds of apples?

  35. kNN K=4 • Of the 4 nearest neighbours: • 3 are green • 1 is red • So we conclude that ? Is green ?

  36. Error on training set Error on testset cross validation Performance of machine learning • How correctly does it predict known examples? • Beware of overtraining • Assess performance on data not used for training

  37. Microarray summary • Very powerful technology – measure all genes • Noise issues. Lots of data  more possibilities for wrong data • Results are not the ”truth” but hypothesis for testing • Statistical significance != biological significance • Change in analysis will change results • Important to try different things and use judgement • Test your hypothesis using different approaches – the more different the better. • You have only scraped the surface – so when faced with problems, seek assistance

  38. Other uses of microarrays • DNA targets • Copy number analysis • SNP detection • Tiling arrays • Whole genome for transcript mapping • Promotor regions for chromatin immunoprecipitation