1 / 38

Expression analysis 2

Expression analysis 2. Introduction to Bioinformatics morten@binf.ku.dk. Program. Jeppe Vinther Array quality Finding significantly expressed genes Spreadsheet exercise dChip exercise Overrepresented gene sets dChip exercise Web exercise (DAVID) Clustering Distance measure exercise

eden
Download Presentation

Expression analysis 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Expression analysis 2 Introduction to Bioinformatics morten@binf.ku.dk

  2. Program • Jeppe Vinther • Array quality • Finding significantly expressed genes • Spreadsheet exercise • dChip exercise • Overrepresented gene sets • dChip exercise • Web exercise (DAVID) • Clustering • Distance measure exercise • Clustering in dChip exercise

  3. Array quality • Open the CEL-image for MCF7-AV_b_A • Look for artefacts • Also check the others

  4. Finding significant genes • Often a combination of • P-value from t-statistics • High variability requires more replicates • Fold change • Demonstrate in dChip • You do it! • Take a look at the resulting spreadsheet

  5. Putting genes into classes • What can we do with our list of genes? All genes angiogenesis On Y-chr Tyrosin-kinases Targeted to mitochondria Our genes Skeletal development Glycolysis DNA replication Upregulated in brainstem

  6. Gene ontology • Effort to categorize gene products using a controlled vocabulary • Three organising principles (cytochrome c) • Molecular function (oxidoreductase activity) • Biological process (oxidative phosphorylation, induction of cell death) • Cellular component (mitochondrial matrix, mitochondrial inner membrane)

  7. Organisation of GO • Example: Interleukin-12 • Directed acyclic graph • Note the GOIDs • Tools for finding overrepresented GO terms in a set of genes • dChip • EASE • DAVID • …many more

  8. Other classification schemes • GO • Pathways – the KEGG database • Protein domains (from PFAM) • Chromosomal location

  9. Overrepresentation exercises • ”classify genes” in dChip • Find overrepresented annotation in upregulated genes. Instructions in the handouts • DAVID • Do the same here

  10. Clustering

  11. Why cluster? • To find genes that behave similarily • Perhaps they have a common regulator? • To find samples that are similar • E.g. Discover subtypes of disease samples.

  12. Have you seen these? Experiments can also be clustered Ring a bell? 1 row = 1 expression vector Similar rows are grouped or clustered

  13. Agglomerative clustering 0 1 2 3 4 a a,b b c d e

  14. Agglomerative clustering 0 1 2 3 4 a a,b b c d d,e e

  15. Agglomerative clustering 0 1 2 3 4 a a,b b c c,d,e d d,e e

  16. Agglomerative clustering 0 1 2 3 4 a a,b b a,b,c,d,e c c,d,e d d,e e … and the tree is constructed

  17. Expression vectors • Each gene can be represented as a point in space • Dimension of the space = the number of different experiments

  18. Requirement for hierachical clustering • A distance matrix!! • Rings a bell from phylogeny?

  19. Distance measures • Euclidian metrics • Non-euclidean metrics • Semimetric distances

  20. c b a Euclidean metric (x1,y1) a2 + b2 = c2 (x2,y2) Generalised to n dimensions

  21. Requirements for a metric Non-negative Symmetric Distance to self is zero Triangle inequality

  22. Non-euclidean metrics Manhattan metric

  23. Semimetric distance - correlation • Similarity inversely related to distance • 1 – similarity measure

  24. Clustering of high dimensional data • Unsupervised learning of patterns in the data • Hierarchical clustering • K-means clustering • Self-organising maps

  25. Mini exercise • Calculate different distance measures in a spreadsheet

  26. Mini exercise • Try hierachical clustering in dChip • Do point 11 and 12 in the handouts • Try using different distance measures • Try exporting branches of the tree (Clustering->export branch) and do functional classification of those • Walkthrough afterwards

  27. Other ways of grouping data points • Hierachical clustering => builds a tree • K-means => partitions points into k groups • Self organising maps (a.k.a Kohonen maps) • demo

  28. In the clinic

  29. Clinical goals • Improve the diagnostic categorization • Identify useful predictive markers for outcome and therapeutic response • Identify points for intervention: • critical pathways • drug targets Supervised learning

  30. Supervised learning

  31. Training set Negative examples (not ovarian cancers) Positive examples (ovarian cancers) Machine Learning I think this is an ovarian cancer! (confidence is xxx) ”Machine” Unknown sample Neural networks Linear discriminant analysis K-nearest neighbours Support vector machines …

  32. A typical (easy) sample set I

  33. A typical (easy) sample set II Easy to distinguish by one measurement per individual.

  34. A harder sample set I We can tell apples from oranges. But can we distinguish different kinds of apples?

  35. kNN K=4 • Of the 4 nearest neighbours: • 3 are green • 1 is red • So we conclude that ? Is green ?

  36. Error on training set Error on testset cross validation Performance of machine learning • How correctly does it predict known examples? • Beware of overtraining • Assess performance on data not used for training

  37. Microarray summary • Very powerful technology – measure all genes • Noise issues. Lots of data  more possibilities for wrong data • Results are not the ”truth” but hypothesis for testing • Statistical significance != biological significance • Change in analysis will change results • Important to try different things and use judgement • Test your hypothesis using different approaches – the more different the better. • You have only scraped the surface – so when faced with problems, seek assistance

  38. Other uses of microarrays • DNA targets • Copy number analysis • SNP detection • Tiling arrays • Whole genome for transcript mapping • Promotor regions for chromatin immunoprecipitation

More Related