Biology driven clustering of microarray data
Download
1 / 25

Biology-Driven Clustering of Microarray Data - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Biology-Driven Clustering of Microarray Data. K.R. Coombes, K.A. Baggerly, D.N. Stivers, J. Wang, D. Gold, H.G. Sung, and S.J. Lee. Applications to the NCI60 Data Set. Introduction. Microarray data is more than a large, unstructured matrix.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Biology-Driven Clustering of Microarray Data' - onslow


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Biology driven clustering of microarray data

Biology-Driven Clustering of Microarray Data

K.R. Coombes, K.A. Baggerly, D.N. Stivers,

J. Wang, D. Gold, H.G. Sung, and S.J. Lee

Applications to the NCI60 Data Set


Introduction
Introduction

  • Microarray data is more than a large, unstructured matrix.

    • We already know many genes important for studying cancer through their involvement in specific biological processes

    • We also know that reproducible chromosomal abnormalities play an important role in cancer

  • Need analytical methods that use biological information early


Methods
Methods

  • First, updated the annotations of the genes on the microarray

  • Performed separate analyses

    • using genes on individual chromosomes

    • using genes involved in different biological processes

  • Developed ways to assess how well each set of genes classified samples


Quality of annotations
Quality of Annotations

  • Problem:

    • I.M.A.G.E. clone IDs and GenBank accession numbers are archival

    • UniGene clusters, gene names, descriptions, functions, etc., are changeable

  • Solution:

    • Download latest UniGene (build 137) and LocusLink to update annotations


How many genes on the array have good annotations
How many genes on the array have good annotations?

Only trust the 7478 spots where the UniGene clusters match.



How do we determine the functions of genes
How do we determine the functions of genes?

  • UniGene -> LocusLink -> GeneOntology

  • GeneOntology is a structured, hierarchical vocabulary to describe gene functions in three broad areas:

    • biological process (why)

    • molecular function (what)

    • cellular component (where)



Data preprocessing
Data Preprocessing

  • Remove spots with poor annotations and spots with median intensity below the 97th percentile of empty spots.

  • Normalize each array so median log ratio between channels is one

  • Center each gene so mean log ratio across experiments is zero

  • Use (1-correlation)/2 as distance metric


How well does a set of genes distinguish types of cancer
How well does a set of genes distinguish types of cancer?

  • Three methods for assessment:

    • Qualitative (PCA, MDS)

    • Quantitative (PCA + ANOVA)

    • Semi-quantitative (Grading Dendrograms)




How good is a dendrogram

0.6

0.4

0.2

0.0

ovarian.4

ovarian.3

ovarian.5

cns.u251

ovarian.8

nsclc.h23

cns.sf539

cns.sf268

cns.sf295

renal.tk10

cns.snb75

cns.snb19

nsclc.ekvx

colon.ht29

renal.a498

renal.786o

renal.uo31

renal.achn

renal.caki1

nsclc.h460

nsclc.h522

nsclc.h322

nsclc.a549

nsclc.h226

breast.t47d

colon.hct15

colon.km12

renal.sn12c

breast.mcf7

renal.rxf393

nsclc.hop92

nsclc.hop62

prostate.pc3

colon.sw620

breast.bt549

breast.mdan

colon.hct116

breast.hs578t

leukemia.hl60

colon.colo205

ovarian.skov3

ovarian.igrov1

leukemia.k562

colon.hcc2998

prostate.du145

leukemia.molt4

melanoma.m14

breast.unknown

leukemia.ccrfcem

melanoma.loximvi

leukemia.srcl7019

breast.mdamb231

breast.mdamb435

melanoma.skmel2

melanoma.skmel5

melanoma.uacc62

leukemia.rpmi8226

melanoma.skmel28

melanoma.uacc577

melanoma.malme3m

How good is a dendrogram?

  • A = cluster contains all and only one kind of cancer

  • B = all, with extras

  • C = all except one

  • D = all except one, with extras

  • E = all except two

  • F = all except two, with extras



Heterogeneity of different types of cancer
Heterogeneity of different types of cancer

  • Some cancers (colon, leukemia) are fairly easy to distinguish from others

  • Some (breast, lung) are so heterogeneous as to be almost impossible to distinguish

  • Some chromosomes (1, 2, 6, 7, 9, 12, 17) can distinguish many cancers.

  • Some (16, 21) are essentially random


0.6

0.4

0.2

0.0

cns.u251

ovarian.8

ovarian.5

ovarian.3

ovarian.4

nsclc.h23

cns.sf268

cns.sf295

cns.sf539

renal.tk10

cns.snb19

cns.snb75

colon.ht29

nsclc.ekvx

renal.uo31

renal.achn

renal.a498

renal.786o

nsclc.h460

nsclc.h226

renal.caki1

nsclc.h522

nsclc.h322

nsclc.a549

breast.t47d

colon.km12

colon.hct15

renal.sn12c

breast.mcf7

renal.rxf393

nsclc.hop92

nsclc.hop62

prostate.pc3

colon.sw620

breast.bt549

breast.mdan

colon.hct116

breast.hs578t

leukemia.hl60

colon.colo205

ovarian.skov3

ovarian.igrov1

leukemia.k562

colon.hcc2998

prostate.du145

leukemia.molt4

melanoma.m14

breast.unknown

leukemia.ccrfcem

melanoma.loximvi

leukemia.srcl7019

breast.mdamb231

breast.mdamb435

melanoma.skmel2

melanoma.skmel5

melanoma.uacc62

leukemia.rpmi8226

melanoma.skmel28

melanoma.uacc577

melanoma.malme3m


0.6

0.4

0.2

0.0

ovarian.8

cns.u251

ovarian.3

ovarian.5

ovarian.4

nsclc.h23

cns.sf295

cns.sf268

cns.sf539

renal.tk10

cns.snb19

cns.snb75

colon.ht29

nsclc.ekvx

renal.786o

renal.achn

renal.a498

renal.uo31

nsclc.h460

nsclc.h226

renal.caki1

nsclc.h322

nsclc.a549

nsclc.h522

breast.t47d

colon.hct15

colon.km12

renal.sn12c

breast.mcf7

renal.rxf393

nsclc.hop62

nsclc.hop92

colon.sw620

prostate.pc3

breast.bt549

breast.mdan

colon.hct116

breast.hs578t

colon.colo205

leukemia.hl60

ovarian.skov3

ovarian.igrov1

colon.hcc2998

leukemia.k562

prostate.du145

leukemia.molt4

melanoma.m14

breast.unknown

leukemia.ccrfcem

melanoma.loximvi

leukemia.srcl7019

breast.mdamb435

breast.mdamb231

melanoma.skmel2

melanoma.skmel5

melanoma.uacc62

leukemia.rpmi8226

melanoma.skmel28

melanoma.uacc577

melanoma.malme3m


Can cancers be distinguished by genes of one function
Can cancers be distinguished by genes of one function?

  • Table for functional categories looks a lot like the table for chromosomes

  • Some biological process categories (signal transduction, cell proliferation, cell cycle, protein metabolism) can distinguish many types of cancer

  • Others (apoptosis, energy pathways) cannot


0.6

0.4

0.2

0.0

cns.u251

ovarian.8

ovarian.4

ovarian.5

ovarian.3

nsclc.h23

cns.sf539

cns.sf268

cns.sf295

renal.tk10

cns.snb75

cns.snb19

nsclc.ekvx

colon.ht29

renal.786o

renal.achn

renal.uo31

renal.a498

nsclc.a549

nsclc.h322

nsclc.h226

renal.caki1

nsclc.h460

nsclc.h522

breast.t47d

colon.km12

colon.hct15

renal.sn12c

breast.mcf7

renal.rxf393

nsclc.hop92

nsclc.hop62

prostate.pc3

colon.sw620

breast.bt549

breast.mdan

colon.hct116

breast.hs578t

colon.colo205

leukemia.hl60

ovarian.skov3

ovarian.igrov1

colon.hcc2998

leukemia.k562

prostate.du145

leukemia.molt4

melanoma.m14

breast.unknown

leukemia.ccrfcem

melanoma.loximvi

leukemia.srcl7019

breast.mdamb231

breast.mdamb435

melanoma.skmel5

melanoma.skmel2

melanoma.uacc62

leukemia.rpmi8226

melanoma.skmel28

melanoma.uacc577

melanoma.malme3m


0.6

0.4

0.2

0.0

ovarian.4

ovarian.5

ovarian.3

ovarian.8

cns.u251

nsclc.h23

cns.sf539

cns.sf295

cns.sf268

renal.tk10

cns.snb75

cns.snb19

colon.ht29

nsclc.ekvx

renal.a498

renal.786o

renal.achn

renal.uo31

nsclc.h522

renal.caki1

nsclc.h322

nsclc.a549

nsclc.h460

nsclc.h226

breast.t47d

colon.km12

colon.hct15

renal.sn12c

breast.mcf7

renal.rxf393

nsclc.hop62

nsclc.hop92

colon.sw620

prostate.pc3

breast.bt549

breast.mdan

colon.hct116

breast.hs578t

leukemia.hl60

colon.colo205

ovarian.skov3

ovarian.igrov1

leukemia.k562

colon.hcc2998

prostate.du145

leukemia.molt4

melanoma.m14

breast.unknown

leukemia.ccrfcem

melanoma.loximvi

leukemia.srcl7019

breast.mdamb435

breast.mdamb231

melanoma.skmel2

melanoma.skmel5

melanoma.uacc62

leukemia.rpmi8226

melanoma.skmel28

melanoma.uacc577

melanoma.malme3m


0.6

0.4

0.2

0.0

ovarian.3

ovarian.5

cns.u251

ovarian.4

ovarian.8

nsclc.h23

cns.sf295

cns.sf539

cns.sf268

renal.tk10

cns.snb19

cns.snb75

colon.ht29

nsclc.ekvx

renal.uo31

renal.a498

renal.786o

renal.achn

nsclc.h522

nsclc.a549

nsclc.h460

nsclc.h322

renal.caki1

nsclc.h226

breast.t47d

colon.km12

colon.hct15

renal.sn12c

breast.mcf7

renal.rxf393

nsclc.hop62

nsclc.hop92

colon.sw620

prostate.pc3

breast.bt549

breast.mdan

colon.hct116

breast.hs578t

colon.colo205

leukemia.hl60

ovarian.skov3

ovarian.igrov1

colon.hcc2998

leukemia.k562

prostate.du145

leukemia.molt4

melanoma.m14

breast.unknown

leukemia.ccrfcem

melanoma.loximvi

leukemia.srcl7019

breast.mdamb231

breast.mdamb435

melanoma.skmel2

melanoma.skmel5

melanoma.uacc62

leukemia.rpmi8226

melanoma.skmel28

melanoma.uacc577

melanoma.malme3m


0.8

0.6

0.4

0.2

0.0

ovarian.5

ovarian.3

cns.u251

ovarian.4

ovarian.8

nsclc.h23

cns.sf295

cns.sf539

cns.sf268

renal.tk10

cns.snb75

cns.snb19

nsclc.ekvx

colon.ht29

renal.786o

renal.uo31

renal.achn

renal.a498

nsclc.h322

nsclc.h226

nsclc.h522

nsclc.a549

nsclc.h460

renal.caki1

breast.t47d

colon.hct15

colon.km12

renal.sn12c

breast.mcf7

renal.rxf393

nsclc.hop92

nsclc.hop62

colon.sw620

prostate.pc3

breast.bt549

breast.mdan

colon.hct116

breast.hs578t

colon.colo205

leukemia.hl60

ovarian.skov3

ovarian.igrov1

leukemia.k562

colon.hcc2998

prostate.du145

leukemia.molt4

melanoma.m14

breast.unknown

leukemia.ccrfcem

melanoma.loximvi

leukemia.srcl7019

breast.mdamb231

breast.mdamb435

melanoma.skmel2

melanoma.skmel5

melanoma.uacc62

leukemia.rpmi8226

melanoma.skmel28

melanoma.uacc577

melanoma.malme3m


Conclusions i
Conclusions (I)

  • Multiple views into the data provide substantial insight into differences in cancer types and gene sets.

  • Cancer types differ greatly in their degree of heterogeneity, ranging from homogeneous (colon, leukemia) through moderately heterogeneous (renal, melanoma) to extremely heterogeneous (breast and lung).


Conclusions ii
Conclusions (II)

  • Homogeneous cancers exhibit strong identifying signals across most views of the data.

  • There are large difference in the ability of genes of different chromosomes or involved in different biological processes to distinguish cancer types.


Supplementary material
Supplementary Material

Complete results of each analysis by chromosome and by function are available no our web site:

http://www.mdanderson.org

/depts/cancergenomics


ad