Robust methodologies for partition clustering
This presentation is the property of its rightful owner.
Sponsored Links
1 / 46

Robust methodologies for partition clustering PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

Robust methodologies for partition clustering. Paulo Lisboa Terence Etchells, Ian Jarman and Simon Chambers. Overview. Partition clustering - critique Decomposition of the covariance matrix Landscape mapping of cluster solutions

Download Presentation

Robust methodologies for partition clustering

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Robust methodologies for partition clustering

Robust methodologies for partition clustering

Paulo LisboaTerence Etchells, Ian Jarman and Simon Chambers


Overview

Overview

  • Partition clustering - critique

  • Decomposition of the covariance matrix

  • Landscape mapping of cluster solutions

  • Validation for two synthetic data sets and metabolic sub-typing


Bioinformatics nottingham tenovous primary breast carcinoma series

BioinformaticsNottingham Tenovous Primary Breast Carcinoma Series

Consecutive series of 1,944 cases of primary operable invasive breast cancer(n=1,076 with all markers present)

Patients presenting during 1986-98

Protein expression comprising

25 immunohistochemical markers related to tumour malignancyderived through high-throughput protein expression using TMA

Abd El-Rehim et al, Int J Cancer, 116, 340-350, 2005.


Partition clustering relevance to bioinformatics

Partition clustering – relevance to bioinformatics

p53

CK 5/6

C-erbB-2

BRCA1

ER

PgR


Partition clustering open issues

Partition clustering –open issues

K-means

i. Assume #K

ii. Initialise #N ?

iii. Sort by optimality ?

iv. Select best for #K ?

v. Select #K(s) ?

vi. Single cluster or ensemble ?

  • Identify a suitable algorithm:

  • Model-based or model-free ?

  • Hierarchical, K-means, PAM ?

  • Return {Sa,...,Sz} solutions

  • Validate & interpret each solution


Separation index decomposition of the scatter matrix

Separation index:Decomposition of the scatter matrix

SW1

SW2

SB

  • Scatter matrices


Separation index decomposition of the scatter matrix1

Separation index:Decomposition of the scatter matrix

SW1

SW2

SB

  • Invariant separation matrix and index


N b if s t 0 project onto subspace of cohort means

N.B. If |ST|=0 → Project onto subspace of cohort means

a1

a3

a2


Theorem is invariant to dimensionality reduction under mahalanobis rotations

Theorem: is invariant to dimensionality reduction under Mahalanobis rotations

~

a1

~

a3

~

a2


K means clustering

K-means clustering


Adaptive resonance theory art clustering

Adaptive Resonance Theory (ART) clustering


Adaptive resonance theory art clustering1

Adaptive Resonance Theory (ART) clustering


Concordance measure

Concordance measure


Optimality principle

Optimality principle

i. N initialisations

ii. Sort by J

iii. Select top p%

iv. Calculate pairwise CV

v. Retain med(CV)

vi.Plot (J, med_CV)

  • Reproducibility with

  • Best Separation - max(J)

  • Best Concordance – max(CV)

  • under repeated initialisations


Synthetic data 10 cohorts

Synthetic data (10 cohorts)


Synthetic data 10 cohorts1

Synthetic data (10 cohorts)


Synthetic data 10 cohorts2

Synthetic data (10 cohorts)


Synthetic data mixing structure sammon map

Synthetic data – mixing structure (Sammon Map)


Synthetic data visualisation in data space

Synthetic data – Visualisation in data space


Synthetic data 10 cohorts3

Synthetic data (10 cohorts)

10

2

9

85

58

100

97

66

45

6

38

1

5

113

5

52

55

18

133

48

59

44

6

42

177

89

8

118

7

24

84

3

3

42

118

78

92

4

124

63

4

88

112

3

208

93

6

79

1

55

189

150

127

24

23

69

101

1

1

189

3

59

54

219

117

7

137

177

7

238

5

21

49

2

172

238

212

60

2

2

143

335

5

183

161

978

294

238

2

47

192

738

2

142

2

185

8

388

738

173

29

153

94

1

455

8

190

4

28

177

1

170

98

181

455

28

192

177

9

98

2

361

4

1

164

181

177

383

100

5

169

6

97

190

144

2

173

1

161

3

176

171

190

97

176

19

96

4

5

160

96

4

3

132

1

96

129

3

129

126

132

127

97

97

3

6

7

4

97

97

95

95

97

95

96


Synthetic data 10 cohorts4

Synthetic data (10 cohorts)

Max J

SeCo

Max Cv


Bioinformatics nottingham tenovous primary breast carcinoma series1

BioinformaticsNottingham Tenovous Primary Breast Carcinoma Series

Consecutive series of 1,944 cases of primary operable invasive breast cancer(n=1,076 with all markers present)

Patients presenting during 1986-98

Protein expression comprising

25 immunohistochemical markers related to tumour malignancyderived through high-throughput protein expression using TMA

Abd El-Rehim et al, Int J Cancer, 116, 340-350, 2005.


Marginal distributions

Marginal distributions


Landscape map seco

Landscape map (SeCo)


Stability index cv

Stability index (Cv)


Landscape map seco1

Landscape map (SeCo)


Cluster hierarchy 1

Cluster hierarchy (1)

C5, 179

159

C7, 186

160

C2, 106

C4, 230

105

206

67

C1, 266

C5, 120

105

240

44

C3, 108

C2, 109

C4, 430

107

407

107

112

C4, 116

C3, 459

C3, 130

458

114

C6, 209

C4, 94

C1, 781

C3, 285

202

22

246

322

62

94

C1, 96

C2, 373

C5, 205

103

201

93

24

51

65

24

C2, 209

C1, 121

C2, 295

C8, 106

102

105

112

244

C1, 244

C2, 198

C6, 119

208

26

116

219

79

C6, 174

C1, 152

C3, 215

172

186

C2, 234

169

C4, 277

44

51

91

C1, 142

C5, 192

101

127

C3, 205

94

C7, 167


Cluster hierarchy 2

Cluster hierarchy (2)

C1, 177

164

C3, 185

172

C2, 131

C5, 184

120

167

C5, 237

C4, 189

15

183

201

46

65

C8, 183

C4, 209

C1, 338

300

134

161

116

228

C2, 249

C3, 459

C1, 241

458

155

125

78

105

C3, 246

C3, 163

C1, 781

C2, 365

209

322

151

C6, 121

C2, 373

C4, 252

240

114

91

102

51

124

C3, 238

C1, 119

C2, 295

C7, 106

19

243

C1, 244

C2, 229

C5, 104

228

229

116

93

99

101

C5, 97

C4, 135

C6, 120

113

117

C7, 138

17

C3, 117

116

136

198

C6, 126

C2, 198

20

62

C1, 90

66

C4, 93


Solution a

Solution A


Solution a1

Solution A


Solution b

Solution B


Solution a2

Solution A


Sub type profiling

Sub-type profiling

Clusters A

Clusters B

Luminal New 2

Luminal N


Sub type profiling1

Sub-type profiling

Clusters A

Clusters B

Luminal A

HER2


Sub type profiling2

Sub-type profiling

Clusters A

Clusters B

Basal p53 -

Basal muc1 +

Basal p53 +

Basal muc1 -


Consistency with consensus clustering

Consistency with consensus clustering


Molecular sub typing

Molecular sub-typing


Molecular sub typing1

Molecular sub-typing


Summary

Summary

  • Partition clustering - critique

  • Decomposition of the covariance matrix

  • Landscape mapping of cluster solutions

  • Validation for two synthetic data sets and metabolic sub-typing


Ferrara data n 633

Ferrara data (n=633)


Ferrara data n 6331

Ferrara data (n=633)


Ferrara data n 6332

Ferrara data (n=633)


Ferrara data n 6333

Ferrara data (n=633)

JMU Cluster 1/5

JMU Cluster 2/5

JMU Cluster 4/5

JMU Cluster 3/5

JMU Cluster 5/5


Robust methodologies for partition clustering

Ferrara data (n=633)


Robust methodologies for partition clustering

Ferrara data (n=633)


  • Login