predicting patterns of biological performance using chemical substructure features l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Predicting patterns of biological performance using chemical substructure features PowerPoint Presentation
Download Presentation
Predicting patterns of biological performance using chemical substructure features

Loading in 2 Seconds...

play fullscreen
1 / 16

Predicting patterns of biological performance using chemical substructure features - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Predicting patterns of biological performance using chemical substructure features. Diego Borges-Rivera 08/04/08. 10111010001010101000101101. 01010001011011. Introduction. cheminformatics – allow us to computationally describe similarity

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Predicting patterns of biological performance using chemical substructure features' - marv


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
predicting patterns of biological performance using chemical substructure features

Predicting patterns of biological performance using chemical substructure features

Diego Borges-Rivera

08/04/08

introduction

10111010001010101000101101

01010001011011

Introduction
  • cheminformatics – allow us to computationally describe similarity
  • synthetic chemists – describe through visual inspection
  • we will describe compounds by the presence of chemical substructures
  • we will attempt to identify sets of substructures that predict biological performance
previous work

10

20

30

40

50

60

substructures

Previous work
  • Clemons/Kahne/Wagneret al. -- disaccharide profiling in multiple cell states
  • found sets of substructures relevant to biological activity patterns
  • substructures highly specific to disaccharides
biological performance profile
Biological performance profile
  • 400 compounds, 8 assays in duplicate
  • tested for cell proliferation in 8 different cell lines
  • class labels are active (A) or inactive (I)

active compound

what are fingerprints
What are fingerprints?
  • compound collection fed into commercial software
  • each substructure = 1 bit
  • the fingerprint shows which substructures are present

substructure #7017

substructure #886

substructure #1725

overview of cheminformatic methods
Overview of cheminformatic methods
  • produced fingerprints  7700 total substructures
  • filtered set
  • left 2166 substructures
overview of computational methods
Overview of computational methods
  • two steps independent of each other

feature (substructure) selection to find predictive subsets

evaluate methods for predictive value

relieff substructure selection
ReliefF: substructure selection

Top 5

-1

0

+1

2166 weights

Bottom 5

similarity between compounds
Similarity between compounds
  • similarity between two fingerprints
  • Tanimoto coefficient
  • this is used twice:
    • in ReliefF
    • in knn

Example:

Compound a: 0 0 1

Compound b: 1 0 1

Tanimoto coefficient = 1 / 2 = .5

cross validation predictive accuracy

test set

training set

Cross-validation: predictive accuracy
  • 10 subsets
  • test set: one of the subsets
  • training set: the remaining subsets
picking parameters for methods
Picking parameters for methods
  • which parameters produce the best predictive accuracies
    • number of neighbors used in ReliefF {1, 2, 4, etc}
    • number of neighbors used in knn {1, 2, 4, etc}
    • number of ReliefF substructures used to predict classes in knn {1, 20, 100, etc}
picking number of substructures

1.0

.9

.8

.7

.6

.5

.4

.3

.2

.1

0.0

predictive accuracy

1 20 all

number of substructures used to predict

Picking number of substructures
future work
Future work
  • multi-class
  • different feature selection
acknowledgements
Acknowledgements

Computational Chemical Biology

Joshua Gilbert

Paul Clemons

Hyman Carrinski

Summer Research Program in GenomicsShawna Young

Lucia Vielma

Maura Silverstein