leveraging genetic algorithm and neural networks in automated protein crystal recognition n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Leveraging Genetic Algorithm and Neural Networks in Automated Protein Crystal Recognition PowerPoint Presentation
Download Presentation
Leveraging Genetic Algorithm and Neural Networks in Automated Protein Crystal Recognition

Loading in 2 Seconds...

play fullscreen
1 / 17

Leveraging Genetic Algorithm and Neural Networks in Automated Protein Crystal Recognition - PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on

Leveraging Genetic Algorithm and Neural Networks in Automated Protein Crystal Recognition. Ming Jack Po and Andrew Laine Department of Biomedical Engineering Columbia University New York, NY USA August 22 nd 2008 IEEE EMBS Annual Conference 2008, Vancouver, Canada. Agenda. Introduction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Leveraging Genetic Algorithm and Neural Networks in Automated Protein Crystal Recognition' - chiku


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
leveraging genetic algorithm and neural networks in automated protein crystal recognition

Leveraging Genetic Algorithm and Neural Networks in Automated Protein Crystal Recognition

Ming Jack Po and Andrew Laine

Department of Biomedical Engineering

Columbia University

New York, NY USA

August 22nd 2008

IEEE EMBS Annual Conference 2008, Vancouver, Canada

agenda
Agenda
  • Introduction
  • Current Algorithm
  • Future Direction
protein structure determination currently relies on x ray crystallography
Protein Structure Determination currently relies on X-ray crystallography
  • The production of protein crystals is crucial to protein structure determination via x-ray crystallography.
  • In 2000, the US National Institute of General Medical Sciences of the National Institutes of Health funded the Protein Structure Initiative (PSI), a ten-year project to uncover the three-dimensional shapes of a wide range of proteins.1
  • Unfortunately, there are currently no reliable methodology to predict environments that would lead to protein crystallization.
    • High throughput experiments with varying crystallization parameters are being performed in order to “brute force” the problem.

1) http://www.nature.com/nmeth/journal/v5/n2/full/nmeth0208-203.html

htp protein crystallization screening is currently the bottleneck in protein crystal discovery
HTP Protein Crystallization Screening is currently the bottleneck in protein crystal discovery
  • Extensive backlog of images have been developed
    • 1536 Wells / Plate * 5K Plates * 6 time points ~ 46M Images*
  • Manual Inspection of images from HTP experiments is not practical
    • Qualified and trained crystallographers are in short supply.
    • Crystallographers cannot keep up with the speed of robotic systems used in production experiments.
  • Automated Protein Crystallization Screening is needed to tackle both previous existing images and future images

* Feb 2002 to October 2006 only

several key challenges have to be overcome for automated protein crystal recognition
Several key challenges have to be overcome for automated protein crystal recognition
  • Arbitrary geometric orientation and structure of crystals
  • Presence of organic matter
  • Non-uniform lighting conditions
  • Irregular droplet boundaries and size.

Hits

our solution to the problem neural networks
Our Solution to the problem – Neural Networks
  • Advantages
    • Allows for incremental learning
    • Can deal with the seemingly arbitrary geometric orientation and structure of crystals
    • Fast classification speed once neural net has been trained.
  • Disadvantages
    • Black-box methodology
    • Identification of good feature set necessary for good performance
    • Need sufficiently large training set to be robust
training database has been compiled by hwi expert crystallographers
Training database has been compiled by HWI expert crystallographers
  • Dr. George DeTitta et al. at HWI (Buffalo, NY) has compiled a data set of 73,632 manually classified images.
    • 3 independent crystallographers each categorized 75,000 images into one of the above categories.
    • 75,632 of these images have consensus between at least two crystallographers. Only these images were used for validation and training.
agenda1
Agenda
  • Introduction
  • Current Algorithm
  • Future Direction
pre processing steps

Image Normalization

MPGA 1 – ROI Detection

MPGA 2 – Area of Crystals

Linearity Detector

Laplacian Pyramidal Decomposition

Feature Extraction

Pre-Processing Steps
  • Images are converted to Sobel edge sets and single edge points are removed.
  • Multi-population Genetic Algorithm is performed on the image to find ellipsoidal Region of Interest (elaborated upon on the next few slides).
multiple population genetic algorithm
Multiple Population Genetic Algorithm
  • Randomly select 100 “chromosomes” of 5 points.
  • Fitness based on similarity and distance metric.
    • Similarity = Distance =
  • Evolution proceeds through selection and diversification.
    • Optimize for high fitness score based on a combination of similarity and distance scores.
    • Selection eliminates low fit populations.
    • Diversification is realized through crossover, mutation and clustering.
  • Significant speed and accuracy improvements vs. Randomized Hough Transforms.
    • Processing time dropped 50% to ~ 10 seconds for ROI detection.

Yao, J., Kharma, N., and Grogono, P, "A multi-population genetic algorithm for robust and fast ellipse detection", Pattern Analysis & Applications, Volume 8, Issue 1 - 2, Sep 2005, pp. 149-162

ellipsoidal geometry
Ellipsoidal Geometry
  • The equation of a conic through 5 points is
    • This conic is an ellipse iff
  • With 5 (x,y) pairs, it is possible to solve for parameters (a,h,b,g,f), and thus in turn solve for the physically related ellipsoidal parameters to the right.
mpga is run twice due to variations in fitness criteria
MPGA is run twice due to variations in fitness criteria
  • Similarity = Distance =
  • Multiple population genetic algorithm allows for significantly faster and more robust search results than Randomized Hough Transform.
  • MPGA 1 – ROI Detection
    • Heavy distance penalties for points that do not line up exactly on the perimeter of the projected ellipse.
    • looks for r_maj close to r_min (more circular shapes – droplets, well).
    • r_maj and r_min are bounded at empirically determined values.
  • MPGA 2 – Crystal Detection
    • Only run inside ROI
    • Heavy distance penalties only for far away points, but allow for ellipsoidal shape to be more “flexible”.
    • Looks for r_maj far from r_min (more elongated ellipsoidal – closer to crystals).
    • r_maj and r_min are bounded by no more than ½ ROI’s r_maj and r_min.
crystal recognition code execution speed
Crystal Recognition Code Execution Speed

* Not scale invariant, and done on original scale

performance for current algorithm
Performance for current algorithm
  • Performance metrics derived using 10% randomized holdout averaged over 3 iterations.
  • Current false negative rate ~ 10%.
    • Working to reduce the number to below 5% at minimum before putting it into production.*
    • Current false negatives are total misses, so not possible to correct through thresholding. There is also no intuitive visual correlation.
  • Current true negative rate ~ 99%.

Conversations with John Hunt

agenda2
Agenda
  • Introduction
  • Current Algorithm
  • Future Direction
future directions
Future Directions
  • Incremental Neural Network training has been implemented in Matlab.
    • Allows us learn new crystal shapes & percipatate. Negligible performance hit.
  • Porting the simulation portion of the network classifier onto C++.
    • The current program consists of
      • Preprocessing done in C++ inside the IT++ framework
      • Neural network toolbox in Matlab
  • Currently working on making new training data sets.
    • Selectively biasing the training data set in order to increase accuracy.
  • Expansion of feature sets in order to improve false negative rates.

Bishop, C. Neural Networks for Pattern Recognition.

acknowledgements
Acknowledgements
  • This project is part of the Northeast Structural Genomics Consortium (NESG) sponsored by the NIH for evaluating the feasibility, costs, economics of scale, and value of structural genomics.
  • Protein crystal images acquired from Hauptman-Woodward Medical Research Institute, Buffalo, NY.