1 / 29

Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen

This article explores the use of artificial neural networks for predicting biological features such as secondary structure, signal peptides, surface accessibility, and propeptides. It discusses the similarities between biological and artificial neural networks and highlights the ability to generalize to new data. The article also discusses the encoding of amino acid sequences and the architecture of neural networks for prediction.

broom
Download Presentation

Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Neural Networks Thomas Nordahl Petersen &Morten Nielsen

  2. Use of artificial neural networks • A data-driven method to predict a feature, given a set of training data • In biology input features could be amino acid sequence or nucleotides • Secondary structure prediction • Signal peptide prediction • Surface accessibility • Propeptide prediction C N Signal peptide Propeptide Mature/active protein

  3. Neural network prediction methodshttp://www.cbs.dtu.dk/services/

  4. Pattern recognition

  5. Biological Neural network

  6. Biological neuron structure Terminal Several connections Neuron Synapse

  7. Diversity of interactions in a network enables complex calculations • Similar in biological and artificial systems • Excitatory (+) and inhibitory (-) relations • between compute units 1 fire 0

  8. Transfer of biological principles to artificial neural network algorithms • Non-linear relation between input and output • Massively parallel information processing • Data-driven construction of algorithms • Ability to generalize to new data items

  9. Sparse encoding of amino acid sequence windows

  10. Sparse encoding Inp Neuron 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 AAcid A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 R 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

  11. BLOSUM encoding (Blosum50 matrix) A R N D C Q E G H I L K M F P S T W Y V A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4

  12. Sequence encoding (continued) • Sparse encoding • V:0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 • L:0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 • V.L=0 (unrelated) • Blosum encoding • V: 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 • L:-1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 • V.L = 0.88 (highly related) • V.R = -0.08 (close to unrelated)

  13. I1 I2 I3 Input w1,2 w1,1 w3,1 h1 h2 hidden v1,1 v2,1 O1 output h1  h1  = 1/ (1+e-x) o=H1*v1,1 + H2*v2,1 O1 = (o) Error = O - True

  14. Sigmodial or logistic function

  15. Training and error reduction

  16. Training and error reduction

  17. Training and error reduction Size matters 

  18. Helix Bend Turn Secondary Structure Elements ß-strand

  19. Weights Input Layer I K H E Output Layer E E C H V I I Q A E Hidden Layer Window IKEEHVIIQAEFYLNPDQSGEF….. Neural Network Architecture

  20. Predictions and reliability of a prediction • Normally the best prediction is obtained by averaging • results from several predictions - “wisdom of the crowd • Two types of neural networks • Prediction of features in classes/bins e.g. H, E or C (1,0,0) • Values close to 1 or 0 are more accurate than values close to 1/2 • Prediction of real values e.g. Surface accessibility (0.43) • Reliability of a prediction is more difficult to estimate

  21. Eukaryotic SP & TM Signal peptide cleavage 1523 seq C-terminal end of TM-regions 669 seq

  22. Signal peptide prediction Signal pepdide likeness Cleavage site Combined information

  23. Propeptide prediction Many secretory proteins and peptides are synthesized as inactive precursors that in addition to signal peptide cleavage undergo post-translational processing to become biologically active polypeptides. Precursors are usually cleaved at sites composed of single or paired basic amino acid residues by members of the subtilisin/kexin-like proprotein convertase (PC) family. In mammals, seven members have been identified, with furin being the one first discovered and best characterized. Recently, the involvement of furin in diseases ranging from Alzheimer's disease and cancer to anthrax and Ebola fever has created additional focus on proprotein processing. We have developed a method for prediction of cleavage sites for PCs based on artificial neural networks. Two different types of neural networks have been constructed: a furin-specific network based on experimental results derived from the literature, and a general PC-specific network trained on data from the Swiss-Prot protein database. The method predicts cleavage sites in independent sequences with a sensitivity of 95% for the furin neural network and 62% for the general PC network. Protein Engineering, Design and Selection: 17: 107-112, 2004. General cleavage: R/K-Xn-R/K , n=0, 2, 4, 6 Furin cleavage: R-X-R/K-R

  24. Propeptide prediction Furin cleavage

More Related