1 / 61

Protein structure prediction

Protein structure prediction. Einat Granot Liron Atedgi. Protein folding . Protein folding determined by A ” A sequence Why knowing the folding is importance ? Determine it ’ s functionality Find distant evolutionary relationship Design drugs. Protein structures. Primary structure

dacey
Download Presentation

Protein structure prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein structure prediction Einat Granot Liron Atedgi

  2. Protein folding • Protein folding determined by A”A sequence Why knowing the folding is importance ? • Determine it’s functionality • Find distant evolutionary relationship • Design drugs

  3. Protein structures • Primary structure • Secondary structure • Tertiary structure

  4. Two prediction methods • PSI-PRED– secondary structure prediction based on PSIBLAST • GenTHREADER– tertiary structure prediction Were developed by the group of David T.Jones,University of Warwick

  5. Methods general format Sequence Alignment + Additional data Neuron networks Structure prediction

  6. Neuron networks

  7. Neuron networks Output Numerical inputs Units Why do we call it neuron network ? Every unit performs weighted calculation

  8. Neuron network hidden layer with the increasing number of added layers the mean square error is lower Hidden layer

  9. Neuron networks training • Network connections and weights determined by training process • Training performs by samples of input and expected output. • The learning algorithm is called back propagation

  10. Network training & testing After training we perform testing • Training and testing groups must be chosen very carefully • What problems can arise ? • Insufficient training or testing • Testing group may be biased

  11. Neuron networks is a “black-box” • The specific algorithm ofa working neuron networkis not known • It’s hard to deduce new biological principles about the solved problem

  12. PSI-PRED Secondary structureprediction

  13. Secondary structure prediction • In DSSP – 8 secondary structures categories • In PSI-PRED – were joined into 3:Strand(E), Helix(H) and Coil(C) AA: RLMPHIKRSAIPVNHGQCRWEDNVDERTNCMIQYVLIMRD Pred: CCCCCHHHCCCCCCEEEEEECCCCCCHHHHEEEEEECCCC

  14. PSI-PRED sequence alignment (Find homologous) Create protein profile Insert to first neuron network Insert to second neuron network Final prediction

  15. sequence alignment • Finding homologous for target protein using PSI-BLAST Reminder … ? What is PSI-BLAST…? Position Specific Iterated Blast,giving output to PSSM.

  16. PSI-BLAST Pros & Cons Pros : • Sensitive to distant homologous • Reliable • Accessible from every workstation Cons : • Sensitive to distant homologous - Result might be biased • Sensitive to repetitive sequences

  17. Solving PSI-BLAST problems • A special DB of 340,000 sequences was constructed for PSI-PRED • This DB contains only unique and unrepetitive sequences

  18. PSI-PRED sequence alignment (Find homologous) Create protein profile Insert to first neuron network Insert to second neuron network Final prediction

  19. Create protein profile • PSI-PRED uses the PSSM from PSI-BLAST produced after 3 iteration • This matrix is processed by transformation f(x) = , so the final values are between 0 to 1

  20. PSSM – Output of PSI-BLAST Transformation

  21. Create protein profile • The matrix size is M x 20, when M is the sequence length • Addition column is added which defined the N/C terminus -> M x 21 matrix

  22. PSI-PRED sequence alignment (Find homologous) Create protein profile Insert to first neuron network Insert to second neuron network Final prediction

  23. Networks training & testing • 187 proteins were selected according to CATH and PSI-BLAST • CATH filters proteins according to their folding domains configuration (T-level) • This considered to be a strict selection

  24. First neuron network Every time, a sequence of 15 A”A long is inserted into the first network The output is a matrix 15 x 3

  25. PSI-PRED sequence alignment (Find homologous) Create protein profile Insert to first neuron network Insert to second neuron network Final prediction

  26. Second neuron network The input for the 2nd network is the output from the 1st one Again, another column is added, indicates the N/C terminus

  27. Why do we need a second network? Let’s examine a possible prediction from the 1st network… What is the problem with this prediction ? Seq VLFLNDNLDDVVIGRPKRTYTAITL Pred EEEECCCCHHHCCCHCCCEEEECC A single A”A helix does not exist The 2nd network maintains the coherency between adjacent A”A and improves the accuracy

  28. PSI-PRED sequence alignment (Find homologous) Create protein profile Insert to first neuron network Insert to second neuron network Final prediction

  29. Final prediction Image of prediction Degree Of confidence Target sequence Secondary structure

  30. PSI-PRED evaluation • CASP– Critical Assessment of technique for protein Structure Prediction experiments • At CASP3 PSI-PRED achieved the best results from all other methods participated

  31. PSI-PRED evaluation Q3 average : PSI-PRED - 76.3% JPRED – 72.4% DSC - 67.3% Q3 score – percentage of A”A predicted correctly

  32. Reasons for success • The use of PSI-BLAST • More sensitive (iterative algorithm) • More accurate (pairwise local alignments) • Usage of neuron networks • Strict selection for training & testing

  33. Possible improvements • Larger data bases (training & alignment) • Combinations with other methods (JPRED) • Predict more than 3 secondary structure

  34. Bring out the food…

  35. GenTHREADER Tertiary structure Prediction

  36. Threading methods • Trying to thread a target A”A sequence on a template 3D structure M Q S N I L D V R E R A Q T V L C N K

  37. Templates collection • Target sequence is compared against a collection of sequences with known folding • The collection was taken from Brookhaven Protein Data Bank and includes unique sequences

  38. GenTHREADER Sequence alignment Calculate threading potential Insert to neuron network Final prediction

  39. Sequence alignment • The target sequence is aligned against each of the templates twice: • Target profile against template sequence • Target sequence against template profile • The best result is taken

  40. Creating a profile Steps for creating a profile : • Alignment against OWL DB(A DB for coding sequences) • Selection of sequences with E-Value lower than 0.01 • Constructing a profile using BLOSUM50

  41. Creating a profile A L M P H I K R S A I P V N H G Y V I M Q C R W E D N S T K V

  42. GenTHREADER Sequence alignment Calculate threading potential Insert to neuron network Final prediction

  43. Calculate threading potential Threading potential includes : • pairwise potential • solvation potential

  44. Pairwise potential • Potential for interaction between two A”A • Considerate analysis of known structure and favorable energy configuration • Lower pairwise potential indicates a favorable state

  45. Solvation potential • Calculated per A”A and proportional to its degree of burial • Degree of burial (DOB)– The num of other A”A located in a radius of 10Å • Hydrophobic acids - a high DOB is preferred • Hydrophilic acids - a low DOB is preferred

  46. GenTHREADER Sequence alignment Calculate threading potential Insert to neuron network Final prediction

  47. Insert to neuron network • Prediction is very complex therefore a neuron network is used

  48. Neuron network • Again, the 6 input parameters were converted to values between 0 – 1 using the function f(x) = • The output is a value between 0 -1 showing the confidence of the match

  49. Network training & testing • The network was trained using pairs of proteins with known folding patterns • Again the training and testing sets were separated to avoid bias

  50. GenTHREADER Sequence alignment Calculate threading potential Insert to neuron network Final prediction

More Related