1 / 7

Protein Prediction II Exercise

Protein Prediction II Exercise. Exercise – Project Layout. G eneral remarks – recap: Report 60pts, Exam 40 pts , weekly presentations of each group, one bad presentation allowed, groups of 3-4 students Contact & Questions: pp2ex@rostlab.org only!

hume
Download Presentation

Protein Prediction II Exercise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Prediction II Exercise

  2. Exercise – Project Layout • General remarks – recap: Report 60pts, Exam 40 pts, weekly presentations of each group, one bad presentation allowed, groups of 3-4 students • Contact & Questions: pp2ex@rostlab.orgonly! • The exercise is taken from the CAFA competition • Prediction of HPO terms • HPO: Human phenotype ontology

  3. Terms – Definitions and Explanations • Amino acids (aa): Building blocks for proteins, 20 different aa are found in proteins • Protein sequence: String of characters representing a sequence of amino acids (string from a 20 letter alphabet) • The protein sequence defines the protein structure and the protein function (within some limits) • Proteins sequences are stored in large publicly available repositories • One of the most well known repositories is UniProt (http://www.uniprot.org/) and its section Swiss-Prot • Besides the sequence these databases hold additional information about the protein, too

  4. Ontology (in information science) • Ontology: An ontology represents knowledge as a set of concepts within a domain, using a shard vocabulary to denote types, properties and interrelationships of those concepts • Human Phenotype ontology (HPO): Set of concepts describing human appearing (shape, health, a.s.f.) • HPO concepts are hierarchically ordered, i.e. there is a “is-a” relation ship. • they are arranged in a tree-like fashion

  5. Our competition • Proteins are annotated (described) with experimentally determined information • As time goes by: Proteins are associated with information about experimentally confirmed effects on the human phenotype • The associated term are taken form the Human Phenotype ontology • Experimental determination is slow and expensive • => we try to predict associated HPO terms for the yet un-annotated

  6. More formal steps • Find a function that assigns a set of HPO terms T to a sequence s so that the number of false assignment is minimal and the number of true assignments is maximal • Remember: The true evaluation is done after submission when so far not annotated sequences get experimentally determined annotations

  7. Tasks • Download files from www.rostlab.org/~richter/pp2_files.tgz • Get familiar with the provided files • Especially the column names (look for at Uniprot and HPO) • Read: http://biofunctionprediction.org/sites/default/files/IntroductionCAFA_pedja.pdf

More Related