1 / 23

Biological sequences and SO

Biological sequences and SO. Karen Eilbeck University of Utah Towards Interoperability of Biomedical Ontologies 27.03.2007 - 30.03.2007. SO categorizes the kinds of, parts of and properties of sequence. How SO is organized - from features and qualities to cross-products

hafwen
Download Presentation

Biological sequences and SO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biological sequences and SO Karen Eilbeck University of Utah Towards Interoperability of Biomedical Ontologies 27.03.2007 - 30.03.2007

  2. SO categorizes the kinds of, parts of and properties of sequence. • How SO is organized - from features and qualities to cross-products • Let’s interoperate - questions that need to be answered.

  3. From nucleotide molecule to sequence

  4. What does sequence look like? • >3R:21066761,21072884 • tcaacgaaaactcggaggccatttacaaggagacggccaaagcaatcgaccgatcctttggcaaactttacctgggcgtcgtcaaaggtgtgttctccaaactgccgtatgccaagttttttgcggatgaatcgtgagttagctcttcaaagtgggcagagtccacataaagatactagatcatgttgtttgcgtactgacagatctaagttttgaggctagcaatcatcattaggtttaatggagttcgtgtttcgcgtttgaaagtgagaacacaagtaactactattaagccatctcagctaaataatctgtaagtgttgtgtggcaataaaagttacatatatgtagttagcacattgtaaattatttataagtgatacaaagaattctgtaaaataccataaaaacatttaaaactatgacccattattaattaagttacagtgagtggaaccctatagatcaggttgatccaaaagatgaaggaccgcctgaaagtatgtgttattcgcgcgcggagattccgaaaggcagggaatatctgtaactggaaaaggcagttacattaaaaaaaagcttgataaacaatctttgttgacttagccattaattagacgttgaaacgggaattaatgtgcgttttggggaaggccgatccaatttgcatatatcgagcaaattgcacccaaaacgcgattaggagcgattgaatgggacggggtcgatgtctggcttgggagttgggaacttgggagttcattaaagggaatcgtaaaatgaattcgccggctataaccagccactttgccatacagccagcctgccggtttcggtttataatccatttaactgactcaactgccaaacggtctaaagtcaaattctgtgcggctgaaacgcaaaagcggtttacggcaacaaaaacatgatacatttcaattgacgaaagtgactatataagtgttaacgcccgcggctaatggatcagtactcgattacgttcgccgccagcaattatggagctaactctcgccctcgtcctgctgttcggatgtgcgtccacctacggacacgcctccgatttccgtgagttgcacacaccccctttccgtaatatataatgtttatgtatatttcacaatcgccacgccccaatttacagcctcgggcatcgagaggtgcgccataatggacgagcagtgcctggaggacagggtgaacttcgtgctcaggaactacgccaaaagcggtatcaaggagctgggcttgatccccctcgatccgctgcacgtcaagaagttcaaaatcggacgcaatccgcacagtccggtcaacatcgatctcagcttccacgagatggacatcttgggtctgcatcagggagttgcgaagcgagtgaggtgagtggatcctcatttcattttatgatcgctctgcttactacatttttctgtttcggattttagcggattcacaagggatctcagccgctccatcgagctggtcatggaagttccagaaataggagtcagaggaccctactcggtggacggaagaatactcattctgcccatcaccggaaatggcattgctgacatacgcctcagtaagatttgcctcccacagctttgaaatctaaaatttttaatgtgtttctggaattcgcagctagaacaaaggtacgtgcacagatcaaattgaagcgcgtctccaagggcgatcatcaaacctacgccgaggtgatgaacataaaggttgagctggatccatcccatgtgacctaccagctggaaaatctgttcaacggccagaaagatctcagcgagaacatgcacgcgcttatcaatgagaactggaaggacatcttcaatgaactgaaaccgggcattggcgaggccttcggactgatagccaagtcggtggtggacaggatctttggcaaactgccgctcgaacagctctttgtagtctaagaccttagtacaaacaccctaattagtccaaacacaaatcgtaaatatttatttgactttcaaaatacaaatgcaaagcaaataagaaaaactggtaagttcctcatacaaataaaacgtagttgcaaaataaattcaggcactaaaggatttcttatttctaaagtttaagtaaaatacagatttataaaagtgaaaagcaaacacatttgtagttttgccaaataaatgtaaacacagttaaacttatataaatttgttatcaatctcaaaacaggggtaataaatcgttttcattttgattttgtttgtatcgatttgataaatatttttaaaaagcttatataagcttattcacgaatacaaatatggagtccgcactattggacaaatatatcttacactatagatatgtttactttacgaaattattgcttcccatgagaagagtagcttttttaaattgcatatttgctgtcattcttttatcgatgtgcacagcattagtttagcttctgaagcgaggtacacgtccggtgtgacgaggtggcgatgatggcacttcctcggtctcctcaaactcctcctccgacgctgacggctgctccatgctcactgcgccaagcttctgtggcccgcaaatggcgacaacactgggcgggaaataggtctctggattggcgatcagagcggagaggaactcatgctgctccagaacgcaatagatcggacctcccagtttgtatctctgggcgatgaacagatcctccgcactgattccacgcgcttcggccgcccttagctcggcaccggacaaaaggatggcccgctcgttcaacgacttgtggctttcggtcat

  5. What does the sequence mean? cgtaaaactttggccaggcgctctccggtctggtctggtctggtctgttcgtactgctcc gctctctttttccctcaaatgggccaaaaggaggcgacgtcgctgccgcggtcgcagcgc tgccgctgccgcagctaccgccgctgcagacgtcgcttacctgccgaagaagaagagcag cgttcAGTCGCGCAGCGCACGTCGTCCAACGCACACACGCTCAGAGACACACCGACACGC ACACAGATACAGATACGTTGAGTCGCCGCCGCCGCGAAAGATACCAGATACTATCTGCCA GATACGAAGAGTTGGGCCCTATAGTCGTCCCGCTTGCACCCATGGCCGCCTGAGTgtgag tgcaagagcggattggattgagtggaatacgaacgcgattccattccggtccacatccga acccacatccgaatcctatccgaagccacctaacccttgccgaccagcgcttaacccatg tcttcgtctttgtctcgtttcagAGTTGCAAGCGACCATGCGCGCATGGCTTCTACTCCT CGCAGTGCTGGCGACTTTTCAAACGATTGTTCGAGTTGCTAGCACCGAGGATATATCCCA GAGATTCATCGCCGCCATAGCGCCCGTTGCCGCTCATATTCCGCTGGCATCAGCATCAGG ATCAGGATCAGGACGATCTGGATCTAGATCGGTAGGAGCCTCGACCAGCACAGCATTAGC AAAAGCATTTAATCCATTCAGCGAGCCCGCCTCGTTCAGTGATAGTGATAAAAGCCATCG GAGTAAAACAAACAAAAAACCTAGCAAAAGTGACGCGAACCGACAGTTCAACGAAGTGCA TAAGCCAAGAACAGACCAATTAGAAAATTCCAAAAATAAGTCTAAACAATTAGTTAATAA ACCCAACCACAACAAAATGGCTGTCAAGGAGCAGAGGAGCCACCACAAGAAGAGCCACCA CCATCGCAGCCACCAGCCAAAGCAGGCCAGTGCATCCACAGAATCTCATCAATCCTCGTC GATTGAATCAATCTTCGTGGAGGAGCCGACGCTGGTGCTCGACCGCGAGGTGGCCTCCAT CAACGTGCCCGCCAACGCCAAGGCCATCATCGCCGAGCAGGGCCCGTCCACCTACAGCAA GGAGGCGCTCATCAAGGACAAGCTGAAGCCAGACCCCTCCACTCTAGTCGAGATCGAGAA GAGCCTGCTCTCGCTGTTCAACATGAAGCGGCCGCCCAAGATCGACCGCTCCAAGATCAT CATCCCCGAGCCGATGAAGAAGCTCTACGCCGAGATCATGGGCCACGAGCTCGACTCGGT CAACATCCCCAAGCCGGGTCTGCTGACCAAGTCGGCCAACACAGTGCGAAGTTTTACACA CAAAGgtgagtctccttttcaaatgtttaaaaccagaactagaaaaccggaagcggatat agaaaaactttgcattctaatggtattacttttaatacagcgagtatgattccttttgga 5’ UTR 1st exon 5’ intron UTR part of 2nd exon Start codon Coding part of 2nd exon intron

  6. We can make pictures to help us understand the sequence. • 5 alternate transcripts of the gene decapentaplegic (dpp)

  7. SO categorizes the kinds of, parts of and properties of sequence. • How SO is organized - from features and qualities to cross-products • Let’s interoperate - questions that need to be answered.

  8. Structure of SO • Sequence features • 662 terms • have coordinates • Examples: exon, 5’UTR, promoter

  9. Structure of SO part 2 • Sequence attributes • 358 terms • describe sequence features • Examples: imprinted, trans-spliced, fragment • Consequence of mutation - describe mutations such as SNPs . • Example: mutation_causes_exon_loss • (SNP isa sequence_variant synonym = mutation) • Chromosome variation - describes weird chromosomes • Example: interchromosomal_transposition

  10. Cross product terms • 156 terms • A new SO term can be composed from a feature and an attribute • What makes a silenced_gene a special kind of gene, is that it has the quality ‘silenced’. gene silenced silenced_gene genus differentiae

  11. SO categorizes the kinds of, parts of and properties of sequence. • How SO is organized - from features and qualities to cross-products • Let’s interoperate - questions that need to be answered.

  12. Lots of potential to interoperate with SO GO ATP binding eye pigment precursor transporter activity permease activity PATO Phenotype qualities [Term] id: PATO:0000952 name: brown is_a: PATO:0000014 ! color GO Can GO annotators use SO terms to annotate cellular locations? SO Annotation of scarlet gene RNA Ontology MGED Ontology Protein Ontology

  13. Question from the GO group. • Should GO annotators locate gene products to SO terms? An annotator wanted to further specify a protein with DNA binding function. • Examples, promotor, intergenic_region. • GOC decided not to use SO terms directly in the GO annotations, but allow them in to be used as “contextual information”.

  14. Aim: Work out how SO fits into grand scheme of things… entity Exists in 4 dimensions continuant occurant role function quality independent entity dependent entity objects aggregates fiat parts site boundary

  15. Questions we need to be able to answer • Generally, • What kind of thing is an instance of SO? • Specifically, • What is a gene? • What is a genotype? • What is an allele?

  16. Things people have said about sequence • Sequence is a molecular thing • Sequence is a mathematical thing • Sequence is abstract

  17. Is a SO sequence a molecule? The intron sequence has relationships that relate it to other sequences. It is part of a gene, and adjacent to exon sequences. • GATACGAAGAGTTGGGCCCTAGTCGTCCCGCTTGCACCATGCCGCCTGAGTgtgagtgcaagagcggattggattgatggaatacgaacgcgattccattccggtccacatccgaacccacatccgaatcctatccgaagccacctaacccttgccgaccagcgcttaacccatgtcttcgtctttgtctcgtttcagAGTTGCAAGCGACCATGCGCGCATGGCTTCTACTCCT The intron molecule is not related to other sequences. It has 3 dimensional structure. The intron molecule has sequence.

  18. Retroviral gag, pol and env genes are encoded in both RNA and DNA • The retrovirus genome exists as RNA. • It integrates into the host DNA via reverse transcriptase. • The host now contains gag, pol and env.

  19. Is biological sequence mathematical? • It is sequential. • Therefore we can do coordinate based calculations like ‘Are the coordinates of this exon located within the coordinates of the transcript?

  20. Is sequence abstract? • Does sequence exist or is it a quality that is dependent on another substrate? • An exon can be located on a genomic sequence, and an mRNA sequence. • SO is used in a representational way. People annotate where the interesting things are located on a genome.

  21. cgtaaaactttggccaggcgctctccggtctggtctggtctggtctgttcgtactgctcc gctctctttttccctcaaatgggccaaaaggaggcgacgtcgctgccgcggtcgcagcgc tgccgctgccgcagctaccgccgctgcagacgtcgcttacctgccgaagaagaagagcag cgttcAGTCGCGCAGCGCACGTCGTCCAACGCACACACGCTCAGAGACACACCGACACGC ACACAGATACAGATACGTTGAGTCGCCGCCGCCGCGAAAGATACCAGATACTATCTGCCA GATACGAAGAGTTGGGCCCTATAGTCGTCCCGCTTGCACCCATGGCCGCCTGAGTgtgag tgcaagagcggattggattgagtggaatacgaacgcgattccattccggtccacatccga acccacatccgaatcctatccgaagccacctaacccttgccgaccagcgcttaacccatg tcttcgtctttgtctcgtttcagAGTTGCAAGCGACCATGCGCGCATGGCTTCTACTCCT CGCAGTGCTGGCGACTTTTCAAACGATTGTTCGAGTTGCTAGCACCGAGGATATATCCCA GAGATTCATCGCCGCCATAGCGCCCGTTGCCGCTCATATTCCGCTGGCATCAGCATCAGG ATCAGGATCAGGACGATCTGGATCTAGATCGGTAGGAGCCTCGACCAGCACAGCATTAGC AAAAGCATTTAATCCATTCAGCGAGCCCGCCTCGTTCAGTGATAGTGATAAAAGCCATCG GAGTAAAACAAACAAAAAACCTAGCAAAAGTGACGCGAACCGACAGTTCAACGAAGTGCA TAAGCCAAGAACAGACCAATTAGAAAATTCCAAAAATAAGTCTAAACAATTAGTTAATAA ACCCAACCACAACAAAATGGCTGTCAAGGAGCAGAGGAGCCACCACAAGAAGAGCCACCA CCATCGCAGCCACCAGCCAAAGCAGGCCAGTGCATCCACAGAATCTCATCAATCCTCGTC GATTGAATCAATCTTCGTGGAGGAGCCGACGCTGGTGCTCGACCGCGAGGTGGCCTCCAT CAACGTGCCCGCCAACGCCAAGGCCATCATCGCCGAGCAGGGCCCGTCCACCTACAGCAA GGAGGCGCTCATCAAGGACAAGCTGAAGCCAGACCCCTCCACTCTAGTCGAGATCGAGAA GAGCCTGCTCTCGCTGTTCAACATGAAGCGGCCGCCCAAGATCGACCGCTCCAAGATCAT CATCCCCGAGCCGATGAAGAAGCTCTACGCCGAGATCATGGGCCACGAGCTCGACTCGGT CAACATCCCCAAGCCGGGTCTGCTGACCAAGTCGGCCAACACAGTGCGAAGTTTTACACA CAAAGgtgagtctccttttcaaatgtttaaaaccagaactagaaaaccggaagcggatat agaaaaactttgcattctaatggtattacttttaatacagcgagtatgattccttttgga

  22. Acknowledgements • SO is funded by the NIH via the Gene Ontology Consortium grant. • Suzi Lewis, Michael Ashburner, Chris Mungall, John Richter, Judy Blake. (GOC)

More Related